Eric,
The fact that I ended up having write access to the CVS does not make my opinion 
special. When it comes to deciding about the direction of HttpClient evolution, I have 
only one (1.0) vote to command, not 1.5 and not even 1.01.

I see your point. I agree that at the end of the days, dealing with a problem 
immediately or dealing with it a moment later does not change much. The problem still 
needs to be dealt with. However, I see one advantage in dropping connection when the 
connection is known to be bad: it is damn sure. When the connection stays open, 
readStatusLine will be making the decision as to where a new response starts on a 
fairly flaky ground: presence of a sequence of characters (HTTP) at the beginning of 
the line. It may guess it right, or it may not. If the connection is dropped and 
reopened, this problem can be avoided altogether.

Anyways, let Christian come up with a patch. If there's a consensus that the patch can 
cause more harm than good, it can still be turned down.

Oleg

-----Original Message-----
From: Eric Johnson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 11, 2003 16:39
To: Commons HttpClient Project
Subject: Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while
trying to retrieve status line


Oleg,

No apologies needed.  As far as I'm concerned, your opinions with 
respect to HttpClient carry far more weight - I seem to contribute a 
patch about once every three months (and Mike just did a much better 
rewrite of my latest patch), but otherwise just stir up trouble on the 
mailing list!

One more wrench.  It strikes me that if HttpClient does support HTTP 
pipelining in the future, then an "available" check will mislead you.  
Of course there might be data available, as that would be precisely the 
point.  Only when looking at the data can you determine whether or not 
it is valid to have that data already in the queue....  Granted, it is 
perhaps early to worry about pipelining!

It seems like we have a spectrum of possibilities to deal with here:

Malicious servers -- bad servers with immediate extra data -- bad 
servers with delayed extra data -- other bad servers (bad cookies et. 
al.) -- mostly compliant servers -- 100% compliant servers (yeah, right!).

It seems like we need to stop the malicious servers from derailing 
HttpClient.  As for the two types of bad servers, I'm not sure I see the 
point of catching the "bad" behavior on some requests immediately, when 
we're guaranteed to catch the problem on a subsequent request 100% of 
the time.  I'm guessing that the minor performance hit of reissuing a 
request when operating in strict mode will be swamped by the fact that 
in either case you have to open a new connection?

I guess that's part of why I ask whether there are specific "badly" 
behaved servers for which enforcing a "strict" mode with an "available" 
check will actually prove useful.  It could be that for every "bad" 
server out there, they almost all fall into the "delayed" extra data 
category, rather than the "immediate" category.  I think my concern 
comes down to this: How much benefit does HttpClient get out of 
enforcing the rule in two places, one with the "available" check, and 
one with the scan at the beginning of reading the next response? Is the 
answer 5% of bad servers, or 75% of bad servers?  And if you set a 
"strict" mode, can HttpClient even operate with those servers anyway?

Absent the real-world data, I'd stick with only trying to deal with the 
bad behavior at the beginning of the next response, rather than at the 
end of the previous one.

-Eric.

Kalnichevski, Oleg wrote:

>Eric,
>Just to clarify things: there's <strong>no way</strong> we touch 2.0 branch for any 
>other reason but fixing a <strong>serious</strong> bug. 
>
>As far as I am concerned the suggested patch is an enhancement, and not a bug. If any 
>of my comments were interpreted otherwise, I offer my apologies. It was not meant 
>that way. 
>
>Anyways, if I understand Christian right, the idea is to drop those connections that 
>are known for sure to be screwy, instead of reusing them and getting into a trouble 
>later when processing subsequent responses. The logic in readStatusLine will be 
>enhanced to optionally terminate the infinite loop. That is it.
>
>I hope that addresses your concerns somewhat.
>
>Oleg
>
>-----Original Message-----
>From: Eric Johnson [mailto:[EMAIL PROTECTED]
>Sent: Tuesday, November 11, 2003 15:35
>To: Commons HttpClient Project
>Subject: Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while
>trying to retrieve status line
>
>
>Christian Kohlschütter wrote:
>
>  
>
>>I perfectly agree - I do not see a bullet-proof solution either.
>>
>>I should correct my assumption:
>>
>>"Can we assume that reusing the HTTP connection is unreliable/should be 
>>avoided if there are more bytes *INSTANTLY* available than specified with 
>>Content-Length"
>>
>>"Instantly" means without waiting/blocking, so at least for this situation, a 
>>simple workaround would be feasible.
>>
>>I think that the currently used SocketInputStream's available() method _does_ 
>>return values > 0.
>> 
>>
>>    
>>
>Unfortunately, I think that depends.  I seem to recall we had 
>difficulties with this function in the past, particularly related to 
>different JVM versions, and also with different implementations of 
>secure sockets.  Granted, some of those implementations were/are buggy, 
>but we have to live with them, I think.  Before we commit such a change 
>to the 2.0 release branch, we'd have to run it through tests across 
>numerous JVMs on numerous platforms with numerous JCE libraries.  We 
>also run the risk that the "available" function could misbehave not only 
>by giving an incorrect response, but also by blocking for a short period 
>of time (1ms?), which would be disastrous for performance.
>
>I think the "instantly" available criteria is misleading, too.  There's 
>absolutely no reason to prevent you from hitting a pathological case 
>where the packet boundary splits right where the extra data is sent, 
>thus leading the "instantly" available check to return false, even 
>though the data would be read on the subsequent response.  In fact, such 
>behavior could be entirely dependent on the misbehaving server.  The 
>case that I've encountered stemmed from a server that tossed in an extra 
>CRLF after a previous response.  As I recall, the extra CRLF were, not 
>surprisingly, written to the socket as a separate chunk, and therefore 
>stood a good a chance as any of being received by the client as part of 
>a separate packet, and therefore would not be "instantly" available anyway.
>
>In the end, I've got a few concerns:
>* I'm not sure I see the point of trying to catch the problem at the end 
>of the previous response, which would seem to be the point of the 
>"available" check, rather than at the beginning of reading the next 
>response.
>* Are there particular servers that demonstrate bad behavior that we 
>want to catch with this new option?  Do we have test cases for those 
>particular servers?
>* Has it been tested across a myriad of environments?
>
>Such changes for the 2.1 CVS HEAD are fine by me.  I'm much more 
>concerned about the 2.0 branch, however, and keeping what changes we do 
>there to a minimum.  This change seems to straddle the boundary between 
>a bug-fix and additional functionality, at least from where I sit.  Then 
>again, I've not looked closely at the patch, I've only followed the 
>discussion.
>
>-Eric.
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [EMAIL PROTECTED]
>For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to