Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
On Monday 17 November 2003 21:05, Oleg Kalnichevski wrote: I have not found any mentioning of unexpected content in the RFC, so this is another reason why I would be a bit cautious about throwing a protocol exception. It would suffice to spit out a warning, drop the connection and move on. Alright. I can live with that. :-) Christian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
Christian, Feel free to make changes to the patch that you deem necessary. Once you think it is ready, I suggest we once again ask all the interested parties to raise their objections and express concerns. If there's no significant opposition to the final revision of the patch, and it is OKed by at least by two committers, I'll consider it good to be checked in. Oleg -Original Message- From: Christian Kohlschütter [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 18, 2003 11:13 To: Commons HttpClient Project Subject: Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line On Monday 17 November 2003 21:05, Oleg Kalnichevski wrote: I have not found any mentioning of unexpected content in the RFC, so this is another reason why I would be a bit cautious about throwing a protocol exception. It would suffice to spit out a warning, drop the connection and move on. Alright. I can live with that. :-) Christian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
[EMAIL PROTECTED] wrote: --- Additional Comments From [EMAIL PROTECTED] 2003-11-17 14:38 --- Odi, agreed - whitespace is a wrong term. CRLF is better. CRLF or LF is correct ;-) (see RFC2616, section 19.3). Would you then prefer my first version of the patch, or do you have another idea how to handle this? Sorry I did not look at the patch. I just outlined my idea of how it should be in my opinion. Odi Ps. Please use the list for discussion and only post decisions to Bugzilla. Short lists of bug notes are read quicker when fixing bugs. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
Odi, That would be REALLY cool! A simple authenticating proxy (or a proxy that could effectively 'fake' popular authentication schemes) would be a very much appreciated contribution. By the way, have a look at the Christian Kohlschütter's SimpleHttpServer: http://nagoya.apache.org/bugzilla/showattachment.cgi?attach_id=9066 I think that can be a good starting point for a better framework than SimpleHttpconnection. Oleg On Mon, 2003-11-17 at 15:00, [EMAIL PROTECTED] wrote: DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT http://nagoya.apache.org/bugzilla/show_bug.cgi?id=24560. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=24560 HttpClient loops endlessly while trying to retrieve status line --- Additional Comments From [EMAIL PROTECTED] 2003-11-17 14:00 --- Oleg, thank you for reviewing my patch. I think the reviewed version is OK in general (AFAICS from the diff - I haven't applied it yet). Just a few comments (new ideas?) by me: In my opinion, strict mode should be very pedantic about standards compliance. HttpClient should notify the user wherever a problematic, non-standards situation is detected. Of course, trailing whitespace should be silently ignored, but any other characters should be regarded as unexpected (is there a section in RFC 2616 for that? I haven't found it yet). The question is: Is (non-whitespace) garbage following the response (caused by a wrong Content-Length header, for example) unexpected enough? In your version of the patch, there is no chance to get informed about such a situation - and in 'lenient' mode, the detection is disabled completely (did you check the TestBadContentLength testcase? does it pass?). Regarding the ProtocolException/ResponseConsumedWatcher thing, of course, it _is_ a workaround to get that Exception thrown to the caller. However, I would appreciate it if the user _would_ receive that Exception (somehow). I even think it is not such a bad idea to keep that in responseConsumed(), just to inform every HttpClient component that there was an error while reading the response (the interface is not public, anyway). Instead of throwing an Exception, we could also have a boolean without/with errors return value, of course... In short, I would prefer the following behaviour: - For any mode: If garbage is detected, read (up to a certain limit of bytes N) until end of garbage (maximum of N bytes) or until a non-whitespace character is received; N is something 10 (should be user-definable). - For any mode, close the connection (the conncetion is definitely unreliable). - For strict mode, throw a ProtocolException if anything else but whitespace has been received. - (Optionally) introduce an extra pedantic mode (inherits strict mode) and throw a ProtocolException even if N bytes of _whitespace_ garbage have been received. Best regards, Christian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
Christian, see my comments in-line In my opinion, strict mode should be very pedantic about standards compliance. HttpClient should notify the user wherever a problematic, non-standards situation is detected. I do not mind being over-pedantic, but not at the expense of code quality. My personal impression was that throwing a protocol exception from responseConsumed required too much of an ugly plumbing. I believe that a big fat log warning should be enough. A protocol exception thrown from responseConsumed would not make HttpClient more reliable (dirty connection would be dropped, anyway), it would just make it, well, more pedantic. That is it. Of course, trailing whitespace should be silently ignored, but any other characters should be regarded as unexpected (is there a section in RFC 2616 for that? I haven't found it yet). This is what RFC has to say: quote In the interest of robustness, servers SHOULD ignore any empty line(s) received where a Request-Line is expected. In other words, if the server is reading the protocol stream at the beginning of a message and receives a CRLF first, it should ignore the CRLF. /quote http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.1 I have not found any mentioning of unexpected content in the RFC, so this is another reason why I would be a bit cautious about throwing a protocol exception. It would suffice to spit out a warning, drop the connection and move on. Folks, any strong options on this issue - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
I have not found any mentioning of unexpected content in the RFC, so this is another reason why I would be a bit cautious about throwing a protocol exception. It would suffice to spit out a warning, drop the connection and move on. Folks, any strong options on this issue I would prefer a warning to an exception. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
[EMAIL PROTECTED] wrote: Odi has a point here. There is no reliable way of telling where one response body ends and another one starts, if content length information is messed up. That's why the Content-Length header is part of the HTTP 1.1 protocol in the first place. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
[EMAIL PROTECTED] wrote: Agreed. In other words, can we assume that reusing the HTTP connection is unreliable/should be avoided if there are more bytes available than specified with Content-Length? In this case, at least, I would suggest to close the current connection and open a fresh one. Christian This is the pragmatic way of solving the problem. It always works and is very reliable. But it is expensive. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
Am Dienstag, 11. November 2003 12:32 schrieb Ortwin Glück: [EMAIL PROTECTED] wrote: Agreed. In other words, can we assume that reusing the HTTP connection is unreliable/should be avoided if there are more bytes available than specified with Content-Length? In this case, at least, I would suggest to close the current connection and open a fresh one. Christian This is the pragmatic way of solving the problem. It always works and is very reliable. But it is expensive. Has this already been implemented or should we do it now?
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
Am Dienstag, 11. November 2003 12:32 schrieb Ortwin Glück: [EMAIL PROTECTED] wrote: Agreed. In other words, can we assume that reusing the HTTP connection is unreliable/should be avoided if there are more bytes available than specified with Content-Length? In this case, at least, I would suggest to close the current connection and open a fresh one. Christian This is the pragmatic way of solving the problem. It always works and is very reliable. But it is expensive. btw, what do you mean with expensive? I think an available() check or read()/unread() call pair at the right point would be enough. Christian
RE: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
In other words, can we assume that reusing the HTTP connection is unreliable/should be avoided if there are more bytes available than specified with Content-Length? Christian, Again, there's a similar problem. How long do you think we should wait on a connection to be really sure that there's no garbage coming out of it? I think this timeout is the performance hit Odi was talking about. Besides, regardless how long you wait there can never be 100% certainty that the connection is completely 'clean'. You could still potentially get some garbage out of persistent connection when reading the status line of the following response. So, we are pretty much back to where we started You are welcome to come up with a better solution than the exiting, but I am afraid there's none that it is truly bullet-proof. Oleg - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
Am Dienstag, 11. November 2003 13:01 schrieb Kalnichevski, Oleg: In other words, can we assume that reusing the HTTP connection is unreliable/should be avoided if there are more bytes available than specified with Content-Length? Christian, Again, there's a similar problem. How long do you think we should wait on a connection to be really sure that there's no garbage coming out of it? I think this timeout is the performance hit Odi was talking about. Besides, regardless how long you wait there can never be 100% certainty that the connection is completely 'clean'. You could still potentially get some garbage out of persistent connection when reading the status line of the following response. So, we are pretty much back to where we started You are welcome to come up with a better solution than the exiting, but I am afraid there's none that it is truly bullet-proof. Oleg I perfectly agree - I do not see a bullet-proof solution either. I should correct my assumption: Can we assume that reusing the HTTP connection is unreliable/should be avoided if there are more bytes *INSTANTLY* available than specified with Content-Length Instantly means without waiting/blocking, so at least for this situation, a simple workaround would be feasible. I think that the currently used SocketInputStream's available() method _does_ return values 0. The only additional thing to change is HttpConnection.WrappedInputStream, as it currently lacks an available() method call to the underlying stream. Christian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
Christian Kohlschütter wrote: In this case, at least, I would suggest to close the current connection and open a fresh one. This is the pragmatic way of solving the problem. It always works and is very reliable. But it is expensive. btw, what do you mean with expensive? Expensive: Throwing away the (known bad) connection and opening a new one, is a performance hit of course. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
Can we assume that reusing the HTTP connection is unreliable/should be avoided if there are more bytes *INSTANTLY* available than specified with Content-Length Instantly means without waiting/blocking, so at least for this situation, a simple workaround would be feasible. Christian, Fair enough. Feel free to extend your patch to include the check to force-close those connections that report input data available beyond 'content-length' value/terminal chunk. I think that the currently used SocketInputStream's available() method _does_ return values 0. The only additional thing to change is HttpConnection.WrappedInputStream, as it currently lacks an available() method call to the underlying stream. This one is a bug, because it effectively rendered HttpConnection#isResponseAvailable meaningless. I'll provide a fix for it tonight. Thanks for tracking this one down Oleg - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
On Tuesday 11 November 2003 13:44, Ortwin Glück wrote: Expensive: Throwing away the (known bad) connection and opening a new one, is a performance hit of course. Sure. But this happens only if we detected a connection to be bad, and then it is a very rare situation. In strict mode, the detection of bad connection should cause an exception to be thrown, instead of reopening another connection. What do you think? Christian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
In strict mode, the detection of bad connection should cause an exception to be thrown, instead of reopening another connection. What do you think? Sounds reasonable. Oleg - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
Eric, Just to clarify things: there's strongno way/strong we touch 2.0 branch for any other reason but fixing a strongserious/strong bug. As far as I am concerned the suggested patch is an enhancement, and not a bug. If any of my comments were interpreted otherwise, I offer my apologies. It was not meant that way. Anyways, if I understand Christian right, the idea is to drop those connections that are known for sure to be screwy, instead of reusing them and getting into a trouble later when processing subsequent responses. The logic in readStatusLine will be enhanced to optionally terminate the infinite loop. That is it. I hope that addresses your concerns somewhat. Oleg -Original Message- From: Eric Johnson [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 11, 2003 15:35 To: Commons HttpClient Project Subject: Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line Christian Kohlschütter wrote: I perfectly agree - I do not see a bullet-proof solution either. I should correct my assumption: Can we assume that reusing the HTTP connection is unreliable/should be avoided if there are more bytes *INSTANTLY* available than specified with Content-Length Instantly means without waiting/blocking, so at least for this situation, a simple workaround would be feasible. I think that the currently used SocketInputStream's available() method _does_ return values 0. Unfortunately, I think that depends. I seem to recall we had difficulties with this function in the past, particularly related to different JVM versions, and also with different implementations of secure sockets. Granted, some of those implementations were/are buggy, but we have to live with them, I think. Before we commit such a change to the 2.0 release branch, we'd have to run it through tests across numerous JVMs on numerous platforms with numerous JCE libraries. We also run the risk that the available function could misbehave not only by giving an incorrect response, but also by blocking for a short period of time (1ms?), which would be disastrous for performance. I think the instantly available criteria is misleading, too. There's absolutely no reason to prevent you from hitting a pathological case where the packet boundary splits right where the extra data is sent, thus leading the instantly available check to return false, even though the data would be read on the subsequent response. In fact, such behavior could be entirely dependent on the misbehaving server. The case that I've encountered stemmed from a server that tossed in an extra CRLF after a previous response. As I recall, the extra CRLF were, not surprisingly, written to the socket as a separate chunk, and therefore stood a good a chance as any of being received by the client as part of a separate packet, and therefore would not be instantly available anyway. In the end, I've got a few concerns: * I'm not sure I see the point of trying to catch the problem at the end of the previous response, which would seem to be the point of the available check, rather than at the beginning of reading the next response. * Are there particular servers that demonstrate bad behavior that we want to catch with this new option? Do we have test cases for those particular servers? * Has it been tested across a myriad of environments? Such changes for the 2.1 CVS HEAD are fine by me. I'm much more concerned about the 2.0 branch, however, and keeping what changes we do there to a minimum. This change seems to straddle the boundary between a bug-fix and additional functionality, at least from where I sit. Then again, I've not looked closely at the patch, I've only followed the discussion. -Eric. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
Oleg, No apologies needed. As far as I'm concerned, your opinions with respect to HttpClient carry far more weight - I seem to contribute a patch about once every three months (and Mike just did a much better rewrite of my latest patch), but otherwise just stir up trouble on the mailing list! One more wrench. It strikes me that if HttpClient does support HTTP pipelining in the future, then an available check will mislead you. Of course there might be data available, as that would be precisely the point. Only when looking at the data can you determine whether or not it is valid to have that data already in the queue Granted, it is perhaps early to worry about pipelining! It seems like we have a spectrum of possibilities to deal with here: Malicious servers -- bad servers with immediate extra data -- bad servers with delayed extra data -- other bad servers (bad cookies et. al.) -- mostly compliant servers -- 100% compliant servers (yeah, right!). It seems like we need to stop the malicious servers from derailing HttpClient. As for the two types of bad servers, I'm not sure I see the point of catching the bad behavior on some requests immediately, when we're guaranteed to catch the problem on a subsequent request 100% of the time. I'm guessing that the minor performance hit of reissuing a request when operating in strict mode will be swamped by the fact that in either case you have to open a new connection? I guess that's part of why I ask whether there are specific badly behaved servers for which enforcing a strict mode with an available check will actually prove useful. It could be that for every bad server out there, they almost all fall into the delayed extra data category, rather than the immediate category. I think my concern comes down to this: How much benefit does HttpClient get out of enforcing the rule in two places, one with the available check, and one with the scan at the beginning of reading the next response? Is the answer 5% of bad servers, or 75% of bad servers? And if you set a strict mode, can HttpClient even operate with those servers anyway? Absent the real-world data, I'd stick with only trying to deal with the bad behavior at the beginning of the next response, rather than at the end of the previous one. -Eric. Kalnichevski, Oleg wrote: Eric, Just to clarify things: there's strongno way/strong we touch 2.0 branch for any other reason but fixing a strongserious/strong bug. As far as I am concerned the suggested patch is an enhancement, and not a bug. If any of my comments were interpreted otherwise, I offer my apologies. It was not meant that way. Anyways, if I understand Christian right, the idea is to drop those connections that are known for sure to be screwy, instead of reusing them and getting into a trouble later when processing subsequent responses. The logic in readStatusLine will be enhanced to optionally terminate the infinite loop. That is it. I hope that addresses your concerns somewhat. Oleg -Original Message- From: Eric Johnson [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 11, 2003 15:35 To: Commons HttpClient Project Subject: Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line Christian Kohlschütter wrote: I perfectly agree - I do not see a bullet-proof solution either. I should correct my assumption: Can we assume that reusing the HTTP connection is unreliable/should be avoided if there are more bytes *INSTANTLY* available than specified with Content-Length Instantly means without waiting/blocking, so at least for this situation, a simple workaround would be feasible. I think that the currently used SocketInputStream's available() method _does_ return values 0. Unfortunately, I think that depends. I seem to recall we had difficulties with this function in the past, particularly related to different JVM versions, and also with different implementations of secure sockets. Granted, some of those implementations were/are buggy, but we have to live with them, I think. Before we commit such a change to the 2.0 release branch, we'd have to run it through tests across numerous JVMs on numerous platforms with numerous JCE libraries. We also run the risk that the available function could misbehave not only by giving an incorrect response, but also by blocking for a short period of time (1ms?), which would be disastrous for performance. I think the instantly available criteria is misleading, too. There's absolutely no reason to prevent you from hitting a pathological case where the packet boundary splits right where the extra data is sent, thus leading the instantly available check to return false, even though the data would be read on the subsequent response. In fact, such behavior could be entirely dependent on the misbehaving server. The case that I've encountered stemmed from a server that tossed in an extra CRLF after
Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line
The only additional thing to change is HttpConnection.WrappedInputStream, as it currently lacks an available() method call to the underlying stream. Problem fixed. WrappedInputStream now overrides InputStream#available method. Thanks. Oleg - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]