Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-18 Thread Christian Kohlschtter
On Monday 17 November 2003 21:05, Oleg Kalnichevski wrote:

 I have not found any mentioning of unexpected content in the RFC, so
 this is another reason why I would be a bit cautious about throwing a
 protocol exception. It would suffice to spit out a warning, drop the
 connection and move on.

Alright. I can live with that. :-)

Christian

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-18 Thread Kalnichevski, Oleg
Christian,
Feel free to make changes to the patch that you deem necessary. Once you think it is 
ready, I suggest we once again ask all the interested parties to raise their 
objections and express concerns. If there's no significant opposition to the final 
revision of the patch, and it is OKed by at least by two committers, I'll consider it 
good to be checked in.

Oleg

-Original Message-
From: Christian Kohlschütter [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 18, 2003 11:13
To: Commons HttpClient Project
Subject: Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while
trying to retrieve status line


On Monday 17 November 2003 21:05, Oleg Kalnichevski wrote:

 I have not found any mentioning of unexpected content in the RFC, so
 this is another reason why I would be a bit cautious about throwing a
 protocol exception. It would suffice to spit out a warning, drop the
 connection and move on.

Alright. I can live with that. :-)

Christian

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-17 Thread Ortwin Glück
[EMAIL PROTECTED] wrote:
--- Additional Comments From [EMAIL PROTECTED]  2003-11-17 14:38 ---
Odi,
agreed - whitespace is a wrong term. CRLF is better. CRLF or LF is correct ;-)
(see RFC2616, section 19.3).
Would you then prefer my first version of the patch, or do you have another idea
how to handle this?
Sorry I did not look at the patch. I just outlined my idea of how it 
should be in my opinion.

Odi

Ps. Please use the list for discussion and only post decisions to 
Bugzilla. Short lists of bug notes are read quicker when fixing bugs.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-17 Thread Oleg Kalnichevski
Odi,
That would be REALLY cool! A simple authenticating proxy (or a proxy
that could effectively 'fake' popular authentication schemes) would be a
very much appreciated contribution. By the way, have a look at the
Christian Kohlschütter's SimpleHttpServer:
 
http://nagoya.apache.org/bugzilla/showattachment.cgi?attach_id=9066

I think that can be a good starting point for a better framework than
SimpleHttpconnection.

Oleg


On Mon, 2003-11-17 at 15:00, [EMAIL PROTECTED] wrote:
 DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
 RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
 http://nagoya.apache.org/bugzilla/show_bug.cgi?id=24560.
 ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
 INSERTED IN THE BUG DATABASE.
 
 http://nagoya.apache.org/bugzilla/show_bug.cgi?id=24560
 
 HttpClient loops endlessly while trying to retrieve status line
 
 
 
 
 
 --- Additional Comments From [EMAIL PROTECTED]  2003-11-17 14:00 ---
 Oleg,
 
 thank you for reviewing my patch.
 
 I think the reviewed version is OK in general (AFAICS from the diff - I haven't
 applied it yet). Just a few comments (new ideas?) by me:
 
 In my opinion, strict mode should be very pedantic about standards compliance.
 HttpClient should notify the user wherever a problematic, non-standards
 situation is detected.
 
 Of course, trailing whitespace should be silently ignored, but any other
 characters should be regarded as unexpected (is there a section in RFC 2616
 for that? I haven't found it yet).
 
 The question is: Is (non-whitespace) garbage following the response (caused by a
 wrong Content-Length header, for example) unexpected enough?
 
 In your version of the patch, there is no chance to get informed about such a
 situation - and in 'lenient' mode, the detection is disabled completely (did you
 check the TestBadContentLength testcase? does it pass?).
 
 Regarding the ProtocolException/ResponseConsumedWatcher thing, of course, it
 _is_ a workaround to get that Exception thrown to the caller. However, I would
 appreciate it if the user _would_ receive that Exception (somehow).
 
 I even think it is not such a bad idea to keep that in responseConsumed(), just
 to inform every HttpClient component that there was an error while reading the
 response (the interface is not public, anyway). Instead of throwing an
 Exception, we could also have a boolean without/with errors return value, of
 course...
 
 In short, I would prefer the following behaviour:
 
 - For any mode: If garbage is detected, read (up to a certain limit of bytes
 N) until end of garbage (maximum of N bytes) or until a non-whitespace
 character is received; N is something  10 (should be user-definable).
 
 - For any mode, close the connection (the conncetion is definitely unreliable).
 
 - For strict mode, throw a ProtocolException if anything else but whitespace has
 been received.
 
 - (Optionally) introduce an extra pedantic mode (inherits strict mode) and
 throw a ProtocolException even if N bytes of _whitespace_ garbage have been
 received.
 
 
 Best regards,
 
 Christian
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-17 Thread Oleg Kalnichevski
Christian, see my comments in-line
 In my opinion, strict mode should be very pedantic about standards compliance.
 HttpClient should notify the user wherever a problematic, non-standards
 situation is detected.
 

I do not mind being over-pedantic, but not at the expense of code
quality. My personal impression was that throwing a protocol exception
from responseConsumed required too much of an ugly plumbing. I believe
that a big fat log warning should be enough. A protocol exception thrown
from responseConsumed would not make HttpClient more reliable (dirty
connection would be dropped, anyway), it would just make it, well, more
pedantic. That is it.


 Of course, trailing whitespace should be silently ignored, but any other
 characters should be regarded as unexpected (is there a section in RFC 2616
 for that? I haven't found it yet).

This is what RFC has to say:
quote
In the interest of robustness, servers SHOULD ignore any empty line(s)
received where a Request-Line is expected. In other words, if the server
is reading the protocol stream at the beginning of a message and
receives a CRLF first, it should ignore the CRLF.
/quote
http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.1

I have not found any mentioning of unexpected content in the RFC, so
this is another reason why I would be a bit cautious about throwing a
protocol exception. It would suffice to spit out a warning, drop the
connection and move on.

Folks, any strong options on this issue


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-17 Thread Michael Becke

I have not found any mentioning of unexpected content in the RFC, so
this is another reason why I would be a bit cautious about throwing a
protocol exception. It would suffice to spit out a warning, drop the
connection and move on.
Folks, any strong options on this issue
I would prefer a warning to an exception.

Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Ortwin Glück
[EMAIL PROTECTED] wrote:
Odi has a point here. There is no reliable way of telling where one response
body ends and another one starts, if content length information is messed up.
That's why the Content-Length header is part of the HTTP 1.1 protocol in 
the first place.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Ortwin Glück


[EMAIL PROTECTED] wrote:
Agreed. 
 
In other words, can we assume that reusing the HTTP connection is 
unreliable/should be avoided if there are more bytes available than specified 
with Content-Length? 
 
In this case, at least, I would suggest to close the current connection and 
open a fresh one. 
 
 
Christian
This is the pragmatic way of solving the problem. It always works and is 
very reliable. But it is expensive.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Christian Kohlschütter
Am Dienstag, 11. November 2003 12:32 schrieb Ortwin Glück:
 [EMAIL PROTECTED] wrote:

  Agreed. 
   
  In other words, can we assume that reusing the HTTP connection is 
  unreliable/should be avoided if there are more bytes available than
  specified 
 with Content-Length?
   
  In this case, at least, I would suggest to close the current connection
  and 
 open a fresh one.
   
   
  Christian

 This is the pragmatic way of solving the problem. It always works and is 
 very reliable. But it is expensive.

Has this already been implemented or should we do it now?


Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Christian Kohlschütter
Am Dienstag, 11. November 2003 12:32 schrieb Ortwin Glück:
 [EMAIL PROTECTED] wrote:

  Agreed. 
   
  In other words, can we assume that reusing the HTTP connection is 
  unreliable/should be avoided if there are more bytes available than
  specified 
 with Content-Length?
   
  In this case, at least, I would suggest to close the current connection
  and 
 open a fresh one.
   
   
  Christian

 
 This is the pragmatic way of solving the problem. It always works and is 
 very reliable. But it is expensive.

btw, what do you mean with expensive?

I think an available() check or read()/unread() call pair at the right point 
would be enough.


Christian


RE: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Kalnichevski, Oleg
 In other words, can we assume that reusing the HTTP connection is 
 unreliable/should be avoided if there are more bytes available than specified 
 with Content-Length? 

Christian,
Again, there's a similar problem. How long do you think we should wait on a connection 
to be really sure that there's no garbage coming out of it? I think this timeout is 
the performance hit Odi was talking about. Besides, regardless how long you wait there 
can never be 100% certainty that the connection is completely 'clean'. You could still 
potentially get some garbage out of persistent connection when reading the status line 
of the following response. So, we are pretty much back to where we started

You are welcome to come up with a better solution than the exiting, but I am afraid 
there's none that it is truly bullet-proof.

Oleg

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Christian Kohlschütter
Am Dienstag, 11. November 2003 13:01 schrieb Kalnichevski, Oleg:
  In other words, can we assume that reusing the HTTP connection is
  unreliable/should be avoided if there are more bytes available than
  specified with Content-Length?

 Christian,
 Again, there's a similar problem. How long do you think we should wait on a
 connection to be really sure that there's no garbage coming out of it? I
 think this timeout is the performance hit Odi was talking about. Besides,
 regardless how long you wait there can never be 100% certainty that the
 connection is completely 'clean'. You could still potentially get some
 garbage out of persistent connection when reading the status line of the
 following response. So, we are pretty much back to where we started

 You are welcome to come up with a better solution than the exiting, but I
 am afraid there's none that it is truly bullet-proof.

 Oleg

I perfectly agree - I do not see a bullet-proof solution either.

I should correct my assumption:

Can we assume that reusing the HTTP connection is unreliable/should be 
avoided if there are more bytes *INSTANTLY* available than specified with 
Content-Length

Instantly means without waiting/blocking, so at least for this situation, a 
simple workaround would be feasible.

I think that the currently used SocketInputStream's available() method _does_ 
return values  0.

The only additional thing to change is HttpConnection.WrappedInputStream, as 
it currently lacks an available() method call to the underlying stream.


Christian

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Ortwin Glück


Christian Kohlschütter wrote:
In this case, at least, I would suggest to close the current connection
and 
 open a fresh one.

This is the pragmatic way of solving the problem. It always works and is 
very reliable. But it is expensive.


btw, what do you mean with expensive?
Expensive: Throwing away the (known bad) connection and opening a new 
one, is a performance hit of course.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Kalnichevski, Oleg
 Can we assume that reusing the HTTP connection is unreliable/should be 
 avoided if there are more bytes *INSTANTLY* available than specified with 
 Content-Length

 Instantly means without waiting/blocking, so at least for this situation, a 
 simple workaround would be feasible.
 

Christian,
Fair enough. Feel free to extend your patch to include the check to force-close those 
connections that report input data available beyond 'content-length' value/terminal 
chunk.


 I think that the currently used SocketInputStream's available() method _does_ 
 return values  0.
 
 The only additional thing to change is HttpConnection.WrappedInputStream, as 
 it currently lacks an available() method call to the underlying stream.

This one is a bug, because it effectively rendered HttpConnection#isResponseAvailable 
meaningless. I'll provide a fix for it tonight. Thanks for tracking this one down

Oleg

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Christian Kohlschütter
On Tuesday 11 November 2003 13:44, Ortwin Glück wrote:
 
 Expensive: Throwing away the (known bad) connection and opening a new 
 one, is a performance hit of course.

Sure. But this happens only if we detected a connection to be bad, and then it 
is a very rare situation.

In strict mode, the detection of bad connection should cause an exception to 
be thrown, instead of reopening another connection. What do you think?


Christian

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Kalnichevski, Oleg
 In strict mode, the detection of bad connection should cause an exception to 
 be thrown, instead of reopening another connection. What do you think?

Sounds reasonable. 

Oleg

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Kalnichevski, Oleg
Eric,
Just to clarify things: there's strongno way/strong we touch 2.0 branch for any 
other reason but fixing a strongserious/strong bug. 

As far as I am concerned the suggested patch is an enhancement, and not a bug. If any 
of my comments were interpreted otherwise, I offer my apologies. It was not meant that 
way. 

Anyways, if I understand Christian right, the idea is to drop those connections that 
are known for sure to be screwy, instead of reusing them and getting into a trouble 
later when processing subsequent responses. The logic in readStatusLine will be 
enhanced to optionally terminate the infinite loop. That is it.

I hope that addresses your concerns somewhat.

Oleg

-Original Message-
From: Eric Johnson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 11, 2003 15:35
To: Commons HttpClient Project
Subject: Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while
trying to retrieve status line


Christian Kohlschütter wrote:

I perfectly agree - I do not see a bullet-proof solution either.

I should correct my assumption:

Can we assume that reusing the HTTP connection is unreliable/should be 
avoided if there are more bytes *INSTANTLY* available than specified with 
Content-Length

Instantly means without waiting/blocking, so at least for this situation, a 
simple workaround would be feasible.

I think that the currently used SocketInputStream's available() method _does_ 
return values  0.
  

Unfortunately, I think that depends.  I seem to recall we had 
difficulties with this function in the past, particularly related to 
different JVM versions, and also with different implementations of 
secure sockets.  Granted, some of those implementations were/are buggy, 
but we have to live with them, I think.  Before we commit such a change 
to the 2.0 release branch, we'd have to run it through tests across 
numerous JVMs on numerous platforms with numerous JCE libraries.  We 
also run the risk that the available function could misbehave not only 
by giving an incorrect response, but also by blocking for a short period 
of time (1ms?), which would be disastrous for performance.

I think the instantly available criteria is misleading, too.  There's 
absolutely no reason to prevent you from hitting a pathological case 
where the packet boundary splits right where the extra data is sent, 
thus leading the instantly available check to return false, even 
though the data would be read on the subsequent response.  In fact, such 
behavior could be entirely dependent on the misbehaving server.  The 
case that I've encountered stemmed from a server that tossed in an extra 
CRLF after a previous response.  As I recall, the extra CRLF were, not 
surprisingly, written to the socket as a separate chunk, and therefore 
stood a good a chance as any of being received by the client as part of 
a separate packet, and therefore would not be instantly available anyway.

In the end, I've got a few concerns:
* I'm not sure I see the point of trying to catch the problem at the end 
of the previous response, which would seem to be the point of the 
available check, rather than at the beginning of reading the next 
response.
* Are there particular servers that demonstrate bad behavior that we 
want to catch with this new option?  Do we have test cases for those 
particular servers?
* Has it been tested across a myriad of environments?

Such changes for the 2.1 CVS HEAD are fine by me.  I'm much more 
concerned about the 2.0 branch, however, and keeping what changes we do 
there to a minimum.  This change seems to straddle the boundary between 
a bug-fix and additional functionality, at least from where I sit.  Then 
again, I've not looked closely at the patch, I've only followed the 
discussion.

-Eric.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Eric Johnson
Oleg,

No apologies needed.  As far as I'm concerned, your opinions with 
respect to HttpClient carry far more weight - I seem to contribute a 
patch about once every three months (and Mike just did a much better 
rewrite of my latest patch), but otherwise just stir up trouble on the 
mailing list!

One more wrench.  It strikes me that if HttpClient does support HTTP 
pipelining in the future, then an available check will mislead you.  
Of course there might be data available, as that would be precisely the 
point.  Only when looking at the data can you determine whether or not 
it is valid to have that data already in the queue  Granted, it is 
perhaps early to worry about pipelining!

It seems like we have a spectrum of possibilities to deal with here:

Malicious servers -- bad servers with immediate extra data -- bad 
servers with delayed extra data -- other bad servers (bad cookies et. 
al.) -- mostly compliant servers -- 100% compliant servers (yeah, right!).

It seems like we need to stop the malicious servers from derailing 
HttpClient.  As for the two types of bad servers, I'm not sure I see the 
point of catching the bad behavior on some requests immediately, when 
we're guaranteed to catch the problem on a subsequent request 100% of 
the time.  I'm guessing that the minor performance hit of reissuing a 
request when operating in strict mode will be swamped by the fact that 
in either case you have to open a new connection?

I guess that's part of why I ask whether there are specific badly 
behaved servers for which enforcing a strict mode with an available 
check will actually prove useful.  It could be that for every bad 
server out there, they almost all fall into the delayed extra data 
category, rather than the immediate category.  I think my concern 
comes down to this: How much benefit does HttpClient get out of 
enforcing the rule in two places, one with the available check, and 
one with the scan at the beginning of reading the next response? Is the 
answer 5% of bad servers, or 75% of bad servers?  And if you set a 
strict mode, can HttpClient even operate with those servers anyway?

Absent the real-world data, I'd stick with only trying to deal with the 
bad behavior at the beginning of the next response, rather than at the 
end of the previous one.

-Eric.

Kalnichevski, Oleg wrote:

Eric,
Just to clarify things: there's strongno way/strong we touch 2.0 branch for any other reason but fixing a strongserious/strong bug. 

As far as I am concerned the suggested patch is an enhancement, and not a bug. If any of my comments were interpreted otherwise, I offer my apologies. It was not meant that way. 

Anyways, if I understand Christian right, the idea is to drop those connections that are known for sure to be screwy, instead of reusing them and getting into a trouble later when processing subsequent responses. The logic in readStatusLine will be enhanced to optionally terminate the infinite loop. That is it.

I hope that addresses your concerns somewhat.

Oleg

-Original Message-
From: Eric Johnson [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 11, 2003 15:35
To: Commons HttpClient Project
Subject: Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while
trying to retrieve status line
Christian Kohlschütter wrote:

 

I perfectly agree - I do not see a bullet-proof solution either.

I should correct my assumption:

Can we assume that reusing the HTTP connection is unreliable/should be 
avoided if there are more bytes *INSTANTLY* available than specified with 
Content-Length

Instantly means without waiting/blocking, so at least for this situation, a 
simple workaround would be feasible.

I think that the currently used SocketInputStream's available() method _does_ 
return values  0.

   

Unfortunately, I think that depends.  I seem to recall we had 
difficulties with this function in the past, particularly related to 
different JVM versions, and also with different implementations of 
secure sockets.  Granted, some of those implementations were/are buggy, 
but we have to live with them, I think.  Before we commit such a change 
to the 2.0 release branch, we'd have to run it through tests across 
numerous JVMs on numerous platforms with numerous JCE libraries.  We 
also run the risk that the available function could misbehave not only 
by giving an incorrect response, but also by blocking for a short period 
of time (1ms?), which would be disastrous for performance.

I think the instantly available criteria is misleading, too.  There's 
absolutely no reason to prevent you from hitting a pathological case 
where the packet boundary splits right where the extra data is sent, 
thus leading the instantly available check to return false, even 
though the data would be read on the subsequent response.  In fact, such 
behavior could be entirely dependent on the misbehaving server.  The 
case that I've encountered stemmed from a server that tossed in an extra 
CRLF after

Re: DO NOT REPLY [Bug 24560] - HttpClient loops endlessly while trying to retrieve status line

2003-11-11 Thread Oleg Kalnichevski

 The only additional thing to change is HttpConnection.WrappedInputStream, as 
 it currently lacks an available() method call to the underlying stream.
 

Problem fixed. WrappedInputStream now overrides InputStream#available
method. Thanks.

Oleg


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]