Hello All,

I ran across another interesting problem today in handling status codes
inside of:

src/main/java/org/apache/mina/filter/codec/http/HttpResponseLineDecodingStat
e.java

This is mina-2.0 snapshot.

I'm building an RSS client that needs to poll thousands of RSS channels. I'm
using conditional gets as not to waste bandwidth.

I noticed a particular site was raising a "Bad Status Code" exception
whenever the server responded with a 304 (Not Modified).

It seems that the server (Apache version ??) was sending back a status code
without the Reason Phrase.

I found the BNF notation that describes the Reason Phrase in RFC2616:

http://www.w3.org/Protocols/HTTP/1.1/rfc2616bis/issues/#i94

Which states:

TEXT           = <any OCTET except CTLs, but including LWS>
LWS            = [CRLF] 1*( SP | HT )
CRLF           = CR LF
Reason-Phrase  = *<TEXT, excluding CR, LF>

This means that a Reason Phrase could be empty and still be considered valid
(e.g., 0 or more Octets).

The state machine expects to see a Reason Phrase and when it doesn't, it
consumes part of the next header (Date) and then throws an exception trying
to convert this value to an Integer.

What I did was override the isTerminator() method for the
ConsumeToLinearWhitespaceDecodingState adding a check for a CR.

This stops the scanner from pulling in excess bytes. I wasn't sure how the
remaining states would handle this but AFTER_READ_STATUS_CODE returns
immediately as does READ_REASON_PHRASE (since we left a remaining LF byte on
the input buffer) and we cleanly move to a final acceptance state.

I've run about 500 feeds through this and nothing seems to have broke.

Here's a patch:

--- HttpResponseLineDecodingState.orig.java     2008-01-04
14:29:25.000000000 -0500
+++ HttpResponseLineDecodingState.java  2008-01-04 14:28:40.000000000 -0500
@@ -80,6 +80,10 @@
             }
             return AFTER_READ_STATUS_CODE;
         }
+        @Override
+        protected boolean isTerminator(byte b) {
+            return b == 32 || b == 9 || b == 13;
+        }
     };
 
     private final DecodingState AFTER_READ_STATUS_CODE = new
LinearWhitespaceSkippingState() {


This _should_ be safe since the response line has to be terminated with a
CR/LF pair. As long as we leave the LF byte, it's enough to satisfy the
state requirements for the trailing states.

You can use this site for testing. If you set the eTag you should get a 304
with no Reason Phrase.

    http://www.mattweber.org/feed/

Thanks,
-Eric

P.S. Did I mention how much fun I've been having with mina? Love it!


Reply via email to