Your patch has been applied. Thanks again!

Cheers,
Trustin

On Jan 4, 2008 1:24 PM, Trustin Lee <[EMAIL PROTECTED]> wrote:
> Hi Eric,
>
> Thank you so much for your bug report.  I'm forwarding this message to
> our official mailing list for better and faster assistance.  :)
>
> HTH,
> Trustin
>
>
> On Jan 4, 2008 1:18 PM, Eric Gaumer <[EMAIL PROTECTED]> wrote:
> > Hey Trustin,
> >
> > I'm building an enterprise RSS retriever for submitting RSS feeds to a
> > search engine (leveraging meta data found in the channel).
> >
> > I was working in Python with Twisted but when I came across mina I decided
> > to port things over to Java. You've created a nice framework for
> > asynchronous sockets.
> >
> > I'm using a 2.0 snapshot from about a week ago.
> >
> > The protocol-http-client is geared more for connecting to a single site and
> > retrieving multiple pages. For the RSS fetcher, I need to connect to many
> > sites and fetch a single page.
> >
> > I removed the blocking call to connect and added a listener, etc... I am
> > using the filter-codec-http package because most of what's being done here
> > is over my head right now and I see no reason to re-implement it.
> >
> > Everything is running great. I'm able to fetch and parse 500 RSS channels in
> > about 41s and 4,175 RSS channels in 5m 59s. This is quite good considering
> > there is about 1s average latency just in a typical request/response.
> >
> > The one problem I have noticed is that I would get a fair amount (maybe 10%)
> > of IllegalArgumentExceptions. Initially I just removed the offending feeds
> > from my tests.
> >
> > Tonight I had a chance to return to these problematic feeds and investigate
> > further. After some tests, I found that they all used Chunked Transfer
> > Coding.
> >
> > Looking at the raw response and reviewing the code in
> > src/main/java/org/apache/mina/filter/codec/http/ChunkedBodyDecodingState.jav
> > a I found that the error was generated because of a space between the chunk
> > size and the CR/LF bytes.
> >
> > For instance:
> >
> >    8a·(CR)(LF)
> >
> > Rather than:
> >
> >    8a(CR)(LF)
> >
> > I've browsed the RFC but I won't claim to fully understand it all. Should
> > whitespace be handled or should an exception be thrown here?
> >
> > Here is an example with some debug output:
> >
> > Byte:13
> >
> > LENGTH:eb1
> >
> > Byte:13
> >
> > LENGTH:f48
> >
> > Byte:13
> >
> > LENGTH:15b
> >
> > Byte:13
> >
> > LENGTH:d77
> >
> > Byte:13
> >
> > LENGTH:b81
> >
> > Byte:13
> >
> > LENGTH:f39
> >
> > Byte:13
> >
> > LENGTH:d0
> >
> > Byte:13
> >
> > LENGTH:53b
> >
> > Byte:13
> >
> > LENGTH:3f30
> >
> > Byte:13
> >
> > LENGTH:3a1
> >
> > Byte:13
> >
> > LENGTH:e4c
> >
> > Byte:32
> >
> > Illegal Argument Here: 32
> >
> > Callback Exception: org.apache.mina.filter.codec.ProtocolDecoderException:
> > java.lang.IllegalArgumentException
> >
> > So you see that we read "8", "a", and " " which causes an exception inside
> > isTerminator()
> >
> > This happens on a fair amount of sites so I'm assuming it's valid to have
> > whitespace here? The RFC provides an EBNF style language but doesn't
> > explicitly mention anything about allowing whitespace here. Yet, some
> > servers seem to occasionally add this whitespace.
> >
> > At any rate here is a simple patch that I did which seems to make these
> > errors disappear.
> >
> >
> > --- ChunkedBodyDecodingState.orig.java  2008-01-03 22:48:14.000000000 -0500
> > +++ ChunkedBodyDecodingState.java       2008-01-03 22:49:34.000000000 -0500
> > @@ -94,7 +94,7 @@
> >                         .throwDecoderException("Expected a chunk length.");
> >             }
> >
> > -            String length = product.getString(asciiDecoder);
> > +            String length = product.getString(asciiDecoder).trim();
> >             lastChunkLength = Integer.parseInt(length, 16);
> >             if (chunkHasExtension) {
> >                 return SKIP_CHUNK_EXTENSION;
> > @@ -106,7 +106,7 @@
> >         @Override
> >         protected boolean isTerminator(byte b) {
> >             if (!(b >= '0' && b <= '9' || b >= 'a' && b <= 'f' || b >= 'A'
> > -                    && b <= 'F')) {
> > +                    && b <= 'F' || b == ' ')) {
> >                 if (b == '\r' || b == ';') {
> >                     chunkHasExtension = b == ';';
> >                     return true;
> >
> >
> > Maybe this is helpful to you. Thanks again for a great framework. I expect
> > to get much use out of it ;-)
> >
> > -Eric
> >
> >
> >
>
>
>
> --
> what we call human nature is actually human habit
> --
> http://gleamynode.net/
> --
> PGP Key ID: 0x0255ECA6
>



-- 
what we call human nature is actually human habit
--
http://gleamynode.net/
--
PGP Key ID: 0x0255ECA6

Reply via email to