Your patch has been applied. Thanks again! Cheers, Trustin
On Jan 4, 2008 1:24 PM, Trustin Lee <[EMAIL PROTECTED]> wrote: > Hi Eric, > > Thank you so much for your bug report. I'm forwarding this message to > our official mailing list for better and faster assistance. :) > > HTH, > Trustin > > > On Jan 4, 2008 1:18 PM, Eric Gaumer <[EMAIL PROTECTED]> wrote: > > Hey Trustin, > > > > I'm building an enterprise RSS retriever for submitting RSS feeds to a > > search engine (leveraging meta data found in the channel). > > > > I was working in Python with Twisted but when I came across mina I decided > > to port things over to Java. You've created a nice framework for > > asynchronous sockets. > > > > I'm using a 2.0 snapshot from about a week ago. > > > > The protocol-http-client is geared more for connecting to a single site and > > retrieving multiple pages. For the RSS fetcher, I need to connect to many > > sites and fetch a single page. > > > > I removed the blocking call to connect and added a listener, etc... I am > > using the filter-codec-http package because most of what's being done here > > is over my head right now and I see no reason to re-implement it. > > > > Everything is running great. I'm able to fetch and parse 500 RSS channels in > > about 41s and 4,175 RSS channels in 5m 59s. This is quite good considering > > there is about 1s average latency just in a typical request/response. > > > > The one problem I have noticed is that I would get a fair amount (maybe 10%) > > of IllegalArgumentExceptions. Initially I just removed the offending feeds > > from my tests. > > > > Tonight I had a chance to return to these problematic feeds and investigate > > further. After some tests, I found that they all used Chunked Transfer > > Coding. > > > > Looking at the raw response and reviewing the code in > > src/main/java/org/apache/mina/filter/codec/http/ChunkedBodyDecodingState.jav > > a I found that the error was generated because of a space between the chunk > > size and the CR/LF bytes. > > > > For instance: > > > > 8a·(CR)(LF) > > > > Rather than: > > > > 8a(CR)(LF) > > > > I've browsed the RFC but I won't claim to fully understand it all. Should > > whitespace be handled or should an exception be thrown here? > > > > Here is an example with some debug output: > > > > Byte:13 > > > > LENGTH:eb1 > > > > Byte:13 > > > > LENGTH:f48 > > > > Byte:13 > > > > LENGTH:15b > > > > Byte:13 > > > > LENGTH:d77 > > > > Byte:13 > > > > LENGTH:b81 > > > > Byte:13 > > > > LENGTH:f39 > > > > Byte:13 > > > > LENGTH:d0 > > > > Byte:13 > > > > LENGTH:53b > > > > Byte:13 > > > > LENGTH:3f30 > > > > Byte:13 > > > > LENGTH:3a1 > > > > Byte:13 > > > > LENGTH:e4c > > > > Byte:32 > > > > Illegal Argument Here: 32 > > > > Callback Exception: org.apache.mina.filter.codec.ProtocolDecoderException: > > java.lang.IllegalArgumentException > > > > So you see that we read "8", "a", and " " which causes an exception inside > > isTerminator() > > > > This happens on a fair amount of sites so I'm assuming it's valid to have > > whitespace here? The RFC provides an EBNF style language but doesn't > > explicitly mention anything about allowing whitespace here. Yet, some > > servers seem to occasionally add this whitespace. > > > > At any rate here is a simple patch that I did which seems to make these > > errors disappear. > > > > > > --- ChunkedBodyDecodingState.orig.java 2008-01-03 22:48:14.000000000 -0500 > > +++ ChunkedBodyDecodingState.java 2008-01-03 22:49:34.000000000 -0500 > > @@ -94,7 +94,7 @@ > > .throwDecoderException("Expected a chunk length."); > > } > > > > - String length = product.getString(asciiDecoder); > > + String length = product.getString(asciiDecoder).trim(); > > lastChunkLength = Integer.parseInt(length, 16); > > if (chunkHasExtension) { > > return SKIP_CHUNK_EXTENSION; > > @@ -106,7 +106,7 @@ > > @Override > > protected boolean isTerminator(byte b) { > > if (!(b >= '0' && b <= '9' || b >= 'a' && b <= 'f' || b >= 'A' > > - && b <= 'F')) { > > + && b <= 'F' || b == ' ')) { > > if (b == '\r' || b == ';') { > > chunkHasExtension = b == ';'; > > return true; > > > > > > Maybe this is helpful to you. Thanks again for a great framework. I expect > > to get much use out of it ;-) > > > > -Eric > > > > > > > > > > -- > what we call human nature is actually human habit > -- > http://gleamynode.net/ > -- > PGP Key ID: 0x0255ECA6 > -- what we call human nature is actually human habit -- http://gleamynode.net/ -- PGP Key ID: 0x0255ECA6