Okay, saw the code in the http-protocol plugin. I remember looking at this about a year ago. RFC 2616 (HTTP/1.1) does say, as Jerome pointed out:
"A server MUST NOT send transfer-codings to an HTTP/1.0 client." Regardless, I can attest that there are servers out there that return chunked content regardless of the client. We had a socket implementation akin to the HttpResponse.java in http-protocol plugin and were stumped on how to handle identifying whether the response was chunked or not - as we could not reliably use the Transfer-coding header. The only way we could see was trying to use the initial hex characters denoting the size of the first chunk. "The chunk-size field is a string of hex digits indicating the size of the chunk. The chunked encoding is ended by any chunk whose size is zero, followed by the trailer, which is terminated by an empty line." - more from RFC 2616 But in practice this was error prone. Switching over to apache httpclient eliminated this problem, as it transparently handles chunked and un-chunked content. But httpclient is much more heavy weight and so the conversion could only be done after implementing some basic resource pooling on the primary httpclient object. It does look like this would be a serious refactor job as nutch uses all java.net classes. On the other hand, it might simplify some areas of the nutch protocol classes and httpclient does have some interesting built in support for multi-threading/performance tuning requests. I hope this helps towards a solution. Best Regards, Chris --- Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > Chris Fellows wrote: > > Just remembered, got around it by using HTTPClient > > which handles reading the response (chunked or > not) > > transparently. Haven't looked at the nutch code, > but > > if we were to use HTTPClient 3.0.x or later, > should > > take care of it. > > > > > > Take a look at protocol-httpclient. This discussion > is on whether/how to > fix protocol-http. The other plugin already supports > this. > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ > __________________________________ > [__ || __|__/|__||\/| Information Retrieval, > Semantic Web > ___|||__|| \| || | Embedded Unix, System > Integration > http://www.sigram.com Contact: info at sigram dot > com > > >