> > I did some research and I traced the problem to be somewhere inside
> > HttpRequest of protocol-httpclient.
> I had a similar report from someone else, and I'll try to find out what
> is happening. Thanks for this debugging output, it is helpful - if you
> find something else, please let me know.
It seems, that at least in most cases (dunno if in every case) inside
the HttpResponse, in the line
while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1 &&
tryAndRead > 0) {
read returns just one byte (bufferFilled == 1). Normally it returns
buffer.length, and it also returns full buffers from the same socket,
but for some reason it goes rampage
and starts returning one byte at a time.
I created an ugly workaround by creating a counter, which starts from 10
and degreases every time when bufferFilled == 1. Once the counter
reaches zero, it aborts the read by breaking the inner while loop. This
makes the fetched page to be corrupted, but at least it won't halt
the whole fetch of thousands pages.
- Juho Mäkinen, http://www.juhonkoti.net
-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_idt77&alloc_id492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general