Does the fetcher ever complete? If not, some of the fetcher threads could be stuck. There was a bug in the PDF parser which caused it to hang on some documents. So, as you encounter more of such pages, your crawl would slow as more threads get stuck.

You might also try more threads. I've noticed that it sometime takes a few minutes for the fetcher to "settle down", so that its initial performance is not representative of the overall.

Doug

[EMAIL PROTECTED] wrote:
Hello,

I am using Nutch 0.5 and am wondering whether anyone noticed that
fetcher sometimes continuously slows down, from the moment it was
started?

I am using 10 threads, and I noticed that the fetcher started with
about 100KB/second, went up to 200kb/second, and then the crawling rate
started continously going down. After half a day it was crawling at a
rate of 30KB/second.  The fetch list consists of a number of random
hosts, so I don't think this should be caused by the delay between
requests to the the host.  There was no other netowork traffic on my
server.  Of course, there could be something external to my machine and
network card, but I couldn't check that.

Has anyone seen this with Nutch?  Should I suspect Nutch, or something
local to my installation or even external to my machine?

Thanks,
Otis



-------------------------------------------------------
This SF.Net email is sponsored by: thawte's Crypto Challenge Vl
Crack the code and win a Sony DCRHC40 MiniDV Digital Handycam
Camcorder. More prizes in the weekly Lunch Hour Challenge.
Sign up NOW http://ad.doubleclick.net/clk;10740251;10262165;m
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to