I've been playing around with Nutch 0.9 lately and have noticed something odd
that I can't explain.  I'm doing whole web crawls using inject, generate,
and fetch with -noParsing and 200 threads, using all the same commands on
the same machine when I'm the only user.  I have a shell script that
estimates how fast urls are being fetched per hour based on a read of the
log file every minute.

If I fetch a list of 50k urls I average a speed of ~120k per hour.  With a
fetch list of 1M urls I average ~40k/hour, and with a fetch list of 500k
urls I get 170k/hour.

I can't think of any reason I would be seeing a speed variance like this
based solely on the size of the fetch list.  Any ideas?  For now I'm
sticking with a fetch list size of 500k (obviously), but I'd like to know
why I'm seeing this variance.

Thanks,
Tim
-- 
View this message in context: 
http://www.nabble.com/Fetch-list-size-affecting-fetch-speed--tf3873977.html#a10976733
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to