On 12/01/2004 01:42 PM, Luke Baker wrote:Here's a patch that'll allow users to configure how many threads they want to access the same host at the same time. Right Nutch only allows one thread at a time to access any given host. The default will still be 1 thread per host.
The somewhat fuzzy part of this is that it will wait the fetcher.server.delay only when it pops off the last thread accessing a host. With 1 thread per host this results in identical behavior as currently.
Anybody have thoughts of committing this to CVS? I've used it for a several million document crawl, and it worked great. This will be a great benefit for those doing intranet crawls and whose infrastucture can afford a fast crawling Nutch.
Sorry. I somehow missed this when you first posted it. This looks like a great addition!
I'm not really sure how fetcher.server.delay should work when multiple threads are permitted to access a host. I guess it might be better if each request paused, but I think the behaviour you implement is acceptable.
I took the liberty of making a few simplifications and cosmetic changes, ran unit tests and comitted it. Can you please check that it still works for you as I comitted it?
Thanks for the patch!
Doug
------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
