Hey,
Here's a patch that'll allow users to configure how many threads they want to access the same host at the same time. Right Nutch only allows one thread at a time to access any given host. The default will still be 1 thread per host.
The somewhat fuzzy part of this is that it will wait the fetcher.server.delay only when it pops off the last thread accessing a host. With 1 thread per host this results in identical behavior as currently.
Let me know what you think. I've tested it a little and seems to work as it is supposed to.
Thanks,
Luke Baker
Hey,
Anybody have thoughts of committing this to CVS? I've used it for a several million document crawl, and it worked great. This will be a great benefit for those doing intranet crawls and whose infrastucture can afford a fast crawling Nutch.
Thoughts?
Thanks,
Luke Baker
------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
