Ignore this question - sorry for time wasting. I was watching the sys out from nutch not realising that infact the fetches were being queued - so a bunch of fetch messages will be logged for each url on the page just visited - but these are then queued for processing at the appropriate time interval.
-----Original Message----- From: Joel Halbert <[email protected]> Reply-To: [email protected] To: nutch users <[email protected]> Subject: N 0.9 - fetcher.threads.per.host Date: Tue, 28 Apr 2009 17:42:29 +0100 Hi, I have noticed that the following settings do not interplay as I expected: fetcher.threads.fetch fetcher.threads.per.host Assuming that I have the following settings: fetcher.server.delay = 4 fetcher.threads.fetch = 10 fetcher.threads.per.host = 1 then I assumed that the min time between requests to an individual host would be 4 seconds. However it seems that fetcher.threads.per.host is being applied on a per thread basis. If I only have one site in my list of urls to crawl then it appears that 10 fetcher threads are created anyway and they all make concurrent requests of the site. Is this expected or have I misunderstood how these settings are to be used? Thanks, Joel
