Ignore this question - sorry for time wasting.

I was watching the sys out from nutch not realising that infact the
fetches were being queued - so a bunch of fetch messages will be logged
for each url on the page just visited - but these are then queued for
processing at the appropriate time interval.


-----Original Message-----
From: Joel Halbert <[email protected]>
Reply-To: [email protected]
To: nutch users <[email protected]>
Subject: N 0.9 - fetcher.threads.per.host
Date: Tue, 28 Apr 2009 17:42:29 +0100

Hi,

I have noticed that the following settings do not interplay as I
expected:

fetcher.threads.fetch
fetcher.threads.per.host

Assuming that I have the following settings:

fetcher.server.delay = 4
fetcher.threads.fetch = 10
fetcher.threads.per.host = 1

then I assumed that the min time between requests to an individual host
would be 4 seconds. However it seems that fetcher.threads.per.host is
being applied on a per thread basis. If I only have one site in my list
of urls to crawl then it appears that 10 fetcher threads are created
anyway and they all make concurrent requests of the site.

Is this expected or have I misunderstood how these settings are to be
used?

Thanks,

Joel    

Reply via email to