On Wed, Aug 26, 2009 at 9:43 AM, Paul Tomblin<[email protected]> wrote:
> On Wed, Aug 26, 2009 at 10:34 AM, Ken
> Krugler<[email protected]> wrote:
>> If the sites you are crawling are under your control, or you have an
>> understanding with the site ops people, then you can alter Nutch's default
>> settings to make it run at near full speed.
>
> What settings would those be?  I tried increasing the number of
> threads from 10 to 125, but it had absolutely no discernible effect on
> the crawl speed.
>

Paul,

   I'd read the nutch-default.xml file, I believe the properties you'd
like to examine start in the section labelled <!-- fetcher properties
-->

fetcher.threads.per.host
fetcher.server.delay
fetcher.server.min.delay
fetcher.max.crawl.delay


I'm guessing there are others but those 4 looked like they were most
closely related.  Spending a bit of time reading the descriptions in
conf/nutch-default.xml is very helpful for tracking these things down.
 Override those values in conf/nutch-site.xml, don't directly change
the nutch-default.xml (at least that's what everything I've read
recommends).

Thanks,
    Kirby


> --
> http://www.linkedin.com/in/paultomblin
>

Reply via email to