I've found that the best speed increases I've made is in boosting the IO by 
upgrading the HD.  I use an I-SCSI with a zfs file system.  Another improvement 
is if your hardware handles lots of threads.

Alex


--- On Tue, 4/28/09, Raymond Balmès <[email protected]> wrote:

> From: Raymond Balmès <[email protected]>
> Subject: Re: dual core and crawling
> To: [email protected]
> Date: Tuesday, April 28, 2009, 10:54 AM
> the full web.
> 
> 2009/4/28 Dennis Kubes <[email protected]>
> 
> > Are you crawling only certain domains or doing a full
> web crawl?
> >
> > Dennis
> >
> >
> > Raymond Balme`s wrote:
> >
> >> Actually even if I put 100 threads it does no go
> faster, I have 30Mbit/s
> >> fiber internet connection so that shouldn't be
> the problem.
> >>
> >> I thought if I would put more threads I could
> fetch more sites in
> >> parrallel
> >> and so use more of the bandwidth & the CPU...
> so waiting on DNS should be
> >> seen.
> >> Or is it that I need run muliple fetchers in
> parallel, but I'm not sure
> >> how
> >> to do that and merge the results back at the end.
> >>
> >> -Ray-
> >>
> >> 2009/4/28 Dennis Kubes <[email protected]>
> >>
> >> Java Threads do take advantage of multiple cores. 
> The fetcher does use
> >>> multiple threads.  Also having multiple
> fetcher tasks on a single machine
> >>> will utilize more of the CPU.  Even with 50
> threads on a single machine,
> >>> depending on the websites being crawled the
> utilization might not get
> >>> that
> >>> much higher.  Much of the time spent in
> fetching is spent waiting on DNS
> >>> and
> >>> the websites being fetched.
> >>>
> >>> Dennis
> >>>
> >>>
> >>> Raymond Balmčs wrote:
> >>>
> >>> I use a dual core intel, I observed the crawls
> never gets above 50% mark
> >>>> CPU
> >>>> load, despite the fact that used -threads
> 50... does nutch take
> >>>> advantage
> >>>> of
> >>>> multi-cores ?
> >>>> Do I miss a setting somewhere ?
> >>>>
> >>>> -Ray-
> >>>>
> >>>>
> >>>>
> >>



Reply via email to