Well I also monitored the HD bandwidth pretty normal nothing really special,
only during the crawldb merge phase there is a saturation but pretty short.
What do you call lots of threads more than 100 ?

-Ray-

2009/4/28 Alex Basa <[email protected]>

>
> I've found that the best speed increases I've made is in boosting the IO by
> upgrading the HD.  I use an I-SCSI with a zfs file system.  Another
> improvement is if your hardware handles lots of threads.
>
> Alex
>
>
> --- On Tue, 4/28/09, Raymond Balmès <[email protected]> wrote:
>
> > From: Raymond Balmès <[email protected]>
> > Subject: Re: dual core and crawling
> > To: [email protected]
> > Date: Tuesday, April 28, 2009, 10:54 AM
>  > the full web.
> >
> > 2009/4/28 Dennis Kubes <[email protected]>
> >
> > > Are you crawling only certain domains or doing a full
> > web crawl?
> > >
> > > Dennis
> > >
> > >
> > > Raymond Balme`s wrote:
> > >
> > >> Actually even if I put 100 threads it does no go
> > faster, I have 30Mbit/s
> > >> fiber internet connection so that shouldn't be
> > the problem.
> > >>
> > >> I thought if I would put more threads I could
> > fetch more sites in
> > >> parrallel
> > >> and so use more of the bandwidth & the CPU...
> > so waiting on DNS should be
> > >> seen.
> > >> Or is it that I need run muliple fetchers in
> > parallel, but I'm not sure
> > >> how
> > >> to do that and merge the results back at the end.
> > >>
> > >> -Ray-
> > >>
> > >> 2009/4/28 Dennis Kubes <[email protected]>
> > >>
> > >> Java Threads do take advantage of multiple cores.
> > The fetcher does use
> > >>> multiple threads.  Also having multiple
> > fetcher tasks on a single machine
> > >>> will utilize more of the CPU.  Even with 50
> > threads on a single machine,
> > >>> depending on the websites being crawled the
> > utilization might not get
> > >>> that
> > >>> much higher.  Much of the time spent in
> > fetching is spent waiting on DNS
> > >>> and
> > >>> the websites being fetched.
> > >>>
> > >>> Dennis
> > >>>
> > >>>
> > >>> Raymond Balmčs wrote:
> > >>>
> > >>> I use a dual core intel, I observed the crawls
> > never gets above 50% mark
> > >>>> CPU
> > >>>> load, despite the fact that used -threads
> > 50... does nutch take
> > >>>> advantage
> > >>>> of
> > >>>> multi-cores ?
> > >>>> Do I miss a setting somewhere ?
> > >>>>
> > >>>> -Ray-
> > >>>>
> > >>>>
> > >>>>
> > >>
>
>
>
>

Reply via email to