Well I also monitored the HD bandwidth pretty normal nothing really special, only during the crawldb merge phase there is a saturation but pretty short. What do you call lots of threads more than 100 ?
-Ray- 2009/4/28 Alex Basa <[email protected]> > > I've found that the best speed increases I've made is in boosting the IO by > upgrading the HD. I use an I-SCSI with a zfs file system. Another > improvement is if your hardware handles lots of threads. > > Alex > > > --- On Tue, 4/28/09, Raymond Balmès <[email protected]> wrote: > > > From: Raymond Balmès <[email protected]> > > Subject: Re: dual core and crawling > > To: [email protected] > > Date: Tuesday, April 28, 2009, 10:54 AM > > the full web. > > > > 2009/4/28 Dennis Kubes <[email protected]> > > > > > Are you crawling only certain domains or doing a full > > web crawl? > > > > > > Dennis > > > > > > > > > Raymond Balme`s wrote: > > > > > >> Actually even if I put 100 threads it does no go > > faster, I have 30Mbit/s > > >> fiber internet connection so that shouldn't be > > the problem. > > >> > > >> I thought if I would put more threads I could > > fetch more sites in > > >> parrallel > > >> and so use more of the bandwidth & the CPU... > > so waiting on DNS should be > > >> seen. > > >> Or is it that I need run muliple fetchers in > > parallel, but I'm not sure > > >> how > > >> to do that and merge the results back at the end. > > >> > > >> -Ray- > > >> > > >> 2009/4/28 Dennis Kubes <[email protected]> > > >> > > >> Java Threads do take advantage of multiple cores. > > The fetcher does use > > >>> multiple threads. Also having multiple > > fetcher tasks on a single machine > > >>> will utilize more of the CPU. Even with 50 > > threads on a single machine, > > >>> depending on the websites being crawled the > > utilization might not get > > >>> that > > >>> much higher. Much of the time spent in > > fetching is spent waiting on DNS > > >>> and > > >>> the websites being fetched. > > >>> > > >>> Dennis > > >>> > > >>> > > >>> Raymond Balmčs wrote: > > >>> > > >>> I use a dual core intel, I observed the crawls > > never gets above 50% mark > > >>>> CPU > > >>>> load, despite the fact that used -threads > > 50... does nutch take > > >>>> advantage > > >>>> of > > >>>> multi-cores ? > > >>>> Do I miss a setting somewhere ? > > >>>> > > >>>> -Ray- > > >>>> > > >>>> > > >>>> > > >> > > > >
