I've found that the best speed increases I've made is in boosting the IO by upgrading the HD. I use an I-SCSI with a zfs file system. Another improvement is if your hardware handles lots of threads.
Alex --- On Tue, 4/28/09, Raymond Balmès <[email protected]> wrote: > From: Raymond Balmès <[email protected]> > Subject: Re: dual core and crawling > To: [email protected] > Date: Tuesday, April 28, 2009, 10:54 AM > the full web. > > 2009/4/28 Dennis Kubes <[email protected]> > > > Are you crawling only certain domains or doing a full > web crawl? > > > > Dennis > > > > > > Raymond Balme`s wrote: > > > >> Actually even if I put 100 threads it does no go > faster, I have 30Mbit/s > >> fiber internet connection so that shouldn't be > the problem. > >> > >> I thought if I would put more threads I could > fetch more sites in > >> parrallel > >> and so use more of the bandwidth & the CPU... > so waiting on DNS should be > >> seen. > >> Or is it that I need run muliple fetchers in > parallel, but I'm not sure > >> how > >> to do that and merge the results back at the end. > >> > >> -Ray- > >> > >> 2009/4/28 Dennis Kubes <[email protected]> > >> > >> Java Threads do take advantage of multiple cores. > The fetcher does use > >>> multiple threads. Also having multiple > fetcher tasks on a single machine > >>> will utilize more of the CPU. Even with 50 > threads on a single machine, > >>> depending on the websites being crawled the > utilization might not get > >>> that > >>> much higher. Much of the time spent in > fetching is spent waiting on DNS > >>> and > >>> the websites being fetched. > >>> > >>> Dennis > >>> > >>> > >>> Raymond Balmčs wrote: > >>> > >>> I use a dual core intel, I observed the crawls > never gets above 50% mark > >>>> CPU > >>>> load, despite the fact that used -threads > 50... does nutch take > >>>> advantage > >>>> of > >>>> multi-cores ? > >>>> Do I miss a setting somewhere ? > >>>> > >>>> -Ray- > >>>> > >>>> > >>>> > >>
