I have no idea, I'm using the default configuration... so probably yes I only have one DNS server. Will look into it.
-Raymond- 2009/4/29 Dennis Kubes <[email protected]> > How are you doing your DNS for fetching? If you have a single server > handling DNS requests for you, you may be overloading it and causing a > inadvertent DOS attack on it. > > Dennis > > > Raymond Balmès wrote: > >> I think I'm just hitting the same issues as reported in this other thread >> and since I'm new to nutch I can't compare >> http://www.mail-archive.com/[email protected]/msg13665.html >> by Doğacan >> Güne >> >> At the beginning of a fetch cycle everything is going damn fast and then >> it >> keeps slowing down and "spinwainting", something looks wrong. >> >> -Ray- >> >> >> >> >> 2009/4/28 Raymond Balmès <[email protected]> >> >> Well I also monitored the HD bandwidth pretty normal nothing really >>> special, only during the crawldb merge phase there is a saturation but >>> pretty short. >>> What do you call lots of threads more than 100 ? >>> >>> -Ray- >>> >>> 2009/4/28 Alex Basa <[email protected]> >>> >>> >>> I've found that the best speed increases I've made is in boosting the IO >>>> by upgrading the HD. I use an I-SCSI with a zfs file system. Another >>>> improvement is if your hardware handles lots of threads. >>>> >>>> Alex >>>> >>>> >>>> --- On Tue, 4/28/09, Raymond Balmès <[email protected]> wrote: >>>> >>>> From: Raymond Balmès <[email protected]> >>>>> Subject: Re: dual core and crawling >>>>> To: [email protected] >>>>> Date: Tuesday, April 28, 2009, 10:54 AM >>>>> >>>> > the full web. >>>> >>>>> 2009/4/28 Dennis Kubes <[email protected]> >>>>> >>>>> Are you crawling only certain domains or doing a full >>>>>> >>>>> web crawl? >>>>> >>>>>> Dennis >>>>>> >>>>>> >>>>>> Raymond Balme`s wrote: >>>>>> >>>>>> Actually even if I put 100 threads it does no go >>>>>>> >>>>>> faster, I have 30Mbit/s >>>>> >>>>>> fiber internet connection so that shouldn't be >>>>>>> >>>>>> the problem. >>>>> >>>>>> I thought if I would put more threads I could >>>>>>> >>>>>> fetch more sites in >>>>> >>>>>> parrallel >>>>>>> and so use more of the bandwidth & the CPU... >>>>>>> >>>>>> so waiting on DNS should be >>>>> >>>>>> seen. >>>>>>> Or is it that I need run muliple fetchers in >>>>>>> >>>>>> parallel, but I'm not sure >>>>> >>>>>> how >>>>>>> to do that and merge the results back at the end. >>>>>>> >>>>>>> -Ray- >>>>>>> >>>>>>> 2009/4/28 Dennis Kubes <[email protected]> >>>>>>> >>>>>>> Java Threads do take advantage of multiple cores. >>>>>>> >>>>>> The fetcher does use >>>>> >>>>>> multiple threads. Also having multiple >>>>>>>> >>>>>>> fetcher tasks on a single machine >>>>> >>>>>> will utilize more of the CPU. Even with 50 >>>>>>>> >>>>>>> threads on a single machine, >>>>> >>>>>> depending on the websites being crawled the >>>>>>>> >>>>>>> utilization might not get >>>>> >>>>>> that >>>>>>>> much higher. Much of the time spent in >>>>>>>> >>>>>>> fetching is spent waiting on DNS >>>>> >>>>>> and >>>>>>>> the websites being fetched. >>>>>>>> >>>>>>>> Dennis >>>>>>>> >>>>>>>> >>>>>>>> Raymond Balmčs wrote: >>>>>>>> >>>>>>>> I use a dual core intel, I observed the crawls >>>>>>>> >>>>>>> never gets above 50% mark >>>>> >>>>>> CPU >>>>>>>>> load, despite the fact that used -threads >>>>>>>>> >>>>>>>> 50... does nutch take >>>>> >>>>>> advantage >>>>>>>>> of >>>>>>>>> multi-cores ? >>>>>>>>> Do I miss a setting somewhere ? >>>>>>>>> >>>>>>>>> -Ray- >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>> >>>> >>>> >>
