I have no idea, I'm using the default configuration... so probably yes I
only  have one DNS server. Will look into it.

-Raymond-

2009/4/29 Dennis Kubes <[email protected]>

> How are you doing your DNS for fetching?  If you have a single server
> handling DNS requests for you, you may be overloading it and causing a
> inadvertent DOS attack on it.
>
> Dennis
>
>
> Raymond Balmès wrote:
>
>> I think I'm just hitting the same issues as reported in this other thread
>> and since I'm new to nutch I can't compare
>> http://www.mail-archive.com/[email protected]/msg13665.html
>> by Doğacan
>> Güne
>>
>> At the beginning of a fetch cycle everything is going damn fast and then
>> it
>> keeps slowing down and "spinwainting", something looks wrong.
>>
>> -Ray-
>>
>>
>>
>>
>> 2009/4/28 Raymond Balmès <[email protected]>
>>
>> Well I also monitored the HD bandwidth pretty normal nothing really
>>> special, only during the crawldb merge phase there is a saturation but
>>> pretty short.
>>> What do you call lots of threads more than 100 ?
>>>
>>> -Ray-
>>>
>>> 2009/4/28 Alex Basa <[email protected]>
>>>
>>>
>>> I've found that the best speed increases I've made is in boosting the IO
>>>> by upgrading the HD.  I use an I-SCSI with a zfs file system.  Another
>>>> improvement is if your hardware handles lots of threads.
>>>>
>>>> Alex
>>>>
>>>>
>>>> --- On Tue, 4/28/09, Raymond Balmès <[email protected]> wrote:
>>>>
>>>> From: Raymond Balmès <[email protected]>
>>>>> Subject: Re: dual core and crawling
>>>>> To: [email protected]
>>>>> Date: Tuesday, April 28, 2009, 10:54 AM
>>>>>
>>>>  > the full web.
>>>>
>>>>> 2009/4/28 Dennis Kubes <[email protected]>
>>>>>
>>>>> Are you crawling only certain domains or doing a full
>>>>>>
>>>>> web crawl?
>>>>>
>>>>>> Dennis
>>>>>>
>>>>>>
>>>>>> Raymond Balme`s wrote:
>>>>>>
>>>>>> Actually even if I put 100 threads it does no go
>>>>>>>
>>>>>> faster, I have 30Mbit/s
>>>>>
>>>>>> fiber internet connection so that shouldn't be
>>>>>>>
>>>>>> the problem.
>>>>>
>>>>>> I thought if I would put more threads I could
>>>>>>>
>>>>>> fetch more sites in
>>>>>
>>>>>> parrallel
>>>>>>> and so use more of the bandwidth & the CPU...
>>>>>>>
>>>>>> so waiting on DNS should be
>>>>>
>>>>>> seen.
>>>>>>> Or is it that I need run muliple fetchers in
>>>>>>>
>>>>>> parallel, but I'm not sure
>>>>>
>>>>>> how
>>>>>>> to do that and merge the results back at the end.
>>>>>>>
>>>>>>> -Ray-
>>>>>>>
>>>>>>> 2009/4/28 Dennis Kubes <[email protected]>
>>>>>>>
>>>>>>> Java Threads do take advantage of multiple cores.
>>>>>>>
>>>>>> The fetcher does use
>>>>>
>>>>>>  multiple threads.  Also having multiple
>>>>>>>>
>>>>>>> fetcher tasks on a single machine
>>>>>
>>>>>>  will utilize more of the CPU.  Even with 50
>>>>>>>>
>>>>>>> threads on a single machine,
>>>>>
>>>>>>  depending on the websites being crawled the
>>>>>>>>
>>>>>>> utilization might not get
>>>>>
>>>>>>  that
>>>>>>>> much higher.  Much of the time spent in
>>>>>>>>
>>>>>>> fetching is spent waiting on DNS
>>>>>
>>>>>>  and
>>>>>>>> the websites being fetched.
>>>>>>>>
>>>>>>>> Dennis
>>>>>>>>
>>>>>>>>
>>>>>>>> Raymond Balmčs wrote:
>>>>>>>>
>>>>>>>> I use a dual core intel, I observed the crawls
>>>>>>>>
>>>>>>> never gets above 50% mark
>>>>>
>>>>>>  CPU
>>>>>>>>> load, despite the fact that used -threads
>>>>>>>>>
>>>>>>>> 50... does nutch take
>>>>>
>>>>>>  advantage
>>>>>>>>> of
>>>>>>>>> multi-cores ?
>>>>>>>>> Do I miss a setting somewhere ?
>>>>>>>>>
>>>>>>>>> -Ray-
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>
>>>>
>>>>
>>

Reply via email to