the full web.

2009/4/28 Dennis Kubes <[email protected]>

> Are you crawling only certain domains or doing a full web crawl?
>
> Dennis
>
>
> Raymond Balme`s wrote:
>
>> Actually even if I put 100 threads it does no go faster, I have 30Mbit/s
>> fiber internet connection so that shouldn't be the problem.
>>
>> I thought if I would put more threads I could fetch more sites in
>> parrallel
>> and so use more of the bandwidth & the CPU... so waiting on DNS should be
>> seen.
>> Or is it that I need run muliple fetchers in parallel, but I'm not sure
>> how
>> to do that and merge the results back at the end.
>>
>> -Ray-
>>
>> 2009/4/28 Dennis Kubes <[email protected]>
>>
>> Java Threads do take advantage of multiple cores.  The fetcher does use
>>> multiple threads.  Also having multiple fetcher tasks on a single machine
>>> will utilize more of the CPU.  Even with 50 threads on a single machine,
>>> depending on the websites being crawled the utilization might not get
>>> that
>>> much higher.  Much of the time spent in fetching is spent waiting on DNS
>>> and
>>> the websites being fetched.
>>>
>>> Dennis
>>>
>>>
>>> Raymond Balmčs wrote:
>>>
>>> I use a dual core intel, I observed the crawls never gets above 50% mark
>>>> CPU
>>>> load, despite the fact that used -threads 50... does nutch take
>>>> advantage
>>>> of
>>>> multi-cores ?
>>>> Do I miss a setting somewhere ?
>>>>
>>>> -Ray-
>>>>
>>>>
>>>>
>>

Reply via email to