Are you crawling only certain domains or doing a full web crawl?

Dennis

Raymond Balme`s wrote:
Actually even if I put 100 threads it does no go faster, I have 30Mbit/s
fiber internet connection so that shouldn't be the problem.

I thought if I would put more threads I could fetch more sites in parrallel
and so use more of the bandwidth & the CPU... so waiting on DNS should be
seen.
Or is it that I need run muliple fetchers in parallel, but I'm not sure how
to do that and merge the results back at the end.

-Ray-

2009/4/28 Dennis Kubes <[email protected]>

Java Threads do take advantage of multiple cores.  The fetcher does use
multiple threads.  Also having multiple fetcher tasks on a single machine
will utilize more of the CPU.  Even with 50 threads on a single machine,
depending on the websites being crawled the utilization might not get that
much higher.  Much of the time spent in fetching is spent waiting on DNS and
the websites being fetched.

Dennis


Raymond Balmčs wrote:

I use a dual core intel, I observed the crawls never gets above 50% mark
CPU
load, despite the fact that used -threads 50... does nutch take advantage
of
multi-cores ?
Do I miss a setting somewhere ?

-Ray-



Reply via email to