Actually even if I put 100 threads it does no go faster, I have 30Mbit/s fiber internet connection so that shouldn't be the problem.
I thought if I would put more threads I could fetch more sites in parrallel and so use more of the bandwidth & the CPU... so waiting on DNS should be seen. Or is it that I need run muliple fetchers in parallel, but I'm not sure how to do that and merge the results back at the end. -Ray- 2009/4/28 Dennis Kubes <[email protected]> > Java Threads do take advantage of multiple cores. The fetcher does use > multiple threads. Also having multiple fetcher tasks on a single machine > will utilize more of the CPU. Even with 50 threads on a single machine, > depending on the websites being crawled the utilization might not get that > much higher. Much of the time spent in fetching is spent waiting on DNS and > the websites being fetched. > > Dennis > > > Raymond Balmčs wrote: > >> I use a dual core intel, I observed the crawls never gets above 50% mark >> CPU >> load, despite the fact that used -threads 50... does nutch take advantage >> of >> multi-cores ? >> Do I miss a setting somewhere ? >> >> -Ray- >> >>
