the full web. 2009/4/28 Dennis Kubes <[email protected]>
> Are you crawling only certain domains or doing a full web crawl? > > Dennis > > > Raymond Balme`s wrote: > >> Actually even if I put 100 threads it does no go faster, I have 30Mbit/s >> fiber internet connection so that shouldn't be the problem. >> >> I thought if I would put more threads I could fetch more sites in >> parrallel >> and so use more of the bandwidth & the CPU... so waiting on DNS should be >> seen. >> Or is it that I need run muliple fetchers in parallel, but I'm not sure >> how >> to do that and merge the results back at the end. >> >> -Ray- >> >> 2009/4/28 Dennis Kubes <[email protected]> >> >> Java Threads do take advantage of multiple cores. The fetcher does use >>> multiple threads. Also having multiple fetcher tasks on a single machine >>> will utilize more of the CPU. Even with 50 threads on a single machine, >>> depending on the websites being crawled the utilization might not get >>> that >>> much higher. Much of the time spent in fetching is spent waiting on DNS >>> and >>> the websites being fetched. >>> >>> Dennis >>> >>> >>> Raymond Balmčs wrote: >>> >>> I use a dual core intel, I observed the crawls never gets above 50% mark >>>> CPU >>>> load, despite the fact that used -threads 50... does nutch take >>>> advantage >>>> of >>>> multi-cores ? >>>> Do I miss a setting somewhere ? >>>> >>>> -Ray- >>>> >>>> >>>> >>
