On Wed, Mar 11, 2009 at 4:41 PM, Sami Siren <ssi...@gmail.com> wrote:
> dayz...@gmail.com wrote: > >> Hi, >> >> If I want to run several parsers on a single quad-core machine >> simultaneously, would I still need to have Hadoop setup as a single-node >> cluster? >> > I think that the fetcher is currently the only component that can take > advantage of multiple cores when running in "local" mode. We should perhaps > address that at some point since it is not that hard to parallelize at least > some of the processing inside individual tools so single machine users could > benefit from multiple cores. But for parsing on a single machine with quad core, I can setup 2 map and 2 reduce jobs and get about 80-90% utilisation of 4 CPUs. When only the reduce jobs are left, I get about 50% utilisation, as expected (2/4). Do I actually not get a performance gain even though more CPUs are being utilised? Michael > > I am not sure but I think that the only way to do it properly is run > jobtracker and tasktracker on that machine and configure proper block sizes > & number of map and reduce tasks. > >> Can several updatedbs be run simultaneously? I believe not, since the db >> seems to be locked when it's being updated. >> > Locking prevents multiple applications of accessing crawl db simultaneously > (also linkdb). > > -- > Sami Siren > >