Are you fetching urls from a random set or all from a single domain? If all from a single domain (including subdomains) then the partitionar for fetcher will put them all into a single map task.

Dennis

caezar wrote:
Hi All,

I've got 15 machines in hadoop farm. 'mapred.map.tasks' and
'mapred.reduce.tasks' set to 15.

But at any moment of executing the fetch job there is always some number of
map tasks completed, one map is running and all other map tasks pending. So
why only one task is running? Is this a correct behavior? Or may be I've
missed something in the configuration?
When other jobs executed - there is always several tasks running
simultaneously.

Nutch version is 1.0.

Besides - what is the best setting for 'fetcher.threads.fetch'?

Reply via email to