FetcherJob should run more reduce tasks than default ----------------------------------------------------
Key: NUTCH-884 URL: https://issues.apache.org/jira/browse/NUTCH-884 Project: Nutch Issue Type: Improvement Components: fetcher Affects Versions: 2.0 Reporter: Andrzej Bialecki Assignee: Andrzej Bialecki Fix For: 2.0 FetcherJob now performs fetching in the reduce phase. This means that in a typical Hadoop setup there will be many fewer reduce tasks than map tasks, and consequently the max. total throughput of Fetcher will be proportionally reduced. I propose that FetcherJob should set the number of reduce tasks to the number of map tasks. This way the fetching will be more granular. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.