Lewis John McGibbney created NUTCH-2148: -------------------------------------------
Summary: Review and update mapred --> mapreduce config params in crawl script Key: NUTCH-2148 URL: https://issues.apache.org/jira/browse/NUTCH-2148 Project: Nutch Issue Type: New Feature Components: bin Affects Versions: 1.10, 2.3.1 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 1.11, 2.3.1 Configuration parameters inside of $NUTCH_HOME/src/bin/crawl currently include {code} commonOptions="-D mapred.reduce.tasks=$numTasks -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true" {code} as well as {code} skipRecordsOptions="-D mapred.skip.attempts.to.start.skipping=2 -D mapred.skip.map.max.skip.records=1" __bin_nutch parse $commonOptions $skipRecordsOptions "$CRAWL_PATH"/segments/$SEGMENT {code} In all honesty as part of the upgrade to Hadoop 2.4.0, this should have been addressed!!! woops. -- This message was sent by Atlassian JIRA (v6.3.4#6332)