Lewis John McGibbney created NUTCH-2148:
-------------------------------------------

             Summary: Review and update mapred --> mapreduce config params in 
crawl script
                 Key: NUTCH-2148
                 URL: https://issues.apache.org/jira/browse/NUTCH-2148
             Project: Nutch
          Issue Type: New Feature
          Components: bin
    Affects Versions: 1.10, 2.3.1
            Reporter: Lewis John McGibbney
            Assignee: Lewis John McGibbney
             Fix For: 1.11, 2.3.1


Configuration parameters inside of $NUTCH_HOME/src/bin/crawl currently include
{code}
commonOptions="-D mapred.reduce.tasks=$numTasks -D 
mapred.child.java.opts=-Xmx1000m -D 
mapred.reduce.tasks.speculative.execution=false -D 
mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true"
{code}
as well as
{code}
  skipRecordsOptions="-D mapred.skip.attempts.to.start.skipping=2 -D 
mapred.skip.map.max.skip.records=1"
  __bin_nutch parse $commonOptions $skipRecordsOptions 
"$CRAWL_PATH"/segments/$SEGMENT
{code}
In all honesty as part of the upgrade to Hadoop 2.4.0, this should have been 
addressed!!! woops.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to