[jira] Created: (HADOOP-1706) recovery after sybchronous Mapper failures on some records

arkady borkovsky (JIRA) Fri, 10 Aug 2007 12:04:05 -0700

recovery after sybchronous Mapper failures on some records
----------------------------------------------------------


                 Key: HADOOP-1706
                 URL: https://issues.apache.org/jira/browse/HADOOP-1706
             Project: Hadoop
          Issue Type: New Feature
          Components: contrib/streaming
            Reporter: arkady borkovsky


Ii is sometimes hard or impossible to make sure that the Mapper reacts 
correctly to all the errors in the input data -- especially when reusing legacy 
or 3rd party code.
It would be nice if Streaming infrastructure had the following feature:
   * check the exit code of the mapper command; 
   * if the command has crashed 
      * log the record that was processed during the failure to the error log 
      * restart the command
      * feed it the remainder of the input 
This way most of the data gets processed.

This feature should be disabled by default -- the user should explicitly 
specify how many faults are allowed per task.
Once the number is exceeded, the whole job should fail without retries.

BTW: this functionality was described in the original MapReduce paper.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-1706) recovery after sybchronous Mapper failures on some records

Reply via email to