[ http://issues.apache.org/jira/browse/HADOOP-153?page=comments#action_12375455 ]
eric baldeschwieler commented on HADOOP-153: -------------------------------------------- sounds good. The acceptable % should probably be configurable. I'd be inclined to use something more like 1%. You could turn the feature off by configuring 0%, which should arguably be the default. > skip records that throw exceptions > ---------------------------------- > > Key: HADOOP-153 > URL: http://issues.apache.org/jira/browse/HADOOP-153 > Project: Hadoop > Type: New Feature > Components: mapred > Versions: 0.2 > Reporter: Doug Cutting > Assignee: Doug Cutting > Fix For: 0.2 > > MapReduce should skip records that throw exceptions. > If the exception is thrown under RecordReader.next() then RecordReader > implementations should automatically skip to the start of a subsequent record. > Exceptions in map and reduce implementations can simply be logged, unless > they happen under RecordWriter.write(). Cancelling partial output could be > hard. So such output errors will still result in task failure. > This behaviour should be optional, but enabled by default. A count of errors > per task and job should be maintained and displayed in the web ui. Perhaps > if some percentage of records (>50%?) result in exceptions then the task > should fail. This would stop jobs early that are misconfigured or have buggy > code. > Thoughts? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira