[ https://issues.apache.org/jira/browse/HADOOP-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592734#action_12592734 ]
Joydeep Sen Sarma commented on HADOOP-153: ------------------------------------------ hey folks - we are having a discussion on a similar jira (covering a smaller subset of issues) - 3144. we are actually hitting this problem (corrupted records causing OOM) and have a simple workaround specific to our problem. but i am a little intrigued by the proposal here. for the recordreader issues - why not, simply, let the record reader skip the bad record(s). as the discussions here mentions - there have to be additional api's in the record reader to be able to skip problematic records. If the framework trusts record readers to be able to skip bad records - why bother re-executing? why not allow them to detect and skip bad records on the very first try. if TT/JT want to keep track and impose a limit on the bad records skipped - they could ask the record reader to report the same through an api. the exceptions from map/reduce functions are different - if they make the entire task unstable due to OOM issues then a re-execution makes sense. but if we separate the two issues - we may have a more lightweight way of tolerating pure data corruption/validity issues (as we are trying to in 3144). > skip records that throw exceptions > ---------------------------------- > > Key: HADOOP-153 > URL: https://issues.apache.org/jira/browse/HADOOP-153 > Project: Hadoop Core > Issue Type: New Feature > Components: mapred > Affects Versions: 0.2.0 > Reporter: Doug Cutting > Assignee: Devaraj Das > > MapReduce should skip records that throw exceptions. > If the exception is thrown under RecordReader.next() then RecordReader > implementations should automatically skip to the start of a subsequent record. > Exceptions in map and reduce implementations can simply be logged, unless > they happen under RecordWriter.write(). Cancelling partial output could be > hard. So such output errors will still result in task failure. > This behaviour should be optional, but enabled by default. A count of errors > per task and job should be maintained and displayed in the web ui. Perhaps > if some percentage of records (>50%?) result in exceptions then the task > should fail. This would stop jobs early that are misconfigured or have buggy > code. > Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.