[ https://issues.apache.org/jira/browse/MAPREDUCE-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126971#comment-13126971 ]
Tom White commented on MAPREDUCE-1932: -------------------------------------- I wonder whether we want to add this to the new API, when we could instead suggest that people launch their own subprocess (as Owen suggests here: http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201108.mbox/%3ccafqou9ekv+sbvav-bsf5dorjo68vsj6ztqxywwut+qhs3v3...@mail.gmail.com%3e). As I understand it, the record skipping feature finds bad records by doing a binary search on the record range covered by a given task, so it has to re-run the task many times until the size of the window is below a given threshold. Also, I'm not sure how it copes with the case of multiple corrupted records in a single split. > record skipping doesn't work with the new map/reduce api > -------------------------------------------------------- > > Key: MAPREDUCE-1932 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1932 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task > Affects Versions: 0.20.1 > Reporter: Owen O'Malley > Assignee: Harsh J > Attachments: mapreduce.1932.skippingreader.r1.diff > > > The new HADOOP-1230 map/reduce api doesn't support the record skipping > features. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira