[jira] [Commented] (MAPREDUCE-1932) record skipping doesn't work with the new map/reduce api

Tom White (Commented) (JIRA) Thu, 13 Oct 2011 14:43:37 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-1932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13126971#comment-13126971
 ]


Tom White commented on MAPREDUCE-1932:
--------------------------------------

I wonder whether we want to add this to the new API, when we could instead 
suggest that people launch their own subprocess (as Owen suggests here: 
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201108.mbox/%3ccafqou9ekv+sbvav-bsf5dorjo68vsj6ztqxywwut+qhs3v3...@mail.gmail.com%3e).

As I understand it, the record skipping feature finds bad records by doing a 
binary search on the record range covered by a given task, so it has to re-run 
the task many times until the size of the window is below a given threshold. 
Also, I'm not sure how it copes with the case of multiple corrupted records in 
a single split.
                
> record skipping doesn't work with the new map/reduce api
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-1932
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1932
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.20.1
>            Reporter: Owen O'Malley
>            Assignee: Harsh J
>         Attachments: mapreduce.1932.skippingreader.r1.diff
>
>
> The new HADOOP-1230 map/reduce api doesn't support the record skipping 
> features.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-1932) record skipping doesn't work with the new map/reduce api

Reply via email to