[ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12749333#action_12749333 ]
Chris Douglas commented on MAPREDUCE-830: ----------------------------------------- (related comments in HADOOP-4012) * Though it's not changed in bzip, since {{getEnd}} is part of the API, it should be called in {{LineRecordReader}}. * Since the codec has state, the API demands that {{LineRecordReader}} synchronize on the codec before creating a splittable stream and calling {{getStart}} and {{getEnd}} to avoid race conditions (unless a better solution is found in HADOOP-4012) * The default dir for unit tests is usually "/tmp", not "." > Providing BZip2 splitting support for Text data > ----------------------------------------------- > > Key: MAPREDUCE-830 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-830 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 0.21.0 > Reporter: Abdul Qadeer > Assignee: Abdul Qadeer > Fix For: 0.21.0 > > Attachments: MapReduce-830-version1.patch > > > HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing > support to handle BZip2 compressed data such that the input compressed file > is split at arbitrary points. This JIRA uses that functionality in > LineRecordReader. The benefit of this work is that, if user provides > compressed BZip2 Text data, it will be split by Hadoop and hence will be > processed by multiple mappers. So BZip2 compressed data will be able to > fully utilize the cluster power. Currently BZip2 compressed Text file goes > to one mapper and is not split. So the enhancement in this JIRA provides > splitting support and a considerable performance gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.