[ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12753999#action_12753999 ]
Hudson commented on MAPREDUCE-830: ---------------------------------- Integrated in Hadoop-Hdfs-trunk-Commit #27 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/27/]) . Add support for splittable compression to TextInputFormats. Contributed by Abdul Qadeer > Providing BZip2 splitting support for Text data > ----------------------------------------------- > > Key: MAPREDUCE-830 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-830 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Affects Versions: 0.21.0 > Reporter: Abdul Qadeer > Assignee: Abdul Qadeer > Fix For: 0.21.0 > > Attachments: M830-2.patch, M830-3.patch, M830-4.patch, M830-4.patch, > MapReduce-830-version1.patch > > > HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing > support to handle BZip2 compressed data such that the input compressed file > is split at arbitrary points. This JIRA uses that functionality in > LineRecordReader. The benefit of this work is that, if user provides > compressed BZip2 Text data, it will be split by Hadoop and hence will be > processed by multiple mappers. So BZip2 compressed data will be able to > fully utilize the cluster power. Currently BZip2 compressed Text file goes > to one mapper and is not split. So the enhancement in this JIRA provides > splitting support and a considerable performance gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.