[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data
[ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755306#action_12755306 ] Hudson commented on MAPREDUCE-830: -- Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #6 (See [http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/6/]) . Add support for splittable compression to TextInputFormats. Contributed by Abdul Qadeer Providing BZip2 splitting support for Text data --- Key: MAPREDUCE-830 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Abdul Qadeer Assignee: Abdul Qadeer Fix For: 0.21.0 Attachments: M830-2.patch, M830-3.patch, M830-4.patch, M830-4.patch, MapReduce-830-version1.patch HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data
[ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753832#action_12753832 ] Hadoop QA commented on MAPREDUCE-830: - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12418869/M830-3.patch against trunk revision 813585. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The patch appears to cause tar ant target to fail. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/24/testReport/ Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/24/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h3.grid.sp2.yahoo.net/24/console This message is automatically generated. Providing BZip2 splitting support for Text data --- Key: MAPREDUCE-830 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Abdul Qadeer Assignee: Abdul Qadeer Fix For: 0.21.0 Attachments: M830-2.patch, M830-3.patch, MapReduce-830-version1.patch HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data
[ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753930#action_12753930 ] Hadoop QA commented on MAPREDUCE-830: - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419222/M830-4.patch against trunk revision 813585. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/59/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/59/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/59/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h6.grid.sp2.yahoo.net/59/console This message is automatically generated. Providing BZip2 splitting support for Text data --- Key: MAPREDUCE-830 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Abdul Qadeer Assignee: Abdul Qadeer Fix For: 0.21.0 Attachments: M830-2.patch, M830-3.patch, M830-4.patch, M830-4.patch, MapReduce-830-version1.patch HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data
[ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753979#action_12753979 ] Hudson commented on MAPREDUCE-830: -- Integrated in Hadoop-Mapreduce-trunk-Commit #30 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/30/]) . Add support for splittable compression to TextInputFormats. Contributed by Abdul Qadeer Providing BZip2 splitting support for Text data --- Key: MAPREDUCE-830 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Abdul Qadeer Assignee: Abdul Qadeer Fix For: 0.21.0 Attachments: M830-2.patch, M830-3.patch, M830-4.patch, M830-4.patch, MapReduce-830-version1.patch HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data
[ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12753999#action_12753999 ] Hudson commented on MAPREDUCE-830: -- Integrated in Hadoop-Hdfs-trunk-Commit #27 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/27/]) . Add support for splittable compression to TextInputFormats. Contributed by Abdul Qadeer Providing BZip2 splitting support for Text data --- Key: MAPREDUCE-830 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Abdul Qadeer Assignee: Abdul Qadeer Fix For: 0.21.0 Attachments: M830-2.patch, M830-3.patch, M830-4.patch, M830-4.patch, MapReduce-830-version1.patch HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data
[ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752304#action_12752304 ] Chris Douglas commented on MAPREDUCE-830: - (also includes a workaround for MAPREDUCE-959, which was getting irritating, and updates the unit tests to JUnit4 semantics) Providing BZip2 splitting support for Text data --- Key: MAPREDUCE-830 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Abdul Qadeer Assignee: Abdul Qadeer Fix For: 0.21.0 Attachments: M830-2.patch, M830-3.patch, MapReduce-830-version1.patch HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-830) Providing BZip2 splitting support for Text data
[ https://issues.apache.org/jira/browse/MAPREDUCE-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749333#action_12749333 ] Chris Douglas commented on MAPREDUCE-830: - (related comments in HADOOP-4012) * Though it's not changed in bzip, since {{getEnd}} is part of the API, it should be called in {{LineRecordReader}}. * Since the codec has state, the API demands that {{LineRecordReader}} synchronize on the codec before creating a splittable stream and calling {{getStart}} and {{getEnd}} to avoid race conditions (unless a better solution is found in HADOOP-4012) * The default dir for unit tests is usually /tmp, not . Providing BZip2 splitting support for Text data --- Key: MAPREDUCE-830 URL: https://issues.apache.org/jira/browse/MAPREDUCE-830 Project: Hadoop Map/Reduce Issue Type: Improvement Affects Versions: 0.21.0 Reporter: Abdul Qadeer Assignee: Abdul Qadeer Fix For: 0.21.0 Attachments: MapReduce-830-version1.patch HADOOP-4012 (https://issues.apache.org/jira/browse/HADOOP-4012) is providing support to handle BZip2 compressed data such that the input compressed file is split at arbitrary points. This JIRA uses that functionality in LineRecordReader. The benefit of this work is that, if user provides compressed BZip2 Text data, it will be split by Hadoop and hence will be processed by multiple mappers. So BZip2 compressed data will be able to fully utilize the cluster power. Currently BZip2 compressed Text file goes to one mapper and is not split. So the enhancement in this JIRA provides splitting support and a considerable performance gains. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.