[ https://issues.apache.org/jira/browse/HADOOP-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eli Collins updated HADOOP-7076: -------------------------------- Target Version/s: 0.23.1 Fix Version/s: (was: 0.23.1) The javadoc warnings are unrelated. Filed HADOOP-7881. > Splittable Gzip > --------------- > > Key: HADOOP-7076 > URL: https://issues.apache.org/jira/browse/HADOOP-7076 > Project: Hadoop Common > Issue Type: New Feature > Components: io > Reporter: Niels Basjes > Assignee: Niels Basjes > Attachments: HADOOP-7076-2011-01-26.patch, > HADOOP-7076-2011-01-29.patch, HADOOP-7076-2011-02-05.patch, > HADOOP-7076-2011-02-06.patch, HADOOP-7076-2011-05-18.patch, > HADOOP-7076-2011-08-05-2255.patch, HADOOP-7076-2011-08-05-2315.patch, > HADOOP-7076-2011-12-04-2332.patch, HADOOP-7076.patch > > > Files compressed with the gzip codec are not splittable due to the nature of > the codec. > This limits the options you have scaling out when reading large gzipped input > files. > Given the fact that gunzipping a 1GiB file usually takes only 2 minutes I > figured that for some use cases wasting some resources may result in a > shorter job time under certain conditions. > So reading the entire input file from the start for each split (wasting > resources!!) may lead to additional scalability. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira