[ https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Greg Roelofs updated MAPREDUCE-469: ----------------------------------- Attachment: MR-469.v2.yahoo-0.20.2xx-branch.patch Expanded test coverage uncovered a bug on Friday, and trunk update today has breakage, so this version is against Yahoo's 0.20S+ branch. Still not quite final; I haven't finished updating the unit test to exercise both native and built-in gzip and built-in bzip2 at multiple buffer sizes, and I've left some (mostly) commented-out debug statements in place in case that turns up anything further. Reviewer questions: - Currently the new BuiltInGzipDecompressor class inherits directly from JDK Inflater, but I suspect I should extend BuiltInZlibInflater instead. - Is it worthwhile to encapsulate the state label and associated variables into a private inner class (BuiltInGzipDecompressor.java, first FIXME comment)? The other FIXMEs are either related to the two items above or else are largely unrelated to this issue. > Support concatenated gzip and bzip2 files > ----------------------------------------- > > Key: MAPREDUCE-469 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-469 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Reporter: Tom White > Assignee: Greg Roelofs > Attachments: grr-hadoop-common.dif.20100614c, > grr-hadoop-mapreduce.dif.20100614c, MR-469.v2.yahoo-0.20.2xx-branch.patch > > > When running MapReduce with concatenated gzip files as input only the first > part is read, which is confusing, to say the least. Concatenated gzip is > described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage > and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at > http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.