[ 
https://issues.apache.org/jira/browse/MAPREDUCE-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12872077#action_12872077
 ] 

Greg Roelofs commented on MAPREDUCE-469:
----------------------------------------

The bzip2 part reportedly is fixed on the trunk (HADOOP-4012); I haven't yet 
verified this for myself, but I have no reason to believe it doesn't work.

I'm working on half of the gzip half, i.e., the native-libraries portion.  I 
appear to have a working proof of concept, but my testing so far has been 
extremely minimal.  The java.util.zip portion could be addressed with something 
similar to Duncan Loveday's MultiMemberGZIPInputStream workaround 
(http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4691425), but the license  
on his actual code is unclear.  (On the other hand, he has an Apache account 
and apparently still works at BT, so it might be possible to get that 
clarified.)

Ravi, do you mind if I assign this issue to myself?

> Support concatenated gzip and bzip2 files
> -----------------------------------------
>
>                 Key: MAPREDUCE-469
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-469
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Tom White
>            Assignee: Ravi Gummadi
>
> When running MapReduce with concatenated gzip files as input only the first 
> part is read, which is confusing, to say the least. Concatenated gzip is 
> described in http://www.gnu.org/software/gzip/manual/gzip.html#Advanced-usage 
> and in http://www.ietf.org/rfc/rfc1952.txt. (See original report at 
> http://www.nabble.com/Problem-with-Hadoop-and-concatenated-gzip-files-to21383097.html)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to