[ 
https://issues.apache.org/jira/browse/HADOOP-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abdul Qadeer updated HADOOP-3646:
---------------------------------

    Attachment: HADOOP-3646version3.patch

This patch tries to correct bugs reported by findbug.  I have left one warning 
un-resolved.  The warning is "MS_OOI_PKGPROTECT:Field should be moved out of an 
interface and made package protected."  This warning is arising from Ant BZip2 
code.  As discussed in https://issues.apache.org/jira/browse/HADOOP-1823  we 
are using this bzip2 code for short term.  This warning along with all the 
splitting support requirements will be posted to Ant JIRA so that we could 
later remove this short term copy of bzip2 and import newer bzip2 code as an 
external jar file.

> Providing bzip2 as codec
> ------------------------
>
>                 Key: HADOOP-3646
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3646
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: conf, io
>    Affects Versions: 0.19.0
>            Reporter: Abdul Qadeer
>            Assignee: Abdul Qadeer
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-3646.patch, HADOOP-3646.patch, 
> HADOOP-3646version3.patch
>
>   Original Estimate: 1008h
>  Remaining Estimate: 1008h
>
> Hadoop recognizes gzip compressed input and automatically decompresses the 
> data before providing it to the mapper. But Hadoop can not split a gzip 
> stream due to the very nature of the gzip compression. Consequently one gzip 
> stream (e.g a whole file) can go to only one mapper.  On the contrary Bzip2 
> compressed stream can be split across its block delimiters.
> We are interested in extending Hadoop to support splittable bzip2 with a 
> codec.  (https://issues.apache.org/jira/browse/HADOOP-1823  uses input reader 
> to split the bzip2 files, which must be provided by the user and can handle 
> FileInputFormat.  If a user wants to use some other input format or wants to 
> do custom record handling, he must write a new input reader!)
> We have a patch now that provides a basic bzip2 codec equivalent to the 
> current gzip codec.  We are in the process of extending that to support 
> splitting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to