[ 
https://issues.apache.org/jira/browse/HADOOP-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12548075
 ] 

Doug Cutting commented on HADOOP-1823:
--------------------------------------

Why did you need to modify Ant's bzip2 code?  Could it not be used as is?

I'd hate to have to copy this code into Hadoop.  We could create our own jar of 
it, extracted from ant's jar, or perhaps this would be an appropriate place to 
use subversion's "externals" feature.  We could link to a tagged version of the 
sources in Ant's tree.

I note there's also a commons project which has copied this code from ant, but 
it does not yet have any releases.  I guess we could include its nightly jar, 
since it is code that's already been released by Ant...

> want InputFormat for bzip2 files
> --------------------------------
>
>                 Key: HADOOP-1823
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1823
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Doug Cutting
>         Attachments: bzip2.jar
>
>
> Unlike gzip, the bzip file format supports splitting.  Compression is by 
> blocks (900k by default) and blocks are separated by a synchronization marker 
> (a 48-bit approximation of Pi).  This would permit very large compressed 
> files to be split into multiple map tasks, which is not currently possible 
> unless using a Hadoop-specific file format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to