[
https://issues.apache.org/jira/browse/HADOOP-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549164
]
Doug Cutting commented on HADOOP-1823:
--------------------------------------
> Nothing is byte aligned.
I see. Sigh. Okay, then I think we should probably pursue two avenues:
- commit a forked version (using 'svn cp', ideally)
- submit a patch to ant, adding support for scanning
The current modified version adds a new ctor that takes a blockSize parameter.
That seems strange. Would anyone ever really pass anything but 9 for that
parameter? Mightn't we instead pass a boolean, 'scanForNextBlock' or somesuch?
> want InputFormat for bzip2 files
> --------------------------------
>
> Key: HADOOP-1823
> URL: https://issues.apache.org/jira/browse/HADOOP-1823
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Reporter: Doug Cutting
> Attachments: bzip2.jar
>
>
> Unlike gzip, the bzip file format supports splitting. Compression is by
> blocks (900k by default) and blocks are separated by a synchronization marker
> (a 48-bit approximation of Pi). This would permit very large compressed
> files to be split into multiple map tasks, which is not currently possible
> unless using a Hadoop-specific file format.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.