[ https://issues.apache.org/jira/browse/HADOOP-1823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Utkarsh Srivastava updated HADOOP-1823: --------------------------------------- Attachment: bzip2.jar bzip2 input format , required libraries and a test case > want InputFormat for bzip2 files > -------------------------------- > > Key: HADOOP-1823 > URL: https://issues.apache.org/jira/browse/HADOOP-1823 > Project: Hadoop > Issue Type: New Feature > Components: mapred > Reporter: Doug Cutting > Attachments: bzip2.jar > > > Unlike gzip, the bzip file format supports splitting. Compression is by > blocks (900k by default) and blocks are separated by a synchronization marker > (a 48-bit approximation of Pi). This would permit very large compressed > files to be split into multiple map tasks, which is not currently possible > unless using a Hadoop-specific file format. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.