Hi Daniel, 

Bzip2 compression codec allows for splittable files.

According to this Hadoop JIRA improvement, splitting of bzip2 compressed files 
in Hadoop jobs is supported:
https://issues.apache.org/jira/browse/HADOOP-4012

-- 
Rohit Bakhshi
www.hortonworks.com (http://www.hortonworks.com/)




On Friday, February 24, 2012 at 7:43 AM, Daniel Baptista wrote:

> Hi All,
> 
> I have a cluster of 6 datanodes, all running hadoop version 0.20.2, r911707 
> that take a series of bzip2 compressed text files as input.
> 
> I have read conflicting articles regarding whether or not hadoop can split 
> these bzip2 files, can anyone give me a definite answer?
> 
> Thanks is advance, Dan. 

Reply via email to