Hi Daniel, Bzip2 compression codec allows for splittable files.
According to this Hadoop JIRA improvement, splitting of bzip2 compressed files in Hadoop jobs is supported: https://issues.apache.org/jira/browse/HADOOP-4012 -- Rohit Bakhshi www.hortonworks.com (http://www.hortonworks.com/) On Friday, February 24, 2012 at 7:43 AM, Daniel Baptista wrote: > Hi All, > > I have a cluster of 6 datanodes, all running hadoop version 0.20.2, r911707 > that take a series of bzip2 compressed text files as input. > > I have read conflicting articles regarding whether or not hadoop can split > these bzip2 files, can anyone give me a definite answer? > > Thanks is advance, Dan.