On Fri, 24 Feb 2012 15:43:10 GMT, Daniel Baptista wrote: 
>Hi All,
>
>I have a cluster of 6 datanodes, all running hadoop version 0.20.2, r911707 
>that take a series of bzip2 compressed text files as input.
>
>I have read conflicting articles regarding whether or not hadoop can split 
>these bzip2 files, can anyone give me a definite answer?
>
>Thanks is advance, Dan.

Support for bzip2 splitting was only added in 0.21.0; see 
https://issues.apache.org/jira/browse/MAPREDUCE-830

You need to roll forward (or backport the patch) if you want bzip2
splitting.

(And since 1.0.0 is a fork from 0.20-security, it also lacks bzip2
splitting, AFAIK.  Hopefully some future 1.x will pick up more of the
0.21 features.)

   -John Heidemann

Reply via email to