On Fri, 24 Feb 2012 15:43:10 GMT, Daniel Baptista wrote: >Hi All, > >I have a cluster of 6 datanodes, all running hadoop version 0.20.2, r911707 >that take a series of bzip2 compressed text files as input. > >I have read conflicting articles regarding whether or not hadoop can split >these bzip2 files, can anyone give me a definite answer? > >Thanks is advance, Dan.
Support for bzip2 splitting was only added in 0.21.0; see https://issues.apache.org/jira/browse/MAPREDUCE-830 You need to roll forward (or backport the patch) if you want bzip2 splitting. (And since 1.0.0 is a fork from 0.20-security, it also lacks bzip2 splitting, AFAIK. Hopefully some future 1.x will pick up more of the 0.21 features.) -John Heidemann