Re: Hadoop and .tgz files

2008-12-02 Thread John Heidemann
On Mon, 01 Dec 2008 12:16:28 EST, Ryan LeCompte wrote: 
I believe I spoke a little too soon. Looks like Hadoop supports .gz
files, not .tgz. :-)


On Mon, Dec 1, 2008 at 10:46 AM, Ryan LeCompte [EMAIL PROTECTED] wrote:
 Hello all,

 I'm using Hadoop 0.19 and just discovered that it has no problems
 processing .tgz files that contain text files. I was under the
 impression that it wouldn't be able to break a .tgz file up into
 multiple maps, but instead just treat it as 1 map per .tgz file. Was
 this a recent change or enhancement? I'm noticing that it is breaking
 up the .tgz file into multiple maps.

 Thanks,
 Ryan



Work is in progress to support splitting of .bz2 files.
See  http://issues.apache.org/jira/browse/HADOOP-4012

I don't believe splitting of .tgz files is possible, something
compressed with gzip can only be uncompressed from the beginning.

   -John Heidemann



Hadoop and .tgz files

2008-12-01 Thread Ryan LeCompte
Hello all,

I'm using Hadoop 0.19 and just discovered that it has no problems
processing .tgz files that contain text files. I was under the
impression that it wouldn't be able to break a .tgz file up into
multiple maps, but instead just treat it as 1 map per .tgz file. Was
this a recent change or enhancement? I'm noticing that it is breaking
up the .tgz file into multiple maps.

Thanks,
Ryan


Re: Hadoop and .tgz files

2008-12-01 Thread Ryan LeCompte
I believe I spoke a little too soon. Looks like Hadoop supports .gz
files, not .tgz. :-)


On Mon, Dec 1, 2008 at 10:46 AM, Ryan LeCompte [EMAIL PROTECTED] wrote:
 Hello all,

 I'm using Hadoop 0.19 and just discovered that it has no problems
 processing .tgz files that contain text files. I was under the
 impression that it wouldn't be able to break a .tgz file up into
 multiple maps, but instead just treat it as 1 map per .tgz file. Was
 this a recent change or enhancement? I'm noticing that it is breaking
 up the .tgz file into multiple maps.

 Thanks,
 Ryan