Re: How to enable compression of blockfiles?

2008-08-08 Thread lohit
I think at present only SequenceFiles can be compressed. 
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html
If you have plain text files, they are stored as is into blocks. You can store 
them as .gz and hadoop recognizes it and process the gz files. But its not 
splittable, meaning each map will consume whole of .gz
Thanks,
Lohit



- Original Message 
From: Michael K. Tung <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, August 8, 2008 1:09:01 PM
Subject: How to enable compression of blockfiles?

Hello, I have a simple question.  How do I configure DFS to store compressed
block files?  I've noticed by looking at the "blk_" files that the text
documents I am storing are uncompressed.  Currently our hadoop deployment is
taking up 10x the diskspace as compared to our system before moving to
hadoop. I've tried modifying the io.seqfile.compress.blocksize option
without success and haven't been able to find anything online regarding
this.  Is there any way to do this or do I need to manually compress my data
before storing to HDFS?

Thanks,

Michael Tung


How to enable compression of blockfiles?

2008-08-08 Thread Michael K. Tung
Hello, I have a simple question.  How do I configure DFS to store compressed
block files?  I've noticed by looking at the "blk_" files that the text
documents I am storing are uncompressed.  Currently our hadoop deployment is
taking up 10x the diskspace as compared to our system before moving to
hadoop. I've tried modifying the io.seqfile.compress.blocksize option
without success and haven't been able to find anything online regarding
this.  Is there any way to do this or do I need to manually compress my data
before storing to HDFS?

Thanks,

Michael Tung