I think at present only SequenceFiles can be compressed. 
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html
If you have plain text files, they are stored as is into blocks. You can store 
them as .gz and hadoop recognizes it and process the gz files. But its not 
splittable, meaning each map will consume whole of .gz
Thanks,
Lohit



----- Original Message ----
From: Michael K. Tung <[EMAIL PROTECTED]>
To: core-user@hadoop.apache.org
Sent: Friday, August 8, 2008 1:09:01 PM
Subject: How to enable compression of blockfiles?

Hello, I have a simple question.  How do I configure DFS to store compressed
block files?  I've noticed by looking at the "blk_" files that the text
documents I am storing are uncompressed.  Currently our hadoop deployment is
taking up 10x the diskspace as compared to our system before moving to
hadoop. I've tried modifying the io.seqfile.compress.blocksize option
without success and haven't been able to find anything online regarding
this.  Is there any way to do this or do I need to manually compress my data
before storing to HDFS?

Thanks,

Michael Tung

Reply via email to