I think at present only SequenceFiles can be compressed. http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html If you have plain text files, they are stored as is into blocks. You can store them as .gz and hadoop recognizes it and process the gz files. But its not splittable, meaning each map will consume whole of .gz Thanks, Lohit
----- Original Message ---- From: Michael K. Tung <[EMAIL PROTECTED]> To: core-user@hadoop.apache.org Sent: Friday, August 8, 2008 1:09:01 PM Subject: How to enable compression of blockfiles? Hello, I have a simple question. How do I configure DFS to store compressed block files? I've noticed by looking at the "blk_" files that the text documents I am storing are uncompressed. Currently our hadoop deployment is taking up 10x the diskspace as compared to our system before moving to hadoop. I've tried modifying the io.seqfile.compress.blocksize option without success and haven't been able to find anything online regarding this. Is there any way to do this or do I need to manually compress my data before storing to HDFS? Thanks, Michael Tung