Sequence Block compression happens on smaller chunks (around 1MB I think) so the compression ration would be smaller than compressing complete file.
________________________________ From: Saurabh Nanda <saurabhna...@gmail.com> Reply-To: <hive-user@hadoop.apache.org> Date: Mon, 27 Jul 2009 08:38:08 -0700 To: <hive-user@hadoop.apache.org> Subject: Re: Re: bz2 Splits. #2 Compressed logs in textfile tables: 60sec (filesize of 736 MB over 8 compressed files) #3 Compressed logs in sequencefile tables: 101sec (filesize of 4,773 MB over 126 compressed files) Why is there such a *big* difference in compression ratios between the gzip utility and Hive? Uncompressed file size: approx 3500 MB Gzip utility: approx 250 MB org.apache.hadoop.io.compress.GzipCodec (BLOCK): approx 1600 MB org.apache.hadoop.io.compress.DefaultCodec (BLOCK): approx 1700 MB Saurabh. -- http://nandz.blogspot.com http://foodieforlife.blogspot.com