Sequence Block compression happens on smaller chunks (around 1MB I think) so 
the compression ration would be smaller than compressing complete file.


________________________________
From: Saurabh Nanda <saurabhna...@gmail.com>
Reply-To: <hive-user@hadoop.apache.org>
Date: Mon, 27 Jul 2009 08:38:08 -0700
To: <hive-user@hadoop.apache.org>
Subject: Re: Re: bz2 Splits.


#2 Compressed logs in textfile tables: 60sec (filesize of 736 MB over 8 
compressed files)
#3 Compressed logs in sequencefile tables: 101sec (filesize of 4,773 MB over 
126 compressed files)

Why is there such a *big* difference in compression ratios between the gzip 
utility and Hive?

Uncompressed file size: approx 3500 MB
Gzip utility: approx 250 MB
org.apache.hadoop.io.compress.GzipCodec (BLOCK): approx 1600 MB
org.apache.hadoop.io.compress.DefaultCodec (BLOCK): approx 1700 MB

Saurabh.
--
http://nandz.blogspot.com
http://foodieforlife.blogspot.com

Reply via email to