And also hive.exec.compress.*. So that makes it three sets of configuration variables:
mapred.output.compress.* io.seqfile.compress.* hive.exec.compress.* What's the relationship between these configuration parameters and which ones should I set to achieve a well compress output table? Saurabh. On Fri, Feb 19, 2010 at 7:16 PM, Saurabh Nanda <saurabhna...@gmail.com>wrote: > I'm confused here Zheng. There are two sets of configuration variables. > Those starting with io.* and those starting with mapred.*. For making sure > that the final output table is compressed, which ones do I have to set? > > Saurabh. > > > On Fri, Feb 19, 2010 at 12:37 AM, Zheng Shao <zsh...@gmail.com> wrote: > >> Did you also: >> >> SET mapred.output.compression.codec=org.apache....GZipCode; >> >> Zheng >> >> On Thu, Feb 18, 2010 at 8:25 AM, Saurabh Nanda <saurabhna...@gmail.com> >> wrote: >> > Hi Zheng, >> > >> > I cross checked. I am setting the following in my Hive script before the >> > INSERT command: >> > >> > SET io.seqfile.compression.type=BLOCK; >> > SET hive.exec.compress.output=true; >> > >> > A 132 MB (gzipped) input file going through a cleanup and getting >> populated >> > in a sequencefile table is growing to 432 MB. What could be going wrong? >> > >> > Saurabh. >> > >> > On Wed, Feb 3, 2010 at 2:26 PM, Saurabh Nanda <saurabhna...@gmail.com> >> > wrote: >> >> >> >> Thanks, Zheng. Will do some more tests and get back. >> >> >> >> Saurabh. >> >> >> >> On Mon, Feb 1, 2010 at 1:22 PM, Zheng Shao <zsh...@gmail.com> wrote: >> >>> >> >>> I would first check whether it is really the block compression or >> >>> record compression. >> >>> Also maybe the block size is too small but I am not sure that is >> >>> tunable in SequenceFile or not. >> >>> >> >>> Zheng >> >>> >> >>> On Sun, Jan 31, 2010 at 9:03 PM, Saurabh Nanda < >> saurabhna...@gmail.com> >> >>> wrote: >> >>> > Hi, >> >>> > >> >>> > The size of my Gzipped weblog files is about 35MB. However, upon >> >>> > enabling >> >>> > block compression, and inserting the logs into another Hive table >> >>> > (sequencefile), the file size bloats up to about 233MB. I've done >> >>> > similar >> >>> > processing on a local Hadoop/Hive cluster, and while the >> compressions >> >>> > is not >> >>> > as good as gzipping, it still is not this bad. What could be going >> >>> > wrong? >> >>> > >> >>> > I looked at the header of the resulting file and here's what it >> says: >> >>> > >> >>> > >> >>> > >> SEQ^F"org.apache.hadoop.io.BytesWritable^Yorg.apache.hadoop.io.Text^A^@'org.apache.hadoop.io.compress.GzipCodec >> >>> > >> >>> > Does Amazon Elastic MapReduce behave differently or am I doing >> >>> > something >> >>> > wrong? >> >>> > >> >>> > Saurabh. >> >>> > -- >> >>> > http://nandz.blogspot.com >> >>> > http://foodieforlife.blogspot.com >> >>> > >> >>> >> >>> >> >>> >> >>> -- >> >>> Yours, >> >>> Zheng >> >> >> >> >> >> >> >> -- >> >> http://nandz.blogspot.com >> >> http://foodieforlife.blogspot.com >> > >> > >> > >> > -- >> > http://nandz.blogspot.com >> > http://foodieforlife.blogspot.com >> > >> >> >> >> -- >> Yours, >> Zheng >> > > > > -- > http://nandz.blogspot.com > http://foodieforlife.blogspot.com > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com