I'm confused here Zheng. There are two sets of configuration variables. Those starting with io.* and those starting with mapred.*. For making sure that the final output table is compressed, which ones do I have to set?
Saurabh. On Fri, Feb 19, 2010 at 12:37 AM, Zheng Shao <zsh...@gmail.com> wrote: > Did you also: > > SET mapred.output.compression.codec=org.apache....GZipCode; > > Zheng > > On Thu, Feb 18, 2010 at 8:25 AM, Saurabh Nanda <saurabhna...@gmail.com> > wrote: > > Hi Zheng, > > > > I cross checked. I am setting the following in my Hive script before the > > INSERT command: > > > > SET io.seqfile.compression.type=BLOCK; > > SET hive.exec.compress.output=true; > > > > A 132 MB (gzipped) input file going through a cleanup and getting > populated > > in a sequencefile table is growing to 432 MB. What could be going wrong? > > > > Saurabh. > > > > On Wed, Feb 3, 2010 at 2:26 PM, Saurabh Nanda <saurabhna...@gmail.com> > > wrote: > >> > >> Thanks, Zheng. Will do some more tests and get back. > >> > >> Saurabh. > >> > >> On Mon, Feb 1, 2010 at 1:22 PM, Zheng Shao <zsh...@gmail.com> wrote: > >>> > >>> I would first check whether it is really the block compression or > >>> record compression. > >>> Also maybe the block size is too small but I am not sure that is > >>> tunable in SequenceFile or not. > >>> > >>> Zheng > >>> > >>> On Sun, Jan 31, 2010 at 9:03 PM, Saurabh Nanda <saurabhna...@gmail.com > > > >>> wrote: > >>> > Hi, > >>> > > >>> > The size of my Gzipped weblog files is about 35MB. However, upon > >>> > enabling > >>> > block compression, and inserting the logs into another Hive table > >>> > (sequencefile), the file size bloats up to about 233MB. I've done > >>> > similar > >>> > processing on a local Hadoop/Hive cluster, and while the compressions > >>> > is not > >>> > as good as gzipping, it still is not this bad. What could be going > >>> > wrong? > >>> > > >>> > I looked at the header of the resulting file and here's what it says: > >>> > > >>> > > >>> > > SEQ^F"org.apache.hadoop.io.BytesWritable^Yorg.apache.hadoop.io.Text^A^@'org.apache.hadoop.io.compress.GzipCodec > >>> > > >>> > Does Amazon Elastic MapReduce behave differently or am I doing > >>> > something > >>> > wrong? > >>> > > >>> > Saurabh. > >>> > -- > >>> > http://nandz.blogspot.com > >>> > http://foodieforlife.blogspot.com > >>> > > >>> > >>> > >>> > >>> -- > >>> Yours, > >>> Zheng > >> > >> > >> > >> -- > >> http://nandz.blogspot.com > >> http://foodieforlife.blogspot.com > > > > > > > > -- > > http://nandz.blogspot.com > > http://foodieforlife.blogspot.com > > > > > > -- > Yours, > Zheng > -- http://nandz.blogspot.com http://foodieforlife.blogspot.com