On Tue, Jul 28, 2009 at 2:22 AM, Zheng Shao<zsh...@gmail.com> wrote: > Yes we do compress all tables. > > Zheng > > On Mon, Jul 27, 2009 at 11:08 PM, Saurabh Nanda<saurabhna...@gmail.com> wrote: >> >>> In our setup, we didn't change io.seqfile.compress.blocksize (1MB) and >>> it's still fairly good. >>> You are free to try 100MB for better compression ratio, but I would >>> recommend to keep the default setting to minimize the possibilities of >>> hitting unknown bugs. >> >> Makes sense. Better compression brought down a count(1) query from 100+ sec >> down to 40sec. The ETL phase is now taking 510sec as opposed to 700sec >> earlier. >> >> Do you also compress all tables, not just the raw ones? Would you recommend >> it? >> >> Saurabh. >> -- >> http://nandz.blogspot.com >> http://foodieforlife.blogspot.com >> > > > > -- > Yours, > Zheng >
Saurabh, That you for the wiki page on this. Keep up the good work and please post all your findings about compression. Many people (including me) will benefit from an explanation about the different types of compression available and the trade offs of different codecs and options. I am really excited as I have (shamefully ) had some large tables with multiple text files building up, and the thought of smaller data and faster queries is giving me goosebumps. Edward