Thanks Harsh, Hong and Ted. I still have question on how to build native library for gzip compression type? I found below information from wiki: ****************************************************************************** In particular the various packages you would need on the target platform are:
- C compiler (e.g. GNU C Compiler <http://gcc.gnu.org/>) - GNU Autools Chain: autoconf <http://www.gnu.org/software/autoconf/>, automake <http://www.gnu.org/software/automake/>, libtool<http://www.gnu.org/software/libtool/> - zlib-development package (stable version >= 1.2.0) - lzo-development package (stable version >= 2.0) Once you have the pre-requisites use the standard build.xml and pass along the compile.native flag (set to true) to build the native hadoop library: $ ant -Dcompile.native=true <target> *************************************************************************** So what's the meaning of development package here? I know for lzo it's there is hadoop-lzo package..but what's that for gzip? I think gzip is written in c programe and shouldn't be built using ant/ivy? Sorry that I am just a beginner to knock the door of hadoop. Thanks for answer for advance! I have "make install" GNU gzip source code on my cluster node, would that work if I directly copy the libraries generated to the dir $HADOOP_HOME/lib/native/Linux_amd64-64? Stan.Lee On Wed, May 19, 2010 at 4:31 PM, stan lee <lee.stan...@gmail.com> wrote: > Get the meaning now. As sort would use SequenceFileFormat to write the > output file and to use gzip as the compression type, we need to use native > library...would try that. > > > On Wed, May 19, 2010 at 3:17 PM, stan lee <lee.stan...@gmail.com> wrote: > >> Thanks All. So if we don't call setCompressOutput() and >> setOutputCompressorClass() funciton in the sort programe,we just set >> mapred.output.compress to true and set mapred.output.compression.codec to >> org.apache.hadoop.io.compress.GzipCodec, that wouldn't have compressed >> output file like part-xxxx.gz? >> >> On Wed, May 19, 2010 at 1:31 AM, Harsh J <qwertyman...@gmail.com> wrote: >> >>> Hi stan, >>> >>> You can do something of this sort if you use FileOutputFormat, from >>> within your Job Driver: >>> >>> FileOutputFormat.setCompressOutput(job, true); >>> FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class); >>> // GzipCodec from org.apache.hadoop.io.compress. >>> // and where 'job' is either JobConf or Job object. >>> >>> This will write the simple file output in Gzip format. You also have >>> BZip2Codec. >>> >>> On Tue, May 18, 2010 at 9:14 PM, stan lee <lee.stan...@gmail.com> wrote: >>> > Hi Guys, >>> > >>> > I am trying to use compression to reduce the IO workload when trying to >>> run >>> > a job but failed. I have several questions which needs your help. >>> > >>> > For lzo compression, I found a guide >>> > http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it said >>> "Note >>> > that you must have both 32-bit and 64-bit liblzo2 installed" ? I am not >>> sure >>> > whether it means that we also need 32bit liblzo2 installed even when we >>> are >>> > on 64bit system. If so, why? >>> > >>> > Also if I don't use lzo compression and tried to use gzip to compress >>> the >>> > final reduce output file, I just set below value in mapred-site.xml, >>> but >>> > seems it doesn't work(how can I find the final .gz file compressed? I >>> used >>> > "hadoop dfs -l <dir>" and didn't find that.). My question: can we use >>> gzip >>> > to compress the final result when it's not streaming job? How can we >>> ensure >>> > that the compression has been enabled during a job execution? >>> > >>> > <property> >>> > <name>mapred.output.compress</name> >>> > <value>true</value> >>> > </property> >>> > >>> > Thanks! >>> > Stan Lee >>> > >>> >>> >>> >>> -- >>> Harsh J >>> www.harshj.com >>> >> >> >