Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

stan lee Wed, 19 May 2010 03:38:38 -0700

Thanks Harsh, Hong and Ted. I still have question on how to build native
library for gzip compression type?  I found below information from wiki:
******************************************************************************
In particular the various packages you would need on the target platform
are:


   - C compiler (e.g. GNU C Compiler <http://gcc.gnu.org/>)
   - GNU Autools Chain: autoconf <http://www.gnu.org/software/autoconf/>,
   automake <http://www.gnu.org/software/automake/>,
libtool<http://www.gnu.org/software/libtool/>
   - zlib-development package (stable version >= 1.2.0)
   - lzo-development package (stable version >= 2.0)

Once you have the pre-requisites use the standard build.xml and pass along
the compile.native flag (set to true) to build the native hadoop library:

$ ant -Dcompile.native=true <target>
 ***************************************************************************
So what's the meaning of development package here? I know for lzo it's there
is hadoop-lzo package..but what's that for gzip? I think gzip is written in
c programe and shouldn't be built using ant/ivy?   Sorry that I am just a
beginner to knock the door of hadoop. Thanks for answer for advance!

I have "make install" GNU gzip source code on my cluster node, would that
work if I directly copy the libraries generated to the dir
$HADOOP_HOME/lib/native/Linux_amd64-64?

Stan.Lee
On Wed, May 19, 2010 at 4:31 PM, stan lee <lee.stan...@gmail.com> wrote:

> Get the meaning now. As sort would use SequenceFileFormat to write the
> output file and to use gzip as the compression type, we need to use native
> library...would try that.
>
>
> On Wed, May 19, 2010 at 3:17 PM, stan lee <lee.stan...@gmail.com> wrote:
>
>> Thanks All. So if we don't call setCompressOutput() and
>> setOutputCompressorClass() funciton in the sort programe,we  just set
>> mapred.output.compress to true and set mapred.output.compression.codec to
>> org.apache.hadoop.io.compress.GzipCodec, that wouldn't have compressed
>> output file like part-xxxx.gz?
>>
>> On Wed, May 19, 2010 at 1:31 AM, Harsh J <qwertyman...@gmail.com> wrote:
>>
>>> Hi stan,
>>>
>>> You can do something of this sort if you use FileOutputFormat, from
>>> within your Job Driver:
>>>
>>>    FileOutputFormat.setCompressOutput(job, true);
>>>    FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
>>>    // GzipCodec from org.apache.hadoop.io.compress.
>>>    // and where 'job' is either JobConf or Job object.
>>>
>>> This will write the simple file output in Gzip format. You also have
>>> BZip2Codec.
>>>
>>> On Tue, May 18, 2010 at 9:14 PM, stan lee <lee.stan...@gmail.com> wrote:
>>> > Hi Guys,
>>> >
>>> > I am trying to use compression to reduce the IO workload when trying to
>>> run
>>> > a job but failed. I have several questions which needs your help.
>>> >
>>> > For lzo compression, I found a guide
>>> > http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it said
>>> "Note
>>> > that you must have both 32-bit and 64-bit liblzo2 installed" ? I am not
>>> sure
>>> > whether it means that we also need 32bit liblzo2 installed even when we
>>> are
>>> > on 64bit system. If so, why?
>>> >
>>> > Also if I don't use lzo compression and tried to use gzip to compress
>>> the
>>> > final reduce output file, I just set below value in mapred-site.xml,
>>> but
>>> > seems it doesn't work（how can I find the final .gz file compressed? I
>>> used
>>> > "hadoop dfs -l <dir>" and didn't find that.）. My question: can we use
>>> gzip
>>> > to compress the final result when it's not streaming job? How can we
>>> ensure
>>> > that the compression has been enabled during a job execution?
>>> >
>>> > <property>
>>> >       <name>mapred.output.compress</name>
>>> >       <value>true</value>
>>> > </property>
>>> >
>>> > Thanks!
>>> > Stan Lee
>>> >
>>>
>>>
>>>
>>>  --
>>> Harsh J
>>> www.harshj.com
>>>
>>
>>
>

Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

Reply via email to