Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

Hong Tang Tue, 18 May 2010 11:12:46 -0700

Stan,

See my comments inline.


Thanks, Hong

On May 18, 2010, at 8:44 AM, stan lee wrote:

Hi Guys,
I am trying to use compression to reduce the IO workload when tryingto run
a job but failed. I have several questions which needs your help.

For lzo compression, I found a guide
http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why itsaid "Notethat you must have both 32-bit and 64-bit liblzo2 installed" ? I amnot surewhether it means that we also need 32bit liblzo2 installed even whenwe are
on 64bit system. If so, why?

The answer on the wiki page is to the question of how to set up thenative libraries so that both 32-bit AND 64-bit java would work. Ifyou adhere to an environment with the same flavor of java across thewhole cluster, then the solution would not apply to you.

Also if I don't use lzo compression and tried to use gzip tocompress thefinal reduce output file, I just set below value in mapred-site.xml,butseems it doesn't work（how can I find the final .gz file compressed?I used"hadoop dfs -l <dir>" and didn't find that.）. My question: can weuse gzipto compress the final result when it's not streaming job? How can weensure
that the compression has been enabled during a job execution?

<property>
      <name>mapred.output.compress</name>
      <value>true</value>
</property>

The truth is, this option is honored by the implementation ofOutputFormat classes. If you use TextOutputFormat, then you shouldsee files like "part-xxxx.gz" in the output directory. If you writeyour own output format class, then you should follow theimplementations of TextOutputFormat or SequenceFileOutputFormat to setup compression properly.

Re: Do we need to install both 32 and 64 bit lzo2 to enable lzo compression and how can we use gzip compressoin codec in hadoop

Reply via email to