Stan,

See my comments inline.

Thanks, Hong

On May 18, 2010, at 8:44 AM, stan lee wrote:

Hi Guys,

I am trying to use compression to reduce the IO workload when trying to run
a job but failed. I have several questions which needs your help.

For lzo compression, I found a guide
http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ, why it said "Note that you must have both 32-bit and 64-bit liblzo2 installed" ? I am not sure whether it means that we also need 32bit liblzo2 installed even when we are
on 64bit system. If so, why?

The answer on the wiki page is to the question of how to set up the native libraries so that both 32-bit AND 64-bit java would work. If you adhere to an environment with the same flavor of java across the whole cluster, then the solution would not apply to you.

Also if I don't use lzo compression and tried to use gzip to compress the final reduce output file, I just set below value in mapred-site.xml, but seems it doesn't work(how can I find the final .gz file compressed? I used "hadoop dfs -l <dir>" and didn't find that.). My question: can we use gzip to compress the final result when it's not streaming job? How can we ensure
that the compression has been enabled during a job execution?

<property>
      <name>mapred.output.compress</name>
      <value>true</value>
</property>


The truth is, this option is honored by the implementation of OutputFormat classes. If you use TextOutputFormat, then you should see files like "part-xxxx.gz" in the output directory. If you write your own output format class, then you should follow the implementations of TextOutputFormat or SequenceFileOutputFormat to set up compression properly.

Reply via email to