Another approach, arguably a bit better in my opinion, is through the
hadoop-gpl-compression project (http://code.google.com/p/hadoop-gpl-compression/
). It also incorporates Johan Oskarsson's H-4640 patch. A detailed
description on how to use it with lzo-less hadoop distribution can be
found at: http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ.
Thanks, Hong
On Jul 21, 2009, at 8:05 PM, Gross, Danny wrote:
Thanks Aaron, for the quick response.
Best Regards,
Danny
-----Original Message-----
From: Aaron Kimball [mailto:aa...@cloudera.com]
Sent: Tuesday, July 21, 2009 9:10 PM
To: common-user@hadoop.apache.org
Subject: Re: native-lzo library not available issue with terasort
Native LZO support was removed from Hadoop due to licensing
restrictions. See
http://www.cloudera.com/blog/2009/06/24/parallel-lzo-splittable-compression-for-hadoop/
for a writeup on how to reenable it in your local build.
- Aaron
On Tue, Jul 21, 2009 at 7:02 PM, Gross,
Danny<danny.gr...@spansion.com> wrote:
Hello,
I've been running terasort on multiple cluster configurations, and
attempted to duplicate some of the configuration settings that Yahoo!
used for the Minute Sort.
In particular, I set the mapred.map.output.compression.codec
property to
value "org.apache.hadoop.io.compress.LzoCodec" in hadoop-
site.xml. I
am using hadoop-0.19.1.
The teragen program runs fine, and completes with improved time
with my
new settings. However, when I run the terasort program, the
following
error is thrown from the map tasks, and the job ultimately fails:
"java.lang.RuntimeException: native-lzo library not available at
org
.apache.hadoop.io.compress.LzoCodec.getCompressorType(LzoCodec.java:1
30) at
org
.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:98)
at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:93) at
org.apache.hadoop.mapred.MapTask
$MapOutputBuffer.sortAndSpill(MapTask.ja
va:961) at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:
842)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at
org.apache.hadoop.mapred.Child.main(Child.java:158)"
I've searched other places for an answer, and am coming up short.
Any
help out there would be greatly appreciated.
Best regards,
Danny
Danny B. Gross
Solutions Engineering
Spansion, Inc.
email: danny.gr...@spansion.com <mailto:danny.gr...@spansion.com>