how to get lzo library loaded?

Alex Luya Sun, 15 Aug 2010 05:53:39 -0700

Hi,
 At every beginning,I  run:hadoop jar hadoop-*-examples.jar grep input output 
'dfs[a-z.]+' successfully,but when  run:
nutch crawl url -dir crawl -depth 3,got errors:
 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------
10/08/07 22:53:30 INFO crawl.Crawl: crawl started in: crawl
        .....................................................................
10/08/07 22:53:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
Exception in thread "main" java.lang.RuntimeException: Error in configuring 
object
        at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        .....................................................................
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        .....................................................................
        ... 9 more
Caused by: java.lang.IllegalArgumentException: Compression codec 
                org.apache.hadoop.io.compress.GzipCodec not found.
        at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
        .....................................................................
        ... 14 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.io.compress.GzipCodec
        .....................................................................
        at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
        ... 16 more


 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------
So,here GzipCode didn't get loaded successfully,or maybe it will not be loaded 
by default,I don't know,but I think it should be,then I followed this 
link:http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ  to install lzo 
and run:
"nutch crawl url -dir crawl -depth 3" again,got errors:
 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------
10/08/07 22:40:41 INFO crawl.Crawl: crawl started in: crawl
        .....................................................................
10/08/07 22:40:42 INFO crawl.Injector: Injector: Converting injected urls to 
crawl db entries.
10/08/07 22:40:42 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
Exception in thread "main" java.lang.RuntimeException: Error in configuring 
object
        at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
        .....................................................................
        at org.apache.nutch.crawl.Injector.inject(Injector.java:211)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        .....................................................................
        at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
        ... 9 more
Caused by: java.lang.IllegalArgumentException: Compression codec 
                org.apache.hadoop.io.compress.GzipCodec not found.
        .....................................................................
        at 
org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:41)
        ... 14 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.io.compress.GzipCodec
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        .....................................................................
        at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
        ... 16 more

 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------
run:hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+',got errors:
 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------
java.lang.RuntimeException: Error in configuring object
        at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
        at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:400)
        .....................................................................
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        .....................................................................
        at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
        ... 22 more
Caused by: java.lang.IllegalArgumentException: Compression codec 
com.hadoop.compression.lzo.LzoCodec
                 not found.
        at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
        at 
org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:134)
        at 
org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:41)
        ... 27 more
Caused by: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec

        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        .....................................................................
        at 
org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
        ... 29 more

 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------
when I run  " ps -aef|grep gpl",got output:
 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------

alex      2267     1  1 22:04 pts/1    00:00:04 
/usr/local/hadoop/jdk1.6.0_21/bin/java -Xmx200m -Dcom.sun.management.jmxremote 
-.............................................. 
/usr/local/hadoop/hadoop-0.20.2/bin/../conf:/usr/local/hadoop/jdk1.6.0_21/lib/tools.jar:/usr/local/hadoop/hadoop-0.20.2/bin/..:/usr/local/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2-
core.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-cli-1.2.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-
codec-1.3.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-el-1.0.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-.-
net-1.4.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/core-3.1.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/hadoop-gpl-compression-0.2.0-
dev.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/hsqldb-1.8.0.10.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jasper-
compiler-5.5.12.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jasper-
runtime-5.5.12.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jets3t-0.6.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jetty-6.1.14.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jetty-
.......................................-
log4j12-1.4.3.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/xmlenc-0.52.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jsp-2.1/jsp-2.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jsp-2.1/jsp-
api-2.1.jar org.apache.hadoop.hdfs.server.namenode.NameNode

See,two jars(hadoop-core and pgl) are there,but seems that they can't be 
referenced by job,before this, I have tried to install hadoop-
lzo(http://github.com/kevinweil/hadoop-lzo),same errors,maybe hadoop-lzo only 
works for hadoop 0.20,not for 0.20.1/2,I don't know.After one month,I 
haven't solved this problem,it's killing me,here I post all configure 
files,would you please help me dig problem out?Thank you.

core-site.xml
 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------
<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://AlexLuya:8020</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/alex/tmp</value>
        </property>

        <property>
                <name>io.compression.codecs</name>
                
<value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec
                </value>
        </property>
        <property>
                <name>io.compression.codec.lzo.class</name>
                <value>com.hadoop.compression.lzo.LzoCodec</value>
        </property>
</configuration>

 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------
mapreduce.xml
 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------
<configuration>
        <property>
                <name>mapred.job.tracker</name>
                <value>AlexLuya:9001</value>
        </property>
        <property>
                <name>mapred.tasktracker.reduce.tasks.maximum</name>
                <value>1</value>
        </property>
        <property>
                <name>mapred.tasktracker.map.tasks.maximum</name>
                <value>1</value>
        </property>
        <property>
                <name>mapred.local.dir</name>
                <value>/home/alex/hadoop/mapred/local</value>
        </property>
        <property>
                <name>mapred.system.dir</name>
                <value>/tmp/hadoop/mapred/system</value>
        </property>
        <property>
    <name>mapreduce.map.output.compress</name>
    <value>true</value>
  </property>
  <property>
    <name>mapreduce.map.output.compress.codec</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>
</configuration>
 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------
hadoop-env.sh
 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------
# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.  Required.
export JAVA_HOME=/usr/local/hadoop/jdk1.6.0_21

# Extra Java CLASSPATH elements.  Optional.
# export HADOOP_CLASSPATH=

# The maximum amount of heap to use, in MB. Default is 1000.
export HADOOP_HEAPSIZE=200

# Extra Java runtime options.  Empty by default.
#export HADOOP_OPTS=-server

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote 
$HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote 
$HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote 
$HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote 
$HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote 
$HADOOP_JOBTRACKER_OPTS"
# export HADOOP_TASKTRACKER_OPTS=
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
# export HADOOP_CLIENT_OPTS

# Extra ssh options.  Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"

# Where log files are stored.  $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

# File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by default.
# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves

# host:path where hadoop code should be rsync'd from.  Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HADOOP_SLAVE_SLEEP=0.1

# The directory where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/var/hadoop/pids

# A string representing this instance of hadoop. $USER by default.
#export HADOOP_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HADOOP_NICENESS=10

 ------------------------------------------------------------------------- 
-------------------------------------------------------------------------

how to get lzo library loaded?

Reply via email to