Hi, At every beginning,I run:hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' successfully,but when run: nutch crawl url -dir crawl -depth 3,got errors: ------------------------------------------------------------------------- ------------------------------------------------------------------------- 10/08/07 22:53:30 INFO crawl.Crawl: crawl started in: crawl ..................................................................... 10/08/07 22:53:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. Exception in thread "main" java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) ..................................................................... at org.apache.nutch.crawl.Crawl.main(Crawl.java:124) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) ..................................................................... ... 9 more Caused by: java.lang.IllegalArgumentException: Compression codec org.apache.hadoop.io.compress.GzipCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96) ..................................................................... ... 14 more Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.compress.GzipCodec ..................................................................... at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89) ... 16 more
------------------------------------------------------------------------- ------------------------------------------------------------------------- So,here GzipCode didn't get loaded successfully,or maybe it will not be loaded by default,I don't know,but I think it should be,then I followed this link:http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ to install lzo and run: "nutch crawl url -dir crawl -depth 3" again,got errors: ------------------------------------------------------------------------- ------------------------------------------------------------------------- 10/08/07 22:40:41 INFO crawl.Crawl: crawl started in: crawl ..................................................................... 10/08/07 22:40:42 INFO crawl.Injector: Injector: Converting injected urls to crawl db entries. 10/08/07 22:40:42 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. Exception in thread "main" java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) ..................................................................... at org.apache.nutch.crawl.Injector.inject(Injector.java:211) at org.apache.nutch.crawl.Crawl.main(Crawl.java:124) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ..................................................................... at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.IllegalArgumentException: Compression codec org.apache.hadoop.io.compress.GzipCodec not found. ..................................................................... at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:41) ... 14 more Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.io.compress.GzipCodec at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) ..................................................................... at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89) ... 16 more ------------------------------------------------------------------------- ------------------------------------------------------------------------- run:hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+',got errors: ------------------------------------------------------------------------- ------------------------------------------------------------------------- java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:400) ..................................................................... at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ..................................................................... at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 22 more Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:134) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:41) ... 27 more Caused by: java.lang.ClassNotFoundException: com.hadoop.compression.lzo.LzoCodec at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) ..................................................................... at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89) ... 29 more ------------------------------------------------------------------------- ------------------------------------------------------------------------- when I run " ps -aef|grep gpl",got output: ------------------------------------------------------------------------- ------------------------------------------------------------------------- alex 2267 1 1 22:04 pts/1 00:00:04 /usr/local/hadoop/jdk1.6.0_21/bin/java -Xmx200m -Dcom.sun.management.jmxremote -.............................................. /usr/local/hadoop/hadoop-0.20.2/bin/../conf:/usr/local/hadoop/jdk1.6.0_21/lib/tools.jar:/usr/local/hadoop/hadoop-0.20.2/bin/..:/usr/local/hadoop/hadoop-0.20.2/bin/../hadoop-0.20.2- core.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-cli-1.2.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons- codec-1.3.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-el-1.0.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/commons-.- net-1.4.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/core-3.1.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/hadoop-gpl-compression-0.2.0- dev.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/hsqldb-1.8.0.10.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jasper- compiler-5.5.12.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jasper- runtime-5.5.12.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jets3t-0.6.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jetty-6.1.14.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jetty- .......................................- log4j12-1.4.3.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/xmlenc-0.52.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jsp-2.1/jsp-2.1.jar:/usr/local/hadoop/hadoop-0.20.2/bin/../lib/jsp-2.1/jsp- api-2.1.jar org.apache.hadoop.hdfs.server.namenode.NameNode See,two jars(hadoop-core and pgl) are there,but seems that they can't be referenced by job,before this, I have tried to install hadoop- lzo(http://github.com/kevinweil/hadoop-lzo),same errors,maybe hadoop-lzo only works for hadoop 0.20,not for 0.20.1/2,I don't know.After one month,I haven't solved this problem,it's killing me,here I post all configure files,would you please help me dig problem out?Thank you. core-site.xml ------------------------------------------------------------------------- ------------------------------------------------------------------------- <configuration> <property> <name>fs.default.name</name> <value>hdfs://AlexLuya:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/alex/tmp</value> </property> <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzoCodec </value> </property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> </configuration> ------------------------------------------------------------------------- ------------------------------------------------------------------------- mapreduce.xml ------------------------------------------------------------------------- ------------------------------------------------------------------------- <configuration> <property> <name>mapred.job.tracker</name> <value>AlexLuya:9001</value> </property> <property> <name>mapred.tasktracker.reduce.tasks.maximum</name> <value>1</value> </property> <property> <name>mapred.tasktracker.map.tasks.maximum</name> <value>1</value> </property> <property> <name>mapred.local.dir</name> <value>/home/alex/hadoop/mapred/local</value> </property> <property> <name>mapred.system.dir</name> <value>/tmp/hadoop/mapred/system</value> </property> <property> <name>mapreduce.map.output.compress</name> <value>true</value> </property> <property> <name>mapreduce.map.output.compress.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> </configuration> ------------------------------------------------------------------------- ------------------------------------------------------------------------- hadoop-env.sh ------------------------------------------------------------------------- ------------------------------------------------------------------------- # Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. export JAVA_HOME=/usr/local/hadoop/jdk1.6.0_21 # Extra Java CLASSPATH elements. Optional. # export HADOOP_CLASSPATH= # The maximum amount of heap to use, in MB. Default is 1000. export HADOOP_HEAPSIZE=200 # Extra Java runtime options. Empty by default. #export HADOOP_OPTS=-server # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS" export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS" export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" # export HADOOP_TASKTRACKER_OPTS= # The following applies to multiple commands (fs, dfs, fsck, distcp etc) # export HADOOP_CLIENT_OPTS # Extra ssh options. Empty by default. # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR" # Where log files are stored. $HADOOP_HOME/logs by default. # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves # host:path where hadoop code should be rsync'd from. Unset by default. # export HADOOP_MASTER=master:/home/$USER/src/hadoop # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HADOOP_SLAVE_SLEEP=0.1 # The directory where pid files are stored. /tmp by default. # export HADOOP_PID_DIR=/var/hadoop/pids # A string representing this instance of hadoop. $USER by default. #export HADOOP_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HADOOP_NICENESS=10 ------------------------------------------------------------------------- -------------------------------------------------------------------------