Using nutch 0.9 (hadoop 0.17.1): [EMAIL PROTECTED] working]$ bin/nutch readlinkdb /home/hadoop/crawl-20081201/crawldb -dump crawled_urls.txt LinkDb dump: starting LinkDb db: /home/hadoop/crawl-urls-20081201/crawldb java.io.IOException: Type mismatch in value from map: expected org.apache.nutch.crawl.Inlinks, recieved org.apache.nutch.crawl.CrawlDatum at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:427) at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
LinkDbReader: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062) at org.apache.nutch.crawl.LinkDbReader.processDumpJob(LinkDbReader.java:110) at org.apache.nutch.crawl.LinkDbReader.run(LinkDbReader.java:127) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.LinkDbReader.main(LinkDbReader.java:114) This is the first time I use readlinkdb and the rest of the crawling process is working ok, I've looked up JIRA and there's no related bug. I've also tried latest trunk nutch but DFS is not working for me: [EMAIL PROTECTED] trunk]$ bin/hadoop dfs -ls Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.DistributedFileSystem at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:648) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1334) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118) at org.apache.hadoop.fs.FsShell.init(FsShell.java:88) at org.apache.hadoop.fs.FsShell.run(FsShell.java:1698) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.fs.FsShell.main(FsShell.java:1847) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.DistributedFileSystem at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:628) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:646) ... 10 more Should I file both bugs on JIRA ?