Using nutch 0.9 (hadoop 0.17.1):

[EMAIL PROTECTED] working]$ bin/nutch readlinkdb
/home/hadoop/crawl-20081201/crawldb -dump crawled_urls.txt
LinkDb dump: starting
LinkDb db: /home/hadoop/crawl-urls-20081201/crawldb
java.io.IOException: Type mismatch in value from map: expected
org.apache.nutch.crawl.Inlinks, recieved
org.apache.nutch.crawl.CrawlDatum
        at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:427)
        at 
org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
        at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

LinkDbReader: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062)
        at 
org.apache.nutch.crawl.LinkDbReader.processDumpJob(LinkDbReader.java:110)
        at org.apache.nutch.crawl.LinkDbReader.run(LinkDbReader.java:127)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.crawl.LinkDbReader.main(LinkDbReader.java:114)

This is the first time I use readlinkdb and the rest of the crawling
process is working ok, I've looked up JIRA and there's no related bug.

I've also tried latest trunk nutch but DFS is not working for me:

[EMAIL PROTECTED] trunk]$ bin/hadoop dfs -ls

Exception in thread "main" java.lang.RuntimeException:
java.lang.ClassNotFoundException:
org.apache.hadoop.hdfs.DistributedFileSystem
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:648)
        at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1334)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118)
        at org.apache.hadoop.fs.FsShell.init(FsShell.java:88)
        at org.apache.hadoop.fs.FsShell.run(FsShell.java:1698)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.fs.FsShell.main(FsShell.java:1847)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hdfs.DistributedFileSystem
        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:628)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:646)
        ... 10 more

Should I file both bugs on JIRA ?

Reply via email to