On Wed, Dec 3, 2008 at 8:29 PM, Doğacan Güney <[EMAIL PROTECTED]> wrote: > On Wed, Dec 3, 2008 at 8:55 PM, brainstorm <[EMAIL PROTECTED]> wrote: >> Using nutch 0.9 (hadoop 0.17.1): >> >> [EMAIL PROTECTED] working]$ bin/nutch readlinkdb >> /home/hadoop/crawl-20081201/crawldb -dump crawled_urls.txt >> LinkDb dump: starting >> LinkDb db: /home/hadoop/crawl-urls-20081201/crawldb > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > It seems you are providing a crawldb as argument. You should pass the linkdb.
Thanks a lot for the hint, but I cannot find "linkdb" dir anywhere on the HDFS :_/ Can you point me where should it be ? >> java.io.IOException: Type mismatch in value from map: expected >> org.apache.nutch.crawl.Inlinks, recieved >> org.apache.nutch.crawl.CrawlDatum >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:427) >> at >> org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:37) >> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219) >> at >> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124) >> >> LinkDbReader: java.io.IOException: Job failed! >> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1062) >> at >> org.apache.nutch.crawl.LinkDbReader.processDumpJob(LinkDbReader.java:110) >> at org.apache.nutch.crawl.LinkDbReader.run(LinkDbReader.java:127) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.nutch.crawl.LinkDbReader.main(LinkDbReader.java:114) >> >> This is the first time I use readlinkdb and the rest of the crawling >> process is working ok, I've looked up JIRA and there's no related bug. >> >> I've also tried latest trunk nutch but DFS is not working for me: >> >> [EMAIL PROTECTED] trunk]$ bin/hadoop dfs -ls >> >> Exception in thread "main" java.lang.RuntimeException: >> java.lang.ClassNotFoundException: >> org.apache.hadoop.hdfs.DistributedFileSystem >> at >> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:648) >> at >> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1334) >> at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:56) >> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1351) >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:213) >> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:118) >> at org.apache.hadoop.fs.FsShell.init(FsShell.java:88) >> at org.apache.hadoop.fs.FsShell.run(FsShell.java:1698) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >> at org.apache.hadoop.fs.FsShell.main(FsShell.java:1847) >> Caused by: java.lang.ClassNotFoundException: >> org.apache.hadoop.hdfs.DistributedFileSystem >> at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >> at java.security.AccessController.doPrivileged(Native Method) >> at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:252) >> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:247) >> at >> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:628) >> at >> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:646) >> ... 10 more >> >> Should I file both bugs on JIRA ? >> > > This I am not sure, but did you try ant clean; ant? It may be a > version mismatch. Yes, I did ant clean && ant before trying the above command. I also tried to upgrade the filesystem unsuccessfully and even created it from scratch: https://issues.apache.org/jira/browse/HADOOP-1212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650556#action_12650556 > > -- > Doğacan Güney >