[ https://issues.apache.org/jira/browse/NUTCH-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062630#comment-13062630 ]
Markus Jelsma edited comment on NUTCH-1029 at 7/9/11 9:22 PM: -------------------------------------------------------------- The assumption was correct. Here's a patch for 1.4 that disables the creation of the _SUCCESS file for the stat job. I haven't tested topN and dump jobs. By the way: having a _SUCCESS file in the current crawl db will also throw errors for the -url job. Yesterday i copied over a crawldb from production hdfs and had to remove the file as well before reading it locally. was (Author: markus17): The assumption was correct. Here's a patch for 1.4 that disables the creation of the _SUCCESS file for the stat job. I haven't tested topN and dump jobs. > Readdb throws EOFException > -------------------------- > > Key: NUTCH-1029 > URL: https://issues.apache.org/jira/browse/NUTCH-1029 > Project: Nutch > Issue Type: Bug > Components: linkdb > Affects Versions: 1.4 > Environment: Hadoop 0.20.203.0 > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Priority: Critical > Fix For: 1.4, 2.0 > > Attachments: NUTCH-1029-1.4-1.patch > > > Readdb -stats on a crawldb with 1 record exits with EOFError on > Hadoop-0.20.203.0. > {code} > Exception in thread "main" java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) > at > org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:93) > at > org.apache.nutch.crawl.CrawlDbReader.processStatJob(CrawlDbReader.java:320) > at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:502) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira