[ https://issues.apache.org/jira/browse/NUTCH-1029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062489#comment-13062489 ]
Markus Jelsma commented on NUTCH-1029: -------------------------------------- It seems this error is caused due to the _SUCCESS file in crawldb. This file is saved after a successful job completion because of MAPREDUCE-947. The crawldb reader attempts to read the file, which it can't and thus throws the above exception. The reader job writes and then reads a temporary stat_tmp1234567 directory. The following read seems to choke on the _SUCCESS file. > Readdb throws EOFException > -------------------------- > > Key: NUTCH-1029 > URL: https://issues.apache.org/jira/browse/NUTCH-1029 > Project: Nutch > Issue Type: Bug > Components: linkdb > Affects Versions: 1.4 > Environment: Hadoop 0.20.203.0 > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Fix For: 1.4, 2.0 > > > Readdb -stats on a crawldb with 1 record exits with EOFError on > Hadoop-0.20.203.0. > {code} > Exception in thread "main" java.io.EOFException > at java.io.DataInputStream.readFully(DataInputStream.java:180) > at java.io.DataInputStream.readFully(DataInputStream.java:152) > at > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1450) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417) > at > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412) > at > org.apache.hadoop.mapred.SequenceFileOutputFormat.getReaders(SequenceFileOutputFormat.java:93) > at > org.apache.nutch.crawl.CrawlDbReader.processStatJob(CrawlDbReader.java:320) > at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:502) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > {code} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira