Dennis I was wondering if this patch could fix my problem which is, if not the same, very similar to this one. I am using Nutch 0.8.2-dev, I have made checkout awhile ago from SVN but never updated again. I was able to crawl 10000 xml files before with no error whatsoever. This is the following errors that I get when I'm fetching:
INFO parser.custom: Custom-parse: Parsing content file:/C:/TeamBinder/AddressBook/9100/(65)E110_ST A0 (1).pdf 07/02/12 22:09:16 INFO fetcher.Fetcher: fetch of file:/C:/TeamBinder/AddressBook/9100/(65)E110_ST A0 (1).pdf failed with: java.lang.NullPointerException 07/02/12 22:09:17 INFO mapred.LocalJobRunner: 0 pages, 0 errors, 0.0 pages/s, 0 kb/s, 07/02/12 22:09:17 FATAL fetcher.Fetcher: java.lang.NullPointerException 07/02/12 22:09:17 FATAL fetcher.Fetcher: at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:198) 07/02/12 22:09:17 FATAL fetcher.Fetcher: at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:189) 07/02/12 22:09:17 FATAL fetcher.Fetcher: at org.apache.hadoop.mapred.MapTask$2.collect(MapTask.java:91) 07/02/12 22:09:17 FATAL fetcher.Fetcher: at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:314) 07/02/12 22:09:17 FATAL fetcher.Fetcher: at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:232) 07/02/12 22:09:17 FATAL fetcher.Fetcher: fetcher caught:java.lang.NullPointerException One of the problem is that my hadoop version says the following: hadoop-0.4.0-patched. Now I don't know if it means that I am running the 0.4.0 version but it seems a little bit confusing. Once you can clarify that for me, then I will be able to apply the patch to my version. Best Regards, Armel -----Original Message----- From: Dennis Kubes [mailto:[EMAIL PROTECTED] Sent: 13 February 2007 21:09 To: [email protected] Subject: Re: NPE in org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue Actually I take it back. I don't think it is the same problem but I do think it is the right solution. Dennis Kubes Dennis Kubes wrote: > This has to do with HADOOP-964. Replace the jar files in your Nutch > versions with the most recent versions from Hadoop. You will also need > to apply NUTCH-437 patch to get Nutch to work with the most recent > changes to the Hadoop codebase. > > Dennis Kubes > > Gal Nitzan wrote: >> Hi, >> >> Does anybody uses Nutch trunk? >> >> I am running nutch 0.9 and unable to fetch. >> >> after 50-60K urls I get NPE in >> org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue every time. >> >> I was wandering if anyone have a work around or maybe something is >> wrong with >> my setup. >> >> I have opened a new issue in jira >> http://issues.apache.org/jira/browse/hadoop-1008 for this. >> >> Any clue? >> >> Gal >> >> -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.441 / Virus Database: 268.17.37/682 - Release Date: 12/02/2007 13:23 -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.441 / Virus Database: 268.17.37/682 - Release Date: 12/02/2007 13:23 ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier. Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
