I've just upgraded to Nutch 0.9.0 and am getting a NullPointerException
during my crawl. I turned on lots of logging for nutch and hadoop.
Nutch is able to complete a smaller crawl of our intranet without a
problem. This only crops up when doing the full crawl.
Here is the relevant section from the logs (sorry for any excess junk,
I wasn't sure how much would be helpful):
2007-05-10 18:47:25,862 INFO mapred.LocalJobRunner
(LocalJobRunner.java:progress(188)) - 117592 pages, 99 errors, 2.4
pages/s, 649 kb/s,
2007-05-10 18:47:26,907 INFO mapred.LocalJobRunner
(LocalJobRunner.java:progress(188)) - 117592 pages, 99 errors, 2.4
pages/s, 649 kb/s,
2007-05-10 18:47:28,290 INFO mapred.LocalJobRunner
(LocalJobRunner.java:progress(188)) - 117592 pages, 99 errors, 2.4
pages/s, 649 kb/s,
fetching http://machine/file.txt
2007-05-10 18:47:28,613 INFO fetcher.Fetcher (Fetcher.java:run(134)) -
fetching http://machine/file.txt
2007-05-10 18:47:29,299 INFO mapred.LocalJobRunner
(LocalJobRunner.java:progress(188)) - 117592 pages, 99 errors, 2.4
pages/s, 649 kb/s,
2007-05-10 18:47:30,566 INFO mapred.LocalJobRunner
(LocalJobRunner.java:progress(188)) - 117592 pages, 99 errors, 2.4
pages/s, 649 kb/s,
2007-05-10 18:47:32,530 INFO mapred.LocalJobRunner
(LocalJobRunner.java:progress(188)) - 117592 pages, 99 errors, 2.4
pages/s, 649 kb/s,
2007-05-10 18:47:34,021 INFO mapred.LocalJobRunner
(LocalJobRunner.java:progress(188)) - 117592 pages, 99 errors, 2.4
pages/s, 649 kb/s,
2007-05-10 18:47:35,287 INFO mapred.LocalJobRunner
(LocalJobRunner.java:progress(188)) - 117592 pages, 99 errors, 2.4
pages/s, 649 kb/s,
2007-05-10 18:47:36,447 INFO mapred.LocalJobRunner
(LocalJobRunner.java:progress(188)) - 117592 pages, 99 errors, 2.4
pages/s, 649 kb/s,
java.lang.NullPointerException
Exception in thread "main" java.io.IOException: Job failed!
2007-05-10 18:47:39,700 FATAL fetcher.Fetcher (?:invoke0(?)) -
java.lang.NullPointerException
at
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.
java:87)
2007-05-10 18:47:40,203 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.
java:87)
at
org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:12
5)
2007-05-10 18:47:40,205 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:12
5)
at
org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:
1736)
2007-05-10 18:47:40,206 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:
1736)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceF
ileRecordReader.java:108)
2007-05-10 18:47:40,206 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceF
ileRecordReader.java:108)
at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
2007-05-10 18:47:40,207 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
2007-05-10 18:47:40,208 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:115)
2007-05-10 18:47:40,208 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:115)
fetcher caught:java.lang.NullPointerException
2007-05-10 18:47:40,255 FATAL fetcher.Fetcher (Fetcher.java:run(264)) -
fetcher caught:java.lang.NullPointerException
at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:470)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
java.lang.NullPointerException
2007-05-10 18:47:41,533 FATAL fetcher.Fetcher (?:invoke0(?)) -
java.lang.NullPointerException
at
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.
java:87)
2007-05-10 18:47:41,534 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.
java:87)
at
org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:12
5)
2007-05-10 18:47:41,534 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:12
5)
at
org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:
1736)
2007-05-10 18:47:41,535 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:
1736)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceF
ileRecordReader.java:108)
2007-05-10 18:47:41,535 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceF
ileRecordReader.java:108)
at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
2007-05-10 18:47:41,535 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
2007-05-10 18:47:41,536 FATAL fetcher.Fetcher (?:invoke0(?)) - at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
java.lang.NullPointerException
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:115)
java.lang.NullPointerException
If anyone has any ideas on what I'm doing wrong, please let me know.
Thanks so much.
Jeff
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general