This is what I get after setting the logger to DEBUG. This is what I get in the crawl log. I'm starting my crawl with the following:

nutch crawl urls -dir crawl_test -depth 8 >& crawl_test.log &


fetch of http://www.lib.ncsu.edu/congbibs/house/100hdgst2.html failed with: java.lang.NullPointerException
java.lang.NullPointerException
at java.lang.System.arraycopy(Native Method)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer $Buffer.write(MapTask.java:822) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer $Buffer.write(MapTask.java:739)
at java.io.DataOutputStream.writeByte(DataOutputStream.java:136)
at org.apache.hadoop.io.WritableUtils.writeVLong(WritableUtils.java:290)
at org.apache.hadoop.io.WritableUtils.writeVInt(WritableUtils.java:270)
at org.apache.hadoop.io.Text.write(Text.java:281)
at org.apache.hadoop.io.serializer.WritableSerialization $WritableSerializer.serialize(WritableSerialization.java:90) at org.apache.hadoop.io.serializer.WritableSerialization $WritableSerializer.serialize(WritableSerialization.java:77) at org.apache.hadoop.mapred.MapTask $MapOutputBuffer.collect(MapTask.java:605) at org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java: 786)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:662)
fetcher caught:java.lang.NullPointerException
Attempting to finish item from unknown queue: org.apache.nutch.fetcher.fetcher$fetchi...@1d57c7
-finishing thread FetcherThread, activeThreads=13


Tim


On May 4, 2009, at 4:09 AM, Andrzej Bialecki wrote:

tsmori wrote:
I'm having an interesting problem that I think revolves around the interplay of a few settings that I'm not really clear on how they affect the crawl.
Currently I have:
content.limit = -1
fetcher.threads = 1000
fetcher.threads.per host = 100
indexer.max.tokens = 750000
I also increased the JAVA_HEAP space to account for the additional tokens. I'm not getting any out of memory errors, so that part should be okay. The problem is that with the content limit set high or not at all (I have tried other values), I get Fetch errors with NullPointerExceptions on one set of files (html files), these are fairly large html files, but not over
1MB. If I set the content limit to a reasonable amount, say 5MB, the
nullpointerexceptions go away, but I get a lot of truncation errors on a
different group of files (pdf files, all over 5MB).

Could you please copy the full stack trace, including line numbers?


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Timothy S. Mori
Systems Librarian for Enterprise Operations
IT Department
North Carolina State University Libraries
Campus Box 7111
Raleigh, NC 27695-7111
919.515.6182 (phone)




Reply via email to