I've seen this issue when the fetcher would attempt to retrieve huge files
(>100MB).  The incredible amount of time required to fetch files this
large caused the individual fetcher thread to appear hung to the thread
monitoring the fetcher status.

In my particular case, the only reason the fetcher was retrieving such a
huge file is that the content length limit is broken for the http-client
protocol plugin (it works in the regular http protocol plugin).  See
Nutch-481.


On 6/19/07, Sunnyvale Fl <[EMAIL PROTECTED]> wrote:

Hi all,

I upgraded from Nutch 0.8.1 to 0.9 and thought the hung thread issues are
fixed in this version (NUTCH-344), but I am still seeing the same problem.
I am only running 10 threads for a focused crawl, and have the default
fetcher.max.crawl.delay of 30.  Every time it hangs, it is followed by a
NPE
- any clue?  Thanks!

2007-06-19 14:40:37,670 WARN  fetcher.Fetcher - Aborting with 3 hung
threads.
2007-06-19 15:01:46,518 FATAL fetcher.Fetcher -
java.lang.NullPointerException
2007-06-19 15:01:46,545 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(
FSDataInputStream.java
:87)
2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - at
org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java
:1736)
2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(
SequenceFileRecordReader.java:108)
2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:115)
2007-06-19 15:01:46,546 FATAL fetcher.Fetcher - fetcher caught:
java.lang.NullPointerException

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to