Are hung threads natural?

I ran a crawl:
nohup time nutch crawl /usr/tmp/urls.txt -dir /usr/tmp/86sites -threads 200 
-depth 10 -topN 103103

it ran a few hours after which I noticed that it seemed hung:

fetching http://www.mediarights.org/film/the_rules_of_the_game.php
fetch of http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5 failed 
with: Http code=500, 
url=http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5
Aborting with 46 hung threads.
java.lang.NullPointerException
at 
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)
at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:1736)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceFileRecordReader.java:108)
at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
fetcher caught:java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)
at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:1736)
at 
org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceFileRecordReader.java:108)
at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
fetcher caught:java.lang.NullPointerException
java.lang.NullPointerException
at 
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)

lather, rinse, repeat
.
.
.
one final:
java.lang.NullPointerException


then it didn't progress (though I didn't wait long).

though hadoop.log seemed to keep going:

2007-07-30 15:21:05,106 FATAL fetcher.Fetcher - at 
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - java.lang.NullPointerException
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at 
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at 
org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at 
org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:1736)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at 
org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceFileRecordReader.java:108)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at 
org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at 
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - fetcher 
caught:java.lang.NullPointerException
2007-07-30 16:16:02,932 INFO  fetcher.Fetcher - Fetcher: done
2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: starting
2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: db: 
/usr/tmp/86sites/crawldb
2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: segments: 
[/usr/tmp/86sites/segments/20070730124436]
2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: additions 
allowed: true
2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: URL normalizing: 
true
2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: URL filtering: 
true
2007-07-30 16:16:02,993 INFO  crawl.CrawlDb - CrawlDb update: Merging segment 
data into db.






      
____________________________________________________________________________________
Park yourself in front of a world of choices in alternative vehicles. Visit the 
Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/ 
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to