Are hung threads natural?
I ran a crawl:
nohup time nutch crawl /usr/tmp/urls.txt -dir /usr/tmp/86sites -threads 200
-depth 10 -topN 103103
it ran a few hours after which I noticed that it seemed hung:
fetching http://www.mediarights.org/film/the_rules_of_the_game.php
fetch of http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5 failed
with: Http code=500,
url=http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5
Aborting with 46 hung threads.
java.lang.NullPointerException
at
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)
at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:1736)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceFileRecordReader.java:108)
at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
fetcher caught:java.lang.NullPointerException
java.lang.NullPointerException
at
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)
at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:1736)
at
org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceFileRecordReader.java:108)
at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
fetcher caught:java.lang.NullPointerException
java.lang.NullPointerException
at
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)
lather, rinse, repeat
.
.
.
one final:
java.lang.NullPointerException
then it didn't progress (though I didn't wait long).
though hadoop.log seemed to keep going:
2007-07-30 15:21:05,106 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - fetcher
caught:java.lang.NullPointerException
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - java.lang.NullPointerException
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(FSDataInputStream.java:87)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java:1736)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(SequenceFileRecordReader.java:108)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - fetcher
caught:java.lang.NullPointerException
2007-07-30 16:16:02,932 INFO fetcher.Fetcher - Fetcher: done
2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: starting
2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: db:
/usr/tmp/86sites/crawldb
2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: segments:
[/usr/tmp/86sites/segments/20070730124436]
2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: additions
allowed: true
2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: URL normalizing:
true
2007-07-30 16:16:02,947 INFO crawl.CrawlDb - CrawlDb update: URL filtering:
true
2007-07-30 16:16:02,993 INFO crawl.CrawlDb - CrawlDb update: Merging segment
data into db.
____________________________________________________________________________________
Park yourself in front of a world of choices in alternative vehicles. Visit the
Yahoo! Auto Green Center.
http://autos.yahoo.com/green_center/
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general