Your threads is larger than capacity of internet bandwidth => content ==
null or contentType == null

2007/7/31, Kai_testing Middleton <[EMAIL PROTECTED]>:
>
> Are hung threads natural?
>
> I ran a crawl:
> nohup time nutch crawl /usr/tmp/urls.txt -dir /usr/tmp/86sites -threads
> 200 -depth 10 -topN 103103
>
> it ran a few hours after which I noticed that it seemed hung:
>
> fetching http://www.mediarights.org/film/the_rules_of_the_game.php
> fetch of http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5failed 
> with: Http code=500, url=
> http://www.hollywood.com/MyHollywood/AddRating/2/3612623/1.5
> Aborting with 46 hung threads.
> java.lang.NullPointerException
> at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(
> FSDataInputStream.java:87)
> at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java
> :125)
> at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java
> :1736)
> at org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(
> SequenceFileRecordReader.java:108)
> at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
> fetcher caught:java.lang.NullPointerException
> java.lang.NullPointerException
> at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(
> FSDataInputStream.java:87)
> at org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java
> :125)
> at org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java
> :1736)
> at org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(
> SequenceFileRecordReader.java:108)
> at org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
> at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
> fetcher caught:java.lang.NullPointerException
> java.lang.NullPointerException
> at org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(
> FSDataInputStream.java:87)
>
> lather, rinse, repeat
> .
> .
> .
> one final:
> java.lang.NullPointerException
>
>
> then it didn't progress (though I didn't wait long).
>
> though hadoop.log seemed to keep going:
>
> 2007-07-30 15:21:05,106 FATAL fetcher.Fetcher - at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
> 2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
> 2007-07-30 15:21:05,107 FATAL fetcher.Fetcher - fetcher caught:
> java.lang.NullPointerException
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher -
> java.lang.NullPointerException
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.hadoop.fs.FSDataInputStream$Buffer.getPos(
> FSDataInputStream.java:87)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.hadoop.fs.FSDataInputStream.getPos(FSDataInputStream.java:125)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.hadoop.io.SequenceFile$Reader.getPosition(SequenceFile.java
> :1736)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.hadoop.mapred.SequenceFileRecordReader.getProgress(
> SequenceFileRecordReader.java:108)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.hadoop.mapred.MapTask$1.getProgress(MapTask.java:165)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:155)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:116)
> 2007-07-30 15:21:16,218 FATAL fetcher.Fetcher - fetcher caught:
> java.lang.NullPointerException
> 2007-07-30 16:16:02,932 INFO  fetcher.Fetcher - Fetcher: done
> 2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: starting
> 2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: db:
> /usr/tmp/86sites/crawldb
> 2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: segments:
> [/usr/tmp/86sites/segments/20070730124436]
> 2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: additions
> allowed: true
> 2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: URL
> normalizing: true
> 2007-07-30 16:16:02,947 INFO  crawl.CrawlDb - CrawlDb update: URL
> filtering: true
> 2007-07-30 16:16:02,993 INFO  crawl.CrawlDb - CrawlDb update: Merging
> segment data into db.
>
>
>
>
>
>
>
>       
> ____________________________________________________________________________________
> Park yourself in front of a world of choices in alternative vehicles.
> Visit the Yahoo! Auto Green Center.
> http://autos.yahoo.com/green_center/




-- 
********************************************************
Le Quoc Anh
Tel: 0912643289
http://quocanh263.googlepages.com/wedding
4/268 Le Trong Tan, Hanoi, Vietnam
********************************************************
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to