Never mind the spam . The problem was because the property value for the
agent name was empty

<property>
  <name>http.agent.name</name>
  <value>TestNutch</value>
  <description>Spareagent
  </description>
</property>


kkfromus wrote:
> 
> Iam a new user trying to configure nutch . Iam running into some issues .
> I will appreciate if some one can help
> 
> Iam running nutch under cygwin . Iam trying to crawl the web site given in
> the tutorial
> 
> i have a urls director and under that a url.text which has the entry
> http://msn.com
> I modified crawl-urlfilter.txt  to use the right domain
> +^http://([a-z0-9]*\.)*msn.com/
> 
> When I run nutch using ./nutch crawl urls -dir c:/nutch/crawl -depth 5
> -topN 50 , it runs . But during fetch i see an error 
> 
> fetching http://www.msn.com/
> fetch of http://www.msn.com/ failed with: java.lang.NullPointerException
> Fetcher: done
> 
> I looked at the readb stats but it says there is only one page
> 
> I looked through the tomcat search page and searched for msn . No results
> . Can some one please help
> 
> Thanks
> Kiran
> 
> 
> Here is the full set of logs
> $ ./nutch crawl urls -dir c:/nutch/crawl -depth 5 -topN 50
> crawl started in: c:/nutch/crawl
> rootUrlDir = urls
> threads = 10
> depth = 5
> topN = 50
> Injector: starting
> Injector: crawlDb: c:/nutch/crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Merging injected urls into crawl db.
> Injector: done
> Generator: starting
> Generator: segment: c:/nutch/crawl/segments/20070318212902
> Generator: Selecting best-scoring urls due for fetch.
> Generator: Partitioning selected urls by host, for politeness.
> Generator: done.
> Fetcher: starting
> Fetcher: segment: c:/nutch/crawl/segments/20070318212902
> Fetcher: threads: 10
> fetching http://www.msn.com/
> fetch of http://www.msn.com/ failed with: java.lang.NullPointerException
> Fetcher: done
> CrawlDb update: starting
> CrawlDb update: db: c:/nutch/crawl/crawldb
> CrawlDb update: segment: c:/nutch/crawl/segments/20070318212902
> CrawlDb update: Merging segment data into db.
> CrawlDb update: done
> Generator: starting
> Generator: segment: c:/nutch/crawl/segments/20070318212912
> Generator: Selecting best-scoring urls due for fetch.
> Generator: Partitioning selected urls by host, for politeness.
> Generator: done.
> Fetcher: starting
> Fetcher: segment: c:/nutch/crawl/segments/20070318212912
> Fetcher: threads: 10
> fetching http://www.msn.com/
> fetch of http://www.msn.com/ failed with: java.lang.NullPointerException
> Fetcher: done
> CrawlDb update: starting
> CrawlDb update: db: c:/nutch/crawl/crawldb
> CrawlDb update: segment: c:/nutch/crawl/segments/20070318212912
> CrawlDb update: Merging segment data into db.
> CrawlDb update: done
> Generator: starting
> Generator: segment: c:/nutch/crawl/segments/20070318212920
> Generator: Selecting best-scoring urls due for fetch.
> Generator: Partitioning selected urls by host, for politeness.
> Generator: done.
> Fetcher: starting
> Fetcher: segment: c:/nutch/crawl/segments/20070318212920
> Fetcher: threads: 10
> fetching http://www.msn.com/
> fetch of http://www.msn.com/ failed with: java.lang.NullPointerException
> Fetcher: done
> CrawlDb update: starting
> CrawlDb update: db: c:/nutch/crawl/crawldb
> CrawlDb update: segment: c:/nutch/crawl/segments/20070318212920
> CrawlDb update: Merging segment data into db.
> CrawlDb update: done
> Generator: starting
> Generator: segment: c:/nutch/crawl/segments/20070318212928
> Generator: Selecting best-scoring urls due for fetch.
> Generator: Partitioning selected urls by host, for politeness.
> Generator: done.
> Fetcher: starting
> Fetcher: segment: c:/nutch/crawl/segments/20070318212928
> Fetcher: threads: 10
> fetching http://www.msn.com/
> fetch of http://www.msn.com/ failed with: java.lang.NullPointerException
> Fetcher: done
> CrawlDb update: starting
> CrawlDb update: db: c:/nutch/crawl/crawldb
> CrawlDb update: segment: c:/nutch/crawl/segments/20070318212928
> CrawlDb update: Merging segment data into db.
> CrawlDb update: done
> Generator: starting
> Generator: segment: c:/nutch/crawl/segments/20070318212936
> Generator: Selecting best-scoring urls due for fetch.
> Generator: Partitioning selected urls by host, for politeness.
> Generator: done.
> Fetcher: starting
> Fetcher: segment: c:/nutch/crawl/segments/20070318212936
> Fetcher: threads: 10
> fetching http://www.msn.com/
> fetch of http://www.msn.com/ failed with: java.lang.NullPointerException
> Fetcher: done
> CrawlDb update: starting
> CrawlDb update: db: c:/nutch/crawl/crawldb
> CrawlDb update: segment: c:/nutch/crawl/segments/20070318212936
> CrawlDb update: Merging segment data into db.
> CrawlDb update: done
> LinkDb: starting
> LinkDb: linkdb: c:/nutch/crawl/linkdb
> LinkDb: adding segment: c:/nutch/crawl/segments/20070318212902
> LinkDb: adding segment: c:/nutch/crawl/segments/20070318212912
> LinkDb: adding segment: c:/nutch/crawl/segments/20070318212920
> LinkDb: adding segment: c:/nutch/crawl/segments/20070318212928
> LinkDb: adding segment: c:/nutch/crawl/segments/20070318212936
> LinkDb: done
> Indexer: starting
> Indexer: linkdb: c:/nutch/crawl/linkdb
> Indexer: adding segment: c:/nutch/crawl/segments/20070318212902
> Indexer: adding segment: c:/nutch/crawl/segments/20070318212912
> Indexer: adding segment: c:/nutch/crawl/segments/20070318212920
> Indexer: adding segment: c:/nutch/crawl/segments/20070318212928
> Indexer: adding segment: c:/nutch/crawl/segments/20070318212936
> Optimizing index.
> Indexer: done
> Dedup: starting
> Dedup: adding indexes in: c:/nutch/crawl/indexes
> Dedup: done
> Adding c:/nutch/crawl/indexes/part-00000
> crawl finished: c:/nutch/crawl
> 
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Nutch-0.8.1-issue-with-fetch-tf3425056.html#a9546898
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to