Iam a new user trying to configure nutch . Iam running into some issues . I will appreciate if some one can help
Iam running nutch under cygwin . Iam trying to crawl the web site given in the tutorial i have a urls director and under that a url.text which has the entry http://msn.com I modified crawl-urlfilter.txt to use the right domain +^http://([a-z0-9]*\.)*msn.com/ When I run nutch using ./nutch crawl urls -dir c:/nutch/crawl -depth 5 -topN 50 , it runs . But during fetch i see an error fetching http://www.msn.com/ fetch of http://www.msn.com/ failed with: java.lang.NullPointerException Fetcher: done I looked at the readb stats but it says there is only one page I looked through the tomcat search page and searched for msn . No results . Can some one please help Thanks Kiran Here is the full set of logs $ ./nutch crawl urls -dir c:/nutch/crawl -depth 5 -topN 50 crawl started in: c:/nutch/crawl rootUrlDir = urls threads = 10 depth = 5 topN = 50 Injector: starting Injector: crawlDb: c:/nutch/crawl/crawldb Injector: urlDir: urls Injector: Converting injected urls to crawl db entries. Injector: Merging injected urls into crawl db. Injector: done Generator: starting Generator: segment: c:/nutch/crawl/segments/20070318212902 Generator: Selecting best-scoring urls due for fetch. Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: starting Fetcher: segment: c:/nutch/crawl/segments/20070318212902 Fetcher: threads: 10 fetching http://www.msn.com/ fetch of http://www.msn.com/ failed with: java.lang.NullPointerException Fetcher: done CrawlDb update: starting CrawlDb update: db: c:/nutch/crawl/crawldb CrawlDb update: segment: c:/nutch/crawl/segments/20070318212902 CrawlDb update: Merging segment data into db. CrawlDb update: done Generator: starting Generator: segment: c:/nutch/crawl/segments/20070318212912 Generator: Selecting best-scoring urls due for fetch. Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: starting Fetcher: segment: c:/nutch/crawl/segments/20070318212912 Fetcher: threads: 10 fetching http://www.msn.com/ fetch of http://www.msn.com/ failed with: java.lang.NullPointerException Fetcher: done CrawlDb update: starting CrawlDb update: db: c:/nutch/crawl/crawldb CrawlDb update: segment: c:/nutch/crawl/segments/20070318212912 CrawlDb update: Merging segment data into db. CrawlDb update: done Generator: starting Generator: segment: c:/nutch/crawl/segments/20070318212920 Generator: Selecting best-scoring urls due for fetch. Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: starting Fetcher: segment: c:/nutch/crawl/segments/20070318212920 Fetcher: threads: 10 fetching http://www.msn.com/ fetch of http://www.msn.com/ failed with: java.lang.NullPointerException Fetcher: done CrawlDb update: starting CrawlDb update: db: c:/nutch/crawl/crawldb CrawlDb update: segment: c:/nutch/crawl/segments/20070318212920 CrawlDb update: Merging segment data into db. CrawlDb update: done Generator: starting Generator: segment: c:/nutch/crawl/segments/20070318212928 Generator: Selecting best-scoring urls due for fetch. Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: starting Fetcher: segment: c:/nutch/crawl/segments/20070318212928 Fetcher: threads: 10 fetching http://www.msn.com/ fetch of http://www.msn.com/ failed with: java.lang.NullPointerException Fetcher: done CrawlDb update: starting CrawlDb update: db: c:/nutch/crawl/crawldb CrawlDb update: segment: c:/nutch/crawl/segments/20070318212928 CrawlDb update: Merging segment data into db. CrawlDb update: done Generator: starting Generator: segment: c:/nutch/crawl/segments/20070318212936 Generator: Selecting best-scoring urls due for fetch. Generator: Partitioning selected urls by host, for politeness. Generator: done. Fetcher: starting Fetcher: segment: c:/nutch/crawl/segments/20070318212936 Fetcher: threads: 10 fetching http://www.msn.com/ fetch of http://www.msn.com/ failed with: java.lang.NullPointerException Fetcher: done CrawlDb update: starting CrawlDb update: db: c:/nutch/crawl/crawldb CrawlDb update: segment: c:/nutch/crawl/segments/20070318212936 CrawlDb update: Merging segment data into db. CrawlDb update: done LinkDb: starting LinkDb: linkdb: c:/nutch/crawl/linkdb LinkDb: adding segment: c:/nutch/crawl/segments/20070318212902 LinkDb: adding segment: c:/nutch/crawl/segments/20070318212912 LinkDb: adding segment: c:/nutch/crawl/segments/20070318212920 LinkDb: adding segment: c:/nutch/crawl/segments/20070318212928 LinkDb: adding segment: c:/nutch/crawl/segments/20070318212936 LinkDb: done Indexer: starting Indexer: linkdb: c:/nutch/crawl/linkdb Indexer: adding segment: c:/nutch/crawl/segments/20070318212902 Indexer: adding segment: c:/nutch/crawl/segments/20070318212912 Indexer: adding segment: c:/nutch/crawl/segments/20070318212920 Indexer: adding segment: c:/nutch/crawl/segments/20070318212928 Indexer: adding segment: c:/nutch/crawl/segments/20070318212936 Optimizing index. Indexer: done Dedup: starting Dedup: adding indexes in: c:/nutch/crawl/indexes Dedup: done Adding c:/nutch/crawl/indexes/part-00000 crawl finished: c:/nutch/crawl -- View this message in context: http://www.nabble.com/Nutch-0.8.1-issue-with-fetch-tf3425056.html#a9546446 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
