First your problem is not very defined, but what I feel is null pointer exception comes because of crawler is not able to crawl the pages, so better you check whether url what u mentioned is correct, and then after crawl-urlfilter.txt, check out by giving +^http://www.msn.com as whole path.
and let me know whether have u done the settings for agent and robots in nutch-default.xml thanks kkfromus wrote: > > Iam a new user trying to configure nutch . Iam running into some issues . > I will appreciate if some one can help > > Iam running nutch under cygwin . Iam trying to crawl the web site given in > the tutorial > > i have a urls director and under that a url.text which has the entry > http://msn.com > I modified crawl-urlfilter.txt to use the right domain > +^http://([a-z0-9]*\.)*msn.com/ > > When I run nutch using ./nutch crawl urls -dir c:/nutch/crawl -depth 5 > -topN 50 , it runs . But during fetch i see an error > > fetching http://www.msn.com/ > fetch of http://www.msn.com/ failed with: java.lang.NullPointerException > Fetcher: done > > I looked at the readb stats but it says there is only one page > > I looked through the tomcat search page and searched for msn . No results > . Can some one please help > > Thanks > Kiran > > > Here is the full set of logs > $ ./nutch crawl urls -dir c:/nutch/crawl -depth 5 -topN 50 > crawl started in: c:/nutch/crawl > rootUrlDir = urls > threads = 10 > depth = 5 > topN = 50 > Injector: starting > Injector: crawlDb: c:/nutch/crawl/crawldb > Injector: urlDir: urls > Injector: Converting injected urls to crawl db entries. > Injector: Merging injected urls into crawl db. > Injector: done > Generator: starting > Generator: segment: c:/nutch/crawl/segments/20070318212902 > Generator: Selecting best-scoring urls due for fetch. > Generator: Partitioning selected urls by host, for politeness. > Generator: done. > Fetcher: starting > Fetcher: segment: c:/nutch/crawl/segments/20070318212902 > Fetcher: threads: 10 > fetching http://www.msn.com/ > fetch of http://www.msn.com/ failed with: java.lang.NullPointerException > Fetcher: done > CrawlDb update: starting > CrawlDb update: db: c:/nutch/crawl/crawldb > CrawlDb update: segment: c:/nutch/crawl/segments/20070318212902 > CrawlDb update: Merging segment data into db. > CrawlDb update: done > Generator: starting > Generator: segment: c:/nutch/crawl/segments/20070318212912 > Generator: Selecting best-scoring urls due for fetch. > Generator: Partitioning selected urls by host, for politeness. > Generator: done. > Fetcher: starting > Fetcher: segment: c:/nutch/crawl/segments/20070318212912 > Fetcher: threads: 10 > fetching http://www.msn.com/ > fetch of http://www.msn.com/ failed with: java.lang.NullPointerException > Fetcher: done > CrawlDb update: starting > CrawlDb update: db: c:/nutch/crawl/crawldb > CrawlDb update: segment: c:/nutch/crawl/segments/20070318212912 > CrawlDb update: Merging segment data into db. > CrawlDb update: done > Generator: starting > Generator: segment: c:/nutch/crawl/segments/20070318212920 > Generator: Selecting best-scoring urls due for fetch. > Generator: Partitioning selected urls by host, for politeness. > Generator: done. > Fetcher: starting > Fetcher: segment: c:/nutch/crawl/segments/20070318212920 > Fetcher: threads: 10 > fetching http://www.msn.com/ > fetch of http://www.msn.com/ failed with: java.lang.NullPointerException > Fetcher: done > CrawlDb update: starting > CrawlDb update: db: c:/nutch/crawl/crawldb > CrawlDb update: segment: c:/nutch/crawl/segments/20070318212920 > CrawlDb update: Merging segment data into db. > CrawlDb update: done > Generator: starting > Generator: segment: c:/nutch/crawl/segments/20070318212928 > Generator: Selecting best-scoring urls due for fetch. > Generator: Partitioning selected urls by host, for politeness. > Generator: done. > Fetcher: starting > Fetcher: segment: c:/nutch/crawl/segments/20070318212928 > Fetcher: threads: 10 > fetching http://www.msn.com/ > fetch of http://www.msn.com/ failed with: java.lang.NullPointerException > Fetcher: done > CrawlDb update: starting > CrawlDb update: db: c:/nutch/crawl/crawldb > CrawlDb update: segment: c:/nutch/crawl/segments/20070318212928 > CrawlDb update: Merging segment data into db. > CrawlDb update: done > Generator: starting > Generator: segment: c:/nutch/crawl/segments/20070318212936 > Generator: Selecting best-scoring urls due for fetch. > Generator: Partitioning selected urls by host, for politeness. > Generator: done. > Fetcher: starting > Fetcher: segment: c:/nutch/crawl/segments/20070318212936 > Fetcher: threads: 10 > fetching http://www.msn.com/ > fetch of http://www.msn.com/ failed with: java.lang.NullPointerException > Fetcher: done > CrawlDb update: starting > CrawlDb update: db: c:/nutch/crawl/crawldb > CrawlDb update: segment: c:/nutch/crawl/segments/20070318212936 > CrawlDb update: Merging segment data into db. > CrawlDb update: done > LinkDb: starting > LinkDb: linkdb: c:/nutch/crawl/linkdb > LinkDb: adding segment: c:/nutch/crawl/segments/20070318212902 > LinkDb: adding segment: c:/nutch/crawl/segments/20070318212912 > LinkDb: adding segment: c:/nutch/crawl/segments/20070318212920 > LinkDb: adding segment: c:/nutch/crawl/segments/20070318212928 > LinkDb: adding segment: c:/nutch/crawl/segments/20070318212936 > LinkDb: done > Indexer: starting > Indexer: linkdb: c:/nutch/crawl/linkdb > Indexer: adding segment: c:/nutch/crawl/segments/20070318212902 > Indexer: adding segment: c:/nutch/crawl/segments/20070318212912 > Indexer: adding segment: c:/nutch/crawl/segments/20070318212920 > Indexer: adding segment: c:/nutch/crawl/segments/20070318212928 > Indexer: adding segment: c:/nutch/crawl/segments/20070318212936 > Optimizing index. > Indexer: done > Dedup: starting > Dedup: adding indexes in: c:/nutch/crawl/indexes > Dedup: done > Adding c:/nutch/crawl/indexes/part-00000 > crawl finished: c:/nutch/crawl > > > > > > -- View this message in context: http://www.nabble.com/Nutch-0.8.1-issue-with-fetch-tf3425056.html#a9571555 Sent from the Nutch - User mailing list archive at Nabble.com. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
