Iam a new user trying to configure nutch . Iam running into some issues . I
will appreciate if some one can help

Iam running nutch under cygwin . Iam trying to crawl the web site given in
the tutorial

i have a urls director and under that a url.text which has the entry
http://msn.com
I modified crawl-urlfilter.txt  to use the right domain
+^http://([a-z0-9]*\.)*msn.com/

When I run nutch using ./nutch crawl urls -dir c:/nutch/crawl -depth 5 -topN
50 , it runs . But during fetch i see an error 

fetching http://www.msn.com/
fetch of http://www.msn.com/ failed with: java.lang.NullPointerException
Fetcher: done

I looked at the readb stats but it says there is only one page

I looked through the tomcat search page and searched for msn . No results .
Can some one please help

Thanks
Kiran


Here is the full set of logs
$ ./nutch crawl urls -dir c:/nutch/crawl -depth 5 -topN 50
crawl started in: c:/nutch/crawl
rootUrlDir = urls
threads = 10
depth = 5
topN = 50
Injector: starting
Injector: crawlDb: c:/nutch/crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: starting
Generator: segment: c:/nutch/crawl/segments/20070318212902
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: c:/nutch/crawl/segments/20070318212902
Fetcher: threads: 10
fetching http://www.msn.com/
fetch of http://www.msn.com/ failed with: java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: c:/nutch/crawl/crawldb
CrawlDb update: segment: c:/nutch/crawl/segments/20070318212902
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: starting
Generator: segment: c:/nutch/crawl/segments/20070318212912
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: c:/nutch/crawl/segments/20070318212912
Fetcher: threads: 10
fetching http://www.msn.com/
fetch of http://www.msn.com/ failed with: java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: c:/nutch/crawl/crawldb
CrawlDb update: segment: c:/nutch/crawl/segments/20070318212912
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: starting
Generator: segment: c:/nutch/crawl/segments/20070318212920
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: c:/nutch/crawl/segments/20070318212920
Fetcher: threads: 10
fetching http://www.msn.com/
fetch of http://www.msn.com/ failed with: java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: c:/nutch/crawl/crawldb
CrawlDb update: segment: c:/nutch/crawl/segments/20070318212920
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: starting
Generator: segment: c:/nutch/crawl/segments/20070318212928
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: c:/nutch/crawl/segments/20070318212928
Fetcher: threads: 10
fetching http://www.msn.com/
fetch of http://www.msn.com/ failed with: java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: c:/nutch/crawl/crawldb
CrawlDb update: segment: c:/nutch/crawl/segments/20070318212928
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: starting
Generator: segment: c:/nutch/crawl/segments/20070318212936
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: c:/nutch/crawl/segments/20070318212936
Fetcher: threads: 10
fetching http://www.msn.com/
fetch of http://www.msn.com/ failed with: java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: c:/nutch/crawl/crawldb
CrawlDb update: segment: c:/nutch/crawl/segments/20070318212936
CrawlDb update: Merging segment data into db.
CrawlDb update: done
LinkDb: starting
LinkDb: linkdb: c:/nutch/crawl/linkdb
LinkDb: adding segment: c:/nutch/crawl/segments/20070318212902
LinkDb: adding segment: c:/nutch/crawl/segments/20070318212912
LinkDb: adding segment: c:/nutch/crawl/segments/20070318212920
LinkDb: adding segment: c:/nutch/crawl/segments/20070318212928
LinkDb: adding segment: c:/nutch/crawl/segments/20070318212936
LinkDb: done
Indexer: starting
Indexer: linkdb: c:/nutch/crawl/linkdb
Indexer: adding segment: c:/nutch/crawl/segments/20070318212902
Indexer: adding segment: c:/nutch/crawl/segments/20070318212912
Indexer: adding segment: c:/nutch/crawl/segments/20070318212920
Indexer: adding segment: c:/nutch/crawl/segments/20070318212928
Indexer: adding segment: c:/nutch/crawl/segments/20070318212936
Optimizing index.
Indexer: done
Dedup: starting
Dedup: adding indexes in: c:/nutch/crawl/indexes
Dedup: done
Adding c:/nutch/crawl/indexes/part-00000
crawl finished: c:/nutch/crawl





-- 
View this message in context: 
http://www.nabble.com/Nutch-0.8.1-issue-with-fetch-tf3425056.html#a9546446
Sent from the Nutch - User mailing list archive at Nabble.com.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to