Hi everybody,
I have worked on Nutch for some days but can not make it work. Below is some
output when crawling with nutch crawl. I have no idea why the fetcher failed
with NullPointerException. I have made some searching but find no answer with
this fail. Anyone can help me ?
Thanks for reading.
I’m using Solaris 10 Sparc, running with SF V440. Here’s my configs:
The url dir (/export/home/nutch/urls) have 2 file:
Netmode: contains: http://netmode.vietnamnet.vn
Localhost: contains: http://localhost:8080
The crawl-urlfilter.txt contains this:
# accept hosts in MY.DOMAIN.NAME
+^http://netmode.vietnamnet.vn
+^http://localhost:8080/nutch
When running with this shell script:
crawls=/export/home/nutch/crawls
urldir=/export/home/nutch/urls
rm -r $crawls
nutch crawl $urldir -dir $crawls -depth 1
It shows:
crawl started in: /export/home/nutch/crawls
rootUrlDir = /export/home/nutch/urls
threads = 10
depth = 1
Injector: starting
Injector: crawlDb: /export/home/nutch/crawls/crawldb
Injector: urlDir: /export/home/nutch/urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: starting
Generator: segment: /export/home/nutch/crawls/segments/20070110144113
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: /export/home/nutch/crawls/segments/20070110144113
Fetcher: threads: 10
fetching http://localhost:8080/nutch
fetching http://netmode.vietnamnet.vn/
fetch of http://localhost:8080/nutch failed with: java.lang.NullPointerException
fetch of http://netmode.vietnamnet.vn/ failed with:
java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: /export/home/nutch/crawls/crawldb
CrawlDb update: segment: /export/home/nutch/crawls/segments/20070110144113
CrawlDb update: Merging segment data into db.
CrawlDb update: done
LinkDb: starting
LinkDb: linkdb: /export/home/nutch/crawls/linkdb
LinkDb: adding segment: /export/home/nutch/crawls/segments/20070110144113
LinkDb: done
Indexer: starting
Indexer: linkdb: /export/home/nutch/crawls/linkdb
Indexer: adding segment: /export/home/nutch/crawls/segments/20070110144113
Optimizing index.
Indexer: done
Dedup: starting
Dedup: adding indexes in: /export/home/nutch/crawls/indexes
Dedup: done
Adding /export/home/nutch/crawls/indexes/part-00000
crawl finished: /export/home/nutch/crawls
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general