NullPointerException
--------------------
Key: NUTCH-428
URL: https://issues.apache.org/jira/browse/NUTCH-428
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: 0.8.1
Environment: Windows XP
Reporter: Piyush
I am using the NUTCH.Bat provided in one one of the thread. (i am not using
CYGWIN) Whenever I try to fetch the Item, I am getting fetching failed
"nullpointerexception"
I have a URL Directory. which has urls.txt file. there is only one entry in the
file which is http://www.winzip.com/land_about.htm.
I have updated the crawl-urlfilter.txt with +^http://www.winzip.com/.
Is there any other settings I am missing?? Any help is greatly appreciated.
The command i used to start the crawl is
nutch crawl urls -dir crawlResults -depth 1
Here is my log
crawl started in: crawlResult
rootUrlDir = urls
threads = 10
depth = 1
Injector: starting
Injector: crawlDb: crawlResult/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: starting
Generator: segment: crawlResult/segments/20070110085314
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawlResult/segments/20070110085314
Fetcher: threads: 10
fetching http://www.winzip.com/land_about.htm
fetch of http://www.winzip.com/land_about.htm failed with:
java.lang.NullPointerException
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawlResult/crawldb
CrawlDb update: segment: crawlResult/segments/20070110085314
CrawlDb update: Merging segment data into db.
CrawlDb update: done
LinkDb: starting
LinkDb: linkdb: crawlResult/linkdb
LinkDb: adding segment: crawlResult/segments/20070110085314
LinkDb: done
Indexer: starting
Indexer: linkdb: crawlResult/linkdb
Indexer: adding segment: crawlResult/segments/20070110085314
Optimizing index.
Indexer: done
Dedup: starting
Dedup: adding indexes in: crawlResult/indexes
Dedup: done
Adding crawlResult/indexes/part-00000
crawl finished: crawlResult
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers