Hello,
I'm new to Nutch. I've downloaded the latest version (0.8.1) and I'm using
WinXP. I did follow the instructions on the tutorial
(http://lucene.apache.org/nutch/tutorial8.html) but I having problems
crawling a small intranet site. Here are my steps:
$ bin/nutch crawl testa -dir test4 -depth 3 -topN 50 >& crawl.log
-- Output of crawl looks fine
$ more crawl.log
crawl started in: test4
rootUrlDir = testa
threads = 10
depth = 3
topN = 50
Injector: starting
Injector: crawlDb: test4/crawldb
Injector: urlDir: testa
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: starting
Generator: segment: test4/segments/20060927105913
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: test4/segments/20060927105913
Fetcher: threads: 10
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: test4/crawldb
CrawlDb update: segment: test4/segments/20060927105913
-- Checking the output
$ bin/nutch readdb test4 -url
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 2
at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:445)
$ bin/nutch readdb test4 -stats
CrawlDb statistics start: test4
Exception in thread "main" java.io.IOException: Input directory
c:/ir/nutch-0.8
1/test4/current in local is invalid.
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:274)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:327)
at
org.apache.nutch.crawl.CrawlDbReader.processStatJob(CrawlDbReader.ja
a:259)
at org.apache.nutch.crawl.CrawlDbReader.main(CrawlDbReader.java:440)
-- also tried checking the integrity of the crawl ...
$ bin/nutch org.apache.nutch.searcher.NutchBean apache
Total hits: 0
What is wrong? Thanks for any help.
--Omar
--
View this message in context:
http://www.nabble.com/Problems-with-Nutch-0.8.1-tf2346446.html#a6532322
Sent from the Nutch - User mailing list archive at Nabble.com.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general