Hello,
I just start with Nutch. My problem: I do not understand, why URLs are
not fetched. My simple trial with one start URL without any filters
and some adjusted configuration can be seen below
fetcher.server.delay: 2.0
fetcher.verbose: true
db.ignore.internal.links: false
http://www.rwth-aachen.de
depth=6
threads=30
adddays=0
topN=15
[nu...@d-1 search]$ bin/nutch readdb crawl/crawldb -stats
CrawlDb statistics start: crawl/crawldb
Statistics for CrawlDb: crawl/crawldb
TOTAL urls: 156
retry 0: 156
min score: 0.0
avg score: 0.03282051
max score: 1.208
status 1 (db_unfetched): 149
status 2 (db_fetched): 5
status 4 (db_redir_temp): 1
status 5 (db_redir_perm): 1
CrawlDb statistics: done
Question, why are 149 URLs from 156 not fetched at all?
Thanks in advance
Jochen