Hello,

I just start with Nutch. My problem: I do not understand, why URLs are not fetched. My simple trial with one start URL without any filters and some adjusted configuration can be seen below

fetcher.server.delay: 2.0
fetcher.verbose: true
db.ignore.internal.links: false

http://www.rwth-aachen.de
depth=6
threads=30
adddays=0
topN=15

[nu...@d-1 search]$ bin/nutch readdb crawl/crawldb -stats
CrawlDb statistics start: crawl/crawldb
Statistics for CrawlDb: crawl/crawldb
TOTAL urls:     156
retry 0:        156
min score:      0.0
avg score:      0.03282051
max score:      1.208
status 1 (db_unfetched):        149
status 2 (db_fetched):  5
status 4 (db_redir_temp):       1
status 5 (db_redir_perm):       1
CrawlDb statistics: done

Question, why are 149 URLs from 156 not fetched at all?

Thanks in advance
Jochen



Reply via email to