out of the box, only simple urls (no special characters like "?" etc...) are crawled. So make sure you remove such filters so make sure to comment in crawl-urlfilter.txt
# skip URLs containing certain characters as probable queries, etc. *#*-[...@=] How do your urls look like. 2009/6/26 Jochen Witte <[email protected]> > Hello, > > I just start with Nutch. My problem: I do not understand, why URLs are not > fetched. My simple trial with one start URL without any filters and some > adjusted configuration can be seen below > > fetcher.server.delay: 2.0 > fetcher.verbose: true > db.ignore.internal.links: false > > http://www.rwth-aachen.de > depth=6 > threads=30 > adddays=0 > topN=15 > > [nu...@d-1 search]$ bin/nutch readdb crawl/crawldb -stats > CrawlDb statistics start: crawl/crawldb > Statistics for CrawlDb: crawl/crawldb > TOTAL urls: 156 > retry 0: 156 > min score: 0.0 > avg score: 0.03282051 > max score: 1.208 > status 1 (db_unfetched): 149 > status 2 (db_fetched): 5 > status 4 (db_redir_temp): 1 > status 5 (db_redir_perm): 1 > CrawlDb statistics: done > > Question, why are 149 URLs from 156 not fetched at all? > > Thanks in advance > Jochen > > > > -- -MilleBii-
