Re: Newbie question: why are URLs not fetched

MilleBii Fri, 26 Jun 2009 14:23:09 -0700

out of the box, only simple urls (no special characters like "?" etc...) are
crawled.
So make sure you remove such filters
so make sure to comment in crawl-urlfilter.txt


# skip URLs containing certain characters as probable queries, etc.
*#*-[...@=]

How do your urls look like.

2009/6/26 Jochen Witte <[email protected]>

> Hello,
>
> I just start with Nutch. My problem: I do not understand, why URLs are not
> fetched. My simple trial with one start URL without any filters and some
> adjusted configuration can be seen below
>
> fetcher.server.delay: 2.0
> fetcher.verbose: true
> db.ignore.internal.links: false
>
> http://www.rwth-aachen.de
> depth=6
> threads=30
> adddays=0
> topN=15
>
> [nu...@d-1 search]$ bin/nutch readdb crawl/crawldb -stats
> CrawlDb statistics start: crawl/crawldb
> Statistics for CrawlDb: crawl/crawldb
> TOTAL urls:     156
> retry 0:        156
> min score:      0.0
> avg score:      0.03282051
> max score:      1.208
> status 1 (db_unfetched):        149
> status 2 (db_fetched):  5
> status 4 (db_redir_temp):       1
> status 5 (db_redir_perm):       1
> CrawlDb statistics: done
>
> Question, why are 149 URLs from 156 not fetched at all?
>
> Thanks in advance
> Jochen
>
>
>
>


-- 
-MilleBii-

Re: Newbie question: why are URLs not fetched

Reply via email to