Hi, I'm testing Nutch and until now everything works fine (ok, some hours spent in reading, testing, testing and testing but it's normal. I have a noob question: I have to crawl websites only within a ccTLD.
In the crawl-urlfilter.txt should I wright so: # accept hosts in MY.DOMAIN.NAME <http://my.domain.name/> +^http://([a-z0-9]*\.)*.ch/ or so # accept hosts in MY.DOMAIN.NAME <http://my.domain.name/> +^http://([a-z0-9]*\.)*ch/ The difference is the dot before the "ch" ccTLD. I mean, the dot before the bracket is already dividing the ccTLD and the name (or the root and a subdomain) or sould I add one like in the first exemple? In the installation guide I can see: +^http://([a-z0-9]*\.)*apache.org/ Is crawling every subdomain of apache.org (xxx.apache.org) or is crawling apache.org? Many thanks for any help Mauro