i have tried the recrawl script of susam pal and have wondered why
url filtering no longer works.
http://wiki.apache.org/nutch/Crawl

the mystery is

only Crawl.java adds crawl-tool.xml to the NutchConfiguration.

Configuration conf = NutchConfiguration.create();
conf.addResource("crawl-tool.xml");

Fetcher.java and all the other tools which filter the outlinks do not
add this.
this is really confusing me and i have spent some time to figure this out.

regards
reinhard







Reply via email to