i have tried the recrawl script of susam pal and have wondered why url filtering no longer works. http://wiki.apache.org/nutch/Crawl
the mystery is
only Crawl.java adds crawl-tool.xml to the NutchConfiguration.
Configuration conf = NutchConfiguration.create();
conf.addResource("crawl-tool.xml");
Fetcher.java and all the other tools which filter the outlinks do not
add this.
this is really confusing me and i have spent some time to figure this out.
regards
reinhard
