> I forgot to add: it is also slightly incompatible with Perl regex (and > consequently with Java regex), I don't remember the details, they are > somewhere in the docs, but the incompatibility is caused by some rarely > used operators being not implemented... so I guess we could live with it.
I have made some quick tests with regex-urlfilter... The major problem is that it doen't use the Perl syntax... For instance, ît doesn't support the boundary matchers ^ and $ (which are used in nutch) Of course, regexp can be easily rewritten. But how will we face to this if we switch to dk.brics.automaton? * Change the syntax used in Nutch? * Convert perl syntax to dk.brics.automaton? I will make some benchs based on regexp-urlfilter to evaluate if dk.brics.automaton integration in Nutch is interesting (facing the needed changes in the regexp syntax) Jérôme -- http://motrech.free.fr/ http://www.frutch.org/