> I forgot to add: it is also slightly incompatible with Perl regex (and
> consequently with Java regex), I don't remember the details, they are
> somewhere in the docs, but the incompatibility is caused by some rarely
> used operators being not implemented... so I guess we could live with it.

I have made some quick tests with regex-urlfilter...
The major problem is that it doen't use the  Perl syntax...
For instance, ît doesn't support the boundary matchers ^ and $ (which are
used in nutch)
Of course, regexp can be easily rewritten. But how will we face to this if
we switch to dk.brics.automaton?
* Change the syntax used in Nutch?
* Convert perl syntax to dk.brics.automaton?

I will make some benchs based on regexp-urlfilter to evaluate if
dk.brics.automaton integration in Nutch is
interesting (facing the needed changes in the regexp syntax)

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Reply via email to