Re: [Haskell-cafe] Efficient parallel regular expressions

wren ng thornton Thu, 06 Nov 2008 00:48:25 -0800

Mitchell, Neil wrote:

Hi Martijn,

> It's not that tricky if you do a regular expression state machine
> yourself, but that's probably a bit too much work. One way to get
> a speed up might be to take the regular expressions a,b,c,d and
> generate a regex a+b+c+d, and one a+b. You can then check any string
> s against a+b+c+d, if that matches check a+b, if that matches check
> a. At each stage you eliminate half the regular expressions, which
> means a match will take log n, where n is the number of regular
> expressions.
>
> This assumes the underlying regular expression engine constructs a
> finite state machine, making it O(m) where m is the length of the
> string to match.

If you're implementing the machine yourself, you can implement it astrie automaton which has some "value" associated with each final state.That is, rather than just accepting or rejecting, when the automatonaccepts it returns the value associated with the particular final statethat it accepted by. This is a trivial extension on DFA/NFAimplementations and is perfectly suited to the problem of combiningmultiple linear tries (sic: regexes) into a single machine. Theminimization algorithms are a bit more complex than for DFA/NFAs, butstill fairly straightforward if you're only doing prefix merging.


And this gets rid of the O(log n) factor.

--
Live well,
~wren
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Efficient parallel regular expressions

Reply via email to