Re: [Haskell-cafe] Efficient parallel regular expressions

Martijn van Steenbergen Wed, 05 Nov 2008 05:57:08 -0800

Hello everyone,

Thank you all for your comments! Those are some very useful ideas.

I think I'll try roger's (private) and ChrisK's suggestion first: usingthe match groups. I'm not sure if the match groups inside the individualregexes will cause much trouble, but we'll see. I imagine I'll have tocount parentheses, except when it's followed by a \, except when that \follows another \, etc. There's probably other situations where a ()doesn't count as a group, perhaps when it's followed by a * or +. I'lllook into that.

If that doesn't work out I'll go for Neil's (from an algorithmic POVbeautiful) suggestion.

While I understand that some of you suggest I use parsec (or some othermature parser library) I'm pretty sure that's not what I want here. Thepatterns will almost always be very simple and regular expressions offeran extremely concise way of expressing when a hook should fire. Forcingthe user to use full parsers would cause the programs to become muchmore verbose. Still, Yogurt is flexible enough to allow the user to useparsec if he or she so chooses.


Thanks again,

Martijn.



Mitchell, Neil wrote:

Hi Martijn,

It's not that tricky if you do a regular expression state machine
yourself, but that's probably a bit too much work. One way to get a
speed up might be to take the regular expressions a,b,c,d and
generate a regex a+b+c+d, and one a+b. You can then check any string
s against a+b+c+d, if that matches check a+b, if that matches check
a. At each stage you eliminate half the regular expressions, which
means a match will take log n, where n is the number of regular
expressions.

This assumes the underlying regular expression engine constructs a
finite state machine, making it O(m) where m is the length of the
string to match.

Thanks

Neil

_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] Efficient parallel regular expressions

Reply via email to