[Haskell-cafe] Re: Fastest regex package?

Eugene Kirpichov Thu, 05 Feb 2009 07:54:10 -0800

I was wrong about BoyerMoore: it works about 2x faster than PCRE for
this example, so seems like it does the trick (although I suspect that
one could do even faster, so the question remains).


2009/2/5 Eugene Kirpichov <ekirpic...@gmail.com>:
> Hello.
>
> What is the fastest regex package of the multitude of packages present
> on Hackage (namely, PCRE, PCRE-light, DFA, TDFA and many more)?
>
> My benchmark (parsing a huge logfile with a regex like "GET
> /foo.xml.*fooid=([0-9]++).*barid=([0-9]++)") shows that plain PCRE is
> the fastest one (I tried PCRE, PCRE-light and TDFA; DFA can't do
> capturing groups at all, TDFA was abysmally slow (about 20x slower
> than PCRE), and it doesn't support ++), but maybe have I missed any
> blazing-fast package?
>
> Even PCRE seems somewhat too slow to me - matching this regex eats 50%
> of the program time; considering that the only things it does is
> ungzipping the log and searching for the regex, this seems like far
> too much, the regex is not that complex to my mind. I tried to replace
> this regex by Data.ByteString.Search.KnuthMorrisPratt and BoyerMoore
> from stringsearch, but they work even twice slower.
>
> I also noticed that PCRE does about 2x fewer allocations than
> PCRE-Light, but is not much faster.
> However, PCRE, strangely, has a non-pure interface, I had to use it
> with unsafePerformIO.
>
> It's a pity I can't put these findings on haskellwiki to
> http://www.haskell.org/haskellwiki/Regular_expressions since I don't
> have an account and new accounts can't be created right now. Could
> anyone put them there?
>
> All in all, my question remains: what is the fastest way to do this
> kind of parsing on a lazy bytestring?
>
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] Re: Fastest regex package?

Reply via email to