I was wrong about BoyerMoore: it works about 2x faster than PCRE for this example, so seems like it does the trick (although I suspect that one could do even faster, so the question remains).
2009/2/5 Eugene Kirpichov <ekirpic...@gmail.com>: > Hello. > > What is the fastest regex package of the multitude of packages present > on Hackage (namely, PCRE, PCRE-light, DFA, TDFA and many more)? > > My benchmark (parsing a huge logfile with a regex like "GET > /foo.xml.*fooid=([0-9]++).*barid=([0-9]++)") shows that plain PCRE is > the fastest one (I tried PCRE, PCRE-light and TDFA; DFA can't do > capturing groups at all, TDFA was abysmally slow (about 20x slower > than PCRE), and it doesn't support ++), but maybe have I missed any > blazing-fast package? > > Even PCRE seems somewhat too slow to me - matching this regex eats 50% > of the program time; considering that the only things it does is > ungzipping the log and searching for the regex, this seems like far > too much, the regex is not that complex to my mind. I tried to replace > this regex by Data.ByteString.Search.KnuthMorrisPratt and BoyerMoore > from stringsearch, but they work even twice slower. > > I also noticed that PCRE does about 2x fewer allocations than > PCRE-Light, but is not much faster. > However, PCRE, strangely, has a non-pure interface, I had to use it > with unsafePerformIO. > > It's a pity I can't put these findings on haskellwiki to > http://www.haskell.org/haskellwiki/Regular_expressions since I don't > have an account and new accounts can't be created right now. Could > anyone put them there? > > All in all, my question remains: what is the fastest way to do this > kind of parsing on a lazy bytestring? > _______________________________________________ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe