At 9:56 AM -0400 9/2/04, Felix Gallo wrote:
Dan writes:
 [...]
 Yes, and some of the initial list already has ops to do those bits,
 though I fully plan on evil cheating versions for some extra speed.

If I recall correctly, someone with the best intentions attempted to write a clear, object-oriented (but still C/C++ based) regex engine to replace the existing highly hairy code. As a result it missed fitting in the cache, and ended up being like two orders of magnitude slower.

Not arguing for premature optimization, but anyone thinking about
regex might want to look ahead (get it?) to the end game where
regex pretty much has to be within .2x of its current speed.

I don't think we're going to be able to manage doing our matches in 20% of the time of the current regex engine. That's a bit ambitious, even for me. :)


FWIW, we did some work on this a few years back. (And yes, the fact that I can say that with a straight face scares me) We managed to run only about 25% slower than the perl 5 regex engine when using the basic string ops. This was pre-ICU, though, so the numbers may well be significantly worse now.

Anyway, that's the whole point of this exercise. I want to know what primitive operations the regex engine will need. (Or, rather, I've a pretty good idea and I want to make sure nothing is missed) When I know I plan on taking all the best accepted standards and practices of software engineering, throwing them out, and cheating like hell with those string primitives. If this means having several alternate class matching schemes (since bitmaps are likely fastest for small sets like 7/8 bit ASCII and some sort of trie fastest for large ones for the asian sets and Unicode) then fine. If this means being able to pin a single string in a fixed location to avoid a single pointer indirection on each access, fine. If it means knowing that since the input is US-ASCII or Latin-x characters are 8 bits and there are no combining characters so we drop to direct byte access, fine.

I'm not so much resigned to cheating as looking forward to it with relish. (And a bit of catsup, with a side of cole slaw)
--
Dan


--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to