[re cc'ing p5p] demerphq wrote: > On Wed, 16 Feb 2005 16:30:47 +0100, Rafael Garcia-Suarez > > I was waiting for Hugo's advice on this patch. > > Also, it would be nice to have something that works with /i, if that can > > be done reliably. > > Its done. I hope reliably too, but there is this annoying segfault > with SA that i need to figure out. While it does appear to be related > to my code im a bit pushed to come up with an explanation for why > slight changes to code around the regex could change the segfault > behaviour. (Note i cant get SA to segfault when i add debug text, and > SA is the only thing ive tested against that has problems, and the > code that has problems is evaled closures (possibly inside of a fork > on win32) so it could be almost anything.)
That's what I was alluding to. > To be honest i would really love it if someone with stronger perl guts > and stonger C skills than me could give the code a good once over. > Even if whoever does it knows nothing about the regex engines > particulars a sanity check would be really appreciated. /me has to look at it more thoroughly then :) > > > I'd love to see this get into perl, even if just as an option enabled > > > through a "use" pragma. (in my opinion, if your regexp will benefit > > > from a trie, you will know that in advance.) > > > > or a new end-of-regexp switch. But if it works well I don't see why > > this shouldn't be optimized by default (although avoiding slowdowns > > in some cases would even be better -- do they affect regexp compilation > > or execution ?) > > Ok, in comp sci terms the cost of construction the trie is > proportional to the number of characters in the trie. So IMO the cost > is negligable. In what is the common case (a short list of short > words of relatively few unique characters) you probably wouldnt even > notice the slowdown as the overall match time would improve more than > the construct time degrades. Cool. > In human terms: there is a slight slowdown of compilation, that is > more than made up with the execution time differences. IMO most times > the optimisation will kick in the size of the alternation being > optimized will be so small as to have negligable construction costs. > The only time I could see the construction costs being an issue would > be with things like: > > > "a"=~/a|very|extremely|long|list|of|words|that|could|match|but|the|first|one|will|always|match|so|the|regex|is|a|bit|of|a|waste|anyway/; > > The trouble is the only way we can "opt-out" of the optimisation > sensibly is if we know what string we are matching against at the > compile time of the regex, since the regex engine has no access to > such knowledge we cant really do this. > > Anyway, the results I show are that even simple subpatterns benefit > from conversion to a Trie and it happens automatically so i dont see > the need of a switch except possibly to force the disabling of the > optimisation for some reason. Im not really sure how that would work > yet tho. Agreeing. > I really hope that some of you guys try the patch out. Some hands on > feedback would be appreciated, especially as im on Win32 and have no > real access to *nix right now. At least on Linux, all tests pass.
