[re cc'ing p5p]
demerphq wrote:
> On Wed, 16 Feb 2005 16:30:47 +0100, Rafael Garcia-Suarez
> > I was waiting for Hugo's advice on this patch.
> > Also, it would be nice to have something that works with /i, if that can
> > be done reliably.
> 
> Its done. I hope reliably too, but there is this annoying segfault
> with SA that i need to figure out. While it does appear to be related
> to my code im a bit pushed to come up with an explanation for why
> slight changes to code around the regex could change the segfault
> behaviour. (Note i cant get SA to segfault when i add debug text, and
> SA is the only thing ive tested against that has problems, and the
> code that has problems is evaled closures (possibly inside of a fork
> on win32) so it could be almost anything.)

That's what I was alluding to.

> To be honest i would really love it if someone with stronger perl guts
> and stonger C skills than me could give the code a good once over.
> Even if whoever does it knows nothing about the regex engines
> particulars a sanity check would be really appreciated.

/me has to look at it more thoroughly then :)

> > > I'd love to see this get into perl, even if just as an option enabled
> > > through a "use" pragma.   (in my opinion, if your regexp will benefit
> > > from a trie, you will know that in advance.)
> > 
> > or a new end-of-regexp switch. But if it works well I don't see why
> > this shouldn't be optimized by default (although avoiding slowdowns
> > in some cases would even be better -- do they affect regexp compilation
> > or execution ?)
> 
> Ok, in comp sci terms the cost of construction the trie is
> proportional to the number of characters in the trie. So IMO the cost
> is negligable. In what  is the common case (a short list of short
> words of relatively few unique characters) you probably wouldnt even
> notice the slowdown as the overall match time would improve more than
> the construct time degrades.

Cool.

> In human terms: there is a slight slowdown of compilation, that is
> more than made up with the execution time differences. IMO most times
> the optimisation will kick in the size of the alternation being
> optimized will be so small as to have negligable construction costs.
> The only time I could see the construction costs being an issue would
> be with things like:
> 
>   
> "a"=~/a|very|extremely|long|list|of|words|that|could|match|but|the|first|one|will|always|match|so|the|regex|is|a|bit|of|a|waste|anyway/;
> 
> The trouble is the only way we can "opt-out" of the optimisation
> sensibly is if we know what string we are matching against at the
> compile time of the regex, since the regex engine has no access to
> such knowledge we cant really do this.
> 
> Anyway, the results I show are that even simple subpatterns benefit
> from conversion to a Trie and it happens automatically so i dont see
> the need of a switch except possibly to force the disabling of the
> optimisation for some reason. Im not really sure how that would work
> yet tho.

Agreeing.

> I really hope that some of you guys try the patch out. Some hands on
> feedback would be appreciated, especially as im on Win32 and have no
> real access to *nix right now.

At least on Linux, all tests pass.

Reply via email to