> Orton, Yves wrote:
> [...]
> > <shameless plug>
> > But David and the other Regexp authors need to update their
> code to take
> > advantage of 5.9.2 and later innate TRIE optimisation. They
> still have
> > room for optimising the patterns that they build but they
> will need to
> > build fairly different looking patterns to really harness
> the TRIE regop.
> >
> > </shameless plug>
>
> No, I've been following the threads on p5p. I've been looking hard at
> the stuff I do, and the patterns I generate come from little patterns
> that all tend to feature lots of metacharacters (otherwise
> I'd be doing
> hash lookups or index()), correct me if I'm wrong, such
> patterns don't
> benefit from your trie optimisations. E.g., what happens with
>
> FROM MRS\. [A-Z]+ [A-Z]+
> FROM MRS [A-Z]+ [A-Z]+
> FROM MR [A-Z]+ [A-Z]+
> FROM MR\. [A-Z]+ [A-Z]+
> FROM: MRS\. [A-Z]+ [A-Z]+
> FROM: MRS [A-Z]+ [A-Z]+
> FROM: MR [A-Z]+ [A-Z]+
> FROM: MR\. [A-Z]+ [A-Z]+
>
> (actual patterns lifted from Nigerian spam). R::A produces
>
> FROM:? MRS?\.? [A-Z]+ [A-Z]+
>
> Instead of the whole mess or'ed together. I'm seriously
> lacking time to benchmark the differences.
Ill see what I can do.
Also I think this is a perfectly reasonable output. But what about when you add TO: variants to the list? Or a different header field? You would then want to end up with
/(FROM|TO):? MRS?\.? [A-Z]+ [A-Z]+/
Which would allow the tree optimization, although its likely the full expansion of the first part would be faster as it would require less regops to be executed which in itself speeds things up.
Which is what i was trying to get at (although i expressed myself poorly). There is still room for perl side regex optimisation, it just needs to be made aware of the TRIE support now built into Perl, and possibly the A-C support if it gets applied.
Yves
