On Wed, 2002-09-04 at 00:22, Aaron Sherman wrote:

> Then, why is there a C<+>? Why not make it C<|>?
> 
>       $foo = rx/ <<a>|<b>|[cde]>|f /

This brings to mind a few big things that have been batting around in my
head about user-defined rules for a while now.... These things fall out
nicely from A5, I think, but correct me if there's some reason I'm wrong
on that.

It would be nice to be able to flag a rule as being either a pure
character class or a generic rule. At the very least this lets the
compiler issue clearer errors, perhaps earlier. Something like:

    rule abc :cc { <[abc]> }
    
Perhaps the engine could even try to coerce non-character classes so
that this would work (not that this simple example would happen, but if
you're working with a rule-chain it might be useful):
    
    rule abc :cc { a | b | c }

Along that line, inline closures can do quite a bit, but it would be
nice if they could be used as counts (instead of the more painful
process of controlling backtracking via commit. Let's assume that C<<
<={...}> >> is used this way. Here's an example of its use:

    /<[\x0d\x0a]><={ .count == 1 || (.count == 2 && .atom eq "\x0d\x0a") }>/

Now that's a very expensive way to say C</\x0d\x0a|\x0d|\x0a/>, but a
much more complicated count might make it worth-while. I'm assuming the
following things:

C<.count> is method on the state object that would return the number of
repetitions of the preceding atom have been tried

C<.atom> is the preceding atom as it would appear if backtracking
stopped now.

The return value of a count closure is boolean.

So, for example here are some translations of existing operators:

    +   <={.count > 0}>
    *   <={1}>
    *?  <={1}>?
    <8> <={.count == 8}>        # No optimization possible!
    ?   <={.count < 2}>

Again, it would be nice to be able to flag these to the compiler in a
rule:

    rule thrice :count { <={.count < 4}> }
    / a<thrice>? /

Note that the C<?> would cause the thrice count-rule to be matched
non-greedily because the regex parser knows that it's a count, not a
generic rule.


Reply via email to