On Wed, 2002-09-04 at 00:22, Aaron Sherman wrote:
> Then, why is there a C<+>? Why not make it C<|>?
>
> $foo = rx/ <<a>|<b>|[cde]>|f /
This brings to mind a few big things that have been batting around in my
head about user-defined rules for a while now.... These things fall out
nicely from A5, I think, but correct me if there's some reason I'm wrong
on that.
It would be nice to be able to flag a rule as being either a pure
character class or a generic rule. At the very least this lets the
compiler issue clearer errors, perhaps earlier. Something like:
rule abc :cc { <[abc]> }
Perhaps the engine could even try to coerce non-character classes so
that this would work (not that this simple example would happen, but if
you're working with a rule-chain it might be useful):
rule abc :cc { a | b | c }
Along that line, inline closures can do quite a bit, but it would be
nice if they could be used as counts (instead of the more painful
process of controlling backtracking via commit. Let's assume that C<<
<={...}> >> is used this way. Here's an example of its use:
/<[\x0d\x0a]><={ .count == 1 || (.count == 2 && .atom eq "\x0d\x0a") }>/
Now that's a very expensive way to say C</\x0d\x0a|\x0d|\x0a/>, but a
much more complicated count might make it worth-while. I'm assuming the
following things:
C<.count> is method on the state object that would return the number of
repetitions of the preceding atom have been tried
C<.atom> is the preceding atom as it would appear if backtracking
stopped now.
The return value of a count closure is boolean.
So, for example here are some translations of existing operators:
+ <={.count > 0}>
* <={1}>
*? <={1}>?
<8> <={.count == 8}> # No optimization possible!
? <={.count < 2}>
Again, it would be nice to be able to flag these to the compiler in a
rule:
rule thrice :count { <={.count < 4}> }
/ a<thrice>? /
Note that the C<?> would cause the thrice count-rule to be matched
non-greedily because the regex parser knows that it's a count, not a
generic rule.