On Wed, 2002-09-04 at 00:22, Aaron Sherman wrote: > Then, why is there a C<+>? Why not make it C<|>? > > $foo = rx/ <<a>|<b>|[cde]>|f /
This brings to mind a few big things that have been batting around in my head about user-defined rules for a while now.... These things fall out nicely from A5, I think, but correct me if there's some reason I'm wrong on that. It would be nice to be able to flag a rule as being either a pure character class or a generic rule. At the very least this lets the compiler issue clearer errors, perhaps earlier. Something like: rule abc :cc { <[abc]> } Perhaps the engine could even try to coerce non-character classes so that this would work (not that this simple example would happen, but if you're working with a rule-chain it might be useful): rule abc :cc { a | b | c } Along that line, inline closures can do quite a bit, but it would be nice if they could be used as counts (instead of the more painful process of controlling backtracking via commit. Let's assume that C<< <={...}> >> is used this way. Here's an example of its use: /<[\x0d\x0a]><={ .count == 1 || (.count == 2 && .atom eq "\x0d\x0a") }>/ Now that's a very expensive way to say C</\x0d\x0a|\x0d|\x0a/>, but a much more complicated count might make it worth-while. I'm assuming the following things: C<.count> is method on the state object that would return the number of repetitions of the preceding atom have been tried C<.atom> is the preceding atom as it would appear if backtracking stopped now. The return value of a count closure is boolean. So, for example here are some translations of existing operators: + <={.count > 0}> * <={1}> *? <={1}>? <8> <={.count == 8}> # No optimization possible! ? <={.count < 2}> Again, it would be nice to be able to flag these to the compiler in a rule: rule thrice :count { <={.count < 4}> } / a<thrice>? / Note that the C<?> would cause the thrice count-rule to be matched non-greedily because the regex parser knows that it's a count, not a generic rule.