> 1. Sub-rules and backtracking
>
> > <name(expr)> # call rule, passing Perl args
> > { .name(expr) } # same thing.
>
> > <name pat> # call rule, passing regex arg
> > { .name(/pat/) } # same thing.
>
> Considering perl can't sanely know how to backtrack into a closure, wouldn't
> { .name(expr) } be equal to <name(expr)>: instead? (note the colon)
Nope. <name(expr)>: is equivalent to { .name{expr} }: . It does know
how to backtrack into a closure: it skips right by it (or throws an
exception through it... not sure which) and tries again.
Hypotheticals make this function properly.
> It seems to me that for a rule to be able to backtrack, you would need to
> pass a closure as arg that represents the rest of the match: the rule
> matches, calls the closure, and if the closure returns tries to backtrack
> and calls it again, or returns if all possibilities are exhausted.
Sounds like continuation-passing style. Yes, you can backtrack
through code with continuation-passing style. Continuations have yet
to be introduced into the language.
> Related to this: what is the prototype for rules (in case you want to
> manually write or invoke them) ?
rule somerule($0) {}
If it takes arguments, put them on the end of the signature. Invoke
them just like subs.
(Just realized something: you can't do {...} on a rule, because that
means match any character three times.)
> 3. Negated assertions
>
> > any assertion that begins with ! is simply negated.
>
> > \P{prop} <!prop>
> > (?!...) <!before ...> # negative lookahead
> > [^[:alpha:]] <-alpha>
>
> Considering <prop> means "matches a character with property prop", it
> seems to me <!prop> would mean the ZERO-WIDTH assertion "does not match a
> character with property prop", rather than "match a character without
> property prop".
Right. It has to be. There is no way to implement it in a
sufficiently general way otherwise.
> 5. Character class semantics
>
> > predefined character classes are just considered intrinsic grammar rules
>
> This means you can place arbitrary rules inside a character class. What
> if the rule has a width unequal to 1 or even variable-width? I can think
> of a few possibilities:
>
> a. Require subrules inside a character class to have a fixed width of 1
> char. (requires a run-time check since the rule might be redefined.. ick)
>
> b. Rules inside a character class are ORed together, an inverted subrule
> is interpreted as [ <!before <subrule>> . ]
>
> c. The whole character class is a zero-width assertion followed by the
> traversal of a single char.
>
> My personal preference is (c), which also means \N is equivalent to <-\n>
Yikes. Good questions. Recall that Unicode is sortof like
multi-character matching, so it might be possible to allow
<<anyrule><anyother>>. That might be a way to specify the parallel
matching of those two rules. It's entirely likely that I'm wrong.
> 6. Null pattern
>
> > That won't work because it'll look for the :wfoo modifier. However, there
> > are several ways to get the effect you want:
> > /[:w()foo bar]/
> > /[:w[]foo bar]/
>
> Tsk tsk Larry, those look like null patterns to me :-)
>
> While I'm on the subject.. why not allow <> as the match-always assertion?
> It might conflict with huffman encoding, but I certainly don't think <>
> could ethically mean anything other than this. And <!> would ofcourse be
> the match-never assertion.
You could always use <(1)> and <(0)>, which are more SWIMmy :)
> 7. The :: operator
>
> > :: # fail all |'s when backtracking
>
> > If you backtrack across it, it fails all the way out of the current
> > list of alternatives.
>
> This suggests that if you do:
> [ foo [ bar :: ]? | foo ( \w+ ) ]
> that if it backtracks over the :: it will break out of the outermost [],
> since the innermost isn't a list of alternatives.
>
> Or does it simply break out of the innermost group, and are the
> descriptions chosen a bit poorly?
I think that's the one. It would make sense, since a list of
alternatives is either surrounded by brackets or the rule boundaries.
> That's it for now I think.. maybe I'll find more later :)
These were stumpers. Thanks! :)
Luke