OK, I've recently spent some intimate time with Apocalypse 5 and it has left me with a few issues and questions.

If any of this has already been discussed, I'd appreciate some links (I've searched google groups but haven't found anything applicable)


1. Sub-rules and backtracking


   <name(expr)>          # call rule, passing Perl args
   { .name(expr) }       # same thing.

   <name pat>            # call rule, passing regex arg
   { .name(/pat/) }      # same thing.

Considering perl can't sanely know how to backtrack into a closure, wouldn't { .name(expr) } be equal to <name(expr)>: instead? (note the colon)


It seems to me that for a rule to be able to backtrack, you would need to pass a closure as arg that represents the rest of the match: the rule matches, calls the closure, and if the closure returns tries to backtrack and calls it again, or returns if all possibilities are exhausted.

Or will a rule store all of its state into hypothetical variables? It seems to me that would make the possibility of backtracking into closures even more problematic, but maybe i'm just missing something...

Related to this: what is the prototype for rules (in case you want to manually write or invoke them) ?


2. Rules with custom parsing


As mentioned in a previous Apocalypse, the \L, \U, and \Q sequences no longer
use \E to terminate--they now require bracketing characters of some sort.

(much later)
In addition to normal subrules, we allow some funny looking method names like:
    rule \a { ... }

Can I conclude from this you can use "is parsed" on a rule to be able to grab the bracketed expression it's followed by?



3. Negated assertions


any assertion that begins with ! is simply negated.

    \P{prop}            <!prop>
    (?!...)             <!before ...>   # negative lookahead
    [^[:alpha:]]        <-alpha>

Considering <prop> means "matches a character with property prop", it seems to me <!prop> would mean the ZERO-WIDTH assertion "does not match a character with property prop", rather than "match a character without property prop".


Shouldn't it be <-prop> instead? (see also point 5)


4. Character class syntax


predefined character classes are just considered intrinsic grammar rules

    [[:alpha:][:digit]] <<alpha><digit>>
                        <[_]+<alpha>+<digit>-<Swedish>>

Can I conclude from this that the + to add character classes is optional? What about <-<foo><bar>>, is is that the inversion of <<foo><bar>> ? (I do hope so) But <-<foo>+<bar>> will be the inversion of <<foo>-<bar>> right?


Also, what exactly is allowed inside a character class? Apparently character sets like [a-z_] and subrules like <alpha>. What can I put into a set? single character and ranges obvious; but what about interpolated variables? I assume I also can't put \w inside [] anymore since it's a subrule, so [\w.;] would become <\w[.;]> ?


5. Character class semantics


predefined character classes are just considered intrinsic grammar rules

This means you can place arbitrary rules inside a character class. What if the rule has a width unequal to 1 or even variable-width? I can think of a few possibilities:


a. Require subrules inside a character class to have a fixed width of 1 char. (requires a run-time check since the rule might be redefined.. ick)

b. Rules inside a character class are ORed together, an inverted subrule is interpreted as [ <!before <subrule>> . ]

c. The whole character class is a zero-width assertion followed by the traversal of a single char.

My personal preference is (c), which also means \N is equivalent to <-\n>


6. Null pattern


That won't work because it'll look for the :wfoo modifier. However, there
are several ways to get the effect you want:
/[:w()foo bar]/ /[:w[]foo bar]/

Tsk tsk Larry, those look like null patterns to me :-)


While I'm on the subject.. why not allow <> as the match-always assertion? It might conflict with huffman encoding, but I certainly don't think <> could ethically mean anything other than this. And <!> would ofcourse be the match-never assertion.


7. The :: operator


:: # fail all |'s when backtracking

If you backtrack across it, it fails all the way out of the current
list of alternatives.

This suggests that if you do:
[ foo [ bar :: ]? | foo ( \w+ ) ]
that if it backtracks over the :: it will break out of the outermost [], since the innermost isn't a list of alternatives.


Or does it simply break out of the innermost group, and are the descriptions chosen a bit poorly?


That's it for now I think.. maybe I'll find more later :)



-- Matthijs van Duin -- May the Forth be with you!

Reply via email to