A quick note before I begin: the stuff in this email isn't just an implementation detail - it impacts the language too, that's why I'm posting it here. Should I cross-post to perl6-internals ? (I'm not really familiar with that list yet)


I've recently spent thought on backtracking into rules, and the exact nature of hypothetical variables etc. The two systems for backtracking into a subrule that are described below are the best I can think of right now, but maybe I'm completely missing something and is something much simpler possible - in which case I'd love to hear about it ofcourse. :-)



My main questions are:


Is there a simpler system I'm overlooking?
Which of the two systems would you prefer if speed isn't the issue?
Which system is likely to run faster on parrot?

and maybe also:
What is the current plan?

although I got the impression earlier that there isn't any yet for invoking subrules :-)


Anyway, I will use the following grammar for examples: rule foo { a } rule bar { a+ } rule quux { ab } rule test { [ <foo> | <bar> ] <quux> }


==== Mechanism 1 -- Continuations ====


Continuations can be used to reset the "state of the world" to the previous backtracking point.

Various ways can be imagined to pass around the continuation. I picked one that seems fairly clean to me and doesn't create any more continuations than strictly necessary.

One thing I'll need to explain is what 'let' means in this system: it makes sure the variable is restored if a continuation is invoked that was created before the variable was hypothesized. This generalization means you can hypothesize any variable you can temporize.


OK, let's look at how the rule 'test' could be implemented using continuations. Note that I'm not paying attention to optimization at this point.


method test (Continuation ?&backtrack is rw) {
        &backtrack or mark(&backtrack) or return;
        $_ = .new;
        if (mark(&backtrack)) {
                let $.{foo} = .foo(&backtrack);
        } elsif (mark(&backtrack)) {
                let $.{bar} = .bar(&backtrack);
        } else {
                backtrack;
        }
        let $.{quux} = .quux(&backtrack);
        return $_;
}

where mark is a utility sub defined as something like:

sub mark (Continuation &backtrack is rw) {
        my &cc = callcc { $_ }  or return 0;
        let &backtrack = &cc.assuming(undef);
        return 1;
}


Let's see how this would match on 'aaab':


0. A new state object is created (Ignore the first line, it's for later)

1. The first mark hypothesizes &backtrack.

2. $.{foo} is hypothesized. foo matches 'a', and since it doesn't do any backtracking, it leaves &backtrack alone.

3. $.{quux} is hypothesized.  quux fails, it calls backtrack:
 a. $.{quux} is de-hypothesized.
 b. $.{foo} is de-hypothesized.
 c. &backtrack is de-hypothesized.
 d. inside the first mark undef is returned from callcc.  mark returns false.

4. The second mark hypothesizes &backtrack.

5. $.{bar} is hypothesized. bar matches 'aaa' and hypothesizes &backtrack

6. $.{quux} is hypothesized.  quux fails, it calls backtrack:
 a. $.{quux} is de-hypothesized
 b. inside bar, it backtracks to match 'aa'.  bar returns again

7. $.{quux} is hypothesized. quux matches, leaves &backtrack alone.

Note that the &backtrack remains hypothesized after test completes. Let's say test is followed by <fail> and see that happens:

8. fail calls backtrack
 a. $.{quux} is de-hypothesized
 b. inside bar, it backtracks to match 'a'.  bar returns again

9. $.{quux} is hypothesized.  quux fails, it calls backtrack:
 a. $.{quux} is de-hypothesized
 b. inside bar, &backtrack is de-hypothesized. bar calls backtrack
 c. $.{bar} is de-hypothesized
 d. &backtrack is de-hypothesized
 e. inside the second mark undef is returned from callcc.  mark returns false.

10. backtrack is called, causing backtracking into whatever was before test.

This only leaves the issue of how to deal with the top-level, where the continuation is omitted. The magical first line will in that case create a continuation which will simply return from the rule if the match fails. If the top-level match succeeds then the &backtrack variable disappears into thin air, and with it all backtracking information (the continuations and de-hypotheticalization info).

Note that the user can ofcourse choose to retain the backtracking info, and use it later to cause backtracking into the match after it has completed:

if Grammar.rule($string, my &backtrack) {
   ...
   if i_dont_like_this_match {
       backtrack;     # try another
   }
}


==== Mechanism 2 -- Callbacks ====


The second mechanism is a bit more mundane. The idea is that every rule will get passed a closure that's called to match whatever comes next. If the "whatever comes next" fails, the rule can backtrack and call the closure again.

This time 'let' is exactly the same as 'temp' except hypothetical variables are only allowed inside the dynamic scope of a subroutine with a special trait (I've called it "hypothetical" here), and they're restored only if that sub is left *normally*, not via an exception.

method test ($result is rw, Code &continue0) is hypothetical {
        given (let $result = .new) {
                my &continue1 = {
                        .quux($.{quux}, &continue0);
                }
                my &continue2 = {
                        .foo($.{foo}, &continue1);
                        .bar($.{bar}, &continue1);
                }
                continue2;
        }
}

You might say this is exactly opposite to using continuations: the passed argument is used when the match is *successful*, and a return is a failure. Other things become opposite as well: alternations can be put right under each other while it's concatenation that needs extra work by chaining closures, in opposite order!

Note that the optimizer might see that &continue2 isn't needed as variable in this case, however it would still need it if the [ <foo> | <bar> ] were preceded by something, so I'm showing it here in non-optimized form. (continue1 can also be set to &.quux.assuming($.{quux}, &continue0), but I don't know if that's faster)

Let's see how this would match on 'aaab':

1. enter hypothetical scope of test
2. hypothesize the result as a new state object
3. prepare the chain of closures
4. enter hypothetical scope of foo, hypothesize $.{foo}
5. foo: match failed, exit hypothetical scope (restore $.{foo})
6. enter hypothetical scope of bar, hypothesize $.{bar}
7. bar: match 'aaa', do callback to &continue1
8. enter hypothetical scope of quux, hypothesize $.{quux}
9. quux: match failed, exit hypothetical scope (restore $.{quux})
10. bar: match 'aa', do callback to &continue1
11. enter hypothetical scope of quux, hypothesize $.{quux}
12. quux: match 'ab', do callback to &continue0

Let's again look at what happens if <test> is followed by <fail>:

14. bar: &continue1 failed, match 'a', do callback to &continue1
15. enter hypothetical scope of quux, hypothesize $.{quux}
16. quux: match failed, exit hypothetical scope (restore $.{quux})
17. bar: &continue1 failed, exit hypothetical scope (restore $.{bar})
18. continue2 failed, exit hypothetical scope of test (restore $result)

But what if the match succeeds? The top-level requires much more work here: the final &continue needs to throw some "match succeeded" exception which needs to be catched by the top-level match and cause it to return the state object, with all hypothetical vars intact ofcourse.

Having code that does the match cause explicit backtracking is possible here too, but not as easy: it'll be necessary to become part of the call chain, so effectively you'll need a temporary rule to do it. I certainly can't think of any clean syntax to accomplish it.


==== Let's wrap it up


The solution using continuations has more inherent beauty: the rule- methods have a logical structure that you can even explain without needing to understand continuations. The inter-rule calling conventions are also cleaner. The top-level handling is so trivial a single statement at the beginning of reach rule can handle it, because of which { .subrule } works inside rules without having to do anything ugly behind the scenes (which *will* be necessary in the callback-system).

The callback system requires the rule-methods to have a weird internal structure, which makes the rule compiler harder to write, and also makes rules harder to debug (considering how large and involved grammars can be, I expect that "debugging a rule" in perl 6 will be a much more common occurrance than "debugging a regex" is in perl 5). It also requires special handling at the start and completion of the match.

The callback system also uses heaps of closures.. that might negatively affect speed.

Finally, the continuation system also gives 'let' interesting semantics which may be useful outside of rules.


Basically, the continuation system has only one big drawback: it uses continuations. I really have no idea how efficient those will be in parrot. If using them makes rules significantly slower than speed will probably have to win from cleanness and the callback system should be used.


Or, as I mentioned at the top, maybe I'm just thinking way too complex and overlooking a simple and obvious system for backtracking into subrules.

So, comments? (see also the questions at the top of the email)

--
Matthijs van Duin  --  May the Forth be with you!

Reply via email to