Peter,
I settled on a system very similar to what you've done.
Like your parser, literal strings and characters do not appear in my
parse tree:
mulexp <- A #\* B -> {(lambda (x y) (* x y))}
I've added support for including literals using the unquote operator,
after our conversation, allowing a litteral to be included in the
parse tree:
mulexp <- A ,#\* B
-> {(lambda (x op y) ((eval (string->symbol (string op))) x y))}
I have also added support for the quasiquote operator, which works
similar to the way yours works, by not modifying the parse tree:
rule <- A B `C
-> {(lambda (x y) (string-append x y))} ; returns "ab"
A <- ,"a"
B <- ,"b"
C <- ,"c"
This is the original question and feature I was curious about, when
I started this thread.
But because of the , operator, I've made quasiquote work everywhere,
so I support the follow expression, which I don't think works like
it does in your parser:
rule <- A B C
-> {(lambda (x y) (string-append x y))} ; also returns "ab"
A <- ,"a"
B <- ,"b"
C <- `,"c" ; a no-op, here for example.
C still doesn't place any material into the parse tree, only it does
it everywhere that C appears, rather than only in the rule for A.
I've used this in two places: ignoring (but matching) whitespace between
tokens and ignoring (but matching) the end of file token.
Here is my packrat parser, written in the PEG-like language the
parser recognizes:
https://bugs.call-cc.org/browser/release/4/genturfahi/trunk/genturfahi.peg
And here is the code this parser generates for itself:
https://bugs.call-cc.org/browser/release/4/genturfahi/trunk/bootstrap.scm
Thank you Peter!
-Alan
On Fri, Dec 10, 2010 at 09:53:42AM +1300, Peter Cashin wrote:
> Hi Alan
> I have been working on grammar rules I'm calling PBNF, for Parser-BNF,
> that can be automatically executed as a parser. The PEG operators are a
> subset of the PBNF operators, but to fully automate a grammar I need to
> define the implicit syntax tree that the grammar rules specify.
> Your issue comes up all the time in that context: my approach is to have a
> literal 'x' match without producing a syntax tree node (you can always add
> a rule if you do want it in the syntax tree). Rules that generate leaf
> nodes are designated in the grammar, they are terminal rules if you like,
> so they generate a literal match (but no internal syntax sub-tree). But
> sometimes you want to reference a rule but still not to generate a syntax
> tree node, and I have used the `x operator: the ` prefix is a sort of
> quote like, and its unobtrusive in the grammar.
> If you want to take a look you will find it all at:
> [1]http://github.com/spinachtree/gist
> Maybe other people have different solutions, I'd like to know..
> Cheers,
> Peter.
>
> On Fri, Dec 10, 2010 at 9:01 AM, Alan Post
> <[2][email protected]> wrote:
>
> I'm working on my PEG parser, in particular the interface between
> the parse tree and the code one can attach to productions that
> are executed on a successful parse.
>
> I've arranged for the two predicate operations, & and !, to not add
> any output to the parse tree. That means that the following
> production:
>
> rule <- &a !b "c"
>
> Produces the same parse tree as:
>
> rule <- "c"
>
> Internally, this means that I recognize that the sequence operator
> (which contains the productions '&a', '!b', and '"c"' in this
> example) is being called with predicates in every position but one,
> and rather than returning a list containing that single element,
> I return just the single element.
>
> As I've been doing this, I've found that I want a new operator similar
> to '&'. '&' matches the production it is attached to, but it does not
> advance the position of the input buffer.
>
> I'd like an operator that matches the production it is attached to,
> advances the input buffer, but doesn't add anything to the parse
> tree.
>
> Here's an example:
>
> mulexp <- digit '*' digit EOF -> {(lambda (x y) (* x y))}
>
> the mulexp production is a sequence of four other rules, but only
> two of them are needed by the associated code. It would be nice
> if I could write the code rule like it is above, rather than say
> this:
>
> (lambda (x op y EOF) (* x y))
>
> Having to account for all the rules in the sequence, but really
> only caring about two of them. Here is the example rewritten
> with '^' expressing "match the rule, advance the input, but don't
> modify the parse tree":
>
> mulexp <- digit ^'*' digit ^EOF -> {(lambda (x y) (* x y))}
>
> Before I go inventing syntax for this use case, will you tell me if
> this is already being done with other parsers? Have any of you had
> this problem and already solved it, and if so, what approach did you
> take?
>
> -Alan
> --
> .i ko djuno fi le do sevzi
>
> _______________________________________________
> PEG mailing list
> [3][email protected]
> [4]https://lists.csail.mit.edu/mailman/listinfo/peg
>
> References
>
> Visible links
> 1. http://github.com/spinachtree/gist
> 2. mailto:[email protected]
> 3. mailto:[email protected]
> 4. https://lists.csail.mit.edu/mailman/listinfo/peg
> _______________________________________________
> PEG mailing list
> [email protected]
> https://lists.csail.mit.edu/mailman/listinfo/peg
--
.i ko djuno fi le do sevzi
_______________________________________________
PEG mailing list
[email protected]
https://lists.csail.mit.edu/mailman/listinfo/peg