Will you explain in this example what you do with the tree object
afterwards?
In the example I gave, the production is converted to a number.
I think in your example, you're showing how to convert a parse tree
to a syntax tree, and I assume you later operator over the syntax
tree?
Thank you,
-Alan
On Sat, Dec 11, 2010 at 10:46:40AM +0100, Ondřej Bílka wrote:
> exp <- digit '*' digit EOF -> {(lambda (x y) (* x y))}
> This looks as step backwards
> How it is different from old yacc
> exp <- digit '*' digit EOF -> {$$ = (* $1 $3)}
>
> My approach is different. I made tree structure explicit in rule by binding as
> in example
>
> tree = '(' number:value tree?:left tree?:rigth ')' result(Tree)
>
> result(Tree) creates Tree object and sets value,left,rigth fields to
> corresponding values
>
> On Fri, Dec 10, 2010 at 01:47:59PM +1300, Peter Cashin wrote:
> > Mathias:
> > A good plan to separate out the:
> > - grammar operators
> > - parse tree building
> > - parser action code
> > I agree with you, but in my case I want to have the grammar language
> > totally independent of the implementation programming language.
> > So the grammar has no parser action code, although this can be added
> > later
> > in a particular programming language implementation. Parboiled has much
> > closer integration with the programming language, which has advantages,
> > but I really want grammar specifications that are totally independent.
> > So now for the grammar operators, and parse tree building annotations.
> > For
> > many years I kept these separate: tree building annotations in the rule
> > head (ie they annotate the rule name) or definition syntax (= for
> > interior
> > nodes, : for leaf nodes or terminals). The right hand side of the rule
> > had
> > the grammar expression body withs grammar operations only. One more head
> > notation allows pruning the tree.
> > This works fine, and maybe I should have stuck to that separation, but I
> > introduced the `x prefix notation because I found that it was an
> > advantage
> > to be able to see the children nodes that would be generated by looking
> > at
> > the parent rule alone, and not having to refer to the children rule
> > definitions to see how they are annotated.
> > I have found that about half of many grammars turn out to be leaf nodes,
> > so the ":" rule definition notation has a huge payoff in tree size and
> > performance.
> > Because of this the `x rule is not vital, but over time I found it was
> > convenient, and a good trade-off for me.
> > Cheers,
> > Peter.
> >
> > On Fri, Dec 10, 2010 at 11:44 AM, Mathias <[1][email protected]>
> > wrote:
> >
> > Over the course of developing parboiled (PEG parsing engine for Java
> > and
> > Scala using internals DSLs for grammar specification) I found that it's
> > best to clearly separate the following three things:
> > - grammar operators
> > - parse tree building
> > - parser action code
> >
> > The operators, together with the rules they are applied to, define the
> > language your grammar recognizes. Nothing more, nothing less.
> > Whether your engine builds a parse tree for input of that language or
> > whether it doesn't should likely not be specified at the operator
> > level,
> > but it some other way.
> > In "parboiled for Java" operators and rules are modeled as method calls
> > whereas parse tree building is controlled via annotations on those
> > methods. You can selectively enable or disable the creation of parse
> > tree nodes per grammar rule, which allows you to tweak parse tree
> > building exactly to your needs.
> >
> > The interface between the parsing engine and custom parser action code
> > is the thing that has changed the most over the course of parboileds
> > life cycle so far.
> > Initially parser action code had to access the parse tree nodes in
> > order
> > to get to matched input. Creating a custom object structure during the
> > parsing run (e.g. an AST) was done by decorating the parse tree. So the
> > parse tree was the "work bench" of the parsing process.
> > Over time this heavy centering around the parse tree turned out to have
> > two problems:
> >
> > 1. Low Performance.
> > Always having to create a parse tree even when all you care about is
> > the
> > AST is just wasteful. Especially since the parse tree can contain a
> > huge
> > number of nodes for larger inputs (sometimes more nodes than input
> > characters).
> >
> > 2. Less room for automatic optimizations.
> > The structure of the parse tree is dictated by the structure of the
> > grammar rules. If your parser action code is built under the assumption
> > that the parse tree has a given structure there is little leeway for
> > the
> > parsing engine to apply automatic rule optimizations. Decoupling the
> > parser action code from the parse tree opens up the possibility to
> > apply
> > all kinds of automatic grammar tweaking before running the parser the
> > first time. The engine might decide to completely change the rule
> > structure, as long as all changes do not change the recognized language
> > and are transparent to the parser action code.
> >
> > Currently parboiled implements the interface between the parsing engine
> > and the action code in the following way:
> > 1. Actions can appear anywhere in a rule.
> > 2. Actions can access the matched input text of the sub rule
> > immediately
> > preceding the action but not of any other rule (so there is no sub rule
> > labeling required).
> > 3. For working with custom objects (e.g. AST nodes) the engine provides
> > a "Value Stack", which is a simple stack structure that serves as a
> > fast
> > work bench for a parsing run. Actions can push objects onto this stack,
> > pop them off, swap them around, and so on.
> >
> > This solution completely decouples the parse tree from everything else.
> > You can enable or disable parse tree building without any effect on the
> > rest of the parser. There is no need for addressing sub rules in action
> > expressions and given a somewhat efficient value stack implementation
> > the whole thing is quite fast. Additionally, in "parboiled for Scala",
> > the action code with its manipulations of the value stack can be
> > statically type-checked at compile time, which is a huge plus.
> >
> > In case you are interested in more details or broader explanations, the
> > parboiled documentation is quite complete.
> >
> > Cheers,
> > Mathias
> >
> > ---
> > [2][email protected]
> > [3]http://www.parboiled.org
> > On 09.12.2010, at 21:01, Alan Post wrote:
> >
> > > I'm working on my PEG parser, in particular the interface between
> > > the parse tree and the code one can attach to productions that
> > > are executed on a successful parse.
> > >
> > > I've arranged for the two predicate operations, & and !, to not add
> > > any output to the parse tree. �That means that the following
> > > production:
> > >
> > > �rule <- &a !b "c"
> > >
> > > Produces the same parse tree as:
> > >
> > > �rule <- "c"
> > >
> > > Internally, this means that I recognize that the sequence operator
> > > (which contains the productions '&a', '!b', and '"c"' in this
> > > example) is being called with predicates in every position but one,
> > > and rather than returning a list containing that single element,
> > > I return just the single element.
> > >
> > > As I've been doing this, I've found that I want a new operator
> > similar
> > > to '&'. �'&' matches the production it is attached to, but it does
> > not
> > > advance the position of the input buffer.
> > >
> > > I'd like an operator that matches the production it is attached to,
> > > advances the input buffer, but doesn't add anything to the parse
> > > tree.
> > >
> > > Here's an example:
> > >
> > > �mulexp <- digit '*' digit EOF -> {(lambda (x y) (* x y))}
> > >
> > > the mulexp production is a sequence of four other rules, but only
> > > two of them are needed by the associated code. �It would be nice
> > > if I could write the code rule like it is above, rather than say
> > > this:
> > >
> > > �(lambda (x op y EOF) (* x y))
> > >
> > > Having to account for all the rules in the sequence, but really
> > > only caring about two of them. �Here is the example rewritten
> > > with '^' expressing "match the rule, advance the input, but don't
> > > modify the parse tree":
> > >
> > > �mulexp <- digit ^'*' digit ^EOF -> {(lambda (x y) (* x y))}
> > >
> > > Before I go inventing syntax for this use case, will you tell me if
> > > this is already being done with other parsers? �Have any of you had
> > > this problem and already solved it, and if so, what approach did you
> > > take?
> > >
> > > -Alan
> > > --
> > > .i ko djuno fi le do sevzi
> > >
> > > _______________________________________________
> > > PEG mailing list
> > > [4][email protected]
> > > [5]https://lists.csail.mit.edu/mailman/listinfo/peg
> >
> > _______________________________________________
> > PEG mailing list
> > [6][email protected]
> > [7]https://lists.csail.mit.edu/mailman/listinfo/peg
> >
> > References
> >
> > Visible links
> > 1. mailto:[email protected]
> > 2. mailto:[email protected]
> > 3. http://www.parboiled.org/
> > 4. mailto:[email protected]
> > 5. https://lists.csail.mit.edu/mailman/listinfo/peg
> > 6. mailto:[email protected]
> > 7. https://lists.csail.mit.edu/mailman/listinfo/peg
>
> > _______________________________________________
> > PEG mailing list
> > [email protected]
> > https://lists.csail.mit.edu/mailman/listinfo/peg
>
>
> --
>
> Plasma conduit breach
>
> _______________________________________________
> PEG mailing list
> [email protected]
> https://lists.csail.mit.edu/mailman/listinfo/peg
--
.i ko djuno fi le do sevzi
_______________________________________________
PEG mailing list
[email protected]
https://lists.csail.mit.edu/mailman/listinfo/peg