Re: Lexer and parser generators using CTFE

Andrei Alexandrescu Wed, 29 Feb 2012 08:20:30 -0800

On 2/28/12 8:58 AM, Hisayuki Mima wrote:

Certainly, I don't write yet the documentation of my library, ctpg.
(But a few examples available here:
https://github.com/youkei/ctpg/tree/master/example)

Nice! I have a few comments. I should say that they entail a fair amountof work.

The current setup makes things very difficult for a common case -generating a parse tree. The user would need to insert a lot of customcode to create the appropriate nodes, but that's very easy for theparser because it already has the components. The parser should have a"build AST" option, in which case it should build all nodes without mucheffort from the user. That's what ANTLR does, and it has a specialoperator "^" to indicate where the root of the tree should be(http://www.antlr2.org/doc/trees.html).


So here's what I think your examples should look like:

The syntax could be changed a bit - it should be less like D and morelike PEG. The "$" is implicit and shouldn't be needed. The ";" is auseful terminator for large productions spanning more than one line, solet's keep it. I don't understand the use of "!", for example the PEGfor expression parsing athttp://en.wikipedia.org/wiki/Parsing_expression_grammar is:


Value   ← [0-9]+ / '(' Expr ')'
Product ← Value (('*' / '/') Value)*
Sum     ← Product (('+' / '-') Product)*
Expr    ← Sum

but your grammar has:

int primary = !"(" addExp !")" / intLit_p;

whereas it seems to me it should be

int primary = "(" addExp ")" / intLit_p;

No?

Here's how I think a parser that also builds an AST looks like:

mixin generateParsers!(
  Options.createAST,
  q{
    root    <- addExp;
    addExp  <- mulExp (("+" / "-")^ addExp)?
    mulExp  <- primary (("*" / "/")^ mulExp)?
    primary <- "(" addExp ")"% / intLit_p;
  }
);

I used the following notation: a node suffixed with "^" becomes the rootof the produced AST, and a node suffixed with "%" will be droppedaltogether (that's because ")" is not needed in the AST after theexpression has been parsed). With only three characters I informed theparser what the shape of the AST will be.

At this point calling parse!root("1+2*3") would return an object of typeASTNode!"addExp", which in turn has two children, lhs and rhs. The lhsnode has type ASTNode!"intLit_p" which has inside a value "1". The rhsnode has type ASTNode!"mulExp", and that in turn has two children nodeslhs and rhs, both of type ASTNode!"intLit_p", each with its own value("2" and "3", respectively). All these nodes are dynamically created bythe parsing process and inherit a common type, e.g. ASTNode!"" or sometype ASTRootNode.

So, I'd like to describe it here.

To be honest, ctpg is inspired by one module of Scala's standard
library, Parser Combinators.
One of the differences between Parser Combinators and ctpg is that
Parser Combinators generates parsers in run-time, but ctpg generates
parsers in compile-time by the power of CTFE and mixin.

A parser generated by ctpg is a recursive descent parser, so it does
lexical analysis and parsing at a time.

Oh, I forgot this property of PEGs - integrated lexing and parsing. Sono need for a separate lexer!

And the type of input which it
can accept is string, wstring, dstring and ForwardRange whose element
type is char, wchar or dchar.

Great. I think we should actually define the notion of BufferedRange.I'll get to that topic later.

So, dividing lexical analysis and parsing into two processes is
difficult in ctpg.


Yes, sorry I forgot about that!

Importantly, a parser generated by ctpg is statically typed as one of
the examples, "parsing simple arithmetic expression" shows.
Generally a parser generated by other tool and accepting tokens returns
the abstract syntax tree, but it return the evaluated value in the example.
In other words, it does lexical analysis, parsing and (type) converting
at a time.
If you want simply abstract syntax tree, it may be a little pain to use
ctpg.

That's all I'd like to say about ctpg here.

Why would it be difficult to integrate AST creation with parsing? I'mnot sure I understand this. My understanding is that you should be ableto add a couple of simple operators ("^" and "%" as described above) toinform AST creation, and you're done!



Thanks,

Andrei

Re: Lexer and parser generators using CTFE

Reply via email to