On 2/28/12 8:58 AM, Hisayuki Mima wrote:
Certainly, I don't write yet the documentation of my library, ctpg.
(But a few examples available here:
https://github.com/youkei/ctpg/tree/master/example)

Nice! I have a few comments. I should say that they entail a fair amount of work.

The current setup makes things very difficult for a common case - generating a parse tree. The user would need to insert a lot of custom code to create the appropriate nodes, but that's very easy for the parser because it already has the components. The parser should have a "build AST" option, in which case it should build all nodes without much effort from the user. That's what ANTLR does, and it has a special operator "^" to indicate where the root of the tree should be (http://www.antlr2.org/doc/trees.html).

So here's what I think your examples should look like:

The syntax could be changed a bit - it should be less like D and more like PEG. The "$" is implicit and shouldn't be needed. The ";" is a useful terminator for large productions spanning more than one line, so let's keep it. I don't understand the use of "!", for example the PEG for expression parsing at http://en.wikipedia.org/wiki/Parsing_expression_grammar is:

Value   ← [0-9]+ / '(' Expr ')'
Product ← Value (('*' / '/') Value)*
Sum     ← Product (('+' / '-') Product)*
Expr    ← Sum

but your grammar has:

int primary = !"(" addExp !")" / intLit_p;

whereas it seems to me it should be

int primary = "(" addExp ")" / intLit_p;

No?

Here's how I think a parser that also builds an AST looks like:

mixin generateParsers!(
  Options.createAST,
  q{
    root    <- addExp;
    addExp  <- mulExp (("+" / "-")^ addExp)?
    mulExp  <- primary (("*" / "/")^ mulExp)?
    primary <- "(" addExp ")"% / intLit_p;
  }
);

I used the following notation: a node suffixed with "^" becomes the root of the produced AST, and a node suffixed with "%" will be dropped altogether (that's because ")" is not needed in the AST after the expression has been parsed). With only three characters I informed the parser what the shape of the AST will be.

At this point calling parse!root("1+2*3") would return an object of type ASTNode!"addExp", which in turn has two children, lhs and rhs. The lhs node has type ASTNode!"intLit_p" which has inside a value "1". The rhs node has type ASTNode!"mulExp", and that in turn has two children nodes lhs and rhs, both of type ASTNode!"intLit_p", each with its own value ("2" and "3", respectively). All these nodes are dynamically created by the parsing process and inherit a common type, e.g. ASTNode!"" or some type ASTRootNode.

So, I'd like to describe it here.

To be honest, ctpg is inspired by one module of Scala's standard
library, Parser Combinators.
One of the differences between Parser Combinators and ctpg is that
Parser Combinators generates parsers in run-time, but ctpg generates
parsers in compile-time by the power of CTFE and mixin.

A parser generated by ctpg is a recursive descent parser, so it does
lexical analysis and parsing at a time.

Oh, I forgot this property of PEGs - integrated lexing and parsing. So no need for a separate lexer!

And the type of input which it
can accept is string, wstring, dstring and ForwardRange whose element
type is char, wchar or dchar.

Great. I think we should actually define the notion of BufferedRange. I'll get to that topic later.

So, dividing lexical analysis and parsing into two processes is
difficult in ctpg.

Yes, sorry I forgot about that!

Importantly, a parser generated by ctpg is statically typed as one of
the examples, "parsing simple arithmetic expression" shows.
Generally a parser generated by other tool and accepting tokens returns
the abstract syntax tree, but it return the evaluated value in the example.
In other words, it does lexical analysis, parsing and (type) converting
at a time.
If you want simply abstract syntax tree, it may be a little pain to use
ctpg.

That's all I'd like to say about ctpg here.

Why would it be difficult to integrate AST creation with parsing? I'm not sure I understand this. My understanding is that you should be able to add a couple of simple operators ("^" and "%" as described above) to inform AST creation, and you're done!


Thanks,

Andrei

Reply via email to