Re: Lexer and parser generators using CTFE

Hisayuki Mima Sun, 04 Mar 2012 06:35:32 -0800

(2012/03/01 1:19), Andrei Alexandrescu wrote:

On 2/28/12 8:58 AM, Hisayuki Mima wrote:

Certainly, I don't write yet the documentation of my library, ctpg.
(But a few examples available here:
https://github.com/youkei/ctpg/tree/master/example)


Nice! I have a few comments. I should say that they entail a fair amount
of work.

The current setup makes things very difficult for a common case -
generating a parse tree. The user would need to insert a lot of custom
code to create the appropriate nodes, but that's very easy for the
parser because it already has the components. The parser should have a
"build AST" option, in which case it should build all nodes without much
effort from the user. That's what ANTLR does, and it has a special
operator "^" to indicate where the root of the tree should be
(http://www.antlr2.org/doc/trees.html).

So here's what I think your examples should look like:

The syntax could be changed a bit - it should be less like D and more
like PEG. The "$" is implicit and shouldn't be needed. The ";" is a
useful terminator for large productions spanning more than one line, so
let's keep it. I don't understand the use of "!", for example the PEG
for expression parsing at
http://en.wikipedia.org/wiki/Parsing_expression_grammar is:

Value ← [0-9]+ / '(' Expr ')'
Product ← Value (('*' / '/') Value)*
Sum ← Product (('+' / '-') Product)*
Expr ← Sum

but your grammar has:

int primary = !"(" addExp !")" / intLit_p;

whereas it seems to me it should be

int primary = "(" addExp ")" / intLit_p;

No?

Here's how I think a parser that also builds an AST looks like:

mixin generateParsers!(
Options.createAST,
q{
root <- addExp;
addExp <- mulExp (("+" / "-")^ addExp)?
mulExp <- primary (("*" / "/")^ mulExp)?
primary <- "(" addExp ")"% / intLit_p;
}
);

I used the following notation: a node suffixed with "^" becomes the root
of the produced AST, and a node suffixed with "%" will be dropped
altogether (that's because ")" is not needed in the AST after the
expression has been parsed). With only three characters I informed the
parser what the shape of the AST will be.

At this point calling parse!root("1+2*3") would return an object of type
ASTNode!"addExp", which in turn has two children, lhs and rhs. The lhs
node has type ASTNode!"intLit_p" which has inside a value "1". The rhs
node has type ASTNode!"mulExp", and that in turn has two children nodes
lhs and rhs, both of type ASTNode!"intLit_p", each with its own value
("2" and "3", respectively). All these nodes are dynamically created by
the parsing process and inherit a common type, e.g. ASTNode!"" or some
type ASTRootNode.

So, I'd like to describe it here.

To be honest, ctpg is inspired by one module of Scala's standard
library, Parser Combinators.
One of the differences between Parser Combinators and ctpg is that
Parser Combinators generates parsers in run-time, but ctpg generates
parsers in compile-time by the power of CTFE and mixin.

A parser generated by ctpg is a recursive descent parser, so it does
lexical analysis and parsing at a time.


Oh, I forgot this property of PEGs - integrated lexing and parsing. So
no need for a separate lexer!

And the type of input which it
can accept is string, wstring, dstring and ForwardRange whose element
type is char, wchar or dchar.


Great. I think we should actually define the notion of BufferedRange.
I'll get to that topic later.

So, dividing lexical analysis and parsing into two processes is
difficult in ctpg.


Yes, sorry I forgot about that!

Importantly, a parser generated by ctpg is statically typed as one of
the examples, "parsing simple arithmetic expression" shows.
Generally a parser generated by other tool and accepting tokens returns
the abstract syntax tree, but it return the evaluated value in the
example.
In other words, it does lexical analysis, parsing and (type) converting
at a time.
If you want simply abstract syntax tree, it may be a little pain to use
ctpg.

That's all I'd like to say about ctpg here.


Why would it be difficult to integrate AST creation with parsing? I'm
not sure I understand this. My understanding is that you should be able
to add a couple of simple operators ("^" and "%" as described above) to
inform AST creation, and you're done!


Thanks,

Andrei


Certainly, "built AST" option is very interesting.

I don't know whether building AST is a common case or not because I'munfamiliar with syntax analysis.

But I'd like to complete ctpg as D version of boost::spirit::qi first.

Currently, ctpg can be used probably like boost::spirit::qi. (Both dotyped syntax analysis.)There are some points where ctpg is superior to boost::spirit::qi. Forexample, The source code written by using ctpg is simple because it islike PEG.In addition, I'm planning to improve error messages by making excellentuse of #line and __LINE__.I'd like to write the library which is functionally sameboost::spirit::qi and written in D style.Hence, I'd like to make implementing "built AST" option i.e. untypedsyntax analysis a low priority.

What is your idea?

P.S. prefix operator "!" is same as postfix operator "%" which you said.

The types which parsers on both sides of "/" operator have should becompatible.The type of "(" addExp ")" is Tuple!(string, int, string) and the typeof intLit_p is int. They are not compatible.The type of !"(" addExp !")" is int. So the types of !"(" addExp !")"and intLit_p are compatible.


Thanks,
Hisayuki Mima

Re: Lexer and parser generators using CTFE

Reply via email to