Re: Lexer and parser generators using CTFE

Dmitry Olshansky Wed, 29 Feb 2012 01:50:28 -0800

On 29.02.2012 0:19, H. S. Teoh wrote:

On Tue, Feb 28, 2012 at 11:52:52PM +0400, Dmitry Olshansky wrote:

On 28.02.2012 22:46, Martin Nowak wrote:

On Tue, 28 Feb 2012 08:59:21 +0100, Andrei Alexandrescu
<[email protected]>  wrote:

[...]

We need to have a easy-to-use, complete, seamless, and efficient
lexer-parser generator combo in Phobos, pronto. The lexer itself
could use a character-level PEG or a classic automaton, and emit
tokens for consumption by a parser generator. The two should work in
perfect tandem (no need for glue code). At the end of the day,
defining a complete lexer+parser combo for a language should be just
a few lines longer than the textual representation of the grammar
itself.

[...]

To be truly successful it should have some distinct properties like:
- being faster or identical to handwritten stuff we already have
(like e.g. std.json ), reminds us of recent INI parser proposal,
should be an easy pick for general parser.


We probably have to use string mixins to achieve this.

OTOH, a generic parser generator is unlikely to be able to handle
special case optimizations such as operator-precedence parsers
(http://en.wikipedia.org/wiki/Operator-precedence_parser), because
operator precedence is only implied by grammar rules, and to optimize
for this case requires explicitly specifying each operator's precedence.
There is some price to be paid for generality, unfortunately.

Well I see no problem in defining a separate OperatorGrammarconstructor, that will take base Unit production and a bunch ofoperators with arity, plus brackets with their priority is plenty.Semantic then actions are trivially defined for each of operatorsrespectively.

Someway of stacking grammars is bonus points for the project. As Andreistarted with lexer+parser, I see no problem with it beinglexer+parser(+parser)*.

- be flexible, the result need not be a predefined AST tree nodes,
Hisayuki Mima's parser shows some nice movement in this direction


In my mind, D's delegates are ideal for this sort of thing. Each grammar
rule will have an associated callback. We can provide default callbacks
that generates AST nodes, but the user can override them with his own
delegates to do whatever they want (evaluate expressions/commands on the
fly, build only AST subtrees of interest, etc.).

Yup delegates and mixins are the tools. More over lambdas shine here,less clutter + type deduction.

- have reasonable compile times and memory consumption (though it will
only improve over time)


This will be good motivation to improve dmd's CTFE implementation. ;-)


[...]

Error reporting is the weak spot, still no good ideas on solving that.


This is not specific to D, though, right?

Such a framework relying on mixins is bound to produce awful errormessages at compile-time, unless explicitly stuffed up to watterlinewith some kind of static assert(__traits(compiles,...),"Error x + info");



[...]

I do not agree that it's the right moment to write a parser though.
It hits the first of phobos two biggest shortcomings, the lack of a
good I/O system and the missing Allocators.  Any parser written now
will either risk to not play nice with ranges or has to come up with
it's own buffering again.


All good points. There is prototype of interactive regex matcher
that works directly on stream (buried in std.regex), it even passed
dry-run unittests back then. Though I had to postpone till I/O is
sorted out. I really loved Steven's design with it's easy access to
buffer and well thought out primitives, hope it will come about
sometime soon.

[...]

Are you talking about std.io? I glanced over that code briefly recently;
it looks very promising. Hope it will make it into Phobos soon. We
*need* to integrate streams, ranges, and the stdio writeln, etc., stuff.
The current situation to an outsider looks quite bad (streams not
compatible with stdio, no easy way to substitute string buffer for
files, etc.).


Aye, that's it. Not to mention DIP9, though that's another topic in itself.


We also need automatic UTF encoding detection for all input
streams/files/ranges/whatever. Including mismatching endianness (i.e.
auto byte-swapping).  In a recent personal project to write a D lexer
(as an exercise to help me learn D better), I ended up having to roll my
own input stream abstraction in order to handle automatic encoding
detection. Which is quite poorly written, I've to admit. Something like
this needs to be standardized in Phobos with a well-written
implementation.


T



--
Dmitry Olshansky

Re: Lexer and parser generators using CTFE

Reply via email to