Re: D parsing

Chad Joan Wed, 06 Nov 2013 00:11:03 -0800

On Tuesday, 5 November 2013 at 14:54:34 UTC, Dmitry Olshanskywrote:

I was also toying with the idea of exposing Builder interfacefor std.regex. But push/pop IMHO are better be implicitlydesigned-out:
auto re =atom('x').star(charClass(unicode.Letter),atom('y')).build();
... and letting the nesting be explicit.

Is the same as:
auto re = regex(`x(?:\p{L}y)*`);
Aimed for apps/libs that build regular expressions anyway andhave no need in textual parser.

Interesting. I like how it induces some amount of staticverification, though I worry that it could harm proceduralgeneration of grammars. It would be difficult, for instance, touse that API to do the equivalent of pushing an atom in onefunction and popping it in another.

I wonder if we are at different levels of abstraction. Theexample you give seems like it requires the API to remember, in astructured way, all of the information presented by thecall-chaining and call-nesting. I might implement something likethat with a stateful "builder" object under the hood. However,the example you give seems like it is closer to what a regexengine would morph an expression into, thus making it a higherabstraction.

That snippet would create a parser that recognizes the grammar'x' (
'y'? ).
The current fledgling implementation creates this parser:
http://pastebin.com/MgSqWXE2
Of course, no one would be expected to write grammars likethat. Itwould be the job of tools like Pegged or std.regex to packageit up in
nice syntax that is easy to use.
I thought to provide some building blocks for that with newstd.uni. Not quite everything I wanted, but now at least thereis one set of wheels less to reinvent.

I haven't looked at std.uni earnestly yet, but if it succeeds atmaking that unicode/utf jungle manageable, then I will beincredibly thankful.

[snip]
Another fun thought: PEGs can have look-behind that includesany regularelements without any additional algorithmic complexity. Justtake all ofthe look-behinds in the grammar, mash them together into onebigregular-expression using regular alternation (|), and thenhave theresulting automaton consume in lock-step with the PEG parser.Wheneverthe PEG parser needs to do a lookbehind, it just checks to seeif thecompanion automaton is in a matching state for the capture itneeds.
Sounds quite slow to do it "just in case". Also complete DFAstend to be mm quite big.

I was envisioning it being done lazily and strictly as-needed, ifI even got around to doing it at all.

What ANTLR does is similar technique - a regular lookahead toresolve ambiguity in the grammar (implicitly). A lot like LL(k)but with unlimited length (so called LL(*)). Of course, itgenerates LL(k) disambiguation where possible, then LL(*),failing that the usual backtracking.


Neat.

*sigh*, I feel like I could write a paper on this stuff if Iwere ingrad school right now. Alas, I am stuck doing 50-60 hours aweek of
soul-sucking business programming.
I heard Sociomantic is hiring D programmers for coding someawesome stuff, you may as well apply :)


Tempting.

And it seems even Facebook is joining the market now, which isnews to me.

Well, then again, my understanding
is that even though I can think of things that seem like theywould makeinteresting topics for publishable papers, reality would havethe profsconscript me to do completely different things that arepossibly just as
inane as the business programming.
Speaking for my limited experience - at times it's like that.


Yay, someone's observations corroborate mine!
...
Crap, someone observations corroborate mine :(

;)

I worry that the greater threat to good AST manipulation toolsin D is a
lack of free time, and not the DMD bugs as much.
Good for you I guess, my developments in related area areblocked still :(


Well, hopefully you've got the wind at your back now.

Re: D parsing

Reply via email to