On Tuesday, 5 November 2013 at 14:54:34 UTC, Dmitry Olshansky wrote:

I was also toying with the idea of exposing Builder interface for std.regex. But push/pop IMHO are better be implicitly designed-out:

auto re = atom('x').star(charClass(unicode.Letter),atom('y')).build();

... and letting the nesting be explicit.

Is the same as:
auto re = regex(`x(?:\p{L}y)*`);

Aimed for apps/libs that build regular expressions anyway and have no need in textual parser.


Interesting. I like how it induces some amount of static verification, though I worry that it could harm procedural generation of grammars. It would be difficult, for instance, to use that API to do the equivalent of pushing an atom in one function and popping it in another.

I wonder if we are at different levels of abstraction. The example you give seems like it requires the API to remember, in a structured way, all of the information presented by the call-chaining and call-nesting. I might implement something like that with a stateful "builder" object under the hood. However, the example you give seems like it is closer to what a regex engine would morph an expression into, thus making it a higher abstraction.

That snippet would create a parser that recognizes the grammar 'x' (
'y'? ).
The current fledgling implementation creates this parser:
http://pastebin.com/MgSqWXE2

Of course, no one would be expected to write grammars like that. It would be the job of tools like Pegged or std.regex to package it up in
nice syntax that is easy to use.

I thought to provide some building blocks for that with new std.uni. Not quite everything I wanted, but now at least there is one set of wheels less to reinvent.


I haven't looked at std.uni earnestly yet, but if it succeeds at making that unicode/utf jungle manageable, then I will be incredibly thankful.

[snip]

Another fun thought: PEGs can have look-behind that includes any regular elements without any additional algorithmic complexity. Just take all of the look-behinds in the grammar, mash them together into one big regular-expression using regular alternation (|), and then have the resulting automaton consume in lock-step with the PEG parser. Whenever the PEG parser needs to do a lookbehind, it just checks to see if the companion automaton is in a matching state for the capture it needs.

Sounds quite slow to do it "just in case". Also complete DFAs tend to be mm quite big.


I was envisioning it being done lazily and strictly as-needed, if I even got around to doing it at all.

What ANTLR does is similar technique - a regular lookahead to resolve ambiguity in the grammar (implicitly). A lot like LL(k) but with unlimited length (so called LL(*)). Of course, it generates LL(k) disambiguation where possible, then LL(*), failing that the usual backtracking.


Neat.


*sigh*, I feel like I could write a paper on this stuff if I were in grad school right now. Alas, I am stuck doing 50-60 hours a week of
soul-sucking business programming.

I heard Sociomantic is hiring D programmers for coding some awesome stuff, you may as well apply :)


Tempting.

And it seems even Facebook is joining the market now, which is news to me.

Well, then again, my understanding
is that even though I can think of things that seem like they would make interesting topics for publishable papers, reality would have the profs conscript me to do completely different things that are possibly just as
inane as the business programming.

Speaking for my limited experience - at times it's like that.


Yay, someone's observations corroborate mine!
...
Crap, someone observations corroborate mine :(

;)

I worry that the greater threat to good AST manipulation tools in D is a
lack of free time, and not the DMD bugs as much.

Good for you I guess, my developments in related area are blocked still :(


Well, hopefully you've got the wind at your back now.

Reply via email to