On Tuesday, 5 November 2013 at 14:54:34 UTC, Dmitry Olshansky
wrote:
I was also toying with the idea of exposing Builder interface
for std.regex. But push/pop IMHO are better be implicitly
designed-out:
auto re =
atom('x').star(charClass(unicode.Letter),atom('y')).build();
... and letting the nesting be explicit.
Is the same as:
auto re = regex(`x(?:\p{L}y)*`);
Aimed for apps/libs that build regular expressions anyway and
have no need in textual parser.
Interesting. I like how it induces some amount of static
verification, though I worry that it could harm procedural
generation of grammars. It would be difficult, for instance, to
use that API to do the equivalent of pushing an atom in one
function and popping it in another.
I wonder if we are at different levels of abstraction. The
example you give seems like it requires the API to remember, in a
structured way, all of the information presented by the
call-chaining and call-nesting. I might implement something like
that with a stateful "builder" object under the hood. However,
the example you give seems like it is closer to what a regex
engine would morph an expression into, thus making it a higher
abstraction.
That snippet would create a parser that recognizes the grammar
'x' (
'y'? ).
The current fledgling implementation creates this parser:
http://pastebin.com/MgSqWXE2
Of course, no one would be expected to write grammars like
that. It
would be the job of tools like Pegged or std.regex to package
it up in
nice syntax that is easy to use.
I thought to provide some building blocks for that with new
std.uni. Not quite everything I wanted, but now at least there
is one set of wheels less to reinvent.
I haven't looked at std.uni earnestly yet, but if it succeeds at
making that unicode/utf jungle manageable, then I will be
incredibly thankful.
[snip]
Another fun thought: PEGs can have look-behind that includes
any regular
elements without any additional algorithmic complexity. Just
take all of
the look-behinds in the grammar, mash them together into one
big
regular-expression using regular alternation (|), and then
have the
resulting automaton consume in lock-step with the PEG parser.
Whenever
the PEG parser needs to do a lookbehind, it just checks to see
if the
companion automaton is in a matching state for the capture it
needs.
Sounds quite slow to do it "just in case". Also complete DFAs
tend to be mm quite big.
I was envisioning it being done lazily and strictly as-needed, if
I even got around to doing it at all.
What ANTLR does is similar technique - a regular lookahead to
resolve ambiguity in the grammar (implicitly). A lot like LL(k)
but with unlimited length (so called LL(*)). Of course, it
generates LL(k) disambiguation where possible, then LL(*),
failing that the usual backtracking.
Neat.
*sigh*, I feel like I could write a paper on this stuff if I
were in
grad school right now. Alas, I am stuck doing 50-60 hours a
week of
soul-sucking business programming.
I heard Sociomantic is hiring D programmers for coding some
awesome stuff, you may as well apply :)
Tempting.
And it seems even Facebook is joining the market now, which is
news to me.
Well, then again, my understanding
is that even though I can think of things that seem like they
would make
interesting topics for publishable papers, reality would have
the profs
conscript me to do completely different things that are
possibly just as
inane as the business programming.
Speaking for my limited experience - at times it's like that.
Yay, someone's observations corroborate mine!
...
Crap, someone observations corroborate mine :(
;)
I worry that the greater threat to good AST manipulation tools
in D is a
lack of free time, and not the DMD bugs as much.
Good for you I guess, my developments in related area are
blocked still :(
Well, hopefully you've got the wind at your back now.