Re: Looking for champion - std.lang.d.lex

Walter Bright Mon, 25 Oct 2010 20:50:27 -0700

Nick Sabalausky wrote:

"Walter Bright" <newshou...@digitalmars.com> wrote in messagenews:ia59si$1r0...@digitalmars.com...
Consider a string literal, say "abc\"def". With Goldie's method, I inferthis string has to be scanned twice. Once to find its limits, and thesecond to convert it to the actual string.
Yea, that is true. With that string in the input, the value given to theuser code will be:
assert(tokenObtainedFromGoldie.toString() == q{"abc\"def"});
That's a consequence of the grammar being separated from lexing/parsingimplementation.
You're right that that does seem less than ideal. Although I'm not sure howto remedy that without loosing the independence between grammar andlex/parse implementation that is the main point of the GOLD-based style.
But there's something I don't quite understand about the approach you'resuggesting: You seem to be suggesting that a terminal be progressivelyconverted into its final form *as* it's still in the process of beingrecognized by the DFA. Which means, you don't know *what* you're supposed tobe converting it into *while* you're converting it. Which means, you have tobe speculatively converting it into all types of tokens that the current DFAstate could possibly be on its way towards accepting (also, the DFA wouldneed to contain a record of possible terminals for each DFA state). And thenthe result is thrown away if it turns out to be a different terminal. Isthis correct? If so, is there generally enough lexical difference betweenthe terminals that need such treatment to compensate for the extraprocessing needed in situations that are closer to worst-case (that is, incomparison to Goldie's current approach)?

Probably that's why I don't use lexer generators. Building lexers is thesimplest part of building a compiler, and I've always been motivated by tryingto make it as fast as possible.

To specifically answer your question, yes, in the lexers I make, you know you'reparsing a string, so you process it as you parse it.

If all of that is so, then what would be your thoughts on this approach?:
Suppose Goldie had a way to associate an optional "simultaneous/lockstepconversion" to a type of terminal. For instance:
myLanguage.associateConversion("StringLiteral", newStringLiteralConverter());
Then, 'StringLiteralConverter' would be something that could be eitheruser-provided or offered by Goldie (both ways would be supported). It wouldbe some sort of class or something that had three basic functions:
class StringLiteralConverter : ITerminalConverter
{
    void process(dchar c) {...}

    // Or maybe this to make it possible to minimize allocations
    // in certain circumstances by utilizing slices:
void process(dchar c, size_t indexIntoSource, string fullOrignalSource){...}
    Variant emit() {...}
    void clear() {...}
}
Each state in the lexer's DFA would know which terminals it could possiblybe processing. And for each of those terminals that has an associatedconverter, the lexer will call 'process()'. If a terminal is accepted,'emit' is called to get the final result (and maybe do any neededfinalization first), and then 'clear' is called on all converters that hadbeen used.
This feature would preclude the use of the actual "GOLD Parser Builder"program, but since I'm writing a tool to handle that functionality anyway,I'm not too concerned about that.
Do you think that would work? Would its benefits be killed by the overheadintroduced? If so, could those overheads be sufficiently reduced withoutscrapping the general idea?

I don't know. I'd have to study the issue for a while. I suggest taking a lookat dmd's lexer and compare. I'm not sure what Spirit's approach to this is.

What Goldie will be compared against is Spirit. Spirit is a reasonablysuccessful add-on to C++. Goldie doesn't have to do things the same way asSpirit (expression templates - ugh), but it should be as easy to use andat least as powerful.
Understood.

Re: Looking for champion - std.lang.d.lex

Reply via email to