12-Sep-2013 19:39, Timon Gehr пишет:
On 09/11/2013 08:49 PM, Walter Bright wrote:
4. When naming tokens like .. 'slice', it is giving it a
syntactic/semantic name rather than a token name. This would be awkward
if .. took on new meanings in D. Calling it 'dotdot' would be clearer.
Ditto for the rest. For example that is done better, '*' is called
'star', rather than 'dereference'.

FWIW, I use Tok!"..". I.e. a "UDL" for specifying kinds of tokens when
interfacing with the parser. Some other kinds of tokens get a canonical
representation. Eg. Tok!"i" is the kind of identifier tokens, Tok!"0" is
the kind of signed integer literal tokens etc.

I like this.
Not only this has the benefit of not colliding with keywords. I also imagine that it could be incredibly convenient to get back the symbolic representation of a token (when token used as parameter to AST-node say BinaryExpr!(Tok!"+")). And truth be told we all know how tokens look in symbolic form so learning a pack of names for them feels pointless.

6. No clue how lookahead works with this.

Eg. use a CircularBuffer adapter range. I have an implementation
currently coupled with my own lexer implementation. If there is
interest, I could factor it out.

Lookahead is realized as follows in the parser:

(assume 'code' is the circular buffer range.)

auto saveState(){muteerr++; return code.pushAnchor();} // saves the
state and mutes all error messages until the state is restored

void restoreState(Anchor state){ muteerr--; code.popAnchor(state); }

The 'Anchor' is a trivial wrapper around a size_t. The circular buffer
grows automatically to keep around tokens still reachable by an anchor.
(The range only needs small constant space besides the buffer to support
this functionality, though it is unable to detect usage errors.)


This approach is typically more efficient than using a free list on
contemporary architectures.


This ^^ is how. In fact std.d.lexer internally does similar thing with non-RA ranges of bytes.


--
Dmitry Olshansky

Reply via email to