Re: DCT: D compiler as a collection of libraries

Roman D. Boiko Fri, 11 May 2012 02:23:57 -0700

On Friday, 11 May 2012 at 09:08:24 UTC, Jacob Carlborg wrote:

On 2012-05-11 10:58, Roman D. Boiko wrote:

Each token contains:
* start index (position in the original encoding, 0 corresponds
to the first code unit after BOM),
* token value encoded as UTF-8 string,
* token kind (e.g., token.kind = TokenKind.Float),
* possibly enum with annotations (e.g., token.annotations =
FloatAnnotation.Hex | FloatAnnotation.Real)


What about line and column information?

Indices of the first code unit of each line are stored insidelexer and a function will compute Location (line number, columnnumber, file specification) for any index. This way size of Tokeninstance is reduced to the minimum. It is assumed that Locationcan be computed on demand, and is not needed frequently. Socolumn is calculated by reverse walk till previous end of line,etc. Locations will possible to calculate both taking intoaccount special token sequences (e.g., #line 3 "ab/c.d"), ordiscarding them.

* Does it convert numerical literals and similar to theiractual values
It is planned to add a post-processor for that as part ofparser,
please see README.md for some more details.
Isn't that a job for the lexer?

That might be done in lexer for efficiency reasons (to avoidlexing token value again). But separating this into a dedicatedpost-processing phase leads to a much cleaner design (IMO), alsosuitable for uses when such values are not needed. Also I don'tthink that performance would be improved given the ratio ofnumber of literals to total number of tokens and the need tostore additional information per token if it is done in lexer. Iwill elaborate on that later.

Re: DCT: D compiler as a collection of libraries

Reply via email to