On Wednesday, August 01, 2012 07:12:30 Christophe Travert wrote:
> "Jonathan M Davis" , dans le message (digitalmars.D:173860), a écrit :
> > struct Token
> > {
> > 
> >  TokenType type;
> >  string str;
> >  LiteralValue value;
> >  SourcePos pos;
> > 
> > }
> > 
> > struct SourcePos
> > {
> > 
> >  size_t line;
> >  size_t col;
> >  size_t tabWidth = 8;
> > 
> > }
> 
> The occurence of tabWidth surprises me.
> What is col supposed to be ? an index (code unit), a character number
> (code point), an estimation of where the caracter is supposed to be
> printed on the line, given the provided tabwidth ?

col counts code points. tabWidth is the number of code points that '\t' is 
considered to be. That's it. So, in theory, an editor should be able to use it 
to indicate where on the line the token starts.

If the code using the lexer wants to treat tabs as having the widtho of a 
single code point, then all they have to do is pass in a SourcePos with a 
tabWidth of 1. But if the lexer doesn't have a way to count tabs differently, 
then there's no way for the code using the lexer to figure out the tabs without 
going back and lexing the whitespace itself. But counting tabs as a different 
width than everything else is so common that it seemed prudent to add it. 
Given that Phobos doesn't support graphemes and that ranges use code points, a 
code point is the closest to a character that the lexer would be counting, and 
it just makes sense to count code points.

Now, the one thing that might be a problem with treating tabs as a fixed width 
is that it's not uncommon to treat tabs as being up to a particular width 
rather than having a fixed width such that if there are other characters before 
a tab, then the tabs width is the max tab width minus the number of characters 
since the start of that tab width. e.g. if tabs had a max width of 8, then

\t123\t

could have the first tab with a width of 8 and the second as only having a 
width of 5. But that's too complicated to deal with in the lexer IMHO.

Maybe the tab width isn't worth having in SourePos and it will ultimately be 
removed, but it struck me as a useful feature, so I added it.

- Jonathan M Davis

Reply via email to