On Wednesday, August 01, 2012 22:33:12 Walter Bright wrote: > The lexer must use char or it will not be acceptable as anything but a toy > for performance reasons.
Avoiding decoding can be done with strings and operating on ranges of dchar, so you'd be operating almost entirely on ASCII. Are you saying that there's a performance issue aside from decoding? > Somebody has to convert the input files into dchars, and then back into > chars. That blows for performance. Think billions and billions of > characters going through, not just a few random strings. Why is there any converting to dchar going on here? I don't see why any would be necessary. If you reading in a file as a string or char[] (as would be typical), then you're operating on a string, and then the only time that any decoding will be necessary is when you actually need to operate on a unicode character, which is very rare in D's grammar. It's only when operating on something _other_ than a string that you'd have to actually deal with dchars. > > Hmmm. Well, I'd still argue that that's a parser thing. Pretty much > > nothing > > else will care about it. At most, it should be an optional feature of the > > lexer. But it certainly could be added that way. > > I hate to say "trust me on this", but if you don't, have a look at dmd's > lexer and how it handles identifiers, then look at dmd's symbol table. My point is that it's the sort of thing that _only_ a parser would care about. So, unless it _needs_ to be in the lexer for some reason, it shouldn't be. - Jonathan M Davis