Re: std.d.lexer requirements

Walter Bright Thu, 02 Aug 2012 00:30:27 -0700

On 8/1/2012 11:56 PM, Jonathan M Davis wrote:

Another thing that I should point out is that a range of UTF-8 or UTF-16
wouldn't work with many range-based functions at all. Most of std.algorithm
and its ilk would be completely useless. Range-based functions operate on a
ranges elements, so operating on a range of code units would mean operating on
code units, which is going to be _wrong_ almost all the time. Think about what
would happen if you used a function like map or filter on a range of code
units. The resultant range would be completely invalid as far as unicode goes.

My experience in writing fast string based code that worked on UTF8 andcorrectly handled multibyte characters was that they are very possible andpractical, and they are faster.

The lexer MUST MUST MUST be FAST FAST FAST. Or it will not be useful. If itisn't fast, serious users will eschew it and will cook up their own. You'll havea nice, pretty, useless toy of std.d.lexer.


I think there's some serious underestimation of how critical this is.

Range-based functions need to be operating on _characters_. Technically, not
even code points gets us there, so it's _still_ buggy. It's just a _lot_
closer to being correct and works 99.99+% of the time.


Multi-code point characters are quite irrelevant to the correctness of a D 
lexer.

If we want to be able to operate on ranges of UTF-8 or UTF-16, we need to add
a concept of variably-length encoded ranges so that it's possible to treat
them as both their encoding and whatever they represent (e.g. code point or
grapheme in the case of ranges of code units).


No, this is not necessary.

Re: std.d.lexer requirements

Reply via email to