On Monday 22 November 2010 16:45:43 Andrei Alexandrescu wrote: > On 11/22/10 5:59 PM, foobar wrote: > > Canonical example: DNA. > > I shouldn't need to write a special function to print it since it IS a > > string. I shouldn't need to cast it in order to do operations on it like > > sort, find, etc. > > I think it's best to encode DNA strings as sequences of ubyte. UTF > routines will work slower on them than functions for ubyte. > > > D's [w|D|]char types make no sense since they are NOT characters and the > > concept doesn't fit for unicode since as someone else wrote, there are > > different levels of abstractions in unicode (copde point, code unit, > > grapheme). Naming matters and having a cat called dog (char is actually > > code unit) is a source of bugs. > > I think the names are fine. It doesn't take much learning to understand > that char, wchar, and dchar are UTF-8, UTF-16, and UTF-32 code units > respectively. I mean it would be odd if they were something else.
The problem with char is that so many people are used to thinking of char as a character rather than a code unit. Once you get passed that, though, it's fine. I think that it's very well thought out as it is. It just takes some getting used to. Unfortunately though, it seems thinking of a char as UTF-8 code unit and _never_ dealing with it as a character is hard for a lot of people to adjust to. - Jonathan M Davis