On 11/22/10 5:59 PM, foobar wrote:
Canonical example: DNA. I shouldn't need to write a special function to print it since it IS a string. I shouldn't need to cast it in order to do operations on it like sort, find, etc.
I think it's best to encode DNA strings as sequences of ubyte. UTF routines will work slower on them than functions for ubyte.
D's [w|D|]char types make no sense since they are NOT characters and the concept doesn't fit for unicode since as someone else wrote, there are different levels of abstractions in unicode (copde point, code unit, grapheme). Naming matters and having a cat called dog (char is actually code unit) is a source of bugs.
I think the names are fine. It doesn't take much learning to understand that char, wchar, and dchar are UTF-8, UTF-16, and UTF-32 code units respectively. I mean it would be odd if they were something else.
Andrei