On Mon, 24 Oct 2011 19:49:43 -0400, Simen Kjaeraas
<simen.kja...@gmail.com> wrote:
On Mon, 24 Oct 2011 21:41:57 +0200, Steven Schveighoffer
<schvei...@yahoo.com> wrote:
Plus, a combining character (such as an umlaut or accent) is part of a
character, but may be a separate code point.
If this is correct (and it is), then decoding to dchar is simply not
enough.
You seem to advocate decoding to graphemes, which is a whole different
matter.
I am advocating that. And it's a matter of perception. D can say "we
only support code-point decoding" and what that means to a user is, "we
don't support language as you know it." Sure it's a part of unicode, but
it takes that extra piece to make it actually usable to people who require
unicode.
Even in English, fiancé has an accent. To say D supports unicode, but
then won't do a simple search on a file which contains a certain *valid*
encoding of that word is disingenuous to say the least.
D needs a fully unicode-aware string type. I advocate D should use it as
the default string type, but it needs one whether it's the default or not
in order to say it supports unicode.
-Steve