On Mon, 24 Oct 2011 19:49:43 -0400, Simen Kjaeraas <simen.kja...@gmail.com> wrote:

On Mon, 24 Oct 2011 21:41:57 +0200, Steven Schveighoffer <schvei...@yahoo.com> wrote:

Plus, a combining character (such as an umlaut or accent) is part of a
character, but may be a separate code point.

If this is correct (and it is), then decoding to dchar is simply not enough. You seem to advocate decoding to graphemes, which is a whole different matter.

I am advocating that. And it's a matter of perception. D can say "we only support code-point decoding" and what that means to a user is, "we don't support language as you know it." Sure it's a part of unicode, but it takes that extra piece to make it actually usable to people who require unicode.

Even in English, fiancé has an accent. To say D supports unicode, but then won't do a simple search on a file which contains a certain *valid* encoding of that word is disingenuous to say the least.

D needs a fully unicode-aware string type. I advocate D should use it as the default string type, but it needs one whether it's the default or not in order to say it supports unicode.

-Steve

Reply via email to