On Sunday, 9 March 2014 at 08:32:09 UTC, monarch_dodra wrote:
On topic, I think D's implicit default decode to dchar is
*infinity* times better than C++'s char-based strings. While
imperfect in terms of grapheme, it was still a design decision
made of win.
I'd be tempted to not ask "how do we back out", but rather,
"how can we take this further"? I'd love to ditch the whole
"char"/"dchar" thing altogether, and work with graphemes. But
that would be massive involvement.
Why do you think it is better?
Let's be clear here: if you are searching/iterating/comparing by
code point then your program is either not correct, or no better
than doing so by code unit. Graphemes don't really fix this
either.
I think this is the main confusion: the belief that iterating by
code point has utility.
If you care about normalization then neither by code unit, by
code point, nor by grapheme are correct (except in certain
language subsets).
If you don't care about normalization then by code unit is just
as good as by code point, but you don't need to specialise
everywhere in Phobos.
AFAIK, there is only one exception, stuff like s.all!(c => c ==
'é'), but as Vladimir correctly points out: (a) by code point,
this is still broken in the face of normalization, and (b) are
there any real applications that search a string for a specific
non-ASCII character?
To those that think the status quo is better, can you give an
example of a real-life use case that demonstrates this?
I do think it's probably too late to change this, but I think
there is value in at least getting everyone on the same page.