On Sunday, 9 March 2014 at 08:32:09 UTC, monarch_dodra wrote:
On topic, I think D's implicit default decode to dchar is *infinity* times better than C++'s char-based strings. While imperfect in terms of grapheme, it was still a design decision made of win.

I'd be tempted to not ask "how do we back out", but rather, "how can we take this further"? I'd love to ditch the whole "char"/"dchar" thing altogether, and work with graphemes. But that would be massive involvement.

Why do you think it is better?

Let's be clear here: if you are searching/iterating/comparing by code point then your program is either not correct, or no better than doing so by code unit. Graphemes don't really fix this either.

I think this is the main confusion: the belief that iterating by code point has utility.

If you care about normalization then neither by code unit, by code point, nor by grapheme are correct (except in certain language subsets).

If you don't care about normalization then by code unit is just as good as by code point, but you don't need to specialise everywhere in Phobos.

AFAIK, there is only one exception, stuff like s.all!(c => c == 'é'), but as Vladimir correctly points out: (a) by code point, this is still broken in the face of normalization, and (b) are there any real applications that search a string for a specific non-ASCII character?

To those that think the status quo is better, can you give an example of a real-life use case that demonstrates this?

I do think it's probably too late to change this, but I think there is value in at least getting everyone on the same page.

Reply via email to