On 3/8/14, 12:38 PM, Vladimir Panteleev wrote:
On Saturday, 8 March 2014 at 20:05:36 UTC, Andrei Alexandrescu wrote:
1. All algorithms would by default operate on strings at char/wchar
level (i.e. code unit). That would cause the usual issues and
confusions I was aware of from C++. Certain algorithms would require
specialization and/or the user using byDchar for correctness.

As previously discussed, "correctness" here is conditional. I would not
use that word, it is another extreme.

Agreed.

From experience with C++ I knew (1) had a bad track record, and (2)
"generically conservative, specialize for speed" was a successful
pattern.

What would you have chosen given that context?

Ideally, we would have the Unicode algorithms in the standard library
from day 1, and advocated their use throughout the documentation.

It's not late to do a lot of that.

I'm inclined to say that the correct approach is to
state that algorithms operate explicitly on a T.sizeof basis and that if
the data contained in a particular range has some multi-element encoding
then separate, specialized routines should be used with the T.sizeof
behavior will not produce the desired result.

That sounds quite like C++ plus ICU. It doesn't strike me as the
golden standard for Unicode integration.

Why not? Because it sounds like D needs exactly that. Plus its amazing
slicing and range capabilities, of course.

Pretty much everyone using ICU hates it.

So the problem to me is that we're stuck not fixing something that's
horribly broken just because it's broken in a way that people presumably
now expect.

Clearly I'm being subjective here but again I'd find it difficult to
get convinced we have something horribly broken from the evidence I
gathered inside and outside Facebook.

Have you or anyone you personally know tried to process text in D
containing a writing system such as Sanskrit's?

No. Point being?

I'd personally like to see this fixed and I think the new behavior is
preferable overall, but I do share Andrei's concern that such a big
change might hurt the language anyway.

I've said this once and I'm saying it again: the best way to convert
this discussion into something useful is to devise ideas for useful
non-breaking additions.

I disagree. As I've argued, I believe that currently most uses of dchars
in an application are incorrect, and ultimately a time bomb for proper
internationalization support. We need to apply the same procedure that
we do with any language construct that was deemed to have been a poor
decision: put it through a deprecation cycle and fix it.

I think there are too large risks for that, and it's quite unclear this is solving a problem. "Slightly better Unicode support" is hardly a good justification.


Andrei

Reply via email to