On 3/8/14, 1:13 PM, Vladimir Panteleev wrote:
On Saturday, 8 March 2014 at 20:50:49 UTC, Andrei Alexandrescu wrote:
On 3/8/14, 12:38 PM, Vladimir Panteleev wrote:
On Saturday, 8 March 2014 at 20:05:36 UTC, Andrei Alexandrescu wrote:
That sounds quite like C++ plus ICU. It doesn't strike me as the
golden standard for Unicode integration.

Why not? Because it sounds like D needs exactly that. Plus its amazing
slicing and range capabilities, of course.

Pretty much everyone using ICU hates it.

I admit I never used it personally.

Time to do due diligence :o).

I just thought you meant that
implied "D implementations of relevant Unicode algorithms, adapted to D
style (range interface)". Is there more to this than the limitations of
C++ or the implementers' design choices?

Have you or anyone you personally know tried to process text in D
containing a writing system such as Sanskrit's?

No. Point being?

Point being, we don't have solid data to conclude whether D's current
approach is actually good enough for such cases as you claim.

My only claim is that recognizing and iterating strings by code point is better than doing things by the octet.

We do have one post in this thread:
http://forum.dlang.org/post/jlgfkxlrhlzdpwkps...@forum.dlang.org

I think there are too large risks for that,

For what? We have not discussed a possible plan yet. Are you referring
to Walter Bright's proposal?

Any plan to inflict a large breaking change for strings incurs a risk. To add insult to injury, the improvement brought about by the change is debatable.

and it's quite unclear this is solving a problem. "Slightly better
Unicode support" is hardly a good justification.

What this will solve:

1. Eliminating dangerous constructs, such as s.countUntil and s.indexOf
both returning integers, yet possibly having different values in
circumstances that the developer may not foresee.

I disagree there's any danger. They deal in code points, end of story.

2. Very high complexity of implementations (the ElementEncodingType
problem previously mentioned).

I disagree with "very high". Besides if you want to do Unicode you gotta crack some eggs.

3. Hidden, difficult-to-detect performance problems. The reason why this
thread was started. I've had to deal with them in several places myself.

I disagree with "hidden, difficult to detect". Also I'd add that I'd rather not have hidden, difficult to detect correctness problems.

4. Encourage D programmers to write Unicode-capable code that is correct
in the full sense of the word.

I disagree we are presently discouraging them. I do agree a change would make certain things clearer. But not enough to nearly make up for the breakage.

I think the above list has enough weight to merit at least considering
*some* breaking changes.

I think a better approach is to figure what to add.


Andrei

Reply via email to