On Sunday, 9 March 2014 at 03:26:40 UTC, Andrei Alexandrescu wrote:
And it's not like people aren't talking. In contrast, D has been (and often rightly) criticized in the past for things like floating point performance and garbage collection. No evidence we are having an acute performance problem with UTF strings.

The size of this thread is one factor. But I see your point - I agree that is evidently not one of D's more glaring current problems. I hope I never alluded to that not being the case. That doesn't mean the problem doesn't exist at all, though.

If UTF decoding was explicit, the problem would stand out. I don't think
this is a valid argument.

Yours? Indeed isn't, if what you want is iterate by code unit (= meaningless for all but ASCII strings) by default.

I don't understand this argument. Iterating by code unit is not meaningless if you don't want to extract meaning from each unit iteration. For example, if you're parsing JSON or XML, you only care about the syntax characters, which are all ASCII. And there is no confusion of "what exactly are we counting here".

This was debated... people should not be looking at individual code
points, unless they really know what they're doing.

Should they be looking at code units instead?

No. They should only be looking at substrings.

Unless they're e.g. parsing a computer language (regardless if it has international text data), as above.

We are going in circles. People should have very good reasons for
looking at individual graphemes as well.

And it's good we have increasing support for graphemes. I don't think they should be the default.

I don't think so either. Did I somehow imply that?

What is an objective summary? Those who want to inflict massive breakage are not even done arguing we have a better design.

From my POV, I could say I see consensus, with just you defending a decision you made a while ago :) But I'd prefer a constructive discussion.

Anyway, I don't want to "inflict massive breakage" either. I want the amount of breakage to be a justified cost of fixing a mistake and permanently improving the language's design going forward.

Here's what I have so far, BTW:
http://wiki.dlang.org/Element_type_of_string_ranges
I'll have to review it in the morning. Or rather, afternoon, given that it's 6 AM here.

I'm afraid burden of proof is on you.

Why? I'm not saying that if you can't produce an example of breakage then your arguments are invalid. Rather, concrete examples give us a concrete problem to work with. I'm not trying to put any "burden of proof" on anyone.

That's great. Yes, we're exchanging jabs right now which is not our best use of time. Also in the interest of time, please understand you'd need to show the second coming if you want to break backward compatibility. Additions are a much better path.

Even a teensy-weensy breakage? :)

Far as I'm concerned every breakage of string processing is unacceptable or at least very undesirable.

In all seriousness, at this point I'm worried that you will defend the status quo even if the breakage turns out minimal. Instead of dealing with absolutes, advantages and disadvantages should be weighed against another (even with the breaking-backwards-compatibility penalty being very high).

Unit. s.byChar.front is a (possibly ref, possibly qualified) char.

So... does byChar for wstrings do the same thing as byWchar? And what if you want to iterate a wstring by char? Wouldn't it be better to have byChar/byWchar/byDchar be a range of char/wchar/dchar regardless of the string type, and have byCodeUnit which iterates by the code unit type?

Reply via email to