On 3/8/14, 6:14 PM, Vladimir Panteleev wrote:
On Sunday, 9 March 2014 at 01:23:27 UTC, Andrei Alexandrescu wrote:
On 3/8/14, 4:42 PM, Vladimir Panteleev wrote:
My point there is that there's no useless or duplicated code that
would be thrown away. A better design would indeed make for better
modular separation - would be great if the string-related
optimizations in std.algorithm went elsewhere. They wouldn't disappear.

Why? Isn't the whole issue that std.range presents strings as dchar
ranges, and std.algorithm needs to detect dchar ranges and then treat
them as char arrays? As opposed to std.algorithm just detecting arrays
and treating them all as arrays (which it should be doing now anyway)?

That's scaffolding, not actual executable code.

Why? You can only find out that an algorithm is slower than it needs to
be via either profiling (at which point you're wondering why the @#$%
the thing is so slow), or feeding it invalid UTF. If you had made a
different choice for Unicode in D, this problem would not exist
altogether.

Disagree.

Could you please elaborate? This is the second uninformative reply to
this argument.

What can I say? The answer is obvious. It's not hard to figure for me. Performance of D's UTF strings has never been a mystery to me. From where I stand all this "hidden, difficult-to-detect performance problems" drama is just posturing. We'd do good to wean such out of the discussion.

No bug myriad of bug reports "D strings are awfully slow" on bugzilla.

No long threads "Why are D strings so slow" on stack overflow.

No trolling on reddit or hackernews "D? Just look at their strings. How could anyone think that's a good idea lol."

And it's not like people aren't talking. In contrast, D has been (and often rightly) criticized in the past for things like floating point performance and garbage collection. No evidence we are having an acute performance problem with UTF strings.

Sure there are, and you yourself illustrated a misuse of the APIs.

If UTF decoding was explicit, the problem would stand out. I don't think
this is a valid argument.

Yours? Indeed isn't, if what you want is iterate by code unit (= meaningless for all but ASCII strings) by default.

My point is: code point is better than code unit

This was debated... people should not be looking at individual code
points, unless they really know what they're doing.

Should they be looking at code units instead?

Grapheme is better than code point but a lot slower.

We are going in circles. People should have very good reasons for
looking at individual graphemes as well.

And it's good we have increasing support for graphemes. I don't think they should be the default.

It seems we're quite in a sweet spot here wrt performance/correctness.

This does not seem like an objective summary of this thread's arguments
so far.

What is an objective summary? Those who want to inflict massive breakage are not even done arguing we have a better design.

I guess I'll get working on that wiki page to organize the arguments.
This discussion is starting to feel like a quicksand roundabout.

That's great. Yes, we're exchanging jabs right now which is not our best use of time. Also in the interest of time, please understand you'd need to show the second coming if you want to break backward compatibility. Additions are a much better path.

With what has been put forward so far, that's not even close to
justifying a breaking change. If that great better design is just get
back to code unit iteration, the change will not happen while I work
on D. It is possible, however, that a much better idea comes forward,
and I'd be looking forward to such.

Actually, could you post some examples of real-world code that would be
broken by a hypothetical sudden switch? I think I would be hard-pressed
to find some in my own code, but I'd need to check for sure to find out.

I'm afraid burden of proof is on you. Far as I'm concerned every breakage of string processing is unacceptable or at least very undesirable.

2. Add byChar that returns a random-access range iterating a string by
character. Add byWchar that does on-the-fly transcoding to UTF16. Add
byDchar that accepts any range of char and does decoding. And such
stuff. Then whenever one wants to go through a string by code point
can just use str.byChar.

This is confusing. Did you mean to say that byChar iterates a string by
code unit (not character / code point)?

Unit. s.byChar.front is a (possibly ref, possibly qualified) char.


Andrei

Reply via email to