Re: Major performance problem with std.array.front()

Vladimir Panteleev Sat, 08 Mar 2014 18:16:18 -0800

On Sunday, 9 March 2014 at 01:23:27 UTC, Andrei Alexandrescuwrote:

On 3/8/14, 4:42 PM, Vladimir Panteleev wrote:
On Saturday, 8 March 2014 at 23:59:15 UTC, Andrei Alexandrescuwrote:
My only claim is that recognizing and iterating strings bycode point
is better than doing things by the octet.
Considering or disregarding the disadvantages of this choice?
Doing my best to weigh everything with the right measures.

I think it would be good to get a comparison of the twoapproaches, and list the arguments presented so far. I'll lookinto starting a Wiki page.

Okay, though when you opened with "devastating" I was hopingfor nothing short of death and dismemberment.

In proportion. To the best of my knowledge, no one here writessoftware for military or industrial robots in D. Security issuesrank as the worst kind of bugs in software on my scale.

Anyhow the fix is obvious per this brief tutorial:http://www.youtube.com/watch?v=hkDD03yeLnU


I don't get it.

I'm quite sure that std.range and std.algorithm will lose aLOT of
weight if they were fixed to not treat strings specially.
I'm not so sure. Most of the string-specific optimizationssimply detect certain string cases and forward them to arrayalgorithms that need be written anyway. You would, indeed, savea fair amount of isSomeString conditionals and stuff (thussimplifying on scaffolding), but probably not a lot of code.That's not useless work - it'd go somewhere in any design.


One way to find out.

Besides if you want to do Unicode you gotta crack some eggs.
No, I can't see how this justifies the choice. An explicitdecodingrange would have simplified things greatly while offering muchof the
same advantages.
My point there is that there's no useless or duplicated codethat would be thrown away. A better design would indeed makefor better modular separation - would be great if thestring-related optimizations in std.algorithm went elsewhere.They wouldn't disappear.

Why? Isn't the whole issue that std.range presents strings asdchar ranges, and std.algorithm needs to detect dchar ranges andthen treat them as char arrays? As opposed to std.algorithm justdetecting arrays and treating them all as arrays (which it shouldbe doing now anyway)?

3. Hidden, difficult-to-detect performance problems. Thereason why thisthread was started. I've had to deal with them in severalplaces myself.
I disagree with "hidden, difficult to detect".
Why? You can only find out that an algorithm is slower than itneeds tobe via either profiling (at which point you're wondering whythe @#$%the thing is so slow), or feeding it invalid UTF. If you hadmade adifferent choice for Unicode in D, this problem would notexist altogether.
Disagree.

Could you please elaborate? This is the second uninformativereply to this argument.

Except we already do. Arguments have already been presented inthisthread that demonstrate correctness problems with the currentapproach.I don't think that these can stand up to the problems that thesimpler
by-char iteration approach would have.
Sure there are, and you yourself illustrated a misuse of theAPIs.

If UTF decoding was explicit, the problem would stand out. Idon't think this is a valid argument.

My point is: code point is better than code unit

This was debated... people should not be looking at individualcode points, unless they really know what they're doing.

Grapheme is better than code point but a lot slower.

We are going in circles. People should have very good reasons forlooking at individual graphemes as well.

It seems we're quite in a sweet spot here wrtperformance/correctness.

This does not seem like an objective summary of this thread'sarguments so far.

I guess I'll get working on that wiki page to organize thearguments. This discussion is starting to feel like a quicksandroundabout.

With what has been put forward so far, that's not even close tojustifying a breaking change. If that great better design isjust get back to code unit iteration, the change will nothappen while I work on D. It is possible, however, that a muchbetter idea comes forward, and I'd be looking forward to such.

Actually, could you post some examples of real-world code thatwould be broken by a hypothetical sudden switch? I think I wouldbe hard-pressed to find some in my own code, but I'd need tocheck for sure to find out.

2. Add byChar that returns a random-access range iterating astring by character. Add byWchar that does on-the-flytranscoding to UTF16. Add byDchar that accepts any range ofchar and does decoding. And such stuff. Then whenever one wantsto go through a string by code point can just use str.byChar.

This is confusing. Did you mean to say that byChar iterates astring by code unit (not character / code point)?

Re: Major performance problem with std.array.front()

Reply via email to