On Tuesday, 8 March 2016 at 14:14:25 UTC, ixid wrote:
Since I posted this thread I've learned std.algorithm.sum is 4 times slower than a naive loop sum. Even if this is for reasons of accuracy this is exactly what I am talking about- this is a hidden iceberg of terrible performance that will reflect poorly on D. That's so slow the function needs a health warning.

I seen a few cases while exploring D. Not all fully researched (apologies for that), but since there appears to be interest in identification I'll list them.

* Lower-casing strings (likely upper-casing), and some character type checks.

Routines like to toLower and asLowerCase call functions that work for all unicode characters. I was able to create much faster versions by checking if the character was ascii, then calling either the ascii version or the more general version. Same is true for a few routines like isNumber. Some have the ascii check optimization built in, but not all.

If this optimization is added, it might also be useful to add a few common combinations (or a generic solution, if that's feasible). For example, to check if a character is alpha-numeric, one currently ORs two tests from the standard library, isAlpha and isNumber. Putting in an ascii optimization check requires putting it before doing the OR, rather than inside the tests being ORed.

* Large associative arrays

When associative arrays get beyond about 10 million entries performance starts to decline. I believe this is due to resizing the arrays. It's worse with strings as keys than integers as keys. Having a way to reserve capacity may help under some circumstances.

* Associative arrays - Converting keys to immutable versions for lookup

Associative arrays want immutable values as keys. Far as I can tell, immutable values are also required when performing a lookup, even if a new entry won't be stored. A couple apps I've written walk through large lists of text values, naturally available as char[] because they are read from input streams. To test presence in an associative array, it's necessary to copy them to immutable strings first. I haven't fully researched this one, but my experience is that copying from char[] to string becomes a meaningful cost.

On the surface, this appears to be an optimization opportunity, to create the immutable strings only when actually storing a new value.

--Jon

Reply via email to