24-Aug-2014 16:24, Andrei Alexandrescu пишет:
On 8/23/14, 6:39 PM, H. S. Teoh via Digitalmars-d wrote:
On Sat, Aug 23, 2014 at 06:06:37PM -0700, Andrei Alexandrescu via
Digitalmars-d wrote:
Currently char[], wchar[], dchar[] and qualified variants fulfill the
requirements of isSomeString. Also, char[], wchar[] and qualified
variants fulfill the requirements of isNarrowString.

Various algorithms in Phobos test for these traits to optimize away
UTF decoding where unnecessary.

I'm thinking of relaxing the definitions to all types that fulfill the
following requirements:

* are random access ranges
* element type is some character
* offer .ptr as a @system property that offers a pointer to the first
character

This would allow us to generalize the notion of string and offer
optimizations for user-defined, not only built-in, strings. Thoughts?
[...]

Recently there has been another heated debate on Github about whether
ranges should auto-decode at all:

    https://github.com/D-Programming-Language/phobos/pull/2423

Jonathan is about to write up a DIP for phasing out auto-decoding. Given
that, I think we need to decide which direction to take, lest we waste
energy on something that will be going away soon.

I've read the thread now, thanks. I think that discussion has lost
perspective. There is talk about how decoding is slow, yet nobody (like,
literally not one soul) looked at measuring and optimizing decoding.
Everybody just assumes it's unacceptably slow; the irony is, it is, but
mostly because it's not engineered for speed.

I'd be damned, if I didn't measure. Do not assume things if there is little actual data.

Take a peek at my tests (yeah, rough stuff but it does the job).
https://github.com/DmitryOlshansky/gsoc-bench-2012

Let's count number of codepoints in a file (loading it into memory before measuring stuff):
===================
     decode-only [arwiki],    368312 hits,   3.16312, 205.58 Mb/s
            noop [arwiki],    599328 hits,   0.67872, 958.11 Mb/s
====================
     decode-only [ruwiki],    523414 hits,   4.54544, 206.23 Mb/s
            noop [ruwiki],    884374 hits,    0.7898, 1186.92 Mb/s
====================
     decode-only [dewiki],   1773464 hits,   5.63744, 318.28 Mb/s
            noop [dewiki],   1607454 hits,    2.1322, 841.52 Mb/s

With that I can say that decode *is* engineered for speed.
It processes gets about 200-300 millions of codepoints per second for me. Compared with baseline of 840 Mbytes/sec of testing if each byte is > 0x20.

With LDC both numbers are about 2 times better but the picture stays the same.

As you can see there is a significant gap there and decoding is not yet memory-bound. It's still a cost that hinders top-performance code.

Look e.g. at
https://github.com/D-Programming-Language/phobos/blob/master/std/utf.d#L1074.
That's a memory allocation each and every time decodeImpl is called. I'm
not kidding. Take a look at http://goo.gl/p5pl3D vs.

You are completely wrong on this count. I'm horrified to learn that even D experts are unaware of how `enum` works or you'd got to be kidding me.

enum doesn't allocate but serves as `copy-paste` token effectively putting said literal everywhere it is used.

Now look at usage pattern:
https://github.com/D-Programming-Language/phobos/blob/master/std/utf.d#L1173

(where 'i' is constant from static foreach)

Compiler sees it as:
[(1 << 7) - 1, (1 << 11) - 1, (1 << 16) - 1, (1 << 21) - 1][0]

and const-folds it accordingly.

> I'd like RCString to benefit of the optimizations for built-in strings,
> and that's why I was looking at relaxing isSomeString/isNarrowString.
>

On this I agree, isSomeString is an ugly half-assed hack if we can't have other kinds of strings.

Also consider Array!char which in fact should be pretty close to RCString.

I'm of opinion that there are 2 ways out of 'decode by default sucks':

a) Make it finally generic and clearly state what a DecodableRange is. Stick with that everywhere.
b) To hell with that and just remove implicit decoding.

Else it really-really sucks.

--
Dmitry Olshansky

Reply via email to