On Mon, May 16, 2016 at 12:31:04AM +0000, Jack Stouffer via Digitalmars-d wrote:
> On Sunday, 15 May 2016 at 23:10:38 UTC, Jon D wrote:
> >Given the importance of performance in the auto-decoding topic, it
> >seems reasonable to quantify it. I took a stab at this. It would of
> >course be prudent to have others conduct similar analysis rather than
> >rely on my numbers alone.
> 
> Here is another benchmark (see the above comment for the code to apply
> the patch to) that measures the iteration time difference:
> http://forum.dlang.org/post/ndj6dm$a6c$1...@digitalmars.com
> 
> The result is a 756% slow down

I decide to do my own benchmarking too. Here's the code:

        /**
         * Simple-minded benchmark for measuring performance degradation caused 
by
         * autodecoding.
         */
        
        import std.typecons : Flag, Yes, No;
        
        size_t countNewlines(Flag!"autodecode" autodecode)(const(char)[] input)
        {
            size_t count = 0;
        
            static if (autodecode)
            {
                import std.array;
                foreach (dchar ch; input)
                {
                    if (ch == '\n') count++;
                }
            }
            else // !autodecode
            {
                import std.utf : byCodeUnit;
                foreach (char ch; input.byCodeUnit)
                {
                    if (ch == '\n') count++;
                }
            }
            return count;
        }
        
        void main(string[] args)
        {
            import std.datetime : benchmark;
            import std.file : read;
            import std.stdio : writeln, writefln;
        
            string input = (args.length >= 2) ? args[1] : 
"/usr/src/d/phobos/std/datetime.d";
        
            uint n = 50;
            auto data = cast(char[]) read(input);
            writefln("Input: %s (%d bytes)", input, data.length);
            size_t count;
        
            writeln("With autodecoding:");
            auto result = benchmark!({
                count = countNewlines!(Yes.autodecode)(data);
            })(n);
            writefln("Newlines: %d  Time: %s msecs", count, result[0].msecs);
        
            writeln("Without autodecoding:");
            result = benchmark!({
                count = countNewlines!(No.autodecode)(data);
            })(n);
            writefln("Newlines: %d  Time: %s msecs", count, result[0].msecs);
        }
        
        // vim:set sw=4 ts=4 et:

Just for fun, I decided to use std/datetime.d, one of the largest
modules in Phobos, as a test case.

For comparison, I compiled with dmd (latest git head) and gdc 5.3.1. The
compile commands were:

        dmd -O -inline bench.d -ofbench.dmd
        gdc -O3 bench.d -o bench.gdc

Here are the results from bench.dmd:

        Input: /usr/src/d/phobos/std/datetime.d (1464089 bytes)
        With autodecoding:
        Newlines: 35398  Time: 331 msecs
        Without autodecoding:
        Newlines: 35398  Time: 254 msecs

And the results from bench.gdc:

        Input: /usr/src/d/phobos/std/datetime.d (1464089 bytes)
        With autodecoding:
        Newlines: 35398  Time: 253 msecs
        Without autodecoding:
        Newlines: 35398  Time: 25 msecs

These results are pretty typical across multiple runs. There is a
variance of about 20 msecs or so between bench.dmd runs, but the
bench.gdc runs vary only by about 1-2 msecs.

So for bench.dmd, autodecoding adds about a 30% overhead to running
time, whereas for bench.gdc, autodecoding costs an order of magnitude
increase in running time.

As an interesting aside, compiling with dmd without -O -inline causes
the non-autodecoding case to be actually consistently *slower* than the
autodecoding case. Apparently in this case the performance is dominated
by the cost of calling non-inlined range primitives on byCodeUnit,
whereas a manual for-loop over the array of chars produces similar
results to the -O -inline case.  I find this interesting, because it
shows that the cost of autodecoding is relatively small compared to the
cost of unoptimized range primitives.  Nevertheless, it does make a big
difference when range primitives are properly optimized.  It is
especially poignant in the case of gdc that, given a superior optimizer,
the non-autodecoding case can be made an order of magnitude faster,
whereas the autodecoding case is presumably complex enough to defeat the
optimizer.


T

-- 
Democracy: The triumph of popularity over principle. -- C.Bond

Reply via email to