On 3/2/20 7:32 PM, aliak wrote:
On Monday, 2 March 2020 at 23:27:22 UTC, Steven Schveighoffer wrote:

What I think is happening is that it determines nobody is using the result, and the function is pure, so it doesn't bother calling that function (probably not even the lambda, and then probably removes the loop completely).

I'm assuming for some reason, the binary search is not flagged pure, so it's not being skipped.

Apparently you're right: https://github.com/dlang/phobos/blob/5e13653a6eb55c1188396ae064717a1a03fd7483/std/range/package.d#L11107

That's not definitive. Note that a template member or member of a struct template can be *inferred* to be pure.

It's also entirely possible for the function to be pure, but the compiler decides for another reason not to elide the whole thing. Optimization isn't ever guaranteed.




If I change to this to ensure side effects:

bool makeImpure; // TLS variable outside of main

...

    auto results = benchmark!(
        () => makeImpure = r1.canFind(max),
        () => makeImpure = r2.contains(max),
        () => makeImpure = r3.canFind(max),
    )(5_000);

writefln("%(%s\n%)", results); // modified to help with the comma confusion

I now get:
4 secs, 428 ms, and 3 hnsecs
221 μs and 9 hnsecs
4 secs, 49 ms, 982 μs, and 5 hnsecs

More like what I expected!

Ahhhh damn! And here I was thinking that branch prediction made a HUGE difference! Ok, I'm taking my tail and slowly moving away now :) Let us never speak of this again.

LOL, I'm sure this will come up again ;) The forums are full of confusing benchmarks where LDC has elided the whole thing being tested. It's amazing at optimizing. Sometimes, too amazing.

On 3/2/20 6:46 PM, H. S. Teoh wrote:
> To prevent the optimizer from eliding "useless" code, you need to do
> something with the return value that isn't trivial (assigning to a
> variable that doesn't get used afterwards is "trivial", so that's not
> enough). The easiest way is to print the result: the optimizer cannot
> elide I/O.

Yeah, well, that means you are also benchmarking the i/o (which would dwarf the other pieces being tested).

I think assigning the result to a global fits the bill pretty well, but obviously only works when you're not inside a pure function.

-Steve

Reply via email to