On Wed, Jul 09, 2014 at 12:53:08PM -0700, H. S. Teoh via Digitalmars-d-learn 
wrote:
[...]
> (with gdc -O3 -funittest:)
> 
>       non-branching compare(signed,unsigned): 516 msecs
>       branching compare(signed,unsigned): 1209 msecs
>       non-branching compare(unsigned,signed): 453 msecs
>       branching compare(unsigned,signed): 756 msecs
>       Optimizer-thwarting value: 0
> 
> (Ignore the last lines of each output; that's just a way to prevent gdc
> -O3 from being over-eager and optimizing out the entire test so that
> everything returns 0 msecs.)
[...]

Argh. I just looked at the disassembly, and unfortunately, we have to
discard the test results for gdc, because gdc -O3 has apparently turned
on auto-*vectorising* optimizations, so the reason the non-branching
implementation runs so fast, is because multiple calls are being run in
parallel in the xmm* registers!

While this is certainly an impressive feat for gdc's optimizer, it
unfortunately also means the above benchmark doesn't reflect the actual
performance of standalone int/uint comparisons. :-(


T

-- 
I see that you JS got Bach.

Reply via email to