On Wed, Jul 09, 2014 at 12:53:08PM -0700, H. S. Teoh via Digitalmars-d-learn wrote: [...] > (with gdc -O3 -funittest:) > > non-branching compare(signed,unsigned): 516 msecs > branching compare(signed,unsigned): 1209 msecs > non-branching compare(unsigned,signed): 453 msecs > branching compare(unsigned,signed): 756 msecs > Optimizer-thwarting value: 0 > > (Ignore the last lines of each output; that's just a way to prevent gdc > -O3 from being over-eager and optimizing out the entire test so that > everything returns 0 msecs.) [...]
Argh. I just looked at the disassembly, and unfortunately, we have to discard the test results for gdc, because gdc -O3 has apparently turned on auto-*vectorising* optimizations, so the reason the non-branching implementation runs so fast, is because multiple calls are being run in parallel in the xmm* registers! While this is certainly an impressive feat for gdc's optimizer, it unfortunately also means the above benchmark doesn't reflect the actual performance of standalone int/uint comparisons. :-( T -- I see that you JS got Bach.
