> On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley <a...@redhat.com> wrote:
> Is it faster?  Have you measured it?  Is it so much faster that it's critical 
> for your
> application?

Well, I couldn't really leave this be: I did a little bit of
benchmarking using my company's proprietary benchmarking library,
which I'll try and get open sourced. It follows Intel's
recommendations for using RDTSCP/CPUID etc, and I've also spent some
time looking at Agner Fog 's techniques. I believe it to be pretty
accurate, to within a clock cycle or two.

On my laptop (Core i5 M520) the volatile and non-volatile increments
are so fast as to be within the noise - 1-2 clock cycles. So that
certainly lends support to your theory Andrew that it's probably not
worth the effort (other than offending my aesthetic sensibilities!).
Obviously this doesn't really take into account the extra i-cache
pressure.

As a comparison, the "lock xaddl" versions come out at 18 cycles.
Obviously this is also pretty much "free" by any reasonable metric,
but it's hard to measure the impact of the bus lock on other
processors' memory accesses in a highly multi-threaded environment.

For completeness I also tried it on a few other machines:
X5670 : 0-2 for normal, 28 clocks for lock xadd
E5-2667 v2: as above, 27 clocks for lock xadd
E5-2667 v3: as above, 15 clocks for lock xadd

On Sat, Dec 27, 2014 at 11:57 AM, Andrew Haley <a...@redhat.com> wrote:
> Well, in this case you now know: it's a bug!  But one that it's
>fairly hard to care deeply about, although it might get fixed now.

Understood completely! Thanks again,

Matt

Reply via email to