------- Comment #2 from pluto at agmk dot net 2007-03-14 19:05 ------- (In reply to comment #1) > ifcvt could do this. But is cmpxchgq really faster with its atomictiy > guarantee?
only `lock; cmpxchg' has atomicity guarantee on smp. > They are all vector-path instructions, a compare - cmov sequence looks > faster (8 cycle latency vs. 10 and also with less constraints on register > allocation). Even the code we emit now: > > emit_cmpxchg: > .LFB2: > movq (%rdi), %rax > cmpq %rsi, %rax > je .L6 > rep ; ret > .p2align 4,,7 > .L6: > movq %rdx, (%rdi) > ret > > could be faster dependent on branch probability. yes, it could be faster, but for -Os we could emit a small branchless code: movq %rsi, %rax cmpxchgq %rdx, (%rdi) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31170