On 30 September 2011 18:01, H.J. Lu <hjl.to...@gmail.com> wrote: > You may want to look a look at: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50583 > > ARM may have the same problem.
OK - although to be honest this patch only stretches the same structures to 64bit - any major changes in semantics are a separate issue - but thanks for pointing it out. Hmm - I think what's produced is correct; however the manual description is inconsistent: These builtins perform the operation suggested by the name, and returns the value that had previously been in memory. That is, { tmp = *ptr; *ptr OP= value; return tmp; } The ARM code (see below) does a single load inside a loop with a guarded store. This guarantees that the value returned is the value that was 'previously been in memory' directly prior to the atomic operation - however that does mean it doesn't do the pair of accesses implied by the 'tmp = *ptr; *ptr OP= value' On ARM the operation for fetch_and_add we get: (This is pre-my-patch and 32bit, my patch doesn't change the structure except for the position of that last label): mov r3, r0 dmb sy .LSYT6: ldrex r0, [r3] add r2, r0, r1 strex r0, r2, [r3] teq r0, #0 bne .LSYT6 sub r0, r2, r1 dmb sy That seems the correct semantics to me - if not what am I missing? Was the intention of the example really to cause two loads - if so why? for sync_and_fetch we get: dmb sy .LSYT6: ldrex r0, [r3] add r0, r0, r1 strex r2, r0, [r3] teq r2, #0 bne .LSYT6 dmb sy i.e. the value returned is always the value that goes into the guarded store - and is hence always the value that's stored. Dave