https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103066

--- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to H.J. Lu from comment #7)
> Instead of generating:
> 
>       movl    f(%rip), %eax
> .L2:
>       movd    %eax, %xmm0
>       addss   .LC0(%rip), %xmm0
>       movd    %xmm0, %edx
>       lock cmpxchgl   %edx, f(%rip)
>       jne     .L2
>       ret
> 
> we want
> 
>       movl    f(%rip), %eax
> .L2:
>       movd    %eax, %xmm0
>       addss   .LC0(%rip), %xmm0
>       movd    %xmm0, %edx
>       cmpl    f(%rip), %eax
>       jne     .L2
>       lock cmpxchgl   %edx, f(%rip)
>       jne     .L2
>       ret

No, certainly not.  The mov before or the remembered value from previous lock
cmpxchgl already has the right value unless the atomic memory is extremely
contended, so you don't want to add the non-atomic comparison in between.  Not
to mention that the way you've written it totally breaks it, because if the
memory is not equal to the expected value, you should get the current value.
With the above code, if f is modified by another thread in between the initial
movl f(%rip), %eax and cmpl f(%rip), %eax and never after it, it will loop
forever.
I believe what the above paper is talking about should be addressed by users of
these intrinsics if they care and if it is beneficial (e.g. depending on extra
information on how much the lock etc. is contended etc., in OpenMP one has
omp_sync_hint_* constants one can use in hint clause to tell if the lock is
contended, uncontended, unknown, speculative, non-speculative, unknown etc.).

Reply via email to