Re: [Qemu-devel] [PATCH v2] atomic: using memory_order_relaxed for refcnt inc/dec ops

Paolo Bonzini Sat, 13 Jul 2013 22:58:58 -0700

Il 14/07/2013 04:53, Liu Ping Fan ha scritto:
> Refcnt's atomic inc/dec ops are frequent and its idiom need no seq_cst
> order. So to get better performance, it worth to adopt _relaxed
> other than _seq_cst memory model on them.
> 
> We resort to gcc builtins. If gcc supports C11 memory model, __atomic_*
> buitlins is used, otherwise __sync_* builtins.
> 
> Signed-off-by: Liu Ping Fan <pingf...@linux.vnet.ibm.com>


No, not at all. :(

First of all, I'd like to understand how you benchmarked this.  For
inc/dec, relaxed vs. seq_cst has no effect except on PPC and ARM.  And
if the refcount ops are frequent enough, I strongly suspect cacheline
bouncing has a bigger effect than the memory barriers.

Second, it is wrong because you need a further read memory barrier when
you are removing the last reference

Third, it is making the API completely unorthogonal, and "tend to be
exceptions" is not a justification.

The justification here could be, rather than the performance, having to
remember how to use atomic_fetch_dec in the unref side.  I don't really
buy that, but if you really care, do something like

#define atomic_ref(ptr, field) \
  __atomic_fetch_add(&((ptr)->field), 1, __ATOMIC_RELAXED)
#define atomic_unref(ptr, field, releasefn) ( \
  __atomic_fetch_add(&((ptr)->field), -1, __ATOMIC_RELEASE) == 1 \
  ? (__atomic_thread_fence(__ATOMIC_ACQUIRE), (releasefn)(ptr)) : false)

i.e. define a new interface similar to kref_get/kref_put and, since you
are at it, optimize it.

Paolo

> ---
> v2:
>   update atomics.txt
>   fix the dependency MACRO
> ---
>  docs/atomics.txt      |  4 ++++
>  include/qemu/atomic.h | 14 ++++++++++++++
>  2 files changed, 18 insertions(+)
> 
> diff --git a/docs/atomics.txt b/docs/atomics.txt
> index 6f2997b..c8ad31d 100644
> --- a/docs/atomics.txt
> +++ b/docs/atomics.txt
> @@ -45,6 +45,10 @@ consistency*, where "the result of any execution is the 
> same as if the
>  operations of all the processors were executed in some sequential order,
>  and the operations of each individual processor appear in this sequence
>  in the order specified by its program".
> +Note that atomic_inc/_dec and atomic_fetch_inc/_dec tend to be exceptions.
> +Once gcc provides atomic_* builtins, they adopt *relaxed* memory order.
> +The reason is that these pairs are used by the refcnt's ops which is frequent
> +and has more effect on performance.
>  
>  qemu/atomic.h provides the following set of atomic read-modify-write
>  operations:
> diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h
> index 0aa8913..e8353d2 100644
> --- a/include/qemu/atomic.h
> +++ b/include/qemu/atomic.h
> @@ -183,8 +183,15 @@
>  #endif
>  
>  /* Provide shorter names for GCC atomic builtins.  */
> +#ifdef __ATOMIC_RELAXED
> +/* C11 memory_order_relaxed */
> +#define atomic_fetch_inc(ptr)  __atomic_fetch_add(ptr, 1, __ATOMIC_RELAXED)
> +#define atomic_fetch_dec(ptr)  __atomic_fetch_add(ptr, -1, __ATOMIC_RELAXED)
> +#else
> +/* close to C11 memory_order_seq_cst */
>  #define atomic_fetch_inc(ptr)  __sync_fetch_and_add(ptr, 1)
>  #define atomic_fetch_dec(ptr)  __sync_fetch_and_add(ptr, -1)
> +#endif
>  #define atomic_fetch_add       __sync_fetch_and_add
>  #define atomic_fetch_sub       __sync_fetch_and_sub
>  #define atomic_fetch_and       __sync_fetch_and_and
> @@ -192,8 +199,15 @@
>  #define atomic_cmpxchg         __sync_val_compare_and_swap
>  
>  /* And even shorter names that return void.  */
> +#ifdef __ATOMIC_RELAXED
> +/* C11 memory_order_relaxed */
> +#define atomic_inc(ptr)        ((void) __atomic_fetch_add(ptr, 1, 
> __ATOMIC_RELAXED))
> +#define atomic_dec(ptr)        ((void) __atomic_fetch_add(ptr, -1, 
> __ATOMIC_RELAXED))
> +#else
> +/* close to C11 memory_order_seq_cst */
>  #define atomic_inc(ptr)        ((void) __sync_fetch_and_add(ptr, 1))
>  #define atomic_dec(ptr)        ((void) __sync_fetch_and_add(ptr, -1))
> +#endif
>  #define atomic_add(ptr, n)     ((void) __sync_fetch_and_add(ptr, n))
>  #define atomic_sub(ptr, n)     ((void) __sync_fetch_and_sub(ptr, n))
>  #define atomic_and(ptr, n)     ((void) __sync_fetch_and_and(ptr, n))
>

Re: [Qemu-devel] [PATCH v2] atomic: using memory_order_relaxed for refcnt inc/dec ops

Reply via email to