Il 14/07/2013 04:53, Liu Ping Fan ha scritto: > Refcnt's atomic inc/dec ops are frequent and its idiom need no seq_cst > order. So to get better performance, it worth to adopt _relaxed > other than _seq_cst memory model on them. > > We resort to gcc builtins. If gcc supports C11 memory model, __atomic_* > buitlins is used, otherwise __sync_* builtins. > > Signed-off-by: Liu Ping Fan <pingf...@linux.vnet.ibm.com>
No, not at all. :( First of all, I'd like to understand how you benchmarked this. For inc/dec, relaxed vs. seq_cst has no effect except on PPC and ARM. And if the refcount ops are frequent enough, I strongly suspect cacheline bouncing has a bigger effect than the memory barriers. Second, it is wrong because you need a further read memory barrier when you are removing the last reference Third, it is making the API completely unorthogonal, and "tend to be exceptions" is not a justification. The justification here could be, rather than the performance, having to remember how to use atomic_fetch_dec in the unref side. I don't really buy that, but if you really care, do something like #define atomic_ref(ptr, field) \ __atomic_fetch_add(&((ptr)->field), 1, __ATOMIC_RELAXED) #define atomic_unref(ptr, field, releasefn) ( \ __atomic_fetch_add(&((ptr)->field), -1, __ATOMIC_RELEASE) == 1 \ ? (__atomic_thread_fence(__ATOMIC_ACQUIRE), (releasefn)(ptr)) : false) i.e. define a new interface similar to kref_get/kref_put and, since you are at it, optimize it. Paolo > --- > v2: > update atomics.txt > fix the dependency MACRO > --- > docs/atomics.txt | 4 ++++ > include/qemu/atomic.h | 14 ++++++++++++++ > 2 files changed, 18 insertions(+) > > diff --git a/docs/atomics.txt b/docs/atomics.txt > index 6f2997b..c8ad31d 100644 > --- a/docs/atomics.txt > +++ b/docs/atomics.txt > @@ -45,6 +45,10 @@ consistency*, where "the result of any execution is the > same as if the > operations of all the processors were executed in some sequential order, > and the operations of each individual processor appear in this sequence > in the order specified by its program". > +Note that atomic_inc/_dec and atomic_fetch_inc/_dec tend to be exceptions. > +Once gcc provides atomic_* builtins, they adopt *relaxed* memory order. > +The reason is that these pairs are used by the refcnt's ops which is frequent > +and has more effect on performance. > > qemu/atomic.h provides the following set of atomic read-modify-write > operations: > diff --git a/include/qemu/atomic.h b/include/qemu/atomic.h > index 0aa8913..e8353d2 100644 > --- a/include/qemu/atomic.h > +++ b/include/qemu/atomic.h > @@ -183,8 +183,15 @@ > #endif > > /* Provide shorter names for GCC atomic builtins. */ > +#ifdef __ATOMIC_RELAXED > +/* C11 memory_order_relaxed */ > +#define atomic_fetch_inc(ptr) __atomic_fetch_add(ptr, 1, __ATOMIC_RELAXED) > +#define atomic_fetch_dec(ptr) __atomic_fetch_add(ptr, -1, __ATOMIC_RELAXED) > +#else > +/* close to C11 memory_order_seq_cst */ > #define atomic_fetch_inc(ptr) __sync_fetch_and_add(ptr, 1) > #define atomic_fetch_dec(ptr) __sync_fetch_and_add(ptr, -1) > +#endif > #define atomic_fetch_add __sync_fetch_and_add > #define atomic_fetch_sub __sync_fetch_and_sub > #define atomic_fetch_and __sync_fetch_and_and > @@ -192,8 +199,15 @@ > #define atomic_cmpxchg __sync_val_compare_and_swap > > /* And even shorter names that return void. */ > +#ifdef __ATOMIC_RELAXED > +/* C11 memory_order_relaxed */ > +#define atomic_inc(ptr) ((void) __atomic_fetch_add(ptr, 1, > __ATOMIC_RELAXED)) > +#define atomic_dec(ptr) ((void) __atomic_fetch_add(ptr, -1, > __ATOMIC_RELAXED)) > +#else > +/* close to C11 memory_order_seq_cst */ > #define atomic_inc(ptr) ((void) __sync_fetch_and_add(ptr, 1)) > #define atomic_dec(ptr) ((void) __sync_fetch_and_add(ptr, -1)) > +#endif > #define atomic_add(ptr, n) ((void) __sync_fetch_and_add(ptr, n)) > #define atomic_sub(ptr, n) ((void) __sync_fetch_and_sub(ptr, n)) > #define atomic_and(ptr, n) ((void) __sync_fetch_and_and(ptr, n)) >