Memory accesses that happen-before, in program order, a call to odp_barrier_wait() cannot be reordered to after the call. Similarly, memory accesses that happen-after, in program order, a call to odp_barrier_wait() cannot be reordered to before the call.
The current implementation of barriers uses sequentially consistent fences on either side of odp_barrier_wait(). The correct memory ordering for barriers is release upon entering odp_barrier_wait(), to prevent reordering to after the barrier, and acquire upon exiting odp_barrier_wait(), to prevent reordering to before the barrier. The measurable performance difference is negligible on weakly ordered architectures such as ARM, so the highlight of this change is correctness. Signed-off-by: Brian Brooks <brian.bro...@arm.com> --- platform/linux-generic/odp_barrier.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/platform/linux-generic/odp_barrier.c b/platform/linux-generic/odp_barrier.c index 5eb354de..f70bdbf8 100644 --- a/platform/linux-generic/odp_barrier.c +++ b/platform/linux-generic/odp_barrier.c @@ -34,7 +34,7 @@ void odp_barrier_wait(odp_barrier_t *barrier) uint32_t count; int wasless; - odp_mb_full(); + odp_mb_release(); count = odp_atomic_fetch_inc_u32(&barrier->bar); wasless = count < barrier->count; @@ -48,5 +48,5 @@ void odp_barrier_wait(odp_barrier_t *barrier) odp_cpu_pause(); } - odp_mb_full(); + odp_mb_acquire(); } -- 2.14.1