Memory accesses that happen-before, in program order, a call to
odp_barrier_wait() cannot be reordered to after the call. Similarly,
memory accesses that happen-after, in program order, a call to
odp_barrier_wait() cannot be reordered to before the call.

The current implementation of barriers uses sequentially consistent
fences on either side of odp_barrier_wait().

The correct memory ordering for barriers is release upon entering
odp_barrier_wait(), to prevent reordering to after the barrier, and
acquire upon exiting odp_barrier_wait(), to prevent reordering to
before the barrier.

The measurable performance difference is negligible on weakly ordered
architectures such as ARM, so the highlight of this change is correctness.

Signed-off-by: Brian Brooks <brian.bro...@arm.com>
---
 platform/linux-generic/odp_barrier.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/platform/linux-generic/odp_barrier.c 
b/platform/linux-generic/odp_barrier.c
index 5eb354de..f70bdbf8 100644
--- a/platform/linux-generic/odp_barrier.c
+++ b/platform/linux-generic/odp_barrier.c
@@ -34,7 +34,7 @@ void odp_barrier_wait(odp_barrier_t *barrier)
        uint32_t count;
        int wasless;
 
-       odp_mb_full();
+       odp_mb_release();
 
        count   = odp_atomic_fetch_inc_u32(&barrier->bar);
        wasless = count < barrier->count;
@@ -48,5 +48,5 @@ void odp_barrier_wait(odp_barrier_t *barrier)
                        odp_cpu_pause();
        }
 
-       odp_mb_full();
+       odp_mb_acquire();
 }
-- 
2.14.1

Reply via email to