Using just sufficient barriers really matters to performance. Insufficient barriers will cause issues while barriers stronger than required, especially in the fast path is a performance killer.
In the joint preliminary testing between Arm and Ampere, 8%~13% performance was measured. Gavin Hu (5): net/mlx5: relax the barrier for UAR write net/mlx5: use cio barrier before the BF WQE net/mlx5: add missing barrier net/mlx5: add descriptive comment for a barrier net/mlx5: non-cacheable mapping defaulted for aarch64 Phil Yang (1): net/mlx5: relaxed ordering for multi-packet RQ buffer refcnt drivers/net/mlx5/mlx5_rxq.c | 5 +++-- drivers/net/mlx5/mlx5_rxtx.c | 16 +++++++++------- drivers/net/mlx5/mlx5_rxtx.h | 11 ++++++++--- drivers/net/mlx5/mlx5_txq.c | 4 ++++ 4 files changed, 24 insertions(+), 12 deletions(-) -- 2.17.1

