On replenishing Rx buffers for vectorized Rx, mbuf->buf_addr isn't needed
to be accessed as it is static and easily calculated from the mbuf address.
Accessing the mbuf content causes unnecessary load stall and it is worsened
on ARM.

Fixes: 545b884b1da3 ("net/mlx5: fix buffer address posting in SSE Rx")
Cc: sta...@dpdk.org

Signed-off-by: Yongseok Koh <ys...@mellanox.com>
---
 drivers/net/mlx5/mlx5_rxtx_vec.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h b/drivers/net/mlx5/mlx5_rxtx_vec.h
index fda7004e2d..ced5547307 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.h
@@ -102,8 +102,12 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq, 
uint16_t n)
                return;
        }
        for (i = 0; i < n; ++i) {
-               wq[i].addr = rte_cpu_to_be_64((uintptr_t)elts[i]->buf_addr +
-                                             RTE_PKTMBUF_HEADROOM);
+               uintptr_t buf_addr =
+                       (uintptr_t)elts[i] + sizeof(struct rte_mbuf) +
+                       rte_pktmbuf_priv_size(rxq->mp) + RTE_PKTMBUF_HEADROOM;
+
+               assert(buf_addr == (uintptr_t)elts[i]->buf_addr);
+               wq[i].addr = rte_cpu_to_be_64(buf_addr);
                /* If there's only one MR, no need to replace LKey in WQE. */
                if (unlikely(mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) > 1))
                        wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
-- 
2.11.0

Reply via email to