On 25/03/2019 19:13, Yongseok Koh wrote: > When replenishing mbufs on Rx, buffer address (mbuf->buf_addr) should be > loaded. non-x86 processors (mostly RISC such as ARM and Power) are more > vulnerable to load stall. For x86, reducing the number of instructions > seems to matter most. > > For x86, this is simply a load but for other architectures, it is > calculated from the address of mbuf structure by rte_mbuf_buf_addr() > without having to load the first cacheline of the mbuf. >
Hi Yongseok, > Fixes: 12d468a62bc1 ("net/mlx5: fix instruction hotspot on replenishing Rx > buffer") A similar backport was just added into 18.11.1-RC2, should it be reverted? I'm not keen to put another fix for it in for 18.11.1 at this stage, I think it can be part of 18.11.2. WDYT? thanks, Kevin. > Cc: sta...@dpdk.org > > Signed-off-by: Yongseok Koh <ys...@mellanox.com> > --- > drivers/net/mlx5/mlx5_rxtx_vec.h | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h > b/drivers/net/mlx5/mlx5_rxtx_vec.h > index 5df8e291e6..4220b08dd2 100644 > --- a/drivers/net/mlx5/mlx5_rxtx_vec.h > +++ b/drivers/net/mlx5/mlx5_rxtx_vec.h > @@ -102,9 +102,21 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq, > uint16_t n) > return; > } > for (i = 0; i < n; ++i) { > - void *buf_addr = rte_mbuf_buf_addr(elts[i], rxq->mp); > + void *buf_addr; > > + /* > + * Load the virtual address for Rx WQE. non-x86 processors > + * (mostly RISC such as ARM and Power) are more vulnerable to > + * load stall. For x86, reducing the number of instructions > + * seems to matter most. > + */ > +#ifdef RTE_ARCH_X86_64 > + buf_addr = elts[i]->buf_addr; > + assert(buf_addr == rte_mbuf_buf_addr(elts[i], rxq->mp)); > +#else > + buf_addr = rte_mbuf_buf_addr(elts[i], rxq->mp); > assert(buf_addr == elts[i]->buf_addr); > +#endif > wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr + > RTE_PKTMBUF_HEADROOM); > /* If there's only one MR, no need to replace LKey in WQE. */ >