On 25/03/2019 19:13, Yongseok Koh wrote:
> When replenishing mbufs on Rx, buffer address (mbuf->buf_addr) should be
> loaded. non-x86 processors (mostly RISC such as ARM and Power) are more
> vulnerable to load stall. For x86, reducing the number of instructions
> seems to matter most.
> 
> For x86, this is simply a load but for other architectures, it is
> calculated from the address of mbuf structure by rte_mbuf_buf_addr()
> without having to load the first cacheline of the mbuf.
> 

Hi Yongseok,

> Fixes: 12d468a62bc1 ("net/mlx5: fix instruction hotspot on replenishing Rx 
> buffer")

A similar backport was just added into 18.11.1-RC2, should it be
reverted? I'm not keen to put another fix for it in for 18.11.1 at this
stage, I think it can be part of 18.11.2. WDYT?

thanks,
Kevin.

> Cc: sta...@dpdk.org
> 
> Signed-off-by: Yongseok Koh <ys...@mellanox.com>
> ---
>  drivers/net/mlx5/mlx5_rxtx_vec.h | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h 
> b/drivers/net/mlx5/mlx5_rxtx_vec.h
> index 5df8e291e6..4220b08dd2 100644
> --- a/drivers/net/mlx5/mlx5_rxtx_vec.h
> +++ b/drivers/net/mlx5/mlx5_rxtx_vec.h
> @@ -102,9 +102,21 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq, 
> uint16_t n)
>               return;
>       }
>       for (i = 0; i < n; ++i) {
> -             void *buf_addr = rte_mbuf_buf_addr(elts[i], rxq->mp);
> +             void *buf_addr;
>  
> +             /*
> +              * Load the virtual address for Rx WQE. non-x86 processors
> +              * (mostly RISC such as ARM and Power) are more vulnerable to
> +              * load stall. For x86, reducing the number of instructions
> +              * seems to matter most.
> +              */
> +#ifdef RTE_ARCH_X86_64
> +             buf_addr = elts[i]->buf_addr;
> +             assert(buf_addr == rte_mbuf_buf_addr(elts[i], rxq->mp));
> +#else
> +             buf_addr = rte_mbuf_buf_addr(elts[i], rxq->mp);
>               assert(buf_addr == elts[i]->buf_addr);
> +#endif
>               wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
>                                             RTE_PKTMBUF_HEADROOM);
>               /* If there's only one MR, no need to replace LKey in WQE. */
> 

Reply via email to