> On Mar 27, 2019, at 4:51 AM, Kevin Traynor <ktray...@redhat.com> wrote: > > On 25/03/2019 19:13, Yongseok Koh wrote: >> When replenishing mbufs on Rx, buffer address (mbuf->buf_addr) should be >> loaded. non-x86 processors (mostly RISC such as ARM and Power) are more >> vulnerable to load stall. For x86, reducing the number of instructions >> seems to matter most. >> >> For x86, this is simply a load but for other architectures, it is >> calculated from the address of mbuf structure by rte_mbuf_buf_addr() >> without having to load the first cacheline of the mbuf. >> > > Hi Yongseok, > >> Fixes: 12d468a62bc1 ("net/mlx5: fix instruction hotspot on replenishing Rx >> buffer") > > A similar backport was just added into 18.11.1-RC2, should it be > reverted? I'm not keen to put another fix for it in for 18.11.1 at this > stage, I think it can be part of 18.11.2. WDYT?
I spoke with Kevin and we decided to drop the old fix. I have also dropped it from 17.11.6-rc1. This new fix will be merged to 18.11.2. I'll merge it to 17.11.6 (or 17.11.7) if it is merged in the master. thanks, Yongseok >> Cc: sta...@dpdk.org >> >> Signed-off-by: Yongseok Koh <ys...@mellanox.com> >> --- >> drivers/net/mlx5/mlx5_rxtx_vec.h | 14 +++++++++++++- >> 1 file changed, 13 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.h >> b/drivers/net/mlx5/mlx5_rxtx_vec.h >> index 5df8e291e6..4220b08dd2 100644 >> --- a/drivers/net/mlx5/mlx5_rxtx_vec.h >> +++ b/drivers/net/mlx5/mlx5_rxtx_vec.h >> @@ -102,9 +102,21 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data *rxq, >> uint16_t n) >> return; >> } >> for (i = 0; i < n; ++i) { >> - void *buf_addr = rte_mbuf_buf_addr(elts[i], rxq->mp); >> + void *buf_addr; >> >> + /* >> + * Load the virtual address for Rx WQE. non-x86 processors >> + * (mostly RISC such as ARM and Power) are more vulnerable to >> + * load stall. For x86, reducing the number of instructions >> + * seems to matter most. >> + */ >> +#ifdef RTE_ARCH_X86_64 >> + buf_addr = elts[i]->buf_addr; >> + assert(buf_addr == rte_mbuf_buf_addr(elts[i], rxq->mp)); >> +#else >> + buf_addr = rte_mbuf_buf_addr(elts[i], rxq->mp); >> assert(buf_addr == elts[i]->buf_addr); >> +#endif >> wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr + >> RTE_PKTMBUF_HEADROOM); >> /* If there's only one MR, no need to replace LKey in WQE. */ >> >