> -----Original Message----- > From: Slava Ovsiienko <[email protected]> > Sent: Friday, July 2, 2021 3:06 PM > To: Ruifeng Wang <[email protected]>; Raslan Darawsheh > <[email protected]>; Matan Azrad <[email protected]>; Shahaf Shuler > <[email protected]> > Cc: [email protected]; [email protected]; nd <[email protected]>; Honnappa > Nagarahalli <[email protected]> > Subject: RE: [PATCH 2/2] net/mlx5: reduce unnecessary memory access > > Hi, Ruifeng > > Could we go further and implement loop inside the conditional? > Like this: > if (mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) > 1) { > for (i = 0; i < n; ++i) { > void *buf_addr = elts[i]->buf_addr; > > wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr + > RTE_PKTMBUF_HEADROOM); > wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]); > } > } else { > for (i = 0; i < n; ++i) { > void *buf_addr = elts[i]->buf_addr; > > wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr + > RTE_PKTMBUF_HEADROOM); > } > } > What do you think? Agree. Loop inside the conditional should be more efficient.
> Also, we should check the performance on other archs is not affected. I will also test on x86 platform that I have. > > With best regards, > Slava > > > -----Original Message----- > > From: Ruifeng Wang <[email protected]> > > Sent: Tuesday, June 1, 2021 11:31 > > To: Raslan Darawsheh <[email protected]>; Matan Azrad > > <[email protected]>; Shahaf Shuler <[email protected]>; Slava > > Ovsiienko <[email protected]> > > Cc: [email protected]; [email protected]; [email protected]; > > [email protected]; Ruifeng Wang <[email protected]> > > Subject: [PATCH 2/2] net/mlx5: reduce unnecessary memory access > > > > MR btree len is a constant during Rx replenish. > > Moved retrieve of the value out of loop to reduce data loads. > > Slight performance uplift was measured on N1SDP. > > > > Signed-off-by: Ruifeng Wang <[email protected]> > > --- > > drivers/net/mlx5/mlx5_rxtx_vec.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c > > b/drivers/net/mlx5/mlx5_rxtx_vec.c > > index d5af2d91ff..fc7e2a7f41 100644 > > --- a/drivers/net/mlx5/mlx5_rxtx_vec.c > > +++ b/drivers/net/mlx5/mlx5_rxtx_vec.c > > @@ -95,6 +95,7 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data > > *rxq) > > volatile struct mlx5_wqe_data_seg *wq = > > &((volatile struct mlx5_wqe_data_seg *)rxq- > >wqes)[elts_idx]; > > unsigned int i; > > + uint16_t btree_len; > > > > if (n >= rxq->rq_repl_thresh) { > > MLX5_ASSERT(n >= > > MLX5_VPMD_RXQ_RPLNSH_THRESH(q_n)); > > @@ -106,6 +107,8 @@ mlx5_rx_replenish_bulk_mbuf(struct > mlx5_rxq_data > > *rxq) > > rxq->stats.rx_nombuf += n; > > return; > > } > > + > > + btree_len = mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh); > > for (i = 0; i < n; ++i) { > > void *buf_addr; > > > > @@ -119,8 +122,7 @@ mlx5_rx_replenish_bulk_mbuf(struct > mlx5_rxq_data > > *rxq) > > wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr > + > > > > RTE_PKTMBUF_HEADROOM); > > /* If there's a single MR, no need to replace LKey. */ > > - if (unlikely(mlx5_mr_btree_len(&rxq- > > >mr_ctrl.cache_bh) > > - > 1)) > > + if (unlikely(btree_len > 1)) > > wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]); > > } > > rxq->rq_ci += n; > > -- > > 2.25.1

