Hi Vladislav, > -----Original Message----- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vladyslav Buslov > Sent: Monday, October 10, 2016 6:06 PM > To: Wu, Jingjing <jingjing.wu at intel.com>; Yigit, Ferruh <ferruh.yigit at > intel.com>; Zhang, Helin <helin.zhang at intel.com> > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] [PATCH] net/i40e: add additional prefetch > instructions for bulk rx > > > -----Original Message----- > > From: Wu, Jingjing [mailto:jingjing.wu at intel.com] > > Sent: Monday, October 10, 2016 4:26 PM > > To: Yigit, Ferruh; Vladyslav Buslov; Zhang, Helin > > Cc: dev at dpdk.org > > Subject: RE: [dpdk-dev] [PATCH] net/i40e: add additional prefetch > > instructions for bulk rx > > > > > > > > > -----Original Message----- > > > From: Yigit, Ferruh > > > Sent: Wednesday, September 14, 2016 9:25 PM > > > To: Vladyslav Buslov <vladyslav.buslov at harmonicinc.com>; Zhang, Helin > > > <helin.zhang at intel.com>; Wu, Jingjing <jingjing.wu at intel.com> > > > Cc: dev at dpdk.org > > > Subject: Re: [dpdk-dev] [PATCH] net/i40e: add additional prefetch > > > instructions for bulk rx > > > > > > On 7/14/2016 6:27 PM, Vladyslav Buslov wrote: > > > > Added prefetch of first packet payload cacheline in > > > > i40e_rx_scan_hw_ring Added prefetch of second mbuf cacheline in > > > > i40e_rx_alloc_bufs > > > > > > > > Signed-off-by: Vladyslav Buslov <vladyslav.buslov at harmonicinc.com> > > > > --- > > > > drivers/net/i40e/i40e_rxtx.c | 7 +++++-- > > > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/net/i40e/i40e_rxtx.c > > > > b/drivers/net/i40e/i40e_rxtx.c index d3cfb98..e493fb4 100644 > > > > --- a/drivers/net/i40e/i40e_rxtx.c > > > > +++ b/drivers/net/i40e/i40e_rxtx.c > > > > @@ -1003,6 +1003,7 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue > > *rxq) > > > > /* Translate descriptor info to mbuf parameters */ > > > > for (j = 0; j < nb_dd; j++) { > > > > mb = rxep[j].mbuf; > > > > + rte_prefetch0(RTE_PTR_ADD(mb->buf_addr, > > > RTE_PKTMBUF_HEADROOM)); > > > > Why did prefetch here? I think if application need to deal with packet, it > > is > > more suitable to put it in application. > > > > > > qword1 = rte_le_to_cpu_64(\ > > > > rxdp[j].wb.qword1.status_error_len); > > > > pkt_len = ((qword1 & > > > I40E_RXD_QW1_LENGTH_PBUF_MASK) >> > > > > @@ -1086,9 +1087,11 @@ i40e_rx_alloc_bufs(struct i40e_rx_queue > > *rxq) > > > > > > > > rxdp = &rxq->rx_ring[alloc_idx]; > > > > for (i = 0; i < rxq->rx_free_thresh; i++) { > > > > - if (likely(i < (rxq->rx_free_thresh - 1))) > > > > + if (likely(i < (rxq->rx_free_thresh - 1))) { > > > > /* Prefetch next mbuf */ > > > > - rte_prefetch0(rxep[i + 1].mbuf); > > > > + rte_prefetch0(&rxep[i + 1].mbuf->cacheline0); > > > > + rte_prefetch0(&rxep[i + 1].mbuf->cacheline1);
I think there are rte_mbuf_prefetch_part1/part2 defined in rte_mbuf.h, specially for that case. > > > > + } > > Agree with this change. And when I test it by testpmd with iofwd, no > > performance increase is observed but minor decrease. > > Can you share will us when it will benefit the performance in your scenario > > ? > > > > > > Thanks > > Jingjing > > Hello Jingjing, > > Thanks for code review. > > My use case: We have simple distributor thread that receives packets from > port and distributes them among worker threads according to > VLAN and MAC address hash. > > While working on performance optimization we determined that most of CPU > usage of this thread is in DPDK. > As and optimization we decided to switch to rx burst alloc function, however > that caused additional performance degradation compared to > scatter rx mode. > In profiler two major culprits were: > 1. Access to packet data Eth header in application code. (cache miss) > 2. Setting next packet descriptor field to NULL in DPDK i40e_rx_alloc_bufs > code. (this field is in second descriptor cache line that was not > prefetched) I wonder what will happen if we'll remove any prefetches here? Would it make things better or worse (and by how much)? > After applying my fixes performance improved compared to scatter rx mode. > > I assumed that prefetch of first cache line of packet data belongs to DPDK > because it is done in scatter rx mode. (in > i40e_recv_scattered_pkts) > It can be moved to application side but IMO it is better to be consistent > across all rx modes. I would agree with Jingjing here, probably PMD should avoid to prefetch packet's data. Konstantin