Hello,

Recently I tried to use bulk rx function to reduce CPU usage of 
rte_eth_rx_burst.
However application performance with i40e_recv_pkts_bulk_alloc was 
significantly worse than with i40e_recv_pkts. (3m less PPS, 0.5 IPC on 
receiving core)

Quick investigation revealed two problems:
 - First payload cacheline is prefetched in i40e_recv_pkts but not in 
i40e_recv_pkts_bulk_alloc.
 - Only first line of next mbuf is prefetched during mbuf init in 
i40e_rx_alloc_bufs. This causes cache miss at setting 'next' field from mbuf 
cacheline1 to NULL.

Fixing these two small issues significantly reduced CPU time spent in 
rte_eth_rx_burst and improved PPS compared to both original 
i40e_recv_pkts_bulk_alloc and i40e_recv_pkts.

Regards,

Vladyslav Buslov (1):
  net/i40e: add additional prefetch instructions for bulk rx

 drivers/net/i40e/i40e_rxtx.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.8.3

Reply via email to