The lack of this prefetch causes a significant performance drop in OVS-DPDK: 13.3 Mpps instead of 14 when forwarding 64 byte packets. Even though OVS prefetches the next packet's header before it starts processing the current one, it doesn't get there fast enough. This aligns with the behaviour of other receive functions.
Signed-off-by: Zoltan Kiss <zoltan.kiss at linaro.org> --- diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c b/drivers/net/ixgbe/ixgbe_rxtx_vec.c index cf25a53..51299fa 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c @@ -502,6 +502,15 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm_storeu_si128((void *)&rx_pkts[pos]->rx_descriptor_fields1, pkt_mb1); + rte_packet_prefetch((char*)(rx_pkts[pos]->buf_addr) + + RTE_PKTMBUF_HEADROOM); + rte_packet_prefetch((char*)(rx_pkts[pos + 1]->buf_addr) + + RTE_PKTMBUF_HEADROOM); + rte_packet_prefetch((char*)(rx_pkts[pos + 2]->buf_addr) + + RTE_PKTMBUF_HEADROOM); + rte_packet_prefetch((char*)(rx_pkts[pos + 3]->buf_addr) + + RTE_PKTMBUF_HEADROOM); + /* C.4 calc avaialbe number of desc */ var = __builtin_popcountll(_mm_cvtsi128_si64(staterr)); nb_pkts_recd += var;