On Fri, Apr 10, 2020 at 01:05:08AM +0200, Andrzej Ostruszka wrote: > On 4/5/20 10:56 AM, jer...@marvell.com wrote: > > From: Nithin Dabilpuram <ndabilpu...@marvell.com> > > > > Add source rte_node ethdev_rx process function and register > > it. This node is a source node that will be called periodically > > and when called, performs rte_eth_rx_burst() on a specific > > (port, queue) pair and enqueue them as stream of objects to > > next node. > > > > Signed-off-by: Nithin Dabilpuram <ndabilpu...@marvell.com> > > Signed-off-by: Pavan Nikhilesh <pbhagavat...@marvell.com> > > Signed-off-by: Kiran Kumar K <kirankum...@marvell.com> > [...] > > +/* Callback for soft ptype parsing */ > > +static uint16_t > > +eth_pkt_parse_cb(uint16_t port, uint16_t queue, struct rte_mbuf **mbufs, > > + uint16_t nb_pkts, uint16_t max_pkts, void *user_param) > > +{ > > + struct rte_mbuf *mbuf0, *mbuf1, *mbuf2, *mbuf3; > > + struct rte_ether_hdr *eth_hdr; > > + uint16_t etype, n_left; > > + struct rte_mbuf **pkts; > > + > > + RTE_SET_USED(port); > > + RTE_SET_USED(queue); > > + RTE_SET_USED(max_pkts); > > + RTE_SET_USED(user_param); > > + > > + pkts = mbufs; > > + n_left = nb_pkts; > > + while (n_left >= 12) { > > + > > + /* Prefetch next-next mbufs */ > > + rte_prefetch0(pkts[8]); > > + rte_prefetch0(pkts[9]); > > + rte_prefetch0(pkts[10]); > > + rte_prefetch0(pkts[11]); > > + > > + /* Prefetch next mbuf data */ > > + rte_prefetch0( > > + rte_pktmbuf_mtod(pkts[4], struct rte_ether_hdr *)); > > + rte_prefetch0( > > + rte_pktmbuf_mtod(pkts[5], struct rte_ether_hdr *)); > > + rte_prefetch0( > > + rte_pktmbuf_mtod(pkts[6], struct rte_ether_hdr *)); > > + rte_prefetch0( > > + rte_pktmbuf_mtod(pkts[7], struct rte_ether_hdr *)); > > I know this is software fallback only (and not likely to be used) but is > this aggressive prefetching always beneficial? I guess you tested this > on octeon and it works, but if this is supposed to be standard RX node > then maybe this is not always good? > > On the other hand if other platforms find that detrimental they can > submit some improvements later :)
I tested it now in octeon and there is 6% increasing when using prefetch based while loop instead of non-prefetch based while loop. Yes, it is not intended to be used normally, but I just followed ideology of prefetch ahead before use. It could be changed later if needed for other platforms or split into platform dependent implementation like lookup node. > > > + > > + mbuf0 = pkts[0]; > > + mbuf1 = pkts[1]; > > + mbuf2 = pkts[2]; > > + mbuf3 = pkts[3]; > > + pkts += 4; > > + n_left -= 4; > > + > > + /* Extract ptype of mbuf0 */ > > + eth_hdr = rte_pktmbuf_mtod(mbuf0, struct rte_ether_hdr *); > > + etype = eth_hdr->ether_type; > > + mbuf0->packet_type = l3_ptype(etype, 0); > > + > > + /* Extract ptype of mbuf1 */ > > + eth_hdr = rte_pktmbuf_mtod(mbuf1, struct rte_ether_hdr *); > > + etype = eth_hdr->ether_type; > > + mbuf1->packet_type = l3_ptype(etype, 0); > > + > > + /* Extract ptype of mbuf2 */ > > + eth_hdr = rte_pktmbuf_mtod(mbuf2, struct rte_ether_hdr *); > > + etype = eth_hdr->ether_type; > > + mbuf2->packet_type = l3_ptype(etype, 0); > > + > > + /* Extract ptype of mbuf3 */ > > + eth_hdr = rte_pktmbuf_mtod(mbuf3, struct rte_ether_hdr *); > > + etype = eth_hdr->ether_type; > > + mbuf3->packet_type = l3_ptype(etype, 0); > > + } > > + > > + while (n_left > 0) { > > + mbuf0 = pkts[0]; > > + > > + pkts += 1; > > + n_left -= 1; > > + > > + /* Extract ptype of mbuf0 */ > > + eth_hdr = rte_pktmbuf_mtod(mbuf0, struct rte_ether_hdr *); > > + etype = eth_hdr->ether_type; > > + mbuf0->packet_type = l3_ptype(etype, 0); > > + } > > + > > + return nb_pkts; > > +} > [...] > > With regards > Andrzej Ostruszka