[dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in macswap loop
> -Original Message- > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Bruce Richardson > Sent: Tuesday, May 03, 2016 11:20 AM > To: Ivan Boule > Cc: Jerin Jacob; dev at dpdk.org; De Lara Guarch, Pablo > Subject: Re: [dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in > macswap loop > > On Tue, May 03, 2016 at 11:50:31AM +0200, Ivan Boule wrote: > > On 05/03/2016 11:45 AM, Bruce Richardson wrote: > > >On Mon, May 02, 2016 at 05:29:37PM +0530, Jerin Jacob wrote: > > >>prefetch the next packet data address in advance in macswap loop > > >>for performance improvement. > > >> > > >>... > > >> for (i = 0; i < nb_rx; i++) { > > >>+ if (likely(i < nb_rx - 1)) > > >>+ rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1], > > >>+void *)); > > > > > >At least on IA platforms, there is no issue with prefetching beyond the > > >end of > > >the array, since it's only a hint to the cpu. If this is true for other > > >platforms, > > >then I suggest we just drop the conditional and just always prefetch. > > > > This is an interesting point. > > Bruce, are you suggesting that prefetching at an invalid [virtual] address > > won't trigger a CPU exception? > > > > Yep. For example, adding "rte_prefetch0(NULL)" at the start of main in testpmd > causes no ill effects when running the app. > One correction - while on IA prefetch(inval_addr) wouldn't cause any functional problems, it still might cause DTLB miss and can be a source of noticeable performance degradation. So it is better to avoid such constructions for performance critical code. Konstantin
[dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in macswap loop
On Mon, May 02, 2016 at 05:48:02PM +, De Lara Guarch, Pablo wrote: > Hi Jerin, > > > -Original Message- > > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com] > > Sent: Monday, May 02, 2016 1:00 PM > > To: dev at dpdk.org > > Cc: De Lara Guarch, Pablo; Jerin Jacob > > Subject: [dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in > > macswap loop > > > > prefetch the next packet data address in advance in macswap loop > > for performance improvement. > > > > Signed-off-by: Jerin Jacob > > --- > > app/test-pmd/macswap.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c > > index 154889d..c10f4b5 100644 > > --- a/app/test-pmd/macswap.c > > +++ b/app/test-pmd/macswap.c > > @@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs) > > if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ) > > ol_flags |= PKT_TX_QINQ_PKT; > > for (i = 0; i < nb_rx; i++) { > > + if (likely(i < nb_rx - 1)) > > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1], > > + void *)); > > mb = pkts_burst[i]; > > eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *); > > > > -- > > 2.1.0 > > This looks good. Could you also add it in the other forwarding modes (the > ones that make changes in the packets)? OK Pablo. I will add the similar logic in the following forwarding modes in testpmd. macswap macfwd macfwd-retry csumonly icmpecho /Jerin > > Thanks, > Pablo
[dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in macswap loop
On 05/03/2016 11:45 AM, Bruce Richardson wrote: > On Mon, May 02, 2016 at 05:29:37PM +0530, Jerin Jacob wrote: >> prefetch the next packet data address in advance in macswap loop >> for performance improvement. >> >> ... >> for (i = 0; i < nb_rx; i++) { >> +if (likely(i < nb_rx - 1)) >> +rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1], >> + void *)); > > At least on IA platforms, there is no issue with prefetching beyond the end of > the array, since it's only a hint to the cpu. If this is true for other > platforms, > then I suggest we just drop the conditional and just always prefetch. This is an interesting point. Bruce, are you suggesting that prefetching at an invalid [virtual] address won't trigger a CPU exception? Ivan > > /Bruce > >> mb = pkts_burst[i]; >> eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *);
[dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in macswap loop
On Tue, May 03, 2016 at 11:50:31AM +0200, Ivan Boule wrote: > On 05/03/2016 11:45 AM, Bruce Richardson wrote: > >On Mon, May 02, 2016 at 05:29:37PM +0530, Jerin Jacob wrote: > >>prefetch the next packet data address in advance in macswap loop > >>for performance improvement. > >> > >>... > >>for (i = 0; i < nb_rx; i++) { > >>+ if (likely(i < nb_rx - 1)) > >>+ rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1], > >>+ void *)); > > > >At least on IA platforms, there is no issue with prefetching beyond the end > >of > >the array, since it's only a hint to the cpu. If this is true for other > >platforms, > >then I suggest we just drop the conditional and just always prefetch. > > This is an interesting point. > Bruce, are you suggesting that prefetching at an invalid [virtual] address > won't trigger a CPU exception? > Yep. For example, adding "rte_prefetch0(NULL)" at the start of main in testpmd causes no ill effects when running the app. /Bruce
[dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in macswap loop
On Tue, May 03, 2016 at 10:48:34AM +0100, De Lara Guarch, Pablo wrote: > > > > -Original Message- > > From: Richardson, Bruce > > Sent: Tuesday, May 03, 2016 10:45 AM > > To: Jerin Jacob > > Cc: dev at dpdk.org; De Lara Guarch, Pablo > > Subject: Re: [dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in > > macswap loop > > > > On Mon, May 02, 2016 at 05:29:37PM +0530, Jerin Jacob wrote: > > > prefetch the next packet data address in advance in macswap loop > > > for performance improvement. > > > > > > Signed-off-by: Jerin Jacob > > > --- > > > app/test-pmd/macswap.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c > > > index 154889d..c10f4b5 100644 > > > --- a/app/test-pmd/macswap.c > > > +++ b/app/test-pmd/macswap.c > > > @@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs) > > > if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ) > > > ol_flags |= PKT_TX_QINQ_PKT; > > > for (i = 0; i < nb_rx; i++) { > > > + if (likely(i < nb_rx - 1)) > > > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1], > > > +void *)); > > > > At least on IA platforms, there is no issue with prefetching beyond the end > > of > > the array, since it's only a hint to the cpu. If this is true for other > > platforms, > > then I suggest we just drop the conditional and just always prefetch. > > That's what I thought when I saw this patch, but the problem is that the > prefetch is not for pkts_burst, > but for rte_pktmbuf_mtod(pkts_burst...), so there must be a limit in nb_rx - > 2, or there will be a seg fault. > > Pablo Good point, Pablo, I missed that subtlety here. /Bruce > > > > /Bruce > > > > > mb = pkts_burst[i]; > > > eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *); > > > > > > -- > > > 2.1.0 > > >
[dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in macswap loop
On Mon, May 02, 2016 at 05:29:37PM +0530, Jerin Jacob wrote: > prefetch the next packet data address in advance in macswap loop > for performance improvement. > > Signed-off-by: Jerin Jacob > --- > app/test-pmd/macswap.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c > index 154889d..c10f4b5 100644 > --- a/app/test-pmd/macswap.c > +++ b/app/test-pmd/macswap.c > @@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs) > if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ) > ol_flags |= PKT_TX_QINQ_PKT; > for (i = 0; i < nb_rx; i++) { > + if (likely(i < nb_rx - 1)) > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1], > +void *)); At least on IA platforms, there is no issue with prefetching beyond the end of the array, since it's only a hint to the cpu. If this is true for other platforms, then I suggest we just drop the conditional and just always prefetch. /Bruce > mb = pkts_burst[i]; > eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *); > > -- > 2.1.0 >
[dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in macswap loop
> -Original Message- > From: Richardson, Bruce > Sent: Tuesday, May 03, 2016 10:45 AM > To: Jerin Jacob > Cc: dev at dpdk.org; De Lara Guarch, Pablo > Subject: Re: [dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in > macswap loop > > On Mon, May 02, 2016 at 05:29:37PM +0530, Jerin Jacob wrote: > > prefetch the next packet data address in advance in macswap loop > > for performance improvement. > > > > Signed-off-by: Jerin Jacob > > --- > > app/test-pmd/macswap.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c > > index 154889d..c10f4b5 100644 > > --- a/app/test-pmd/macswap.c > > +++ b/app/test-pmd/macswap.c > > @@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs) > > if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ) > > ol_flags |= PKT_TX_QINQ_PKT; > > for (i = 0; i < nb_rx; i++) { > > + if (likely(i < nb_rx - 1)) > > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1], > > + void *)); > > At least on IA platforms, there is no issue with prefetching beyond the end of > the array, since it's only a hint to the cpu. If this is true for other > platforms, > then I suggest we just drop the conditional and just always prefetch. That's what I thought when I saw this patch, but the problem is that the prefetch is not for pkts_burst, but for rte_pktmbuf_mtod(pkts_burst...), so there must be a limit in nb_rx - 2, or there will be a seg fault. Pablo > > /Bruce > > > mb = pkts_burst[i]; > > eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *); > > > > -- > > 2.1.0 > >
[dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in macswap loop
Hi Jerin, > -Original Message- > From: Jerin Jacob [mailto:jerin.jacob at caviumnetworks.com] > Sent: Monday, May 02, 2016 1:00 PM > To: dev at dpdk.org > Cc: De Lara Guarch, Pablo; Jerin Jacob > Subject: [dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in > macswap loop > > prefetch the next packet data address in advance in macswap loop > for performance improvement. > > Signed-off-by: Jerin Jacob > --- > app/test-pmd/macswap.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c > index 154889d..c10f4b5 100644 > --- a/app/test-pmd/macswap.c > +++ b/app/test-pmd/macswap.c > @@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs) > if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ) > ol_flags |= PKT_TX_QINQ_PKT; > for (i = 0; i < nb_rx; i++) { > + if (likely(i < nb_rx - 1)) > + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1], > +void *)); > mb = pkts_burst[i]; > eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *); > > -- > 2.1.0 This looks good. Could you also add it in the other forwarding modes (the ones that make changes in the packets)? Thanks, Pablo
[dpdk-dev] [PATCH] app/testpmd: add packet data prefetch in macswap loop
prefetch the next packet data address in advance in macswap loop for performance improvement. Signed-off-by: Jerin Jacob --- app/test-pmd/macswap.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c index 154889d..c10f4b5 100644 --- a/app/test-pmd/macswap.c +++ b/app/test-pmd/macswap.c @@ -113,6 +113,9 @@ pkt_burst_mac_swap(struct fwd_stream *fs) if (txp->tx_ol_flags & TESTPMD_TX_OFFLOAD_INSERT_QINQ) ol_flags |= PKT_TX_QINQ_PKT; for (i = 0; i < nb_rx; i++) { + if (likely(i < nb_rx - 1)) + rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i + 1], + void *)); mb = pkts_burst[i]; eth_hdr = rte_pktmbuf_mtod(mb, struct ether_hdr *); -- 2.1.0