Hi Jerin,
> -----Original Message----- > From: Jerin Jacob <[email protected]> > Sent: Monday, March 9, 2020 3:36 PM > To: Gavin Hu <[email protected]> > Cc: dpdk-dev <[email protected]>; nd <[email protected]>; David Marchand > <[email protected]>; [email protected]; [email protected]; > Ye, Xiaolong <[email protected]>; Honnappa Nagarahalli > <[email protected]>; Ruifeng Wang > <[email protected]>; Phil Yang <[email protected]>; Joyce Kong > <[email protected]>; Steve Capper <[email protected]> > Subject: Re: [dpdk-dev] [PATCH v1 3/3] net/i40e: auto-vectorization to speed > up Tx free > > On Sat, Mar 7, 2020 at 8:34 PM Gavin Hu <[email protected]> wrote: > > > > Hi Jerin, > > > > > -----Original Message----- > > > From: Jerin Jacob <[email protected]> > > > Sent: Friday, March 6, 2020 3:45 PM > > > To: Gavin Hu <[email protected]> > > > Cc: dpdk-dev <[email protected]>; nd <[email protected]>; David Marchand > > > <[email protected]>; [email protected]; > > > [email protected]; Ye, Xiaolong <[email protected]>; Honnappa > > > Nagarahalli <[email protected]>; Ruifeng Wang > > > <[email protected]>; Phil Yang <[email protected]>; Joyce Kong > > > <[email protected]>; Steve Capper <[email protected]> > > > Subject: Re: [dpdk-dev] [PATCH v1 3/3] net/i40e: auto-vectorization to > > > speed up Tx free > > > > > > On Fri, Mar 6, 2020 at 10:35 AM Gavin Hu <[email protected]> wrote: > > > > > > > > Tx mbuf free is a hotspot for i40e on aarch64, as there are no > > > > inter-loop dependencies, it is safe to enable auto-vectorization > > > > to speed up. > > > > > > > > This patch showed 2~3% performance lift on ThunderX2 and no > > > degradation > > > > on Arm N1SDP. The test case is single core RFC2544 zero-loss test. > > > > > > > > Signed-off-by: Gavin Hu <[email protected]> > > > > Reviewed-by: Steve Capper <[email protected]> > > > > --- > > > > drivers/net/i40e/i40e_rxtx_vec_common.h | 5 +++++ > > > > 1 file changed, 5 insertions(+) > > > > > > > > diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h > > > b/drivers/net/i40e/i40e_rxtx_vec_common.h > > > > index 0e6ffa007..fc0fa45d4 100644 > > > > --- a/drivers/net/i40e/i40e_rxtx_vec_common.h > > > > +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h > > > > @@ -98,6 +98,11 @@ i40e_tx_free_bufs(struct i40e_tx_queue *txq) > > > > if (likely(m != NULL)) { > > > > free[0] = m; > > > > nb_free = 1; > > > > +#if defined(__clang__) > > > > +#pragma clang loop vectorize(assume_safety) > > > > +#elif defined(__GNUC__) > > > > +#pragma GCC ivdep > > > > +#endif > > > > > > IMO, It is better to abstract the compiler features (above compiler > > > feature and __restrict__) as macros in > > > rte_common.h or so. It will help to support other compilers(ICC or > > > Windows) and enable them to have "changes" in one place. > > > > How about defining RTE_LOOP_AUTO_VECTORIZATION in the > rte_common.h? > > Other compiler stuff in rte_common.h are starting with __rte in small > letter(__rte_packed, __rte_unused) etc. > I think, a better name would be __rte_loop_auto_vectorize or so. > No strong opinion for the name though. > > # Probably it is worth checking and add performance result of x86 > testing in git commit as well as it > is common code. Okay, I will do it. > > > > #if defined(__clang__) > > define RTE_LOOP_AUTO_VECTORIZATION \ > > #pragma clang loop vectorize(assume_safety) > > #elif defined(__GNUC__) > > define RTE_LOOP_AUTO_VECTORIZATION \ > > #pragma GCC ivdep > > #else > > define RTE_LOOP_AUTO_VECTORIZATION > > #endif > > If you agree, I will submit a v2. Thanks for your comments! > > /Gavin > > > > > > > > > > > > > for (i = 1; i < n; i++) { > > > > m = rte_pktmbuf_prefree_seg(txep[i].mbuf); > > > > if (likely(m != NULL)) { > > > > -- > > > > 2.17.1 > > > >

