> -----Original Message----- > From: Ananyev, Konstantin > Sent: Monday, April 4, 2016 21:05 > To: Kulasek, TomaszX <tomaszx.kulasek at intel.com> > Cc: dev at dpdk.org > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc 5.x > > > > > -----Original Message----- > > From: Kulasek, TomaszX > > Sent: Monday, April 04, 2016 5:20 PM > > To: Ananyev, Konstantin > > Cc: dev at dpdk.org > > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc > > 5.x > > > > Hi Konstantin, > > > > > -----Original Message----- > > > From: Ananyev, Konstantin > > > Sent: Monday, April 4, 2016 17:35 > > > To: Kulasek, TomaszX <tomaszx.kulasek at intel.com> > > > Cc: dev at dpdk.org > > > Subject: RE: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with > > > gcc 5.x > > > > > > Hi Tomasz, > > > > > > > -----Original Message----- > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tomasz > > > > Kulasek > > > > Sent: Monday, April 04, 2016 3:45 PM > > > > To: dev at dpdk.org > > > > Subject: [dpdk-dev] [PATCH] examples/l3fwd: fix segfault with gcc > > > > 5.x > > > > > > > > It seems that with gcc >5.x and -O2/-O3 optimization breaks packet > > > > grouping algorithm. > > > > > > > > When last packet pointer "lp" and "pnum->u64" buffer points the > > > > same memory buffer, high optimization can cause unpredictable > > > > results. It seems that assignment of precalculated group sizes may > > > > interfere with initialization of new group size when lp points > > > > value inside current group and didn't should be changed. > > > > > > > > With gcc >5.x and optimization we cannot be sure which assignment > > > > will be done first, so the group size can be counted incorrectly. > > > > > > > > This patch eliminates intersection of assignment of initial group > > > > size (lp[0] = 1) and precalculated group sizes when gptbl[v].idx < > 4. > > > > > > > > Fixes: 94c54b4158d5 ("examples/l3fwd: rework exact-match") > > > > > > > > Signed-off-by: Tomasz Kulasek <tomaszx.kulasek at intel.com> > > > > --- > > > > examples/l3fwd/l3fwd_sse.h | 4 ++-- > > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/examples/l3fwd/l3fwd_sse.h > > > > b/examples/l3fwd/l3fwd_sse.h index f9cf50a..1afa1f0 100644 > > > > --- a/examples/l3fwd/l3fwd_sse.h > > > > +++ b/examples/l3fwd/l3fwd_sse.h > > > > @@ -283,9 +283,9 @@ port_groupx4(uint16_t pn[FWDSTEP + 1], > > > > uint16_t *lp, __m128i dp1, __m128i dp2) > > > > > > > > /* if dest port value has changed. */ > > > > if (v != GRPMSK) { > > > > - lp = pnum->u16 + gptbl[v].idx; > > > > - lp[0] = 1; > > > > pnum->u64 = gptbl[v].pnum; > > > > + pnum->u16[FWDSTEP] = 1; > > > > > > Hmm, but FWDSTEP and gptbl[v].idx are not always equal. > > > Actually could you explain a bit more - what exactly is reordered by > > > gcc 5.x, and how to reproduce it? > > > i.e what sequence of input packets will trigger an error? > > > Konstantin > > > > > > > + lp = pnum->u16 + gptbl[v].idx; > > > > } > > > > > > > > return lp; > > > > -- > > > > 1.7.9.5 > > > > > > Eg. For this case, when group is changed: > > > > { > > /* 0xb: a == b, b == c, c != d, d == e */ > > .pnum = UINT64_C(0x0002000100020003), > > .idx = 3, > > .lpv = 2, > > }, > > > > We expect: > > > > pnum->u16 = { 3, 2, 1, 2, x } > > lp = pnum->u16 + 3; > > // should be lp[0] == 2 > > > > but for gcc 5.2 > > > > lp = pnum->u16 + gptbl[v].idx; > > lp[0] = 1; > > pnum->u64 = gptbl[v].pnum; > > > > gives, for some reason lp[0] == 1, even if pnum->u16[3] == 2. > > > > It causes, that group is shorter and fails trying to send next group > with messy length. > > > > We should set lp[0] = 1 only when needed (gptbl[v].idx == 4), so this > > is why I set pnum->u16[4] = 1. I set it up always to prevent condition. > For idx < 4 we don't need to set lp[0]. > > > > The problem is that both pointers operates on the same memory buffer > and, it seems like gcc optimization will produce (it is wrong): > > > > lp = pnum->u16 + gptbl[v].idx; > > pnum->u64 = gptbl[v].pnum; > > lp[0] = 1; > > > > except: > > > > lp = pnum->u16 + gptbl[v].idx; > > lp[0] = 1; > > pnum->u64 = gptbl[v].pnum; > > > > This issue is with gcc 5.x and application seems to fail for the > patterns where gptbl[v].idx < 4. > > > Thanks for explanation Tomasz. > So it reordered: > lp[0] = 1; > pnum->u64 = gptbl[v].pnum; > correct? > My first thought was to insert a rte_complier_barrier() between these two > lines, but actually your approach looks cleaner. > Konstantin
Yes.