> -----Original Message----- > From: Slava Ovsiienko <viachesl...@nvidia.com> > Sent: Wednesday, June 29, 2022 3:55 PM > To: Ruifeng Wang <ruifeng.w...@arm.com>; Ali Alnubani > <alia...@nvidia.com>; Matan Azrad <ma...@nvidia.com> > Cc: dev@dpdk.org; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; nd <n...@arm.com>; > nd <n...@arm.com> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON vector > path > > Hi, Ruifeng > > > -----Original Message----- > > From: Ruifeng Wang <ruifeng.w...@arm.com> > > Sent: Monday, June 27, 2022 14:08 > > To: Slava Ovsiienko <viachesl...@nvidia.com>; Ali Alnubani > > <alia...@nvidia.com>; Matan Azrad <ma...@nvidia.com> > > Cc: dev@dpdk.org; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com>; > > sta...@dpdk.org; nd <n...@arm.com>; nd <n...@arm.com> > > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON > > vector path > > > > > -----Original Message----- > > > From: Slava Ovsiienko <viachesl...@nvidia.com> > > > Sent: Monday, June 20, 2022 1:38 PM > > > To: Ali Alnubani <alia...@nvidia.com>; Ruifeng Wang > > > <ruifeng.w...@arm.com>; Matan Azrad <ma...@nvidia.com> > > > Cc: dev@dpdk.org; Honnappa Nagarahalli > > > <honnappa.nagaraha...@arm.com>; sta...@dpdk.org; nd <n...@arm.com> > > > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in > > > NEON vector path > > > > > > Hi, Ruifeng > > > > Hi Slava, > > > > Thanks for your review. > > > > > > My apologies for review delay. > > > > Apologies too. I was on something else. > > > > > As far I understand the hypothetical problem scenario is: > > > - CPU core reorders reading of qwords of 16B vector > > > - core reads the second 8B of CQE (old CQE values) > > > - CQE update > > > - core reads the first 8B of CQE (new CQE values) > > > > Yes, This is the problem. > > > > > > How the re-reading of CQEs can resolve the issue? > > > This wrong scenario might happen on the second read and we would run > > > into the same issue. > > > > Here we are trying to ordering reading of a 16B vector (8B with op_own > > - high, and 8B without op_own - low). > > The first read will load 16B. The second read will load and update low > > 8B (no op_own). > OK, I got the point, thank you for the explanations. > Can we avoid the first reading of low 8B (no containing CQE owning field)? > > I mean to update this part to read only upper 8Bs: > /* B.0 (CQE 3) load a block having op_own. */ > c3 = vld1q_u64((uint64_t *)(p3 + 48)); > /* B.0 (CQE 2) load a block having op_own. */ > c2 = vld1q_u64((uint64_t *)(p2 + 48)); > /* B.0 (CQE 1) load a block having op_own. */ > c1 = vld1q_u64((uint64_t *)(p1 + 48)); > /* B.0 (CQE 0) load a block having op_own. */ > c0 = vld1q_u64((uint64_t *)(p0 + 48)); > /* Synchronize for loading the rest of blocks. */ > rte_io_rmb(); > > Because lower 8Bs will be overlapped with the second read (in your patch) > and barrier ensures the correct order.
Hi Slava, Yes, your suggestion is valid. Actually, I tried that approach: load higher 8B + barrier + load lower 8B + combine the two 8Bs into a vector. It also has no observable performance impact but generates more instructions compared to the current patch (the 'combine' operation). So I followed current approach. Thanks. > > > With best regards, > Slava