Hi, Ruifeng > -----Original Message----- > From: Ruifeng Wang <ruifeng.w...@arm.com> > Sent: Monday, June 27, 2022 14:08 > To: Slava Ovsiienko <viachesl...@nvidia.com>; Ali Alnubani > <alia...@nvidia.com>; Matan Azrad <ma...@nvidia.com> > Cc: dev@dpdk.org; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; > sta...@dpdk.org; nd <n...@arm.com>; nd <n...@arm.com> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON > vector path > > > -----Original Message----- > > From: Slava Ovsiienko <viachesl...@nvidia.com> > > Sent: Monday, June 20, 2022 1:38 PM > > To: Ali Alnubani <alia...@nvidia.com>; Ruifeng Wang > > <ruifeng.w...@arm.com>; Matan Azrad <ma...@nvidia.com> > > Cc: dev@dpdk.org; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; > > sta...@dpdk.org; nd <n...@arm.com> > > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON > > vector path > > > > Hi, Ruifeng > > Hi Slava, > > Thanks for your review. > > > > My apologies for review delay. > > Apologies too. I was on something else. > > > As far I understand the hypothetical problem scenario is: > > - CPU core reorders reading of qwords of 16B vector > > - core reads the second 8B of CQE (old CQE values) > > - CQE update > > - core reads the first 8B of CQE (new CQE values) > > Yes, This is the problem. > > > > How the re-reading of CQEs can resolve the issue? > > This wrong scenario might happen on the second read and we would run > > into the same issue. > > Here we are trying to ordering reading of a 16B vector (8B with op_own - > high, and 8B without op_own - low). > The first read will load 16B. The second read will load and update low > 8B (no op_own). OK, I got the point, thank you for the explanations. Can we avoid the first reading of low 8B (no containing CQE owning field)?
I mean to update this part to read only upper 8Bs: /* B.0 (CQE 3) load a block having op_own. */ c3 = vld1q_u64((uint64_t *)(p3 + 48)); /* B.0 (CQE 2) load a block having op_own. */ c2 = vld1q_u64((uint64_t *)(p2 + 48)); /* B.0 (CQE 1) load a block having op_own. */ c1 = vld1q_u64((uint64_t *)(p1 + 48)); /* B.0 (CQE 0) load a block having op_own. */ c0 = vld1q_u64((uint64_t *)(p0 + 48)); /* Synchronize for loading the rest of blocks. */ rte_io_rmb(); Because lower 8Bs will be overlapped with the second read (in your patch) and barrier ensures the correct order. With best regards, Slava