Hi, Ruifeng
> -----Original Message-----
> From: Ruifeng Wang <[email protected]>
> Sent: Monday, June 27, 2022 14:08
> To: Slava Ovsiienko <[email protected]>; Ali Alnubani
> <[email protected]>; Matan Azrad <[email protected]>
> Cc: [email protected]; Honnappa Nagarahalli <[email protected]>;
> [email protected]; nd <[email protected]>; nd <[email protected]>
> Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> vector path
>
> > -----Original Message-----
> > From: Slava Ovsiienko <[email protected]>
> > Sent: Monday, June 20, 2022 1:38 PM
> > To: Ali Alnubani <[email protected]>; Ruifeng Wang
> > <[email protected]>; Matan Azrad <[email protected]>
> > Cc: [email protected]; Honnappa Nagarahalli <[email protected]>;
> > [email protected]; nd <[email protected]>
> > Subject: RE: [PATCH] net/mlx5: fix risk in Rx descriptor read in NEON
> > vector path
> >
> > Hi, Ruifeng
>
> Hi Slava,
>
> Thanks for your review.
> >
> > My apologies for review delay.
>
> Apologies too. I was on something else.
>
> > As far I understand the hypothetical problem scenario is:
> > - CPU core reorders reading of qwords of 16B vector
> > - core reads the second 8B of CQE (old CQE values)
> > - CQE update
> > - core reads the first 8B of CQE (new CQE values)
>
> Yes, This is the problem.
> >
> > How the re-reading of CQEs can resolve the issue?
> > This wrong scenario might happen on the second read and we would run
> > into the same issue.
>
> Here we are trying to ordering reading of a 16B vector (8B with op_own -
> high, and 8B without op_own - low).
> The first read will load 16B. The second read will load and update low
> 8B (no op_own).
OK, I got the point, thank you for the explanations.
Can we avoid the first reading of low 8B (no containing CQE owning field)?
I mean to update this part to read only upper 8Bs:
/* B.0 (CQE 3) load a block having op_own. */
c3 = vld1q_u64((uint64_t *)(p3 + 48));
/* B.0 (CQE 2) load a block having op_own. */
c2 = vld1q_u64((uint64_t *)(p2 + 48));
/* B.0 (CQE 1) load a block having op_own. */
c1 = vld1q_u64((uint64_t *)(p1 + 48));
/* B.0 (CQE 0) load a block having op_own. */
c0 = vld1q_u64((uint64_t *)(p0 + 48));
/* Synchronize for loading the rest of blocks. */
rte_io_rmb();
Because lower 8Bs will be overlapped with the second read (in your patch)
and barrier ensures the correct order.
With best regards,
Slava