Am 9/26/2024 um 6:12 PM schrieb Linus Torvalds:
On Thu, 26 Sept 2024 at 08:54, Jonas Oberhauser
<jonas.oberhau...@huaweicloud.com> wrote:

No, the issue introduced by the compiler optimization (or by your
original patch) is that the CPU can speculatively load from the first
pointer as soon as it has completed the load of that pointer:

You mean the compiler can do it.

What I mean is that if we only use rcu_dereference for the second load (and not either some form of compiler barrier or an acquire load), then the compiler can transform the second program from my previous e-mail (which if mapped 1:1 to hardware would be correct because hardware ensures the ordering based on the address dependency) into the first one (which is incorrect).

In particular, the compiler can change

 if (node == node2) t = *node2;

into

 if (node == node2) t = *node;

and then the CPU can speculatively read *node before knowing the value of node2.

The compiler can also speculatively read *node in this case, but that is not what I meant.

The code in Mathieu's original patch is already like the latter one and is broken even if the compiler does not do any optimizations.


The inline asm has no impact on what
the CPU does. The conditional isn't a barrier for the actual hardware.
But once the compiler doesn't try to do it, the data dependency on the
address does end up being an ordering constraint on the hardware too

Exactly. The inline asm would prevent the compiler from doing the transformation though, which would mean that the address dependency appears in the final compiler output.

Just use a barrier.  Or make sure to use the proper ordered memory
accesses when possible.
>
Don't use an inline asm for the compare - we
don't even have anything insane like that as a portable helper, and we
shouldn't have it.

I'm glad you say that :))

I would also just use a barrier before returing the pointer.

Boqun seems to be unhappy with a barrier though, because it would theoretically also forbid unrelated optimizations. But I have not seen any evidence that there are any unrelated optimizations going on in the first place that would be forbidden by this.

Have fun,
  jonas


Reply via email to