On Fri, Sep 27, 2024 at 03:20:40AM +0200, Mathieu Desnoyers wrote:
> On 2024-09-26 18:12, Linus Torvalds wrote:
> > On Thu, 26 Sept 2024 at 08:54, Jonas Oberhauser
> > <jonas.oberhau...@huaweicloud.com> wrote:
> > > 
> > > No, the issue introduced by the compiler optimization (or by your
> > > original patch) is that the CPU can speculatively load from the first
> > > pointer as soon as it has completed the load of that pointer:
> > 
> > You mean the compiler can do it. The inline asm has no impact on what
> > the CPU does. The conditional isn't a barrier for the actual hardware.
> > But once the compiler doesn't try to do it, the data dependency on the
> > address does end up being an ordering constraint on the hardware too
> > (I'm happy to say that I haven't heard from the crazies that want
> > value prediction in a long time).
> > 
> > Just use a barrier.  Or make sure to use the proper ordered memory
> > accesses when possible. Don't use an inline asm for the compare - we
> > don't even have anything insane like that as a portable helper, and we
> > shouldn't have it.
> 
> How does the compiler barrier help in any way here ?
> 
> I am concerned about the compiler SSA GVN (Global Value Numbering)
> optimizations, and I don't think a compiler barrier solves anything.
> (or I'm missing something obvious)

I think you're right, a compiler barrier doesn't help here:

        head = READ_ONCE(p);
        smp_mb();
        WRITE_ONCE(*slot, head);

        ptr = READ_ONCE(p);
        if (ptr != head) {
                ...
        } else {
                barrier();
                return ptr;
        }

compilers can replace 'ptr' with 'head' because of the equality, and
even putting barrier() here cannot prevent compiler to rewrite the else
branch into:

        else {
                barrier();
                return head;
        }

because that's just using a different register, unrelated to memory
accesses.

Jonas, am I missing something subtle? Or this is different than what you
proposed?

Regards,
Boqun

> 
> I was concerned about the suggestion from Jonas to use "node2"
> rather than "node" after the equality check as a way to ensure
> the intended register is used to return the pointer, because after
> the SSA GVN optimization pass, AFAIU this won't help in any way.
> I have a set of examples below that show gcc use the result of the
> first load, and clang use the result of the second load (on
> both x86-64 and aarch64). Likewise when a load-acquire is used as
> second load, which I find odd. Hopefully mixing this optimization
> from gcc with speculation still abide by the memory model.
> 
> Only the asm goto approach ensures that gcc uses the result from
> the second load.
> 
[...]

Reply via email to