On 2024-09-27 18:44, Linus Torvalds wrote:
On Thu, 26 Sept 2024 at 18:38, Boqun Feng <boqun.f...@gmail.com> wrote:

Note that ADDRESS_EQ() only hide first parameter, so this should be 
ADDRESS_EQ(b, a).

Yeah, please stop making things unnecessarily complicated.

Just use a barrier(). Please. Stop these stupid games until you can
show why it matters.

The barrier() is ineffective at fixing the issue.
It does not prevent the compiler CSE from losing the
address dependency:

int fct_2_volatile_barriers(void)
{
    int *a, *b;

    do {
        a = READ_ONCE(p);
        asm volatile ("" : : : "memory");
        b = READ_ONCE(p);
    } while (a != b);
    asm volatile ("" : : : "memory");  <----- where you would like your barrier
    return *b;
}

With gcc 14.2 (arm64):

fct_2_volatile_barriers:
        adrp    x0, .LANCHOR0
        add     x0, x0, :lo12:.LANCHOR0
.L2:
        ldr     x1, [x0]    <------ x1 populated by first load.
        ldr     x2, [x0]
        cmp     x1, x2
        bne     .L2
        ldr     w0, [x1]    <------ x1 is used for access which should depend 
on b.
        ret

On weakly-ordered architectures, this lets CPU speculation
use the result from the first load to speculate
"ldr w0, [x1]" before "ldr x2, [x0]".
Based on the RCU documentation, the control dependency
does not prevent the CPU from speculating loads.

There are a few ways to fix this:

- Compile everything with -fno-cse-follow-jumps.
- Make the compiler unaware of the relationship between the
  address equality and address-dependent use of b. This can
  be done either using ADDRESS_EQ() or asm goto.

I prefer ADDRESS_EQ() because it is arch-agnostic. I don't
care that it adds one more register movement instruction.

And by "why it matters" I mean "major difference in code generation",
not some "it uses one more register and has to spill" kind of small
detail.

Why it matters is because gcc generates code that does not
preserve address dependency of the second READ_ONCE().

At this point, I'm not even convinced the whole hazard pointer
approach makes sense. And you're not helping by making it more
complicated than it needs to be.

I'm preparing a small series that aims to show how a minimal
hazard pointer implementation can help improve common scenarios:

- Guarantee object existence on pointer dereference to increment
  a reference count:
  - replace locking used for that purpose in drivers (e.g. usb),
  - replace RCU + inc_not_zero pattern,
- rtmutex: I suspect we can improve situations where locks need
  to be taken in reverse dependency chain order by guaranteeing
  existence of first and second locks in traversal order,
  allowing them to be locked in the correct order (which is
  reverse from traversal order) rather than try-lock+retry on
  nested lock. This can be done with hazard pointers without
  requiring object reclaim to be delayed by an RCU grace period.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com


Reply via email to