On Tue, May 27, 2025 at 5:01 AM Akihiko Odaki <akihiko.od...@daynix.com> wrote:
> I'd like to submit it with "[PATCH v4 05/11] qemu-thread: Avoid futex
> abstraction for non-Linux" because it aligns the implementations of
> Linux and non-Linux versions to rely on a store-release of EV_SET in
> qemu_event_set().

Ok, I see what you mean - you would like the xchg to be an
xchg_release essentially.

There is actually one case in which skipping the xchg has an effect.
If you have the following:

- one side does

  s.foo = 1;
  qemu_event_set(&s.ev);

- the other side never reaches the qemu_event_reset(&s.ev)

then skipping the xchg might allow the cacheline for ev to remain
shared. This is unlikely to *make* a difference, though it does
*exist* as a difference, so I will review the patch, but I really
prefer to place it last.  It's safer to take a known-working
algorithm, apply it to all OSes (or at least Linux and Windows), and
only then you refine it. It also makes my queue shorter.

> > Do you think it's incorrect?  I'll wait for your answer before sending
> > out the actual pull request.
>
> It's correct, but I don't think it's worthwhile.
>
> This code path is only used by platforms without a futex wrapper.
> Currently we only have one for Linux and this series adds one for
> Windows, but FreeBSD[1] and OpenBSD[2] have their own futex. macOS also
> gained one with version 14.4.[3] We can add wrappers for them too if
> their performance really matters.
> So the only platforms listed in docs/about/build-platforms.rst that
> require the non-futex version are macOS older than 14.4 and NetBSD.
> macOS older than 14.4 will not be supported after June 5 since macOS 14
> was released June 5, 2023 and docs/about/build-platforms.rst says:
>
> There are too few relevant platforms to justify the effort potentially
> needed for quality assurance.

Ok, nice.  So it's really just NetBSD in the end.

> Moreover, qemu_event_reset() is often followed by qemu_event_wait() or
> other barriers so probably relaxing ordering here does not affect the
> overall ordering constraint (and performance) much.

Understood.  For me it wasn't really about performance, but more about
understanding exactly which reorderings can happen and what
synchronizes with what. Load-acquire/store-release are simpler to
understand in that respect, especially since this use of condvar,
without the mutex in reset, is different from everything else that
I've ever seen.

Paolo


Reply via email to