On Tue, May 27, 2025 at 5:01 AM Akihiko Odaki <akihiko.od...@daynix.com> wrote: > I'd like to submit it with "[PATCH v4 05/11] qemu-thread: Avoid futex > abstraction for non-Linux" because it aligns the implementations of > Linux and non-Linux versions to rely on a store-release of EV_SET in > qemu_event_set().
Ok, I see what you mean - you would like the xchg to be an xchg_release essentially. There is actually one case in which skipping the xchg has an effect. If you have the following: - one side does s.foo = 1; qemu_event_set(&s.ev); - the other side never reaches the qemu_event_reset(&s.ev) then skipping the xchg might allow the cacheline for ev to remain shared. This is unlikely to *make* a difference, though it does *exist* as a difference, so I will review the patch, but I really prefer to place it last. It's safer to take a known-working algorithm, apply it to all OSes (or at least Linux and Windows), and only then you refine it. It also makes my queue shorter. > > Do you think it's incorrect? I'll wait for your answer before sending > > out the actual pull request. > > It's correct, but I don't think it's worthwhile. > > This code path is only used by platforms without a futex wrapper. > Currently we only have one for Linux and this series adds one for > Windows, but FreeBSD[1] and OpenBSD[2] have their own futex. macOS also > gained one with version 14.4.[3] We can add wrappers for them too if > their performance really matters. > So the only platforms listed in docs/about/build-platforms.rst that > require the non-futex version are macOS older than 14.4 and NetBSD. > macOS older than 14.4 will not be supported after June 5 since macOS 14 > was released June 5, 2023 and docs/about/build-platforms.rst says: > > There are too few relevant platforms to justify the effort potentially > needed for quality assurance. Ok, nice. So it's really just NetBSD in the end. > Moreover, qemu_event_reset() is often followed by qemu_event_wait() or > other barriers so probably relaxing ordering here does not affect the > overall ordering constraint (and performance) much. Understood. For me it wasn't really about performance, but more about understanding exactly which reorderings can happen and what synchronizes with what. Load-acquire/store-release are simpler to understand in that respect, especially since this use of condvar, without the mutex in reset, is different from everything else that I've ever seen. Paolo