[PATCH] libstdc++: Future-proof C++20 atomic wait/notify

Jonathan Wakely Sat, 15 Nov 2025 16:55:16 -0800

This will allow us to extend atomic waiting functions to support a
possible future 64-bit version of futex, as well as supporting
futex-like wait/wake primitives on other targets (e.g. macOS has
os_sync_wait_on_address and FreeBSD has _umtx_op).


Before this change, the decision of whether to do a proxy wait or to
wait on the atomic variable itself was made in the header at
compile-time, which makes it an ABI property that would not have been
possible to change later.  That would have meant that
std::atomic<uint64_t> would always have to do a proxy wait even if Linux
gains support for 64-bit futex2(2) calls at some point in the future.
The disadvantage of proxy waits is that several distinct atomic objects
can share the same proxy state, leading to contention between threads
even when they are not waiting on the same atomic object, similar to
false sharing. It also result in spurious wake-ups because doing a
notify on an atomic object that uses a proxy wait will wake all waiters
sharing the proxy.

For types that are known to definitely not need a proxy wait (e.g. int
on Linux) the header can still choose a more efficient path at
compile-time. But for other types, the decision of whether to do a proxy
wait is deferred to runtime, inside the library internals. This will
make it possible for future versions of libstdc++.so to extend the set
of types which don't need to use proxy waits, without ABI changes.

The way the change works is to stop using the __proxy_wait flag that was
set by the inline code in the headers. Instead the __wait_args struct
has an extra pointer member which the library internals populate with
either the address of the atomic object or the _M_ver counter in the
proxy state. There is also a new _M_obj_size member which stores the
size of the atomic object, so that the library can decide whether a
proxy is needed. So for example if linux gains 64-bit futex support then
the library can decide not to use a proxy when _M_obj_size == 8.
Finally, the _M_old member of the __wait_args struct is changed to
uint64_t so that it has room to store 64-bit values, not just whatever
size the __platform_wait_t type is (which is a 32-bit int on Linux).
Similarly, the _M_val member of __wait_result_type changes to uint64_t
too.

libstdc++-v3/ChangeLog:

        * config/abi/pre/gnu.ver: Adjust exports.
        * include/bits/atomic_timed_wait.h (_GLIBCXX_HAVE_PLATFORM_TIMED_WAIT):
        Do not define this macro.
        (__atomic_wait_address_until_v, __atomic_wait_address_for_v):
        Guard assertions with #ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT.
        * include/bits/atomic_wait.h (__platform_wait_uses_type):
        Different separately for platforms with and without platform
        wait.
        (_GLIBCXX_HAVE_PLATFORM_WAIT): Do not define this macro.
        (_GLIBCXX_UNKNOWN_PLATFORM_WAIT): Define new macro.
        (__wait_value_type): New typedef.
        (__wait_result_type): Change _M_val to __wait_value_type.
        (__wait_args_base::_M_old): Change to __wait_args_base.
        (__wait_args_base::_M_obg, __wait_args_base::_M_obj_size): New
        data members.
        (__wait_args::__wait_args): Set _M_obj and _M_obj_size on
        construction.
        (__wait_args::_M_setup_wait): Change void* parameter to deduced
        type. Use _S_bit_cast instead of __builtin_bit_cast.
        (__wait_args::_M_load_proxy_wait_val): Remove function, replace
        with ...
        (__wait_args::_M_setup_wait_impl): New function.
        (__wait_args::_S_bit_cast): Wrapper for __builtin_bit_cast which
        also supports conversion from 32-bit values.
        (__wait_args::_S_flags_for): Do not set __proxy_wait flag.
        (__atomic_wait_address_v): Guard assertions with #ifdef
        _GLIBCXX_UNKNOWN_PLATFORM_WAIT.
        * src/c++20/atomic.cc (_GLIBCXX_HAVE_PLATFORM_WAIT): Define here
        instead of in header. Check _GLIBCXX_HAVE_PLATFORM_WAIT instead
        of _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT.
        (__spin_impl): Adjust for 64-bit __wait_args_base::_M_old.
        (use_proxy_wait): New function.
        (__wait_args::_M_load_proxy_wait_val): Replace with ...
        (__wait_args::_M_setup_wait_impl): New function. Call
        use_proxy_wait to decide at runtime whether to wait on the
        pointer directly instead of using a proxy. If a proxy is needed,
        set _M_obj to point to its _M_ver member. Adjust for change to
        type of _M_old.
        (__wait_impl): Wait on _M_obj unconditionally.
        (__notify_impl): Call use_proxy_wait to decide whether to notify
        on the address parameter or a proxy
        (__spin_until_impl): Adjust for change to type of _M_val.
        (__wait_until_impl): Wait on _M_obj unconditionally.
---

Tested x86_64-linux, powerpc64le-linux, sparc-solaris.

I think this is an imporant change which I unfortunately didn't think of
until recently.

This changes the exports from the shared library, but we're still in
stage 1 so I think that should be allowed (albeit unfortunate). Nobody
should be expecting GCC 16 to be stable yet.

The __proxy_wait enumerator is now unused and could be removed. The
__abi_version enumerator could also be bumped to indicate the
incompatibility with earlier snapshots of GCC 16, but I don't think that
is needed. We could in theory keep the old symbol export
(__wait_args::_M_load_proxy_wait) and make it trap/abort if called, but
I'd prefer to just remove it and cause dynamic linker errors instead.

There's a TODO in the header about which types should be allowed to take
the optimized paths (see the __waitable concept). For types where that's
true, if the size matches a futex then we'll use a futex, even if it's
actually an enum or floating-point type (or pointer on 32-bit targets).
I'm not sure if that's safe.


 libstdc++-v3/config/abi/pre/gnu.ver           |   3 +-
 libstdc++-v3/include/bits/atomic_timed_wait.h |  12 +-
 libstdc++-v3/include/bits/atomic_wait.h       | 109 +++++++++-----
 libstdc++-v3/src/c++20/atomic.cc              | 140 +++++++++++-------
 4 files changed, 166 insertions(+), 98 deletions(-)

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 2e48241d51f9..3c2bd4921730 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -2553,7 +2553,8 @@ GLIBCXX_3.4.35 {
     _ZNSt8__detail11__wait_implEPKvRNS_16__wait_args_baseE;
     _ZNSt8__detail13__notify_implEPKvbRKNS_16__wait_args_baseE;
     
_ZNSt8__detail17__wait_until_implEPKvRNS_16__wait_args_baseERKNSt6chrono8durationI[lx]St5ratioIL[lx]1EL[lx]1000000000EEEE;
-    _ZNSt8__detail11__wait_args22_M_load_proxy_wait_valEPKv;
+    _ZNSt8__detail11__wait_args18_M_setup_wait_implEPKv;
+    _ZNSt8__detail11__wait_args20_M_setup_notify_implEPKv;
 
     # std::chrono::gps_clock::now, tai_clock::now
     _ZNSt6chrono9gps_clock3nowEv;
diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h 
b/libstdc++-v3/include/bits/atomic_timed_wait.h
index 30f7ff616840..918a267d10eb 100644
--- a/libstdc++-v3/include/bits/atomic_timed_wait.h
+++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
@@ -75,14 +75,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
          return chrono::ceil<__w_dur>(__atime);
       }
 
-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-#define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
-#else
-// define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement __platform_wait_until
-// if there is a more efficient primitive supported by the platform
-// (e.g. __ulock_wait) which is better than pthread_cond_clockwait.
-#endif // ! HAVE_LINUX_FUTEX
-
     __wait_result_type
     __wait_until_impl(const void* __addr, __wait_args_base& __args,
                      const __wait_clock_t::duration& __atime);
@@ -156,7 +148,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
                                  const chrono::time_point<_Clock, _Dur>& 
__atime,
                                  bool __bare_wait = false) noexcept
     {
-#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT
       __glibcxx_assert(false); // This function can't be used for proxy wait.
 #endif
       __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
@@ -208,7 +200,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
                                const chrono::duration<_Rep, _Period>& __rtime,
                                bool __bare_wait = false) noexcept
     {
-#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT
       __glibcxx_assert(false); // This function can't be used for proxy wait.
 #endif
       __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index 95151479c120..49369419d6a6 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -45,35 +45,34 @@
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
+#if defined _GLIBCXX_HAVE_LINUX_FUTEX
   namespace __detail
   {
-#ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-#define _GLIBCXX_HAVE_PLATFORM_WAIT 1
     using __platform_wait_t = int;
     inline constexpr size_t __platform_wait_alignment = 4;
+  }
+  template<typename _Tp>
+    inline constexpr bool __platform_wait_uses_type
+      = is_scalar_v<_Tp> && sizeof(_Tp) == sizeof(int) && alignof(_Tp) >= 4;
 #else
+# define _GLIBCXX_UNKNOWN_PLATFORM_WAIT 1
 // define _GLIBCX_HAVE_PLATFORM_WAIT and implement __platform_wait()
 // and __platform_notify() if there is a more efficient primitive supported
 // by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better than
 // a mutex/condvar based wait.
+  namespace __detail
+  {
 # if ATOMIC_LONG_LOCK_FREE == 2
     using __platform_wait_t = unsigned long;
 # else
     using __platform_wait_t = unsigned int;
 # endif
     inline constexpr size_t __platform_wait_alignment
-      = __alignof__(__platform_wait_t);
-#endif
+      = sizeof(__platform_wait_t) < __alignof__(__platform_wait_t)
+         ? __alignof__(__platform_wait_t) : sizeof(__platform_wait_t);
   } // namespace __detail
-
-  template<typename _Tp>
-    inline constexpr bool __platform_wait_uses_type
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-      = is_scalar_v<_Tp>
-       && ((sizeof(_Tp) == sizeof(__detail::__platform_wait_t))
-       && (alignof(_Tp) >= __detail::__platform_wait_alignment));
-#else
-      = false;
+  template<typename>
+    inline constexpr bool __platform_wait_uses_type = false;
 #endif
 
   namespace __detail
@@ -105,10 +104,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
        return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0;
       }
 
-    // lightweight std::optional<__platform_wait_t>
+    // TODO: this needs to be false for types with padding, e.g. __int20.
+    // TODO: should this be true only for integral, enum, and pointer types?
+    template<typename _Tp>
+      concept __waitable
+       = is_scalar_v<_Tp> && (sizeof(_Tp) <= sizeof(__UINT64_TYPE__));
+
+    // Storage for up to 64 bits of value, should be considered opaque bits.
+    using __wait_value_type = __UINT64_TYPE__;
+
+    // lightweight std::optional<__wait_value_type>
     struct __wait_result_type
     {
-      __platform_wait_t _M_val;
+      __wait_value_type _M_val;
       unsigned char _M_has_val : 1; // _M_val value was loaded before return.
       unsigned char _M_timeout : 1; // Waiting function ended with timeout.
       unsigned char _M_unused : 6;  // padding
@@ -143,8 +151,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     {
       __wait_flags _M_flags;
       int _M_order = __ATOMIC_ACQUIRE;
-      __platform_wait_t _M_old = 0;
+      __wait_value_type _M_old{};
       void* _M_wait_state = nullptr;
+      const void* _M_obj = nullptr;  // The address of the object to wait on.
+      unsigned char _M_obj_size = 0; // The size of that object.
 
       // Test whether _M_flags & __flags is non-zero.
       bool
@@ -162,36 +172,48 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
        explicit
        __wait_args(const _Tp* __addr, bool __bare_wait = false) noexcept
        : __wait_args_base{ _S_flags_for(__addr, __bare_wait) }
-       { }
+       {
+         _M_obj = __addr; // Might be replaced by _M_setup_wait
+         if constexpr (__waitable<_Tp>)
+           // __wait_impl might be able to wait directly on __addr
+           // instead of using a proxy, depending on its size.
+           _M_obj_size = sizeof(_Tp);
+       }
 
       __wait_args(const __platform_wait_t* __addr, __platform_wait_t __old,
                  int __order, bool __bare_wait = false) noexcept
-      : __wait_args_base{ _S_flags_for(__addr, __bare_wait), __order, __old }
-      { }
+      : __wait_args(__addr, __bare_wait)
+      {
+       _M_order = __order;
+       _M_old = __old;
+      }
 
       __wait_args(const __wait_args&) noexcept = default;
       __wait_args& operator=(const __wait_args&) noexcept = default;
 
-      template<typename _ValFn,
-              typename _Tp = decay_t<decltype(std::declval<_ValFn&>()())>>
+      template<typename _Tp, typename _ValFn>
        _Tp
-       _M_setup_wait(const void* __addr, _ValFn __vfn,
+       _M_setup_wait(const _Tp* __addr, _ValFn __vfn,
                      __wait_result_type __res = {})
        {
+         static_assert(is_same_v<_Tp, decay_t<decltype(__vfn())>>);
+
          if constexpr (__platform_wait_uses_type<_Tp>)
            {
-             // If the wait is not proxied, the value we check when waiting
-             // is the value of the atomic variable itself.
+             // If we know for certain that this type can be waited on
+             // efficiently using something like a futex syscall,
+             // then we can avoid the overhead of _M_setup_wait_impl
+             // and just load the value and store it into _M_old.
 
-             if (__res._M_has_val) // The previous wait loaded a recent value.
+             if (__res._M_has_val) // A previous wait loaded a recent value.
                {
                  _M_old = __res._M_val;
-                 return __builtin_bit_cast(_Tp, __res._M_val);
+                 return _S_bit_cast<_Tp>(_M_old);
                }
              else // Load the value from __vfn
                {
-                 _Tp __val = __vfn();
-                 _M_old = __builtin_bit_cast(__platform_wait_t, __val);
+                 auto __val = __vfn();
+                 _M_old = _S_bit_cast<__wait_value_type>(__val);
                  return __val;
                }
            }
@@ -199,16 +221,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
            {
              if (__res._M_has_val) // The previous wait loaded a recent value.
                _M_old = __res._M_val;
-             else // Load _M_ver from the proxy (must happen before __vfn()).
-               _M_load_proxy_wait_val(__addr);
+             else // Let the library decide how to setup the wait.
+               {
+                 // Set _M_obj to the address to be waited on (either __addr
+                 // or a proxy) and load its current value into _M_old.
+                 _M_setup_wait_impl(__addr);
+               }
              return __vfn();
            }
        }
 
     private:
-      // Populates _M_wait_state and _M_old from the proxy for __addr.
+      template<typename _To, typename _From>
+       [[__gnu__::__always_inline__]]
+       static _To
+       _S_bit_cast(_From __from)
+       {
+         if constexpr (sizeof(_To) == sizeof(_From))
+           return __builtin_bit_cast(_To, __from);
+         else if constexpr (sizeof(_From) == sizeof(__UINT32_TYPE__))
+           return __builtin_bit_cast(__UINT32_TYPE__, __from);
+         else
+           __builtin_unreachable();
+       }
+
+      // Populates _M_wait_state and _M_old appropriately for __addr.
       void
-      _M_load_proxy_wait_val(const void* __addr);
+      _M_setup_wait_impl(const void* __addr);
 
       template<typename _Tp>
        static constexpr __wait_flags
@@ -218,8 +257,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
          __wait_flags __res = __abi_version | __do_spin;
          if (!__bare_wait)
            __res |= __track_contention;
-         if constexpr (!__platform_wait_uses_type<_Tp>)
-           __res |= __proxy_wait;
+         // if constexpr (!__platform_wait_uses_type<_Tp>)
+         // __res |= __proxy_wait;
          return __res;
        }
     };
@@ -255,7 +294,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
                          __detail::__platform_wait_t __old,
                          int __order, bool __bare_wait = false)
   {
-#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
+#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT
     __glibcxx_assert(false); // This function can't be used for proxy wait.
 #endif
     __detail::__wait_args __args{ __addr, __old, __order, __bare_wait };
diff --git a/libstdc++-v3/src/c++20/atomic.cc b/libstdc++-v3/src/c++20/atomic.cc
index e280045b619d..d5fbd10ed11f 100644
--- a/libstdc++-v3/src/c++20/atomic.cc
+++ b/libstdc++-v3/src/c++20/atomic.cc
@@ -38,14 +38,7 @@
 # include <syscall.h>
 # include <bits/functexcept.h>
 # include <sys/time.h>
-#endif
-
-#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
-# ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
-// __waitable_state assumes that we consistently use the same implementation
-// (i.e. futex vs mutex+condvar) for timed and untimed waiting.
-#  error "This configuration is not currently supported"
-# endif
+# define _GLIBCXX_HAVE_PLATFORM_WAIT 1
 #endif
 
 #pragma GCC diagnostic ignored "-Wmissing-field-initializers"
@@ -107,7 +100,7 @@ namespace
     // Count of threads blocked waiting on this state.
     alignas(_S_align) __platform_wait_t _M_waiters = 0;
 
-#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
     mutex _M_mtx;
 
     // This type meets the Cpp17BasicLockable requirements.
@@ -123,7 +116,7 @@ namespace
     // use this for waiting and notifying functions instead.
     alignas(_S_align) __platform_wait_t _M_ver = 0;
 
-#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
     __condvar _M_cv;
 #endif
 
@@ -216,17 +209,19 @@ namespace
   __spin_impl(const __platform_wait_t* __addr, const __wait_args_base& __args)
   {
     __platform_wait_t __val{};
+    __wait_value_type wval;
     for (auto __i = 0; __i < __atomic_spin_count; ++__i)
       {
        __atomic_load(__addr, &__val, __args._M_order);
-       if (__val != __args._M_old)
-         return { ._M_val = __val, ._M_has_val = true, ._M_timeout = false };
+       wval = static_cast<__wait_value_type>(__val);
+       if (wval != __args._M_old)
+         return { ._M_val = wval, ._M_has_val = true, ._M_timeout = false };
        if (__i < __atomic_spin_count_relax)
          __thread_relax();
        else
          __thread_yield();
       }
-    return { ._M_val = __val, ._M_has_val = true, ._M_timeout = true };
+    return { ._M_val = wval, ._M_has_val = true, ._M_timeout = true };
   }
 
   inline __waitable_state*
@@ -237,32 +232,58 @@ namespace
     return static_cast<__waitable_state*>(args._M_wait_state);
   }
 
+  [[gnu::always_inline]]
+  inline bool
+  use_proxy_wait(const __wait_args_base& args)
+  {
+    if constexpr (__platform_wait_uses_type<uint32_t>)
+      if (args._M_obj_size == sizeof(uint32_t))
+       return false;
+
+    if constexpr (__platform_wait_uses_type<uint64_t>)
+      if (args._M_obj_size == sizeof(uint64_t))
+       return false;
+
+    // Use proxy wait for everything else:
+    return true;
+  }
+
 } // namespace
 
-// Called for a proxy wait
+// Set _M_wait_state if using proxy wait, or caller wants contention tracking.
+// Set _M_obj to &_M_wait_state->_M_ver if using proxy wait.
+// Load the current value from _M_obj and store in _M_val.
 void
-__wait_args::_M_load_proxy_wait_val(const void* addr)
+__wait_args::_M_setup_wait_impl(const void* addr)
 {
-  // __glibcxx_assert( *this & __wait_flags::__proxy_wait );
+  if (!use_proxy_wait(*this))
+    {
+      // We can wait on this address directly.
+      __glibcxx_assert(_M_obj == addr);
 
-  // We always need a waitable state for proxy waits.
+      int val;
+      __atomic_load(static_cast<const int*>(addr), &val, __ATOMIC_ACQUIRE);
+      _M_old = val;
+
+      return;
+    }
+
+  // This will be a proxy wait, so get a waitable state.
   auto state = set_wait_state(addr, *this);
 
+  // The address we will wait on is the version count of the waitable state:
+  _M_obj = &state->_M_ver;
+
   // Read the value of the _M_ver counter.
-  __atomic_load(&state->_M_ver, &_M_old, __ATOMIC_ACQUIRE);
+  __platform_wait_t __old;
+  __atomic_load(&state->_M_ver, &__old, __ATOMIC_ACQUIRE);
+  _M_old = __old;
 }
 
 __wait_result_type
-__wait_impl(const void* __addr, __wait_args_base& __args)
+__wait_impl([[maybe_unused]] const void* __addr, __wait_args_base& __args)
 {
-  auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
-
-  const __platform_wait_t* __wait_addr;
-
-  if (__args & __wait_flags::__proxy_wait)
-    __wait_addr = &__state->_M_ver;
-  else
-    __wait_addr = static_cast<const __platform_wait_t*>(__addr);
+  auto* __wait_addr = static_cast<const __platform_wait_t*>(__args._M_obj);
 
   if (__args & __wait_flags::__do_spin)
     {
@@ -286,6 +307,7 @@ __wait_impl(const void* __addr, __wait_args_base& __args)
   __atomic_load(__wait_addr, &__val, __args._M_order);
   if (__val == __args._M_old)
     {
+      auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
       __state->_M_cv.wait(__state->_M_mtx);
       return { ._M_val = __val, ._M_has_val = false, ._M_timeout = false };
     }
@@ -294,24 +316,40 @@ __wait_impl(const void* __addr, __wait_args_base& __args)
 }
 
 void
-__notify_impl(const void* __addr, [[maybe_unused]] bool __all,
+__notify_impl([[maybe_unused]] const void* __addr, [[maybe_unused]] bool __all,
              const __wait_args_base& __args)
 {
-  auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
-  if (!__state)
-    __state = &__waitable_state::_S_state_for(__addr);
+  const bool __track_contention = __args & __wait_flags::__track_contention;
+  const bool proxy_wait = use_proxy_wait(__args);
 
-  [[maybe_unused]] const __platform_wait_t* __wait_addr;
+  [[maybe_unused]] auto* __wait_addr
+    = static_cast<const __platform_wait_t*>(__addr);
+
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
+  // Check whether it would be a non-proxy wait for this object.
+  // This condition must match the one in _M_setup_wait_impl to ensure that
+  // the address used for the notify matches the one used for the wait.
+  if (!proxy_wait)
+    {
+      if (__track_contention)
+       if (!__waitable_state::_S_state_for(__addr)._M_waiting())
+         return;
+
+      __platform_notify(__wait_addr, __all);
+      return;
+    }
+#endif
+
+  // Either a proxy wait or we don't have platform wait/wake primitives.
+
+  auto __state = &__waitable_state::_S_state_for(__addr);
 
   // Lock mutex so that proxied waiters cannot race with incrementing _M_ver
   // and see the old value, then sleep after the increment and notify_all().
   lock_guard __l{ *__state };
 
-  if (__args & __wait_flags::__proxy_wait)
+  if (proxy_wait)
     {
-      // Waiting for *__addr is actually done on the proxy's _M_ver.
-      __wait_addr = &__state->_M_ver;
-
       // Increment _M_ver so that waiting threads see something changed.
       // This has to be atomic because the load in _M_load_proxy_wait_val
       // is done without the mutex locked.
@@ -322,11 +360,11 @@ __notify_impl(const void* __addr, [[maybe_unused]] bool 
__all,
       // they can re-evaluate their conditions to see if they should
       // stop waiting or should wait again.
       __all = true;
-    }
-  else // Use the atomic variable's own address.
-    __wait_addr = static_cast<const __platform_wait_t*>(__addr);
 
-  if (__args & __wait_flags::__track_contention)
+      __wait_addr = &__state->_M_ver;
+    }
+
+  if (__track_contention)
     {
       if (!__state->_M_waiting())
        return;
@@ -366,7 +404,7 @@ __platform_wait_until(const __platform_wait_t* __addr,
 }
 #endif // HAVE_LINUX_FUTEX
 
-#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT
 bool
 __cond_wait_until(__condvar& __cv, mutex& __mx,
                  const __wait_clock_t::time_point& __atime)
@@ -381,7 +419,7 @@ __cond_wait_until(__condvar& __cv, mutex& __mx,
     __cv.wait_until(__mx, __ts);
   return __wait_clock_t::now() < __atime;
 }
-#endif // ! HAVE_PLATFORM_TIMED_WAIT
+#endif // ! HAVE_PLATFORM_WAIT
 
 // Unlike __spin_impl, does not always return _M_has_val == true.
 // If the deadline has already passed then no fresh value is loaded.
@@ -414,7 +452,9 @@ __spin_until_impl(const __platform_wait_t* __addr,
            return __res;
        }
 
-      __atomic_load(__addr, &__res._M_val, __args._M_order);
+      __platform_wait_t val;
+      __atomic_load(__addr, &val, __args._M_order);
+      __res._M_val = val;
       __res._M_has_val = true;
       if (__res._M_val != __args._M_old)
        {
@@ -428,16 +468,11 @@ __spin_until_impl(const __platform_wait_t* __addr,
 } // namespace
 
 __wait_result_type
-__wait_until_impl(const void* __addr, __wait_args_base& __args,
+__wait_until_impl([[maybe_unused]] const void* __addr, __wait_args_base& 
__args,
                  const __wait_clock_t::duration& __time)
 {
   const __wait_clock_t::time_point __atime(__time);
-  auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
-  const __platform_wait_t* __wait_addr;
-  if (__args & __wait_flags::__proxy_wait)
-    __wait_addr = &__state->_M_ver;
-  else
-    __wait_addr = static_cast<const __platform_wait_t*>(__addr);
+  auto* __wait_addr = static_cast<const __platform_wait_t*>(__args._M_obj);
 
   if (__args & __wait_flags::__do_spin)
     {
@@ -448,9 +483,9 @@ __wait_until_impl(const void* __addr, __wait_args_base& 
__args,
        return __res;
     }
 
-#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT
+#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
   if (__args & __wait_flags::__track_contention)
-    set_wait_state(__addr, __args);
+    set_wait_state(__addr, __args); // scoped_wait needs a __waitable_state
   scoped_wait s(__args);
   bool timeout = !__platform_wait_until(__wait_addr, __args._M_old, __atime);
   return { ._M_val = __args._M_old, ._M_has_val = false, ._M_timeout = timeout 
};
@@ -460,6 +495,7 @@ __wait_until_impl(const void* __addr, __wait_args_base& 
__args,
   __atomic_load(__wait_addr, &__val, __args._M_order);
   if (__val == __args._M_old)
     {
+      auto __state = static_cast<__waitable_state*>(__args._M_wait_state);
       bool timeout = !__cond_wait_until(__state->_M_cv, __state->_M_mtx, 
__atime);
       return { ._M_val = __val, ._M_has_val = false, ._M_timeout = timeout };
     }
-- 
2.51.1

[PATCH] libstdc++: Future-proof C++20 atomic wait/notify

Reply via email to