On Fri, 9 Jan 2026 at 12:44, Jonathan Wakely <[email protected]> wrote:
>
> A failed assertion was observed with std::atomic<bool>::wait when the
> loop in __atomic_wait_address is entered and calls _M_setup_wait a
> second time, after waking from __wait_impl. When the first call to
> _M_setup_wait makes a call to _M_setup_proxy_wait that function decides
> that a proxy wait is needed for an object of type bool, and it updates
> the _M_obj and _M_obj_size members to refer to the futex in the proxy
> state, instead of referring to the bool object itself. The next time
> _M_setup_wait is called it calls _M_setup_proxy_wait again but now it
> sees _M_obj_size == sizeof(futex) and so this time decides a proxy wait
> is *not* needed, and then fails the __glibcxx_assert(_M_obj == addr)
> check.
>
> The problem is that _M_setup_proxy_wait wasn't correctly handling the
> case where it's called a second time, after the decision to use a proxy
> wait has already been made. That can be fixed in _M_setup_proxy_wait by
> checking if _M_obj != addr, which implies that a proxy wait has already
> been set up by a previous call. In that case, _M_setup_proxy_wait should
> only update _M_old to the latest value of the proxy _M_ver.
>
> This change means that _M_setup_proxy_wait is safe to call repeatedly
> for a proxy wait, and will only update _M_wait_state, _M_obj, and
> _M_obj_size on the first call. On the second and subsequent calls, those
> variables are already correctly set for the proxy wait so don't need to
> be set again.
>
> For non-proxy waits, calling _M_setup_proxy_wait more than once is safe,
> but pessimizes performance. The caller shouldn't make a second call to
> _M_setup_proxy_wait because we don't need to check again if a proxy wait
> should be used (the answer won't change) and we don't need to load a
> value from the proxy _M_ver.
>
> However, it was difficult to detect the case of a non-proxy wait,
> because _M_setup_wait doesn't know if it's being called the first time
> (when _M_setup_proxy_wait is called to make the initial decision) or a
> subsequent time (in which case _M_obj == addr implies a non-proxy wait
> was already decided on). As a result, _M_setup_proxy_wait was being used
> every time to see if it's a proxy wait. We can resolve this by splitting
> the _M_setup_wait function into _M_setup_wait and _M_on_wake, where the
> former is only called once to do the initial setup and the latter is
> called after __wait_impl returns, to prepare to check the predicate and
> possibly wait again. The new _M_on_wake function can avoid unnecessary
> calls to _M_setup_proxy_wait by checking _M_obj == addr to identify a
> non-proxy wait.
>
> The three callers of _M_setup_wait are updated to use _M_on_wake instead
> of _M_setup_wait after waking from a waiting function. This change
> revealed a latent performance bug in __atomic_wait_address_for which was
> not passing __res to _M_setup_wait, so a new value was always loaded
> even when __res._M_has_val was true. By splitting _M_on_wake out of
> _M_setup_wait this problem became more obvious, because we no longer
> have _M_setup_wait doing two different jobs, depending on whether it was
> passed the optional third argument or not.
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/atomic_timed_wait.h (__atomic_wait_address_until):
> Use _M_on_wake instead of _M_setup_wait after waking.
> (__atomic_wait_address_for): Likewise.
> * include/bits/atomic_wait.h (__atomic_wait_address): Likewise.
> (__wait_args::_M_setup_wait): Remove third parameter and move
> code to update _M_old to ...
> (__wait_args::_M_on_wake): New member function to update _M_old
> after waking, only calling _M_setup_proxy_wait if needed.
> (__wait_args::_M_store): New member function to update _M_old
> from a value, for non-proxy waits.
I forgot to add the changelog for atomic.cc:
* src/c++20/atomic.cc (__wait_args::_M_setup_proxy_wait): If
_M_obj is not addr, only load a new value and return true.
> ---
>
> v2: Address Tomasz's review. We still need to call _M_setup_proxy_wait
> more than once for proxy waits, to load _M_old. So make it safe to call
> repeatedly, only loading _M_old after the first call. Split
> _M_setup_wait into two functions, for before and after the first wait,
> allowing us to avoid redundant calls to _M_setup_proxy_wait when the
> first call returned false.
>
> Tested x86_64-linux and x86_64-freebsd14.
>
> libstdc++-v3/include/bits/atomic_timed_wait.h | 4 +-
> libstdc++-v3/include/bits/atomic_wait.h | 82 +++++++++++++------
> libstdc++-v3/src/c++20/atomic.cc | 36 ++++----
> 3 files changed, 78 insertions(+), 44 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h
> b/libstdc++-v3/include/bits/atomic_timed_wait.h
> index bca43e3222dd..c195cefc6b85 100644
> --- a/libstdc++-v3/include/bits/atomic_timed_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_timed_wait.h
> @@ -140,7 +140,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> auto __res = __detail::__wait_until(__addr, __args, __atime);
> if (__res._M_timeout)
> return false; // C++26 will also return last observed __val
> - __val = __args._M_setup_wait(__addr, __vfn, __res);
> + __val = __args._M_on_wake(__addr, __vfn, __res);
> }
> return true; // C++26 will also return last observed __val
> }
> @@ -192,7 +192,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> auto __res = __detail::__wait_for(__addr, __args, __rtime);
> if (__res._M_timeout)
> return false; // C++26 will also return last observed __val
> - __val = __args._M_setup_wait(__addr, __vfn);
> + __val = __args._M_on_wake(__addr, __vfn, __res);
> }
> return true; // C++26 will also return last observed __val
> }
> diff --git a/libstdc++-v3/include/bits/atomic_wait.h
> b/libstdc++-v3/include/bits/atomic_wait.h
> index b6240a95370f..8511e003abca 100644
> --- a/libstdc++-v3/include/bits/atomic_wait.h
> +++ b/libstdc++-v3/include/bits/atomic_wait.h
> @@ -205,43 +205,72 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
> template<typename _Tp, typename _ValFn>
> _Tp
> - _M_setup_wait(const _Tp* __addr, _ValFn __vfn,
> - __wait_result_type __res = {})
> + _M_setup_wait(const _Tp* __addr, _ValFn __vfn)
> {
> static_assert(is_same_v<_Tp, decay_t<decltype(__vfn())>>);
>
> - if (__res._M_has_val) // A previous wait loaded a recent value.
> - {
> - _M_old = __res._M_val;
> - if constexpr (!__platform_wait_uses_type<_Tp>)
> - {
> - // __res._M_val might be the value of a proxy wait object,
> - // not the value of *__addr. Call __vfn() to get new value.
> - return __vfn();
> - }
> - // Not a proxy wait, so the value in __res._M_val was loaded
> - // from *__addr and we don't need to call __vfn().
> - else if constexpr (sizeof(_Tp) == sizeof(__UINT32_TYPE__))
> - return __builtin_bit_cast(_Tp, (__UINT32_TYPE__)_M_old);
> - else if constexpr (sizeof(_Tp) == sizeof(__UINT64_TYPE__))
> - return __builtin_bit_cast(_Tp, (__UINT64_TYPE__)_M_old);
> - else
> - static_assert(false); // Unsupported size
> - }
> -
> if constexpr (!__platform_wait_uses_type<_Tp>)
> if (_M_setup_proxy_wait(__addr))
> {
> // We will use a proxy wait for this object.
> - // The library has set _M_obj and _M_obj_size and _M_old.
> + // The library has set _M_wait_state, _M_obj, _M_obj_size,
> + // and _M_old.
> // Call __vfn to load the current value from *__addr
> // (which must happen after the call to _M_setup_proxy_wait).
> return __vfn();
> }
>
> // We will use a futex-like operation to wait on this object,
> - // and so can just load the value and store it into _M_old.
> - auto __val = __vfn();
> + // so just load the value, store it into _M_old, and return it.
> + return _M_store(__vfn());
> + }
> +
> + // Called after a wait returns, to prepare to wait again.
> + template<typename _Tp, typename _ValFn>
> + _Tp
> + _M_on_wake(const _Tp* __addr, _ValFn __vfn, __wait_result_type __res)
> + {
> + if constexpr (!__platform_wait_uses_type<_Tp>) // maybe a proxy wait
> + if (_M_obj != __addr) // definitely a proxy wait
> + {
> + if (__res._M_has_val)
> + // Previous wait loaded a recent value from the proxy.
> + _M_old = __res._M_val;
> + else // Load a new value from the proxy and store in _M_old.
> + (void) _M_setup_proxy_wait(nullptr);
> + // Read the current value of *__addr
> + return __vfn();
> + }
> +
> + if (__res._M_has_val) // The previous wait loaded a recent value.
> + {
> + _M_old = __res._M_val;
> +
> + // Not a proxy wait, so the value in __res._M_val was loaded
> + // from *__addr and we don't need to call __vfn().
> + if constexpr (sizeof(_Tp) == sizeof(__UINT64_TYPE__))
> + return __builtin_bit_cast(_Tp, (__UINT64_TYPE__)_M_old);
> + else if constexpr (sizeof(_Tp) == sizeof(__UINT32_TYPE__))
> + return __builtin_bit_cast(_Tp, (__UINT32_TYPE__)_M_old);
> + else if constexpr (sizeof(_Tp) == sizeof(__UINT16_TYPE__))
> + return __builtin_bit_cast(_Tp, (__UINT16_TYPE__)_M_old);
> + else if constexpr (sizeof(_Tp) == sizeof(__UINT8_TYPE__))
> + return __builtin_bit_cast(_Tp, (__UINT8_TYPE__)_M_old);
> + else // Should be a proxy wait for this size!
> + __glibcxx_assert(false);
> + }
> + else
> + return _M_store(__vfn());
> + }
> +
> + private:
> + // Store __val in _M_old.
> + // pre: This must be a non-proxy wait.
> + template<typename _Tp>
> + [[__gnu__::__always_inline__]]
> + _Tp
> + _M_store(_Tp __val)
> + {
> // We have to consider various sizes, because a future libstdc++.so
> // might enable non-proxy waits for additional sizes.
> if constexpr (sizeof(_Tp) == sizeof(__UINT64_TYPE__))
> @@ -252,12 +281,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> _M_old = __builtin_bit_cast(__UINT16_TYPE__, __val);
> else if constexpr (sizeof(_Tp) == sizeof(__UINT8_TYPE__))
> _M_old = __builtin_bit_cast(__UINT8_TYPE__, __val);
> - else // _M_setup_proxy_wait should have returned true for this type!
> + else // Should be a proxy wait for this size!
> __glibcxx_assert(false);
> return __val;
> }
>
> - private:
> // Returns true if a proxy wait will be used for __addr, false
> otherwise.
> // If true, _M_wait_state, _M_obj, _M_obj_size, and _M_old are set.
> // If false, data members are unchanged.
> @@ -297,7 +325,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> while (!__pred(__val))
> {
> auto __res = __detail::__wait_impl(__addr, __args);
> - __val = __args._M_setup_wait(__addr, __vfn, __res);
> + __val = __args._M_on_wake(__addr, __vfn, __res);
> }
> // C++26 will return __val
> }
> diff --git a/libstdc++-v3/src/c++20/atomic.cc
> b/libstdc++-v3/src/c++20/atomic.cc
> index 2d5842df30c3..cbff5e9ba1a6 100644
> --- a/libstdc++-v3/src/c++20/atomic.cc
> +++ b/libstdc++-v3/src/c++20/atomic.cc
> @@ -311,26 +311,32 @@ namespace
>
> // Return false (and don't change any data members) if we can do a non-proxy
> // wait for the object of size `_M_obj_size` at address `addr`.
> -// Otherwise, the object at addr needs to use a proxy wait. Set
> _M_wait_state,
> -// set _M_obj and _M_obj_size to refer to the _M_wait_state->_M_ver proxy,
> -// load the current value from _M_obj and store it in _M_old, then return
> true.
> +// Otherwise, we will use a proxy wait. If addr == _M_obj this is the initial
> +// setup of the proxy wait, so set _M_wait_state to the proxy state for addr,
> +// and set _M_obj and _M_obj_size to refer to the _M_wait_state->_M_ver
> proxy.
> +// For both the initial setup of a proxy wait and for subsequent calls to
> this
> +// function for proxy waits, we load the current value from _M_obj (the
> proxy)
> +// and store it in _M_old, then return true.
> bool
> __wait_args::_M_setup_proxy_wait(const void* addr)
> {
> - if (!use_proxy_wait(*this, addr)) // We can wait on this address directly.
> + __waitable_state* state = nullptr;
> +
> + if (addr == _M_obj)
> {
> - // Ensure the caller set _M_obj correctly, as that's what we'll wait
> on:
> - __glibcxx_assert(_M_obj == addr);
> - return false;
> + if (!use_proxy_wait(*this, addr)) // We can wait on this address
> directly.
> + return false;
> +
> + // This will be a proxy wait, so get a waitable state.
> + state = set_wait_state(addr, *this);
> +
> + // The address we will wait on is the version count of the waitable
> state:
> + _M_obj = &state->_M_ver;
> + // __wait_impl and __wait_until_impl need to know this size:
> + _M_obj_size = sizeof(state->_M_ver);
> }
> -
> - // This will be a proxy wait, so get a waitable state.
> - auto state = set_wait_state(addr, *this);
> -
> - // The address we will wait on is the version count of the waitable state:
> - _M_obj = &state->_M_ver;
> - // __wait_impl and __wait_until_impl need to know this size:
> - _M_obj_size = sizeof(state->_M_ver);
> + else // This is not the first call to this function, already a proxy wait.
> + state = static_cast<__waitable_state*>(_M_wait_state);
>
> // Read the value of the _M_ver counter.
> _M_old = __atomic_load_n(&state->_M_ver, __ATOMIC_ACQUIRE);
> --
> 2.52.0
>