On Mon, Nov 17, 2025 at 10:10 AM Tomasz Kaminski <[email protected]> wrote:
> > > On Sun, Nov 16, 2025 at 1:56 AM Jonathan Wakely <[email protected]> > wrote: > >> This will allow us to extend atomic waiting functions to support a >> possible future 64-bit version of futex, as well as supporting >> futex-like wait/wake primitives on other targets (e.g. macOS has >> os_sync_wait_on_address and FreeBSD has _umtx_op). >> >> Before this change, the decision of whether to do a proxy wait or to >> wait on the atomic variable itself was made in the header at >> compile-time, which makes it an ABI property that would not have been >> possible to change later. That would have meant that >> std::atomic<uint64_t> would always have to do a proxy wait even if Linux >> gains support for 64-bit futex2(2) calls at some point in the future. >> The disadvantage of proxy waits is that several distinct atomic objects >> can share the same proxy state, leading to contention between threads >> even when they are not waiting on the same atomic object, similar to >> false sharing. It also result in spurious wake-ups because doing a >> notify on an atomic object that uses a proxy wait will wake all waiters >> sharing the proxy. >> >> For types that are known to definitely not need a proxy wait (e.g. int >> on Linux) the header can still choose a more efficient path at >> compile-time. But for other types, the decision of whether to do a proxy >> wait is deferred to runtime, inside the library internals. This will >> make it possible for future versions of libstdc++.so to extend the set >> of types which don't need to use proxy waits, without ABI changes. >> >> The way the change works is to stop using the __proxy_wait flag that was >> set by the inline code in the headers. Instead the __wait_args struct >> has an extra pointer member which the library internals populate with >> either the address of the atomic object or the _M_ver counter in the >> proxy state. There is also a new _M_obj_size member which stores the >> size of the atomic object, so that the library can decide whether a >> proxy is needed. So for example if linux gains 64-bit futex support then >> the library can decide not to use a proxy when _M_obj_size == 8. >> Finally, the _M_old member of the __wait_args struct is changed to >> uint64_t so that it has room to store 64-bit values, not just whatever >> size the __platform_wait_t type is (which is a 32-bit int on Linux). >> Similarly, the _M_val member of __wait_result_type changes to uint64_t >> too. >> >> libstdc++-v3/ChangeLog: >> >> * config/abi/pre/gnu.ver: Adjust exports. >> * include/bits/atomic_timed_wait.h >> (_GLIBCXX_HAVE_PLATFORM_TIMED_WAIT): >> Do not define this macro. >> (__atomic_wait_address_until_v, __atomic_wait_address_for_v): >> Guard assertions with #ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT. >> * include/bits/atomic_wait.h (__platform_wait_uses_type): >> Different separately for platforms with and without platform >> wait. >> (_GLIBCXX_HAVE_PLATFORM_WAIT): Do not define this macro. >> (_GLIBCXX_UNKNOWN_PLATFORM_WAIT): Define new macro. >> (__wait_value_type): New typedef. >> (__wait_result_type): Change _M_val to __wait_value_type. >> (__wait_args_base::_M_old): Change to __wait_args_base. >> (__wait_args_base::_M_obg, __wait_args_base::_M_obj_size): New >> data members. >> (__wait_args::__wait_args): Set _M_obj and _M_obj_size on >> construction. >> (__wait_args::_M_setup_wait): Change void* parameter to deduced >> type. Use _S_bit_cast instead of __builtin_bit_cast. >> (__wait_args::_M_load_proxy_wait_val): Remove function, replace >> with ... >> (__wait_args::_M_setup_wait_impl): New function. >> (__wait_args::_S_bit_cast): Wrapper for __builtin_bit_cast which >> also supports conversion from 32-bit values. >> (__wait_args::_S_flags_for): Do not set __proxy_wait flag. >> (__atomic_wait_address_v): Guard assertions with #ifdef >> _GLIBCXX_UNKNOWN_PLATFORM_WAIT. >> * src/c++20/atomic.cc (_GLIBCXX_HAVE_PLATFORM_WAIT): Define here >> instead of in header. Check _GLIBCXX_HAVE_PLATFORM_WAIT instead >> of _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT. >> (__spin_impl): Adjust for 64-bit __wait_args_base::_M_old. >> (use_proxy_wait): New function. >> (__wait_args::_M_load_proxy_wait_val): Replace with ... >> (__wait_args::_M_setup_wait_impl): New function. Call >> use_proxy_wait to decide at runtime whether to wait on the >> pointer directly instead of using a proxy. If a proxy is needed, >> set _M_obj to point to its _M_ver member. Adjust for change to >> type of _M_old. >> (__wait_impl): Wait on _M_obj unconditionally. >> (__notify_impl): Call use_proxy_wait to decide whether to notify >> on the address parameter or a proxy >> (__spin_until_impl): Adjust for change to type of _M_val. >> (__wait_until_impl): Wait on _M_obj unconditionally. >> --- >> >> Tested x86_64-linux, powerpc64le-linux, sparc-solaris. >> > A lot of comments below. > >> >> I think this is an imporant change which I unfortunately didn't think of >> until recently. >> >> This changes the exports from the shared library, but we're still in >> stage 1 so I think that should be allowed (albeit unfortunate). Nobody >> should be expecting GCC 16 to be stable yet. >> >> The __proxy_wait enumerator is now unused and could be removed. The >> __abi_version enumerator could also be bumped to indicate the >> incompatibility with earlier snapshots of GCC 16, but I don't think that >> is needed. We could in theory keep the old symbol export >> (__wait_args::_M_load_proxy_wait) and make it trap/abort if called, but >> I'd prefer to just remove it and cause dynamic linker errors instead. >> >> There's a TODO in the header about which types should be allowed to take >> the optimized paths (see the __waitable concept). For types where that's >> true, if the size matches a futex then we'll use a futex, even if it's >> actually an enum or floating-point type (or pointer on 32-bit targets). >> I'm not sure if that's safe. >> >> >> libstdc++-v3/config/abi/pre/gnu.ver | 3 +- >> libstdc++-v3/include/bits/atomic_timed_wait.h | 12 +- >> libstdc++-v3/include/bits/atomic_wait.h | 109 +++++++++----- >> libstdc++-v3/src/c++20/atomic.cc | 140 +++++++++++------- >> 4 files changed, 166 insertions(+), 98 deletions(-) >> >> diff --git a/libstdc++-v3/config/abi/pre/gnu.ver >> b/libstdc++-v3/config/abi/pre/gnu.ver >> index 2e48241d51f9..3c2bd4921730 100644 >> --- a/libstdc++-v3/config/abi/pre/gnu.ver >> +++ b/libstdc++-v3/config/abi/pre/gnu.ver >> @@ -2553,7 +2553,8 @@ GLIBCXX_3.4.35 { >> _ZNSt8__detail11__wait_implEPKvRNS_16__wait_args_baseE; >> _ZNSt8__detail13__notify_implEPKvbRKNS_16__wait_args_baseE; >> >> >> _ZNSt8__detail17__wait_until_implEPKvRNS_16__wait_args_baseERKNSt6chrono8durationI[lx]St5ratioIL[lx]1EL[lx]1000000000EEEE; >> - _ZNSt8__detail11__wait_args22_M_load_proxy_wait_valEPKv; >> + _ZNSt8__detail11__wait_args18_M_setup_wait_implEPKv; >> + _ZNSt8__detail11__wait_args20_M_setup_notify_implEPKv; >> >> # std::chrono::gps_clock::now, tai_clock::now >> _ZNSt6chrono9gps_clock3nowEv; >> diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h >> b/libstdc++-v3/include/bits/atomic_timed_wait.h >> index 30f7ff616840..918a267d10eb 100644 >> --- a/libstdc++-v3/include/bits/atomic_timed_wait.h >> +++ b/libstdc++-v3/include/bits/atomic_timed_wait.h >> @@ -75,14 +75,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> return chrono::ceil<__w_dur>(__atime); >> } >> >> -#ifdef _GLIBCXX_HAVE_LINUX_FUTEX >> -#define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT >> -#else >> -// define _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT and implement >> __platform_wait_until >> -// if there is a more efficient primitive supported by the platform >> -// (e.g. __ulock_wait) which is better than pthread_cond_clockwait. >> -#endif // ! HAVE_LINUX_FUTEX >> - >> __wait_result_type >> __wait_until_impl(const void* __addr, __wait_args_base& __args, >> const __wait_clock_t::duration& __atime); >> @@ -156,7 +148,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> const chrono::time_point<_Clock, _Dur>& >> __atime, >> bool __bare_wait = false) noexcept >> { >> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT >> +#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT >> __glibcxx_assert(false); // This function can't be used for proxy >> wait. >> #endif >> __detail::__wait_args __args{ __addr, __old, __order, __bare_wait >> }; >> @@ -208,7 +200,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> const chrono::duration<_Rep, _Period>& >> __rtime, >> bool __bare_wait = false) noexcept >> { >> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT >> +#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT >> > This name really reads strange, and sounds like something with "TODO". > I think _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT was just OK name, even > if it was not used directly. > >> __glibcxx_assert(false); // This function can't be used for proxy >> wait. >> #endif >> __detail::__wait_args __args{ __addr, __old, __order, __bare_wait >> }; >> diff --git a/libstdc++-v3/include/bits/atomic_wait.h >> b/libstdc++-v3/include/bits/atomic_wait.h >> index 95151479c120..49369419d6a6 100644 >> --- a/libstdc++-v3/include/bits/atomic_wait.h >> +++ b/libstdc++-v3/include/bits/atomic_wait.h >> @@ -45,35 +45,34 @@ >> namespace std _GLIBCXX_VISIBILITY(default) >> { >> _GLIBCXX_BEGIN_NAMESPACE_VERSION >> +#if defined _GLIBCXX_HAVE_LINUX_FUTEX >> namespace __detail >> { >> -#ifdef _GLIBCXX_HAVE_LINUX_FUTEX >> -#define _GLIBCXX_HAVE_PLATFORM_WAIT 1 >> using __platform_wait_t = int; >> inline constexpr size_t __platform_wait_alignment = 4; >> + } >> + template<typename _Tp> >> + inline constexpr bool __platform_wait_uses_type >> + = is_scalar_v<_Tp> && sizeof(_Tp) == sizeof(int) && alignof(_Tp) >> >= 4; >> #else >> +# define _GLIBCXX_UNKNOWN_PLATFORM_WAIT 1 >> // define _GLIBCX_HAVE_PLATFORM_WAIT and implement __platform_wait() >> // and __platform_notify() if there is a more efficient primitive >> supported >> // by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better >> than >> // a mutex/condvar based wait. >> + namespace __detail >> + { >> # if ATOMIC_LONG_LOCK_FREE == 2 >> using __platform_wait_t = unsigned long; >> # else >> using __platform_wait_t = unsigned int; >> # endif >> inline constexpr size_t __platform_wait_alignment >> - = __alignof__(__platform_wait_t); >> -#endif >> + = sizeof(__platform_wait_t) < __alignof__(__platform_wait_t) >> + ? __alignof__(__platform_wait_t) : sizeof(__platform_wait_t); >> } // namespace __detail >> - >> - template<typename _Tp> >> - inline constexpr bool __platform_wait_uses_type >> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT >> - = is_scalar_v<_Tp> >> - && ((sizeof(_Tp) == sizeof(__detail::__platform_wait_t)) >> - && (alignof(_Tp) >= __detail::__platform_wait_alignment)); >> -#else >> - = false; >> + template<typename> >> + inline constexpr bool __platform_wait_uses_type = false; >> #endif >> >> namespace __detail >> @@ -105,10 +104,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> return __builtin_memcmp(&__a, &__b, sizeof(_Tp)) == 0; >> } >> >> - // lightweight std::optional<__platform_wait_t> >> + // TODO: this needs to be false for types with padding, e.g. __int20. >> > I do not understand why this needs to be required. This funciton is used > only via atomic > or atomic_ref. For atomic, we can guarantee that the type has padding > bytes cleared.. > >> + // TODO: should this be true only for integral, enum, and pointer >> types? >> > What I think is missing here is alignment. I assume that any platform wait > may use > bits that are clear due the alignment of platform wait type for some > internal state. > Or we are going to check is_sufficiently_aligment in cc file, and use > different kind of > wait depending on the object? > > But I think, we can later safely extend or change what is waitable (except > extending it past 8 bytes), > as if we start putting _M_obj_size to non zero, impl may use platform wait. > > So, I will go with safe option is integral, pointer or enum, This would > also give > us no padding guarantee, I assume? > >> + template<typename _Tp> >> + concept __waitable >> + = is_scalar_v<_Tp> && (sizeof(_Tp) <= sizeof(__UINT64_TYPE__)); >> + >> + // Storage for up to 64 bits of value, should be considered opaque >> bits. >> + using __wait_value_type = __UINT64_TYPE__; >> + >> + // lightweight std::optional<__wait_value_type> >> struct __wait_result_type >> { >> - __platform_wait_t _M_val; >> + __wait_value_type _M_val; >> unsigned char _M_has_val : 1; // _M_val value was loaded before >> return. >> unsigned char _M_timeout : 1; // Waiting function ended with >> timeout. >> unsigned char _M_unused : 6; // padding >> @@ -143,8 +151,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> { >> __wait_flags _M_flags; >> int _M_order = __ATOMIC_ACQUIRE; >> - __platform_wait_t _M_old = 0; >> + __wait_value_type _M_old{}; >> void* _M_wait_state = nullptr; >> + const void* _M_obj = nullptr; // The address of the object to >> wait on. >> + unsigned char _M_obj_size = 0; // The size of that object. >> >> // Test whether _M_flags & __flags is non-zero. >> bool >> @@ -162,36 +172,48 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> explicit >> __wait_args(const _Tp* __addr, bool __bare_wait = false) noexcept >> : __wait_args_base{ _S_flags_for(__addr, __bare_wait) } >> - { } >> + { >> + _M_obj = __addr; // Might be replaced by _M_setup_wait >> + if constexpr (__waitable<_Tp>) >> + // __wait_impl might be able to wait directly on __addr >> + // instead of using a proxy, depending on its size. >> + _M_obj_size = sizeof(_Tp); >> + } >> >> __wait_args(const __platform_wait_t* __addr, __platform_wait_t >> __old, >> int __order, bool __bare_wait = false) noexcept >> - : __wait_args_base{ _S_flags_for(__addr, __bare_wait), __order, >> __old } >> - { } >> + : __wait_args(__addr, __bare_wait) >> + { >> + _M_order = __order; >> + _M_old = __old; >> + } >> >> __wait_args(const __wait_args&) noexcept = default; >> __wait_args& operator=(const __wait_args&) noexcept = default; >> >> - template<typename _ValFn, >> - typename _Tp = >> decay_t<decltype(std::declval<_ValFn&>()())>> >> + template<typename _Tp, typename _ValFn> >> _Tp >> - _M_setup_wait(const void* __addr, _ValFn __vfn, >> + _M_setup_wait(const _Tp* __addr, _ValFn __vfn, >> __wait_result_type __res = {}) >> { >> + static_assert(is_same_v<_Tp, decay_t<decltype(__vfn())>>); >> + >> if constexpr (__platform_wait_uses_type<_Tp>) >> { >> - // If the wait is not proxied, the value we check when >> waiting >> - // is the value of the atomic variable itself. >> + // If we know for certain that this type can be waited on >> + // efficiently using something like a futex syscall, >> + // then we can avoid the overhead of _M_setup_wait_impl >> + // and just load the value and store it into _M_old. >> >> - if (__res._M_has_val) // The previous wait loaded a recent >> value. >> + if (__res._M_has_val) // A previous wait loaded a recent >> value. >> { >> _M_old = __res._M_val; >> - return __builtin_bit_cast(_Tp, __res._M_val); >> + return _S_bit_cast<_Tp>(_M_old); >> > I am not sure if I understand which branch of _S_bit_cast this would use, > we have neither (sizeof(To) == sizeof(From) i.e. 4 vs 8) and > neither sizeof(_From) == sizeof(__UINT32_TYPE__). > I would much more prefer if this would be dome as: > return __builtin_bit_cast(_Tp, static_cast<__UINT32_TYPE__>(_M_old)); > >> } >> else // Load the value from __vfn >> { >> - _Tp __val = __vfn(); >> - _M_old = __builtin_bit_cast(__platform_wait_t, __val); >> + auto __val = __vfn(); >> + _M_old = _S_bit_cast<__wait_value_type>(__val); >> > And here: > _M_ old = __builtin_bit_cast(__UINT32_TYPE__, __val); > However, instead of __UINT32_TYPE__, we should use > make_unsinged<__platform_wait_t>. > >> return __val; >> } >> } >> @@ -199,16 +221,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> { >> if (__res._M_has_val) // The previous wait loaded a recent >> value. >> _M_old = __res._M_val; >> - else // Load _M_ver from the proxy (must happen before >> __vfn()). >> - _M_load_proxy_wait_val(__addr); >> + else // Let the library decide how to setup the wait. >> + { >> + // Set _M_obj to the address to be waited on (either >> __addr >> + // or a proxy) and load its current value into _M_old. >> + _M_setup_wait_impl(__addr); >> + } >> return __vfn(); >> } >> } >> >> private: >> - // Populates _M_wait_state and _M_old from the proxy for __addr. >> + template<typename _To, typename _From> >> + [[__gnu__::__always_inline__]] >> + static _To >> + _S_bit_cast(_From __from) >> + { >> + if constexpr (sizeof(_To) == sizeof(_From)) >> + return __builtin_bit_cast(_To, __from); >> + else if constexpr (sizeof(_From) == sizeof(__UINT32_TYPE__)) >> + return __builtin_bit_cast(__UINT32_TYPE__, __from); >> + else >> + __builtin_unreachable(); >> + } >> + >> + // Populates _M_wait_state and _M_old appropriately for __addr. >> void >> - _M_load_proxy_wait_val(const void* __addr); >> + _M_setup_wait_impl(const void* __addr); >> >> template<typename _Tp> >> static constexpr __wait_flags >> @@ -218,8 +257,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> __wait_flags __res = __abi_version | __do_spin; >> if (!__bare_wait) >> __res |= __track_contention; >> - if constexpr (!__platform_wait_uses_type<_Tp>) >> - __res |= __proxy_wait; >> + // if constexpr (!__platform_wait_uses_type<_Tp>) >> + // __res |= __proxy_wait; >> > Remove this simply. > >> return __res; >> } >> }; >> @@ -255,7 +294,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >> __detail::__platform_wait_t __old, >> int __order, bool __bare_wait = false) >> { >> -#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT >> +#ifdef _GLIBCXX_UNKNOWN_PLATFORM_WAIT >> __glibcxx_assert(false); // This function can't be used for proxy >> wait. >> #endif >> __detail::__wait_args __args{ __addr, __old, __order, __bare_wait }; >> diff --git a/libstdc++-v3/src/c++20/atomic.cc >> b/libstdc++-v3/src/c++20/atomic.cc >> index e280045b619d..d5fbd10ed11f 100644 >> --- a/libstdc++-v3/src/c++20/atomic.cc >> +++ b/libstdc++-v3/src/c++20/atomic.cc >> @@ -38,14 +38,7 @@ >> # include <syscall.h> >> # include <bits/functexcept.h> >> # include <sys/time.h> >> -#endif >> - >> -#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT >> -# ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT >> -// __waitable_state assumes that we consistently use the same >> implementation >> -// (i.e. futex vs mutex+condvar) for timed and untimed waiting. >> -# error "This configuration is not currently supported" >> -# endif >> +# define _GLIBCXX_HAVE_PLATFORM_WAIT 1 >> #endif >> >> #pragma GCC diagnostic ignored "-Wmissing-field-initializers" >> @@ -107,7 +100,7 @@ namespace >> // Count of threads blocked waiting on this state. >> alignas(_S_align) __platform_wait_t _M_waiters = 0; >> >> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT >> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT >> mutex _M_mtx; >> >> // This type meets the Cpp17BasicLockable requirements. >> @@ -123,7 +116,7 @@ namespace >> // use this for waiting and notifying functions instead. >> alignas(_S_align) __platform_wait_t _M_ver = 0; >> >> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT >> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT >> __condvar _M_cv; >> #endif >> >> @@ -216,17 +209,19 @@ namespace >> __spin_impl(const __platform_wait_t* __addr, const __wait_args_base& >> __args) >> { >> __platform_wait_t __val{}; >> + __wait_value_type wval; >> for (auto __i = 0; __i < __atomic_spin_count; ++__i) >> { >> __atomic_load(__addr, &__val, __args._M_order); >> - if (__val != __args._M_old) >> - return { ._M_val = __val, ._M_has_val = true, ._M_timeout = >> false }; >> + wval = static_cast<__wait_value_type>(__val); >> + if (wval != __args._M_old) >> + return { ._M_val = wval, ._M_has_val = true, ._M_timeout = >> false }; >> if (__i < __atomic_spin_count_relax) >> __thread_relax(); >> else >> __thread_yield(); >> } >> - return { ._M_val = __val, ._M_has_val = true, ._M_timeout = true }; >> + return { ._M_val = wval, ._M_has_val = true, ._M_timeout = true }; >> } >> >> inline __waitable_state* >> @@ -237,32 +232,58 @@ namespace >> return static_cast<__waitable_state*>(args._M_wait_state); >> } >> >> + [[gnu::always_inline]] >> + inline bool >> + use_proxy_wait(const __wait_args_base& args) >> > This needs to take the address that we are waiting on and check if it's > sufficiently aligned > to __platform_wait_aligment. > However, I do not think we should check unit64_t at all, as we do not have > such platform > wait yet, and we will not change __platform_wait_uses_type in that case. >> >> + { >> + if constexpr (__platform_wait_uses_type<uint32_t>) >> + if (args._M_obj_size == sizeof(uint32_t)) >> + return false; >> + >> + if constexpr (__platform_wait_uses_type<uint64_t>) >> + if (args._M_obj_size == sizeof(uint64_t)) >> + return false; >> + >> + // Use proxy wait for everything else: >> + return true; >> + } >> + >> } // namespace >> >> -// Called for a proxy wait >> +// Set _M_wait_state if using proxy wait, or caller wants contention >> tracking. >> +// Set _M_obj to &_M_wait_state->_M_ver if using proxy wait. >> +// Load the current value from _M_obj and store in _M_val. >> void >> -__wait_args::_M_load_proxy_wait_val(const void* addr) >> +__wait_args::_M_setup_wait_impl(const void* addr) >> > I would use _M_obj_size to indicate if proxy wait is used for the actual > await. > This would not work well with notify, so I think it is fine as it, except that we need to check aligment. > { >> - // __glibcxx_assert( *this & __wait_flags::__proxy_wait ); >> + if (!use_proxy_wait(*this)) > > Do; _M_obj_size > 0 && !use_proxy_wait(*this) > >> + { >> + // We can wait on this address directly. >> + __glibcxx_assert(_M_obj == addr); >> > This funciton is one that is expected to set _M_obj, so it seems incorrect > to assert it here. > I missed the fact that this is set in the constructor. > >> - // We always need a waitable state for proxy waits. >> + int val; >> + __atomic_load(static_cast<const int*>(addr), &val, >> __ATOMIC_ACQUIRE); >> + _M_old = val; >> + >> + return; >> + } >> + >> + // This will be a proxy wait, so get a waitable state. >> auto state = set_wait_state(addr, *this); >> >> + // The address we will wait on is the version count of the waitable >> state: >> + _M_obj = &state->_M_ver; >> > + >> // Read the value of the _M_ver counter. >> - __atomic_load(&state->_M_ver, &_M_old, __ATOMIC_ACQUIRE); >> + __platform_wait_t __old; >> + __atomic_load(&state->_M_ver, &__old, __ATOMIC_ACQUIRE); >> > Nit pick, but I think this could be just: > __platform_wait_t __old = __atomic_load_n(&state->_M_ver, > __ATOMIC_ACQUIRE); > >> + _M_old = __old; >> > Or just: > _M_old = __atomic_load_n(&state->_M_ver, __ATOMIC_ACQUIRE); > >> } >> >> __wait_result_type >> -__wait_impl(const void* __addr, __wait_args_base& __args) >> +__wait_impl([[maybe_unused]] const void* __addr, __wait_args_base& >> __args) >> { >> - auto __state = static_cast<__waitable_state*>(__args._M_wait_state); >> - >> - const __platform_wait_t* __wait_addr; >> - >> - if (__args & __wait_flags::__proxy_wait) >> - __wait_addr = &__state->_M_ver; >> - else >> - __wait_addr = static_cast<const __platform_wait_t*>(__addr); >> + auto* __wait_addr = static_cast<const >> __platform_wait_t*>(__args._M_obj); >> >> if (__args & __wait_flags::__do_spin) >> { >> @@ -286,6 +307,7 @@ __wait_impl(const void* __addr, __wait_args_base& >> __args) >> __atomic_load(__wait_addr, &__val, __args._M_order); >> if (__val == __args._M_old) >> { >> + auto __state = >> static_cast<__waitable_state*>(__args._M_wait_state); >> __state->_M_cv.wait(__state->_M_mtx); >> return { ._M_val = __val, ._M_has_val = false, ._M_timeout = false >> }; >> } >> @@ -294,24 +316,40 @@ __wait_impl(const void* __addr, __wait_args_base& >> __args) >> } >> >> void >> -__notify_impl(const void* __addr, [[maybe_unused]] bool __all, >> +__notify_impl([[maybe_unused]] const void* __addr, [[maybe_unused]] bool >> __all, >> const __wait_args_base& __args) >> { >> - auto __state = static_cast<__waitable_state*>(__args._M_wait_state); >> - if (!__state) >> - __state = &__waitable_state::_S_state_for(__addr); >> + const bool __track_contention = __args & >> __wait_flags::__track_contention; >> + const bool proxy_wait = use_proxy_wait(__args); >> > And here we could just check if _M_obj_size == 0, as this would indication > of proxy wait. > >> >> - [[maybe_unused]] const __platform_wait_t* __wait_addr; >> + [[maybe_unused]] auto* __wait_addr >> + = static_cast<const __platform_wait_t*>(__addr); >> + >> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT >> + // Check whether it would be a non-proxy wait for this object. >> + // This condition must match the one in _M_setup_wait_impl to ensure >> that >> + // the address used for the notify matches the one used for the wait. >> + if (!proxy_wait) >> + { >> + if (__track_contention) >> + if (!__waitable_state::_S_state_for(__addr)._M_waiting()) >> + return; >> + >> + __platform_notify(__wait_addr, __all); >> + return; >> + } >> +#endif >> + >> + // Either a proxy wait or we don't have platform wait/wake primitives. >> + >> + auto __state = &__waitable_state::_S_state_for(__addr); >> >> // Lock mutex so that proxied waiters cannot race with incrementing >> _M_ver >> // and see the old value, then sleep after the increment and >> notify_all(). >> lock_guard __l{ *__state }; >> >> - if (__args & __wait_flags::__proxy_wait) >> + if (proxy_wait) >> { >> - // Waiting for *__addr is actually done on the proxy's _M_ver. >> - __wait_addr = &__state->_M_ver; >> - >> // Increment _M_ver so that waiting threads see something changed. >> // This has to be atomic because the load in _M_load_proxy_wait_val >> // is done without the mutex locked. >> @@ -322,11 +360,11 @@ __notify_impl(const void* __addr, [[maybe_unused]] >> bool __all, >> // they can re-evaluate their conditions to see if they should >> // stop waiting or should wait again. >> __all = true; >> - } >> - else // Use the atomic variable's own address. >> - __wait_addr = static_cast<const __platform_wait_t*>(__addr); >> >> - if (__args & __wait_flags::__track_contention) >> + __wait_addr = &__state->_M_ver; >> + } >> + >> + if (__track_contention) >> { >> if (!__state->_M_waiting()) >> return; >> @@ -366,7 +404,7 @@ __platform_wait_until(const __platform_wait_t* __addr, >> } >> #endif // HAVE_LINUX_FUTEX >> >> -#ifndef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT >> +#ifndef _GLIBCXX_HAVE_PLATFORM_WAIT >> bool >> __cond_wait_until(__condvar& __cv, mutex& __mx, >> const __wait_clock_t::time_point& __atime) >> @@ -381,7 +419,7 @@ __cond_wait_until(__condvar& __cv, mutex& __mx, >> __cv.wait_until(__mx, __ts); >> return __wait_clock_t::now() < __atime; >> } >> -#endif // ! HAVE_PLATFORM_TIMED_WAIT >> +#endif // ! HAVE_PLATFORM_WAIT >> >> // Unlike __spin_impl, does not always return _M_has_val == true. >> // If the deadline has already passed then no fresh value is loaded. >> @@ -414,7 +452,9 @@ __spin_until_impl(const __platform_wait_t* __addr, >> return __res; >> } >> >> - __atomic_load(__addr, &__res._M_val, __args._M_order); >> + __platform_wait_t val; >> + __atomic_load(__addr, &val, __args._M_order); >> > Same comment on __atomic_load_n here. > >> + __res._M_val = val; >> __res._M_has_val = true; >> if (__res._M_val != __args._M_old) >> { >> @@ -428,16 +468,11 @@ __spin_until_impl(const __platform_wait_t* __addr, >> } // namespace >> >> __wait_result_type >> -__wait_until_impl(const void* __addr, __wait_args_base& __args, >> +__wait_until_impl([[maybe_unused]] const void* __addr, __wait_args_base& >> __args, >> const __wait_clock_t::duration& __time) >> { >> const __wait_clock_t::time_point __atime(__time); >> - auto __state = static_cast<__waitable_state*>(__args._M_wait_state); >> - const __platform_wait_t* __wait_addr; >> - if (__args & __wait_flags::__proxy_wait) >> - __wait_addr = &__state->_M_ver; >> - else >> - __wait_addr = static_cast<const __platform_wait_t*>(__addr); >> + auto* __wait_addr = static_cast<const >> __platform_wait_t*>(__args._M_obj); >> >> if (__args & __wait_flags::__do_spin) >> { >> @@ -448,9 +483,9 @@ __wait_until_impl(const void* __addr, >> __wait_args_base& __args, >> return __res; >> } >> >> -#ifdef _GLIBCXX_HAVE_PLATFORM_TIMED_WAIT >> +#ifdef _GLIBCXX_HAVE_PLATFORM_WAIT >> if (__args & __wait_flags::__track_contention) >> - set_wait_state(__addr, __args); >> + set_wait_state(__addr, __args); // scoped_wait needs a >> __waitable_state >> scoped_wait s(__args); >> bool timeout = !__platform_wait_until(__wait_addr, __args._M_old, >> __atime); >> return { ._M_val = __args._M_old, ._M_has_val = false, ._M_timeout = >> timeout }; >> @@ -460,6 +495,7 @@ __wait_until_impl(const void* __addr, >> __wait_args_base& __args, >> __atomic_load(__wait_addr, &__val, __args._M_order); >> if (__val == __args._M_old) >> { >> + auto __state = >> static_cast<__waitable_state*>(__args._M_wait_state); >> bool timeout = !__cond_wait_until(__state->_M_cv, __state->_M_mtx, >> __atime); >> return { ._M_val = __val, ._M_has_val = false, ._M_timeout = >> timeout }; >> } >> -- >> 2.51.1 >> >>
