As noted in Bug 122878 comment 2, the _M_try_acquire_for implementation
doesn't reduce the remaining timeout each time it returns from an atomic
waiting function. This means that it can wait longer than requested, or
even loop forever. If there is a spurious wake from the timed waiting
function (__wait_until_impl) it will return indicating no timeout
occurred, which means the caller will check the value and potentially
sleep again. If spurious wakes happen every time, it will just keep
sleeping in a loop forever. This is observed to actually happen on
FreeBSD 14.0-STABLE where pthread_cond_timedwait gets a spurious wake
and so never times out.

The solution in this commit is to replace the implementation of
_M_try_acquire_for with a call to _M_try_acquire_until, converting the
relative timeout to an absolute timeout against the steady clock.

As noted in comment 4 of the PR, this requires some changes to
_M_try_acquire which was relying on an implementation detail of
_M_try_acquire_for, namely that waiting for a zero duration will just
check if the value changes in a spinloop, without actually sleeping.
This behaviour is desirable for _M_try_acquire so that it can handle
short-lived contention without failing immediately. To preserve that
behaviour of _M_try_acquire it is changed to do its own loop and to call
__atomic_wait_address_for directly to do the spinloop.

libstdc++-v3/ChangeLog:

        PR libstdc++/122878
        * include/bits/semaphore_base.h (_M_try_acquire): Replace
        _M_try_acquire_for call with explicit loop and call to
        __atomic_wait_address_for.
        (_M_try_acquire_for): Replace loop with call to
        _M_try_acquire_until.

Co-authored-by: Tomasz KamiƄski <[email protected]>
---

Tested x86_64-linux and x86_64-freebsd14

 libstdc++-v3/include/bits/semaphore_base.h | 28 ++++++++++++++--------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/bits/semaphore_base.h 
b/libstdc++-v3/include/bits/semaphore_base.h
index 5a1e0a416249..679b43e4cb1c 100644
--- a/libstdc++-v3/include/bits/semaphore_base.h
+++ b/libstdc++-v3/include/bits/semaphore_base.h
@@ -100,8 +100,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
     {
       // The fastest implementation of this function is just _M_do_try_acquire
       // but that can fail under contention even when _M_count > 0.
-      // Using _M_try_acquire_for(0ns) will retry a few times in a loop.
-      return _M_try_acquire_for(__detail::__wait_clock_t::duration{});
+
+      auto __vfn = [this]{ return _M_get_current(); };
+      _Available __is_available{__vfn()};
+
+      // Retry the compare exchange a few times in case of contention:
+      for (int __i = 0; __i < 10 && __is_available(); ++__i)
+       if (_M_do_try_acquire(__is_available._M_val))
+         return true;
+
+      // Spinloop to see if it becomes available.
+      constexpr auto __zero = __detail::__wait_clock_t::duration{};
+      if (std::__atomic_wait_address_for(&_M_counter, __is_available,
+                                        __vfn, __zero, true))
+       return false; // timed out
+
+      // Should be available, try once more to acquire it:
+      return _M_do_try_acquire(__is_available._M_val);
     }
 
     template<typename _Clock, typename _Duration>
@@ -122,14 +137,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
       bool
       _M_try_acquire_for(const chrono::duration<_Rep, _Period>& __rtime) 
noexcept
       {
-       auto __vfn = [this]{ return _M_get_current(); };
-       _Available __is_available{__vfn()};
-       while (!_M_do_try_acquire(__is_available._M_val))
-         if (!__is_available())
-           if (!std::__atomic_wait_address_for(&_M_counter, __is_available,
-                                               __vfn, __rtime, true))
-             return false; // timed out
-       return true;
+       return _M_try_acquire_until(__detail::__wait_clock_t::now() + __rtime);
       }
 
     _GLIBCXX_ALWAYS_INLINE ptrdiff_t
-- 
2.52.0

Reply via email to