[Bug libstdc++/101037] C++20 std::atomic_flag deadlock on Windows

pexu--- via Gcc-bugs Sat, 15 Jan 2022 20:08:28 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101037


Pekka S <p...@gcc-bugzilla.mail.kapsi.fi> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |p...@gcc-bugzilla.mail.kaps
                   |                            |i.fi

--- Comment #3 from Pekka S <p...@gcc-bugzilla.mail.kapsi.fi> ---
Created attachment 52203
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52203&action=edit
Ensure _M_ver and _M_cv/_M_mtx are synchronized

Hi.

I suspect this is not a w64-mingw32 specific error but present on all platforms
that do not define _GLIBCXX_HAVE_PLATFORM_WAIT.

If I have not mistaken the problem lies within __waiter_pool::_M_do_wait and
that _M_ver and _M_cv/_M_mtx are not properly synchronized:

stdlibc++v3/include/bits/atomic_wait.h:
 254       void
 255       _M_do_wait(const __platform_wait_t* __addr, __platform_wait_t __old)
noexcept
 256       {
 257 #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
 258         __platform_wait(__addr, __old);
 259 #else
 260         __platform_wait_t __val;
 261         __atomic_load(__addr, &__val, __ATOMIC_RELAXED);
 262         if (__val == __old)
 263           {
 264             lock_guard<mutex> __l(_M_mtx);
 265             _M_cv.wait(_M_mtx);
 266           }
 267 #endif // __GLIBCXX_HAVE_PLATFORM_WAIT
 268       }

In this case _GLIBCXX_HAVE_PLATFORM_WAIT is not defined and __addr will point
to __waiter_pool_base::_M_ver (I do not know why it is called ver).  _M_ver is
used to indicate if a thread has been notified.  __old is the value of _M_ver
when the conditional wait was entered at __waiter::_M_do_wait_v or
__waiter::_M_do_wait.

(__waiter and __waiter_pool are different classes, __waiter::_M_do_wait calls
__waiter__pool::_M_do_wait unless a predicate on which to wait passes.)

 310           if (_M_laundered())
 311             {
 312               __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL);
 313               __all = true;
 314             }

Each time a thread is notified _M_ver is incremented by one. _M_laundered
simply checks if _M_addr is equal to _M_ver (of __waiter_base::_M_w).

This is where the problem is:  A waiting thread might observe a (soon-to-be)
stale value which is equal to __old.  This is because between __atomic_load and
_M_cv.wait the value of _M_ver might change.  So, in other words,
_M_cv.notify_all (note that notify_one never actually happens since __all is
forced as true) might occur before _M_cv.wait is properly registered and/or
actually waiting.

As __waiter_bool::_M_do_wait can only observe _M_ver (e.g. not a value of
atomic_flag on which the user expects the wait to happen) it must not wait
based on a stale information -- so, it absolutely must not miss any notifies,
or it will not be able to return to __waiter::_M_do_wait which does the
predicate level checking.

My suggestion is that __l(_M_mtx) shall be moved outside the `if (__val ==
__old)' block and _M_ver shall be only modified if the _M_mtx mutex is held. 
Since _M_cv.wait(_M_mtx) releases the mutex when the thread is actually waiting
this should "just work".

Attached patch does that and bit more:  _M_laundered is moved from
__waiter_base to __waiter_pool_base since it must be able to access _M_mtx. 
_M_enter_notify is added along with _M_enter, _M_leave and
__detail::__atomic_load_value helpers.  _M_mtx and _M_cv are marked mutable so
that __waiter_pool::_M_do_wait / _M_do_spin can be made const.  Instead of
__addr it uses _M_ver (and adds _M_laundered(__addr) assertation) to clearly
communicate that non-_GLIBCXX_HAVE_PLATFORM_WAIT code is specialized.  Also,
added const and/or noexcept where due.

[Bug libstdc++/101037] C++20 std::atomic_flag deadlock on Windows

Reply via email to