Ref: https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636805.html, https://gcc.gnu.org/pipermail/gcc-patches/2023-November/636804.html

I found a couple of bugs in this patch set.

#1: In atomic_wait.h, we have __wait_flags defined to include:

       __do_spin = 4,
       __spin_only = 8 | __do_spin, // implies __do_spin

So __spin_only is defined as two bits, which breaks when we test `__args & __wait_flags::__spin_only` in __wait_impl(). The test evaluates true even when we have only set __do_spin (bit 2) without bit 3, which is the default at least on Linux and which is supposed to request a limited number of spins before calling futex() and sleeping. You can observe this by seeing that a thread blocked on std::counting_semaphore::acquire() consumes 100% CPU indefinitely.

There is another instance in atomic_timed_wait.h.

Probably __spin_only should just be 8, and any configuration that wants it should be responsible for manually setting __do_spin as well.


#2: __atomic_wait_address() does not populate __args._M_old before passing to __wait_impl(), so it remains at its default value 0 (set by the constructor). As a result, `__wait_impl` is effectively always waiting on the value 0 (i.e. until `*__addr != 0`) rather than whatever value is actually wanted. This is of course wrong, and can deadlock under the right race condition.

To fix, we need something like `__args._M_old = __val;` inside the loop in __atomic_wait_address(), so that we always wait on the exact value that the predicate __pred() rejected. Again, there are similar instances in atomic_timed_wait.h.

(In the current libstdc++ implementation, the predicate function loads the value itself and doesn't return it, so we wait instead on a possibly different value that was loaded somewhere else in _S_do_spin(). That is also wrong, because the latter value might have been acceptable to the predicate, and if so then waiting on it may deadlock. A classic TOCTOU. This is Bug #104928, reported in March 2022 and still unfixed: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104928. The patch proposed here would fix it, though.)

--
Nate Eldredge
n...@thatsmathematics.com

Reply via email to