On Fri, 18 Jul 2025 at 19:03, Jonathan Wakely <jwak...@redhat.com> wrote: > > This adds a check when incrementing the shared count and weak count and > will trap if it overflows. This also double the effective range of the > counts for most 64-bit targets. > > The counter type, _Atomic_word, is usually a signed 32-bit int (except > on Solaris v9 where it is a signed 64-bit long). The return type of > std::shared_ptr::use_count() is long. For targets where long is wider > than _Atomic_word (most 64-bit targets) we can treat the _Atomic_word > reference counts as unsigned and allow them to wrap around from their > most positive value to their most negative value without any problems. > The logic that operates on the counts only cares if they are zero or > non-zero, and never performs relational comparisons. The atomic > fetch_add operations on integers are required by the standard to behave > like unsigned types, so that overflow is well-defined: > > "the result is as if the object value and parameters were converted to > their corresponding unsigned types, the computation performed on those > types, and the result converted back to the signed type." > > So if we allow the counts to wrap around to negative values, all we need > to do is cast the value to make_unsigned_t<_Atomic_word> before > returning it as long from the use_count() function. > > In practice even exceeding INT_MAX is extremely unlikely, as it would > require billions of shared_ptr or weak_ptr objects to have been > constructed and never destroyed. However, if that happens we now have > twice the range before the count returns to zero and causes problems. > > This change also adds checks to the increment operations to detect when > incrementing the maximum value, and will now trap if the counter would > be incremented past its maximum. The maximum value is the value at which > incrementing it produces an invalid use_count(). So that is either the > maximum positive value of _Atomic_word, or for targets where we now > allow the counters to wrap around to negative values, the "maximum" > value is -1, because that is the value at which one more increment > overflows the counter to zero. > > libstdc++-v3/ChangeLog: > > PR libstdc++/71945 > * include/bits/shared_ptr_base.h (_Sp_counted_base::_S_chk): > Trap if a reference count cannot be incremented any higher. > (_Sp_counted_base::_M_add_ref_copy): Use _S_chk. > (_Sp_counted_base::_M_add_weak_ref): Likewise. > (_Sp_counted_base<...>::_M_add_ref_lock_nothrow): Likewise. > (_Sp_counted_base::_M_use_count): Cast _M_use_count to unsigned > before returning as long. > --- > > This improves safety for std::shared_ptr and std::weak_ptr, removing a > possible source of undefined behaviour (although only in extreme cases > where the reference count somehow exceeds INT_MAX). > > However, in a very simple benchmark I see a significant impact in > performance, which is not entirely surprising given that this patch adds > a test to every increment. Maybe we could change [[__unlikely__]] on > those branches to use __builtin_expect_with_probability instead, with a > very very very small probability?
I've tried some other benchmarks and I'm not seeing any difference with the checking for counter overflow. The first benchmark I tried, where the checks did make it slower, was just timing the execution time of the test for PR71945 (which I forgot to add to the patch): // { dg-do run { target c++11 } } // { dg-require-effective-target int32plus } #include <memory> #include <climits> #include <testsuite_hooks.h> bool destroyed = false; struct Obj { ~Obj() { destroyed = true; } }; int main() { if constexpr (sizeof(_Atomic_word) >= sizeof(long) || __gnu_cxx::__default_lock_policy == __gnu_cxx::_S_single) return 0; using Ptr = std::shared_ptr<Obj>; Ptr p(new Obj); alignas(Ptr) unsigned char buf[sizeof(Ptr)]; long count = p.use_count(); while (count < (long)(UINT_MAX)) { new (buf) Ptr(p); ++count; VERIFY(p.use_count() == count); } p.reset(); VERIFY( ! destroyed ); } All this does is create billions of shared_ptr objects, increasing the ref count up to its maximum value, and assert each time that the use_count() is correct. The total execution time of this test with the overflow checks is about 60% slower than without the checks. But if I remove the VERIFY from the loop (just checking it once after the loop) then the time with and without the checks seems to be much closer, between 0-5% slower with checks, maybe. It could just be noise. So maybe the overhead of the checks in real code isn't going to be significant.