On Fri, Jul 18, 2025 at 8:03 PM Jonathan Wakely <jwak...@redhat.com> wrote:
> This adds a check when incrementing the shared count and weak count and > will trap if it overflows. This also double the effective range of the > counts for most 64-bit targets. > > The counter type, _Atomic_word, is usually a signed 32-bit int (except > on Solaris v9 where it is a signed 64-bit long). The return type of > std::shared_ptr::use_count() is long. For targets where long is wider > than _Atomic_word (most 64-bit targets) we can treat the _Atomic_word > reference counts as unsigned and allow them to wrap around from their > most positive value to their most negative value without any problems. > I wonder could we return std::numeric_limits<long>::max() from use_count(), if make_unsigned_t<_Atomic_word> value does not fit in the long? The value will not increment, but only the usable one is zero. > The logic that operates on the counts only cares if they are zero or > non-zero, and never performs relational comparisons. The atomic > fetch_add operations on integers are required by the standard to behave > like unsigned types, so that overflow is well-defined: > > "the result is as if the object value and parameters were converted to > their corresponding unsigned types, the computation performed on those > types, and the result converted back to the signed type." > > So if we allow the counts to wrap around to negative values, all we need > to do is cast the value to make_unsigned_t<_Atomic_word> before > returning it as long from the use_count() function. > > In practice even exceeding INT_MAX is extremely unlikely, as it would > require billions of shared_ptr or weak_ptr objects to have been > constructed and never destroyed. However, if that happens we now have > twice the range before the count returns to zero and causes problems. > > This change also adds checks to the increment operations to detect when > incrementing the maximum value, and will now trap if the counter would > be incremented past its maximum. The maximum value is the value at which > incrementing it produces an invalid use_count(). So that is either the > maximum positive value of _Atomic_word, or for targets where we now > allow the counters to wrap around to negative values, the "maximum" > value is -1, because that is the value at which one more increment > overflows the counter to zero. > > libstdc++-v3/ChangeLog: > > PR libstdc++/71945 > * include/bits/shared_ptr_base.h (_Sp_counted_base::_S_chk): > Trap if a reference count cannot be incremented any higher. > (_Sp_counted_base::_M_add_ref_copy): Use _S_chk. > (_Sp_counted_base::_M_add_weak_ref): Likewise. > (_Sp_counted_base<...>::_M_add_ref_lock_nothrow): Likewise. > (_Sp_counted_base::_M_use_count): Cast _M_use_count to unsigned > before returning as long. > --- > > This improves safety for std::shared_ptr and std::weak_ptr, removing a > possible source of undefined behaviour (although only in extreme cases > where the reference count somehow exceeds INT_MAX). > > However, in a very simple benchmark I see a significant impact in > performance, which is not entirely surprising given that this patch adds > a test to every increment. Maybe we could change [[__unlikely__]] on > those branches to use __builtin_expect_with_probability instead, with a > very very very small probability? > > I like the idea of trapping on these overflows, but we need to benchmark > carefully and decide if it's worth the overhead, or if the overhead can > be reduced. Maybe this patch should be split into two, one which adds > the unsigned cast in use_count(), which doubles the usable range of > reference counts without affecting codegen, and a second patch which > adds the overflow detection and the trap. > > Either way (as the complete patch, or just casting to unsigned so that > negative values can be used), it depends on the PR 121148 fix to avoid > UB from signed overflow in the atomics. > > Tested x86_64-linux. > > libstdc++-v3/include/bits/shared_ptr_base.h | 62 +++++++++++++++++++-- > 1 file changed, 58 insertions(+), 4 deletions(-) > > diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h > b/libstdc++-v3/include/bits/shared_ptr_base.h > index fb868e7afc36..ccc7cef3cb5d 100644 > --- a/libstdc++-v3/include/bits/shared_ptr_base.h > +++ b/libstdc++-v3/include/bits/shared_ptr_base.h > @@ -148,7 +148,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > // Increment the use count (used when the count is greater than > zero). > void > _M_add_ref_copy() > - { __gnu_cxx::__atomic_add_dispatch(&_M_use_count, 1); } > + { _S_chk(__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, 1)); > } > > // Increment the use count if it is non-zero, throw otherwise. > void > @@ -200,7 +200,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > // Increment the weak count. > void > _M_weak_add_ref() noexcept > - { __gnu_cxx::__atomic_add_dispatch(&_M_weak_count, 1); } > + { > + // _M_weak_count can always use negative values (except for > _S_single) > + // because only _M_use_count can be observed. See _M_chk for > details. > + constexpr _Atomic_word __max = _Lp != _S_single > + ? -1 : __gnu_cxx::__int_traits<_Atomic_word>::__max; > + > + if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count, 1) == > __max) > + [[__unlikely__]] __builtin_trap(); > + } > > // Decrement the weak count. > void > @@ -224,15 +232,52 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > long > _M_get_use_count() const noexcept > { > + // If long is wider than _Atomic_word then we can treat > _Atomic_word > + // as unsigned, and so double its usable range. If the widths are > the > + // same then casting to unsigned and then to long is a no-op. > + using _Up = typename make_unsigned<_Atomic_word>::type; > + > // No memory barrier is used here so there is no synchronization > // with other threads. > - return __atomic_load_n(&_M_use_count, __ATOMIC_RELAXED); > + return (_Up) __atomic_load_n(&_M_use_count, __ATOMIC_RELAXED); > } > > private: > _Sp_counted_base(_Sp_counted_base const&) = delete; > _Sp_counted_base& operator=(_Sp_counted_base const&) = delete; > > + // Called when incrementing _M_use_count to cause a trap on > overflow. > + // This should be passed the value of the counter before the > increment. > + static void > + _S_chk(_Atomic_word __count) > + { > + // __max is the maximum allowed value for the shared reference > count. > + // All valid reference count values need to fit into [0,LONG_MAX) > + // because users can observe the count via shared_ptr::use_count(). > + // > + // When long is wider than _Atomic_word, _M_use_count can go > negative > + // and the cast in _Sp_counted_base::use_count() will turn it into > a > + // positive value suitable for returning to users. The > implementation > + // only cares whether _M_use_count reaches zero after a decrement, > + // so negative values are not a problem internally. > + // So when possible, use -1 for __max (incrementing past that would > + // overflow _M_use_count to 0, which means an empty shared_ptr). > + // > + // When long is not wider than _Atomic_word, negative counts would > + // not fit in [0,LONG_MAX) after casting to unsigned, so > use_count() > + // would return invalid negative values, which is not allowed. > + // So __max is just the type's maximum positive value. > + // > + // The _S_single policy cannot use negative counts, because it uses > + // non-atomic increments with undefined behaviour on signed > overflow. > + constexpr _Atomic_word __max > + = sizeof(long) > sizeof(_Atomic_word) && _Lp != _S_single > + ? -1 : __gnu_cxx::__int_traits<_Atomic_word>::__max; > + > + if (__count == __max) [[__unlikely__]] > + __builtin_trap(); > + } > + > _Atomic_word _M_use_count; // #shared > _Atomic_word _M_weak_count; // #weak + (#shared != 0) > }; > @@ -244,6 +289,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > { > if (_M_use_count == 0) > return false; > + _S_chk(_M_use_count); > ++_M_use_count; > return true; > } > @@ -254,8 +300,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > _M_add_ref_lock_nothrow() noexcept > { > __gnu_cxx::__scoped_lock sentry(*this); > - if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, 1) == 0) > + if (auto __c = > __gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, 1)) > + _S_chk(__c); > + else > { > + // Count was zero, so we cannot lock it to get a shared_ptr. > + // Reset to zero. This isn't racy, because there are no > shared_ptr > + // objects using this count and any other weak_ptr objects using > it > + // must call this function to modify _M_use_count, so would be > + // synchronized by the mutex. > _M_use_count = 0; > return false; > } > @@ -279,6 +332,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > while (!__atomic_compare_exchange_n(&_M_use_count, &__count, > __count + 1, > true, __ATOMIC_ACQ_REL, > __ATOMIC_RELAXED)); > + _S_chk(__count); > return true; > } > > -- > 2.50.1 > >