On Fri, Jul 18, 2025 at 8:03 PM Jonathan Wakely <jwak...@redhat.com> wrote:

> This adds a check when incrementing the shared count and weak count and
> will trap if it overflows. This also double the effective range of the
> counts for most 64-bit targets.
>
> The counter type, _Atomic_word, is usually a signed 32-bit int (except
> on Solaris v9 where it is a signed 64-bit long). The return type of
> std::shared_ptr::use_count() is long. For targets where long is wider
> than _Atomic_word (most 64-bit targets) we can treat the _Atomic_word
> reference counts as unsigned and allow them to wrap around from their
> most positive value to their most negative value without any problems.
>
I wonder could we return std::numeric_limits<long>::max() from use_count(),
if make_unsigned_t<_Atomic_word> value does not fit in the long?
The value will not increment, but only the usable one is zero.

> The logic that operates on the counts only cares if they are zero or
> non-zero, and never performs relational comparisons. The atomic
> fetch_add operations on integers are required by the standard to behave
> like unsigned types, so that overflow is well-defined:
>
>   "the result is as if the object value and parameters were converted to
>   their corresponding unsigned types, the computation performed on those
>   types, and the result converted back to the signed type."
>
> So if we allow the counts to wrap around to negative values, all we need
> to do is cast the value to make_unsigned_t<_Atomic_word> before
> returning it as long from the use_count() function.
>
> In practice even exceeding INT_MAX is extremely unlikely, as it would
> require billions of shared_ptr or weak_ptr objects to have been
> constructed and never destroyed. However, if that happens we now have
> twice the range before the count returns to zero and causes problems.
>
> This change also adds checks to the increment operations to detect when
> incrementing the maximum value, and will now trap if the counter would
> be incremented past its maximum. The maximum value is the value at which
> incrementing it produces an invalid use_count(). So that is either the
> maximum positive value of _Atomic_word, or for targets where we now
> allow the counters to wrap around to negative values, the "maximum"
> value is -1, because that is the value at which one more increment
> overflows the counter to zero.
>
> libstdc++-v3/ChangeLog:
>
>         PR libstdc++/71945
>         * include/bits/shared_ptr_base.h (_Sp_counted_base::_S_chk):
>         Trap if a reference count cannot be incremented any higher.
>         (_Sp_counted_base::_M_add_ref_copy): Use _S_chk.
>         (_Sp_counted_base::_M_add_weak_ref): Likewise.
>         (_Sp_counted_base<...>::_M_add_ref_lock_nothrow): Likewise.
>         (_Sp_counted_base::_M_use_count): Cast _M_use_count to unsigned
>         before returning as long.
> ---
>
> This improves safety for std::shared_ptr and std::weak_ptr, removing a
> possible source of undefined behaviour (although only in extreme cases
> where the reference count somehow exceeds INT_MAX).
>
> However, in a very simple benchmark I see a significant impact in
> performance, which is not entirely surprising given that this patch adds
> a test to every increment. Maybe we could change [[__unlikely__]] on
> those branches to use __builtin_expect_with_probability instead, with a
> very very very small probability?
>
> I like the idea of trapping on these overflows, but we need to benchmark
> carefully and decide if it's worth the overhead, or if the overhead can
> be reduced. Maybe this patch should be split into two, one which adds
> the unsigned cast in use_count(), which doubles the usable range of
> reference counts without affecting codegen, and a second patch which
> adds the overflow detection and the trap.
>
> Either way (as the complete patch, or just casting to unsigned so that
> negative values can be used), it depends on the PR 121148 fix to avoid
> UB from signed overflow in the atomics.
>
> Tested x86_64-linux.
>
>  libstdc++-v3/include/bits/shared_ptr_base.h | 62 +++++++++++++++++++--
>  1 file changed, 58 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h
> b/libstdc++-v3/include/bits/shared_ptr_base.h
> index fb868e7afc36..ccc7cef3cb5d 100644
> --- a/libstdc++-v3/include/bits/shared_ptr_base.h
> +++ b/libstdc++-v3/include/bits/shared_ptr_base.h
> @@ -148,7 +148,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>        // Increment the use count (used when the count is greater than
> zero).
>        void
>        _M_add_ref_copy()
> -      { __gnu_cxx::__atomic_add_dispatch(&_M_use_count, 1); }
> +      { _S_chk(__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, 1));
> }
>
>        // Increment the use count if it is non-zero, throw otherwise.
>        void
> @@ -200,7 +200,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>        // Increment the weak count.
>        void
>        _M_weak_add_ref() noexcept
> -      { __gnu_cxx::__atomic_add_dispatch(&_M_weak_count, 1); }
> +      {
> +       // _M_weak_count can always use negative values (except for
> _S_single)
> +       // because only _M_use_count can be observed. See _M_chk for
> details.
> +       constexpr _Atomic_word __max = _Lp != _S_single
> +             ? -1 : __gnu_cxx::__int_traits<_Atomic_word>::__max;
> +
> +       if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count, 1) ==
> __max)
> +         [[__unlikely__]] __builtin_trap();
> +      }
>
>        // Decrement the weak count.
>        void
> @@ -224,15 +232,52 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>        long
>        _M_get_use_count() const noexcept
>        {
> +       // If long is wider than _Atomic_word then we can treat
> _Atomic_word
> +       // as unsigned, and so double its usable range. If the widths are
> the
> +       // same then casting to unsigned and then to long is a no-op.
> +       using _Up = typename make_unsigned<_Atomic_word>::type;
> +
>          // No memory barrier is used here so there is no synchronization
>          // with other threads.
> -        return __atomic_load_n(&_M_use_count, __ATOMIC_RELAXED);
> +       return (_Up) __atomic_load_n(&_M_use_count, __ATOMIC_RELAXED);
>        }
>
>      private:
>        _Sp_counted_base(_Sp_counted_base const&) = delete;
>        _Sp_counted_base& operator=(_Sp_counted_base const&) = delete;
>
> +      // Called when incrementing _M_use_count to cause a trap on
> overflow.
> +      // This should be passed the value of the counter before the
> increment.
> +      static void
> +      _S_chk(_Atomic_word __count)
> +      {
> +       // __max is the maximum allowed value for the shared reference
> count.
> +       // All valid reference count values need to fit into [0,LONG_MAX)
> +       // because users can observe the count via shared_ptr::use_count().
> +       //
> +       // When long is wider than _Atomic_word, _M_use_count can go
> negative
> +       // and the cast in _Sp_counted_base::use_count() will turn it into
> a
> +       // positive value suitable for returning to users. The
> implementation
> +       // only cares whether _M_use_count reaches zero after a decrement,
> +       // so negative values are not a problem internally.
> +       // So when possible, use -1 for __max (incrementing past that would
> +       // overflow _M_use_count to 0, which means an empty shared_ptr).
> +       //
> +       // When long is not wider than _Atomic_word, negative counts would
> +       // not fit in [0,LONG_MAX) after casting to unsigned, so
> use_count()
> +       // would return invalid negative values, which is not allowed.
> +       // So __max is just the type's maximum positive value.
> +       //
> +       // The _S_single policy cannot use negative counts, because it uses
> +       // non-atomic increments with undefined behaviour on signed
> overflow.
> +       constexpr _Atomic_word __max
> +         = sizeof(long) > sizeof(_Atomic_word) && _Lp != _S_single
> +             ? -1 : __gnu_cxx::__int_traits<_Atomic_word>::__max;
> +
> +       if (__count == __max) [[__unlikely__]]
> +         __builtin_trap();
> +      }
> +
>        _Atomic_word  _M_use_count;     // #shared
>        _Atomic_word  _M_weak_count;    // #weak + (#shared != 0)
>      };
> @@ -244,6 +289,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      {
>        if (_M_use_count == 0)
>         return false;
> +      _S_chk(_M_use_count);
>        ++_M_use_count;
>        return true;
>      }
> @@ -254,8 +300,15 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>      _M_add_ref_lock_nothrow() noexcept
>      {
>        __gnu_cxx::__scoped_lock sentry(*this);
> -      if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, 1) == 0)
> +      if (auto __c =
> __gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, 1))
> +       _S_chk(__c);
> +      else
>         {
> +         // Count was zero, so we cannot lock it to get a shared_ptr.
> +         // Reset to zero. This isn't racy, because there are no
> shared_ptr
> +         // objects using this count and any other weak_ptr objects using
> it
> +         // must call this function to modify _M_use_count, so would be
> +         // synchronized by the mutex.
>           _M_use_count = 0;
>           return false;
>         }
> @@ -279,6 +332,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>        while (!__atomic_compare_exchange_n(&_M_use_count, &__count,
> __count + 1,
>                                           true, __ATOMIC_ACQ_REL,
>                                           __ATOMIC_RELAXED));
> +      _S_chk(__count);
>        return true;
>      }
>
> --
> 2.50.1
>
>

Reply via email to