On Fri, 18 Jul 2025 at 19:03, Jonathan Wakely <jwak...@redhat.com> wrote:
>
> This adds a check when incrementing the shared count and weak count and
> will trap if it overflows. This also double the effective range of the
> counts for most 64-bit targets.
>
> The counter type, _Atomic_word, is usually a signed 32-bit int (except
> on Solaris v9 where it is a signed 64-bit long). The return type of
> std::shared_ptr::use_count() is long. For targets where long is wider
> than _Atomic_word (most 64-bit targets) we can treat the _Atomic_word
> reference counts as unsigned and allow them to wrap around from their
> most positive value to their most negative value without any problems.
> The logic that operates on the counts only cares if they are zero or
> non-zero, and never performs relational comparisons. The atomic
> fetch_add operations on integers are required by the standard to behave
> like unsigned types, so that overflow is well-defined:
>
>   "the result is as if the object value and parameters were converted to
>   their corresponding unsigned types, the computation performed on those
>   types, and the result converted back to the signed type."
>
> So if we allow the counts to wrap around to negative values, all we need
> to do is cast the value to make_unsigned_t<_Atomic_word> before
> returning it as long from the use_count() function.
>
> In practice even exceeding INT_MAX is extremely unlikely, as it would
> require billions of shared_ptr or weak_ptr objects to have been
> constructed and never destroyed. However, if that happens we now have
> twice the range before the count returns to zero and causes problems.
>
> This change also adds checks to the increment operations to detect when
> incrementing the maximum value, and will now trap if the counter would
> be incremented past its maximum. The maximum value is the value at which
> incrementing it produces an invalid use_count(). So that is either the
> maximum positive value of _Atomic_word, or for targets where we now
> allow the counters to wrap around to negative values, the "maximum"
> value is -1, because that is the value at which one more increment
> overflows the counter to zero.
>
> libstdc++-v3/ChangeLog:
>
>         PR libstdc++/71945
>         * include/bits/shared_ptr_base.h (_Sp_counted_base::_S_chk):
>         Trap if a reference count cannot be incremented any higher.
>         (_Sp_counted_base::_M_add_ref_copy): Use _S_chk.
>         (_Sp_counted_base::_M_add_weak_ref): Likewise.
>         (_Sp_counted_base<...>::_M_add_ref_lock_nothrow): Likewise.
>         (_Sp_counted_base::_M_use_count): Cast _M_use_count to unsigned
>         before returning as long.
> ---
>
> This improves safety for std::shared_ptr and std::weak_ptr, removing a
> possible source of undefined behaviour (although only in extreme cases
> where the reference count somehow exceeds INT_MAX).
>
> However, in a very simple benchmark I see a significant impact in
> performance, which is not entirely surprising given that this patch adds
> a test to every increment. Maybe we could change [[__unlikely__]] on
> those branches to use __builtin_expect_with_probability instead, with a
> very very very small probability?

I've tried some other benchmarks and I'm not seeing any difference
with the checking for counter overflow.

The first benchmark I tried, where the checks did make it slower, was
just timing the execution time of the test for PR71945 (which I forgot
to add to the patch):

// { dg-do run { target c++11 } }
// { dg-require-effective-target int32plus }

#include <memory>
#include <climits>
#include <testsuite_hooks.h>

bool destroyed = false;

struct Obj
{
  ~Obj() { destroyed = true; }
};

int main()
{
  if constexpr (sizeof(_Atomic_word) >= sizeof(long)
      || __gnu_cxx::__default_lock_policy == __gnu_cxx::_S_single)
    return 0;

  using Ptr = std::shared_ptr<Obj>;

  Ptr p(new Obj);
  alignas(Ptr) unsigned char buf[sizeof(Ptr)];
  long count = p.use_count();
  while (count < (long)(UINT_MAX))
  {
    new (buf) Ptr(p);
    ++count;
    VERIFY(p.use_count() == count);
  }
  p.reset();
  VERIFY( ! destroyed );
}


All this does is create billions of shared_ptr objects, increasing the
ref count up to its maximum value, and assert each time that the
use_count() is correct. The total execution time of this test with the
overflow checks is about 60% slower than without the checks. But if I
remove the VERIFY from the loop (just checking it once after the loop)
then the time with and without the checks seems to be much closer,
between 0-5% slower with checks, maybe. It could just be noise.

So maybe the overhead of the checks in real code isn't going to be significant.

Reply via email to