Re: [PATCH] libstdc++: Relax memory-ordering in shared-ptr/weak-ptr inc-ref

Jay Feldblum Tue, 27 Jan 2026 10:44:47 -0800

On Tue, Jan 27, 2026 at 7:55 AM Jonathan Wakely <[email protected]> wrote:
>
> On Tue, 27 Jan 2026 at 07:29, Jay Feldblum <[email protected]> wrote:
> >
> > Only some operations on std::shared_ptr and std::weak_ptr must
> > synchronize with each other, namely, destructions and calls to member
> > reset(). In the atomic implementation, synchronization is accomplished
> > with release/acquire memory-ordering operations on refcount decrements.
> > Copies, involving refcount increments, do not synchronize.
> >
> > On TSO architectures like x86-64, there is little observable difference.
> > But on architectures with weaker memory models like aarch64, the
> > compiler may emit faster instruction sequences for atomic increments
> > without memory ordering than with.
>
> Thanks for working on this. I think it's a useful optimization,
> although it's too late to make the cut for GCC 16 now.


C'est la vie.

>
> >
> >         PR libstdc++/111589
> >
> > libstdc++-v3/ChangeLog:
> >
> >         PR libstdc++/111589
> >         * include/bits/shared_ptr_base.h 
> > (_Sp_counted_base::_M_add_ref_copy):
> >         Use __exchange_and_add_relaxed_dispatch instead of
> >         __exchange_and_add_dispatch.
> >         (_Sp_counted_base::_M_weak_add_ref): Likewise.
> >         (_Sp_counted_base::_M_add_ref_lock_nothrow): Use __ATOMIC_RELAXED
> >         instead of __ATOMIC_ACQ_REL for the success ordering.
> >         * include/ext/atomicity.h (__exchange_and_add_relaxed): New 
> > function.
> >         (__atomic_add_relaxed): New function.
> >         (__exchange_and_add_relaxed_dispatch): New function.
> >         (__atomic_add_relaxed_dispatch): New function.
> > ---
> >  libstdc++-v3/include/bits/shared_ptr_base.h | 11 ++++--
> >  libstdc++-v3/include/ext/atomicity.h        | 43 +++++++++++++++++++++
> >  2 files changed, 51 insertions(+), 3 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h 
> > b/libstdc++-v3/include/bits/shared_ptr_base.h
> > index 88e0f4d58c6..1aa0d4faa85 100644
> > --- a/libstdc++-v3/include/bits/shared_ptr_base.h
> > +++ b/libstdc++-v3/include/bits/shared_ptr_base.h
> > @@ -148,7 +148,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >        // Increment the use count (used when the count is greater than 
> > zero).
> >        void
> >        _M_add_ref_copy()
> > -      { _S_chk(__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, 1)); }
> > +      {
> > +       _S_chk(__gnu_cxx::__exchange_and_add_relaxed_dispatch(&_M_use_count,
> > +                                                             1));
> > +      }
> >
> >        // Increment the use count if it is non-zero, throw otherwise.
> >        void
> > @@ -204,7 +207,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >         // _M_weak_count can always use negative values because it cannot be
> >         // observed by users (unlike _M_use_count). See _S_chk for details.
> >         constexpr _Atomic_word __max = -1;
> > -       if (__gnu_cxx::__exchange_and_add_dispatch(&_M_weak_count, 1) == 
> > __max)
> > +       auto __r
> > +         = __gnu_cxx::__exchange_and_add_relaxed_dispatch(&_M_weak_count, 
> > 1);
> > +       if (__r == __max)
> >           [[__unlikely__]] __builtin_trap();
> >        }
> >
> > @@ -358,7 +363,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >           // long as it's not changed meanwhile.
> >         }
> >        while (!__atomic_compare_exchange_n(&_M_use_count, &__count, __count 
> > + 1,
> > -                                         true, __ATOMIC_ACQ_REL,
> > +                                         true, __ATOMIC_RELAXED,
>
> Can this really be relaxed?
>
> I guess it already has no ordering guarantees for the case where we
> return false, and when we successfully increment a non-zero count this
> is equivalent to _M_ref_add_copy() so relaxed must be fine. OK, I'm
> happy.
>

That's how I see it too.

>
> >                                           __ATOMIC_RELAXED));
> >        _S_chk(__count);
> >        return true;
> > diff --git a/libstdc++-v3/include/ext/atomicity.h 
> > b/libstdc++-v3/include/ext/atomicity.h
> > index ecbe376a687..6ecda307990 100644
> > --- a/libstdc++-v3/include/ext/atomicity.h
> > +++ b/libstdc++-v3/include/ext/atomicity.h
> > @@ -74,12 +74,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >    __attribute__((__always_inline__))
> >    __atomic_add(volatile _Atomic_word* __mem, int __val)
> >    { __atomic_fetch_add(__mem, __val, __ATOMIC_ACQ_REL); }
> > +
> > +  inline _Atomic_word
> > +  __attribute__((__always_inline__))
> > +  __exchange_and_add_relaxed(volatile _Atomic_word* __mem, int __val)
> > +  { return __atomic_fetch_add(__mem, __val, __ATOMIC_RELAXED); }
> > +
> > +  inline void
> > +  __attribute__((__always_inline__))
> > +  __atomic_add_relaxed(volatile _Atomic_word* __mem, int __val)
> > +  { __atomic_fetch_add(__mem, __val, __ATOMIC_RELAXED); }
>
> Do we need __atomic_add_relaxed? You don't seem to use it above.
>
> >  #else // Defined in config/cpu/.../atomicity.h
> >    _Atomic_word
> >    __exchange_and_add(volatile _Atomic_word*, int) _GLIBCXX_NOTHROW;
> >
> >    void
> >    __atomic_add(volatile _Atomic_word*, int) _GLIBCXX_NOTHROW;
> > +
> > +  // Fallback to acq_rel when builtins are not available.
> > +  inline _Atomic_word
> > +  __attribute__((__always_inline__))
> > +  __exchange_and_add_relaxed(volatile _Atomic_word* __mem, int __val)
> > +  { return __exchange_and_add(__mem, __val); }
> > +
> > +  inline void
> > +  __attribute__((__always_inline__))
> > +  __atomic_add_relaxed(volatile _Atomic_word* __mem, int __val)
> > +  { __atomic_add(__mem, __val); }
> >  #endif
> >
> >  #if __cplusplus < 201103L
> > @@ -154,6 +175,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >        __atomic_add(__mem, __val);
> >    }
> >
> > +  // Relaxed variants for reference count increments where acquire/release
> > +  // semantics are not required. Decrements still need stronger ordering.
> > +  inline _Atomic_word
> > +  __attribute__ ((__always_inline__))
> > +  __exchange_and_add_relaxed_dispatch(_Atomic_word* __mem, int __val)
> > +  {
> > +    if (__is_single_threaded())
> > +      return __exchange_and_add_single(__mem, __val);
> > +    else
> > +      return __exchange_and_add_relaxed(__mem, __val);
> > +  }
>
> I wonder if we really need the new dispatch function, or should just
> use __exchange_and_add_relaxed directly in the shared_ptr code.
>
> The point of the dispatch functions is to use unsynchronized
> operations when we know the program is single-threaded, to avoid the
> additional overhead of atomics with acq_rel ordering. But for some
> targets, a relaxed atomic is almost as cheap as the non-atomic op
> anyway. At what point does checking the __is_single_threaded()
> function outweigh the cost of a relaxed RMW? It certainly increases
> code size.
>
> It depends on the processor, and for some arches I expect it's worth
> keeping the dispatch. But maybe not on all arches.

Interesting point. I took a look at __is_single_threaded and
__gthread_active_p. It looks like the cost of dispatch is low across all
implementations - a hot path with just an unordered load, compare, and
branch, a larger cold path, and the single-threaded code. So there is
some performance overhead, but not much, and some unnecessary code size.
But whether it is better to remove the dispatch unconditionally, or how
to condition removal of the dispatch, seems like a larger question than
I can answer in this patch.

Nevertheless, this patch strictly removes cpu cost or leaves cpu cost
unaffected, and leaves code size unaffected (or provides room for
hypothetical compiler optimizations that increase code size but that
would be forbidden in the presence of memory-ordering calls), across
all processors.

So this patch is an improvement over the current state, even if there is
room for further improvement later on by adjusting the dispatch.

>
>
> > +
> > +  inline void
> > +  __attribute__ ((__always_inline__))
> > +  __atomic_add_relaxed_dispatch(_Atomic_word* __mem, int __val)
> > +  {
> > +    if (__is_single_threaded())
> > +      __atomic_add_single(__mem, __val);
> > +    else
> > +      __atomic_add_relaxed(__mem, __val);
> > +  }
> > +
> >  _GLIBCXX_END_NAMESPACE_VERSION
> >  } // namespace
> >
> > --
> > 2.47.3
> >
>

Re: [PATCH] libstdc++: Relax memory-ordering in shared-ptr/weak-ptr inc-ref

Reply via email to