Re: [PATCH v2] libstdc++: Improve std::lock algorithm

Jonathan Wakely via Gcc-patches Tue, 22 Jun 2021 08:21:06 -0700

On Tue, 22 Jun 2021 at 14:21, Matthias Kretz wrote:

> On Tuesday, 22 June 2021 14:51:26 CEST Jonathan Wakely wrote:
> > With your suggestion
> > to also drop std::tuple the number of parameters decides which
> > function we call. And we don't instantiate std::tuple. And we can also
> > get rid of the __try_to_lock function, which was only used to deduce
> > the lock type rather than use tuple_element to get it. That's much
> > nicer.
>
> 👍
>
> > > How about optimizing a likely common case where all lockables have the
> > > same
> > > type? In that case we don't require recursion and can manage stack
> usage
> > > much
> > > simpler:
> > The stack usage is bounded by the number of mutexes being locked,
> > which is unlikely to get large, but we can do that.
>
> Right. I meant simpler, because it takes a while of staring at the
> recursive
> implementation to understand how it works. :)
>
> > We can do it for try_lock too:
> >
> >   template<typename _L1, typename _L2, typename... _L3>
> >     int
> >     try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
> >     {
> > #if __cplusplus >= 201703L
> >       if constexpr (is_same_v<_L1, _L2>
> >                     && (is_same_v<_L1, _L3> && ...))
> >         {
> >           constexpr int _Np = 2 + sizeof...(_L3);
> >           unique_lock<_L1> __locks[_Np] = {
> >               {__l1, try_to_lock}, {__l2, try_to_lock}, {__l3,
> > try_to_lock}... };
>
> This does a try_lock on all lockabes even if any of them fails. I think
> that's
> not only more expensive but also non-conforming. I think you need to defer
> locking and then loop from beginning to end to break the loop on the first
> unsuccessful try_lock.
>


Oops, good point. I'll add a test for that too. Here's the fixed code:

    template<typename _L0, typename... _Lockables>
      inline int
      __try_lock_impl(_L0& __l0, _Lockables&... __lockables)
      {
#if __cplusplus >= 201703L
        if constexpr ((is_same_v<_L0, _Lockables> && ...))
          {
            constexpr int _Np = 1 + sizeof...(_Lockables);
            unique_lock<_L0> __locks[_Np] = {
                {__l0, defer_lock}, {__lockables, defer_lock}...
            };
            for (int __i = 0; __i < _Np; ++__i)
              if (!__locks[__i].try_lock())
                {
                  const int __failed = __i;
                  while (__i--)
                    __locks[__i].unlock();
                  return __i;
                }
            for (auto& __l : __locks)
              __l.release();
            return -1;
          }
        else
#endif




> >           for (int __i = 0; __i < _Np; ++__i)
> >             if (!__locks[__i])
> >               return __i;
> >           for (auto& __l : __locks)
> >             __l.release();
> >           return -1;
> >         }
> >       else
> > #endif
> >       return __detail::__try_lock_impl(__l1, __l2, __l3...);
> >     }
> >
> > > if constexpr ((is_same_v<_L0, _L1> && ...))
> > > {
> > > constexpr int _Np = 1 + sizeof...(_L1);
> > > std::array<unique_lock<_L0>, _Np> __locks = {
> > > {__l0, defer_lock}, {__l1, defer_lock}...
> > > };
> > > int __first = 0;
> > > do {
> > > __locks[__first].lock();
> > > for (int __j = 1; __j < _Np; ++__j)
> > > {
> > > const int __idx = (__first + __j) % _Np;
> > > if (!__locks[__idx].try_lock())
> > > {
> > > for (int __k = __idx; __k != __first;
> > > __k = __k == 1 ? _Np : __k - 1)
> > > __locks[__k - 1].unlock();
> >
> > This loop doesn't work if any try_lock fails when first==0, because
> > the loop termination condition is never reached.
>
> Uh yes. Which is the same reason why the __j loop doesn't start from
> __first +
> 1.
>
> > I find this a bit easier to understand than the loop above, and
> > correct (I think):
> >
> >   for (int __k = __j; __k != 0; --__k)
> >     __locks[(__first + __k - 1) % _Np].unlock();
>
> Yes, if only we had a wrapping integer type that wraps at an arbitrary N.
> Like
> unsigned int but with parameter, like:
>
>   for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
>     __locks[__k - 1].unlock();
>
> This is the loop I wanted to write, except --__k is simpler to write and
> __k -
> 1 would also wrap around to _Np - 1 for __k == 0. But if this is the only
> place it's not important enough to abstract.
>

We might be able to use __wrapping_uint in std::seed_seq::generate too, and
maybe some other places in <random>. But we can add that later if we decide
it's worth it.


> > [...]
> > @@ -620,15 +632,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >     *  @post All arguments are locked.
> >     *
> >     *  All arguments are locked via a sequence of calls to lock(),
> > try_lock() -   *  and unlock().  If the call exits via an exception any
> > locks that were -   *  obtained will be released.
> > +   *  and unlock().  If this function exits via an exception any locks
> that
> > +   *  were obtained will be released.
> >     */
> >    template<typename _L1, typename _L2, typename... _L3>
> >      void
> >      lock(_L1& __l1, _L2& __l2, _L3&... __l3)
> >      {
> > -      int __i = 0;
> > -      __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
> > +#if __cplusplus >= 201703L
>
> I also considered moving it down here. Makes sense unless you want to call
> __detail::__lock_impl from other functions. And if we want to make it work
> for
> pre-C++11 we could do
>
>   using __homogeneous
>     = __and_<is_same<_L1, _L2>, is_same<_L1, _L3>...>;
>   int __i = 0;
>   __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);
>
>
We don't need tag dispatching, we could just do:

if _GLIBCXX17_CONSTEXPR (homogeneous::value)
 ...
else
 ...

because both branches are valid for the homogeneous case, i.e. we aren't
using if-constexpr to avoid invalid instantiations. So for pre-C++17 it
would still compile both branches (and instantiate the recursive function
templates) but would use the iterative code at run time.

But given that the default -std option is gnu++17 now, I'm OK with the
iterative version only being used for C++17.

Re: [PATCH v2] libstdc++: Improve std::lock algorithm

Reply via email to