Re: [PATCH v2] libstdc++: Improve std::lock algorithm

Matthias Kretz Tue, 22 Jun 2021 06:26:13 -0700

On Tuesday, 22 June 2021 14:51:26 CEST Jonathan Wakely wrote:
> With your suggestion
> to also drop std::tuple the number of parameters decides which
> function we call. And we don't instantiate std::tuple. And we can also
> get rid of the __try_to_lock function, which was only used to deduce
> the lock type rather than use tuple_element to get it. That's much
> nicer.


👍

> > How about optimizing a likely common case where all lockables have the
> > same
> > type? In that case we don't require recursion and can manage stack usage
> > much
> > simpler:
> The stack usage is bounded by the number of mutexes being locked,
> which is unlikely to get large, but we can do that.

Right. I meant simpler, because it takes a while of staring at the recursive 
implementation to understand how it works. :)

> We can do it for try_lock too:
> 
>   template<typename _L1, typename _L2, typename... _L3>
>     int
>     try_lock(_L1& __l1, _L2& __l2, _L3&... __l3)
>     {
> #if __cplusplus >= 201703L
>       if constexpr (is_same_v<_L1, _L2>
>                     && (is_same_v<_L1, _L3> && ...))
>         {
>           constexpr int _Np = 2 + sizeof...(_L3);
>           unique_lock<_L1> __locks[_Np] = {
>               {__l1, try_to_lock}, {__l2, try_to_lock}, {__l3,
> try_to_lock}... };

This does a try_lock on all lockabes even if any of them fails. I think that's 
not only more expensive but also non-conforming. I think you need to defer 
locking and then loop from beginning to end to break the loop on the first 
unsuccessful try_lock.

>           for (int __i = 0; __i < _Np; ++__i)
>             if (!__locks[__i])
>               return __i;
>           for (auto& __l : __locks)
>             __l.release();
>           return -1;
>         }
>       else
> #endif
>       return __detail::__try_lock_impl(__l1, __l2, __l3...);
>     }
> 
> > if constexpr ((is_same_v<_L0, _L1> && ...))
> > {
> > constexpr int _Np = 1 + sizeof...(_L1);
> > std::array<unique_lock<_L0>, _Np> __locks = {
> > {__l0, defer_lock}, {__l1, defer_lock}...
> > };
> > int __first = 0;
> > do {
> > __locks[__first].lock();
> > for (int __j = 1; __j < _Np; ++__j)
> > {
> > const int __idx = (__first + __j) % _Np;
> > if (!__locks[__idx].try_lock())
> > {
> > for (int __k = __idx; __k != __first;
> > __k = __k == 1 ? _Np : __k - 1)
> > __locks[__k - 1].unlock();
> 
> This loop doesn't work if any try_lock fails when first==0, because
> the loop termination condition is never reached.

Uh yes. Which is the same reason why the __j loop doesn't start from __first + 
1.

> I find this a bit easier to understand than the loop above, and
> correct (I think):
> 
>   for (int __k = __j; __k != 0; --__k)
>     __locks[(__first + __k - 1) % _Np].unlock();

Yes, if only we had a wrapping integer type that wraps at an arbitrary N. Like 
unsigned int but with parameter, like:

  for (__wrapping_uint<_Np> __k = __idx; __k != __first; --__k)
    __locks[__k - 1].unlock();

This is the loop I wanted to write, except --__k is simpler to write and __k - 
1 would also wrap around to _Np - 1 for __k == 0. But if this is the only 
place it's not important enough to abstract.

> [...]
> @@ -620,15 +632,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>     *  @post All arguments are locked.
>     *
>     *  All arguments are locked via a sequence of calls to lock(),
> try_lock() -   *  and unlock().  If the call exits via an exception any
> locks that were -   *  obtained will be released.
> +   *  and unlock().  If this function exits via an exception any locks that
> +   *  were obtained will be released.
>     */
>    template<typename _L1, typename _L2, typename... _L3>
>      void
>      lock(_L1& __l1, _L2& __l2, _L3&... __l3)
>      {
> -      int __i = 0;
> -      __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
> +#if __cplusplus >= 201703L

I also considered moving it down here. Makes sense unless you want to call 
__detail::__lock_impl from other functions. And if we want to make it work for 
pre-C++11 we could do

  using __homogeneous
    = __and_<is_same<_L1, _L2>, is_same<_L1, _L3>...>;
  int __i = 0;
  __detail::__lock_impl(__homogeneous(), __i, 0, __l1, __l2, __l3...);


-Matthias

> +      if constexpr (is_same_v<_L1, _L2> && (is_same_v<_L1, _L3> && ...))
> +       {
> +         constexpr int _Np = 2 + sizeof...(_L3);
> +         unique_lock<_L1> __locks[] = {
> +             {__l1, defer_lock}, {__l2, defer_lock}, {__l3, defer_lock}...
> +         };
> +         int __first = 0;
> +         do {
> +           __locks[__first].lock();
> +           for (int __j = 1; __j < _Np; ++__j)
> +             {
> +               const int __idx = (__first + __j) % _Np;
> +               if (!__locks[__idx].try_lock())
> +                 {
> +                   for (int __k = __j; __k != 0; --__k)
> +                     __locks[(__first + __k - 1) % _Np].unlock();
> +                   __first = __idx;
> +                   break;
> +                 }
> +             }
> +         } while (!__locks[__first]);
> +
> +         for (auto& __l : __locks)
> +           __l.release();
> +       }
> +      else
> +#endif
> +       {
> +         int __i = 0;
> +         __detail::__lock_impl(__i, 0, __l1, __l2, __l3...);
> +       }
>      }
> 
>  #if __cplusplus >= 201703L


-- 
──────────────────────────────────────────────────────────────────────────
 Dr. Matthias Kretz                           https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research               https://gsi.de
 std::experimental::simd              https://github.com/VcDevel/std-simd
──────────────────────────────────────────────────────────────────────────

Re: [PATCH v2] libstdc++: Improve std::lock algorithm

Reply via email to