On Mon, Jul 28, 2025 at 10:13 AM Luc Grosheintz <luc.groshei...@gmail.com>
wrote:

>
>
> On 7/28/25 08:32, Tomasz Kaminski wrote:
> > In the __fwd_prod, __rev_prod I would also add the case for all-dynamic
> > extents, so we do not instantiate an array with all 1, i.e. we would
> have:
> >   constexpr size_t __rank = _Extents::rank();
> > +       size_t __sta_prod = _RevProd<_Extents>::_S_value[__r];
> > +       return __extents_prod(__exts, __sta_prod, __r + 1, __rank);
> > if constexpr (__rank == 1)
> >     return 1;
> > else if constexpr (__rank == 2)
> >    return __i == 0 ? _M_extents.extent(1) : 1;
> > else if constexpr (Extents::rank_dynamic() == __rank)
> >    return __extents_prod(__exts, __1u, __r + 1, __rank);
> > else
> >    {
> >       size_t __sta_prod = _RevProd<_Extents>::_S_value[__r];
> >       return __extents_prod(__exts, __sta_prod, __r + 1, __rank);
> >    }
>
> True, purely dynamic extents, is another interesting case to optimize,
> I suspect there's more potential in that case. For example the
> indirection E[i] := D[k[i]] is not needed because k[i] == i.
>
I am not concerned about  optimization that much, just in avoiding creating
storage for arrays with all 1. This two suggestion are independent, so I
will
still add:
if constexpr (Extents::rank_dynamic() == __rank)
   return __extents_prod(__exts, __1u, __r + 1, __rank);
else
  {
     size_t __sta_prod = _RevProd<_Extents>::_S_value[__r];
     return __extents_prod(__exts, __sta_prod, __r + 1, __rank);
   }
// So the __RevProd is no longer instantiated.

>
> Therefore, since the series already had several commits, I chose
> to leave it for later.
>
> >
> >
> > On Mon, Jul 28, 2025 at 8:02 AM Tomasz Kaminski <tkami...@redhat.com>
> wrote:
> >
> >>
> >>
> >> On Sun, Jul 27, 2025 at 2:47 PM Luc Grosheintz <
> luc.groshei...@gmail.com>
> >> wrote:
> >>
> >>> The methods layout_{left,right}::mapping::stride are defined
> >>> as
> >>>
> >>>    \prod_{i = 0}^r E[i]
> >>>    \prod_{i = r+1}^n E[i]
> >>>
> >>> This is computed as the product of a pre-comupted static product and
> the
> >>> product of the required dynamic extents.
> >>>
> >>> Disassembly shows that even for low-rank extents, i.e. rank == 1 and
> >>> rank == 2, with at least one  dynamic extent, the generated code loads
> >>> two values; and then runs the loop over at most one element, e.g.
> >>>
> >>>   220:  48 8b 0c f5 00 00 00   mov    rcx,QWORD PTR [rsi*8+0x0]
> >>>   227:  00
> >>>   228:  48 8b 04 f5 00 00 00   mov    rax,QWORD PTR [rsi*8+0x0]
> >>>   22f:  00
> >>>   230:  48 c1 e1 02            shl    rcx,0x2
> >>>   234:  74 1a                  je     250 <stride_left_d5+0x30>
> >>>   236:  48 01 f9               add    rcx,rdi
> >>>   239:  0f 1f 80 00 00 00 00   nop    DWORD PTR [rax+0x0]
> >>>   240:  48 63 17               movsxd rdx,DWORD PTR [rdi]
> >>>   243:  48 83 c7 04            add    rdi,0x4
> >>>   247:  48 0f af c2            imul   rax,rdx
> >>>   24b:  48 39 f9               cmp    rcx,rdi
> >>>   24e:  75 f0                  jne    240 <stride_left_d5+0x20>
> >>>   250:  c3                     ret
> >>>
> >>> If there's no dynamic extents, it simply loads the precomputed product
> >>> of static extents.
> >>>
> >>> For rank == 1 the answer is constant `1`; for rank == 2 it's either 1
> or
> >>> extents.extent(k), with k == 0 for layout_left and k == 1 for
> >>> layout_right.
> >>>
> >>> Consider,
> >>>
> >>>    using Ed = std::extents<int, dyn>;
> >>>    int stride_left_d(const std::layout_left::mapping<Ed>& m, size_t r)
> >>>    { return m.stride(r); }
> >>>
> >>>    using E3d = std::extents<int, 3, dyn>;
> >>>    int stride_left_3d(const std::layout_left::mapping<E3d>& m, size_t
> r)
> >>>    { return m.stride(r); }
> >>>
> >>>    using Ed5 = std::extents<int, dyn, 5>;
> >>>    int stride_left_d5(const std::layout_left::mapping<Ed5>& m, size_t
> r)
> >>>    { return m.stride(r); }
> >>>
> >>> The optimized code for these three cases is:
> >>>
> >>>    0000000000000060 <stride_left_d>:
> >>>    60:  b8 01 00 00 00         mov    eax,0x1
> >>>    65:  c3                     ret
> >>>
> >>>    0000000000000090 <stride_left_3d>:
> >>>    90:  48 83 fe 01            cmp    rsi,0x1
> >>>    94:  19 c0                  sbb    eax,eax
> >>>    96:  83 e0 fe               and    eax,0xfffffffe
> >>>    99:  83 c0 03               add    eax,0x3
> >>>    9c:  c3                     ret
> >>>
> >>>    00000000000000a0 <stride_left_d5>:
> >>>    a0:  b8 01 00 00 00         mov    eax,0x1
> >>>    a5:  48 85 f6               test   rsi,rsi
> >>>    a8:  74 02                  je     ac <stride_left_d5+0xc>
> >>>    aa:  8b 07                  mov    eax,DWORD PTR [rdi]
> >>>    ac:  c3                     ret
> >>>
> >>> For rank == 1 it simply returns 1 (as expected). For rank == 2, it
> >>> either implements a branchless formula, or conditionally loads one
> >>> value. In all cases involving a dynamic extent this seems like it's
> >>> always doing clearly less work, both in terms of computation and loads.
> >>>
> >>> For rank == 2, it trades loading one value for a branchless sequence of
> >>> four instructions that don't require loading any values.
> >>>
> >> I will put this optimization into the __fwd_prod and __rev_pord
> functions,
> >> so it will be applied for all uses. This will also avoid us creating
> this
> >> caching
> >> tables for such small ranks.
> >>
> >>>
> >>> libstdc++-v3/ChangeLog:
> >>>
> >>>          * include/std/mdspan (layout_left::mapping::stride): Optimize
> >>>          for rank <= 2.
> >>>          (layout_right::mapping::stride): Ditto.
> >>>
> >>> Signed-off-by: Luc Grosheintz <luc.groshei...@gmail.com>
> >>> ---
> >>>   libstdc++-v3/include/std/mdspan | 14 ++++++++++++--
> >>>   1 file changed, 12 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/libstdc++-v3/include/std/mdspan
> >>> b/libstdc++-v3/include/std/mdspan
> >>> index 06ccf3e3827..f288af96cdb 100644
> >>> --- a/libstdc++-v3/include/std/mdspan
> >>> +++ b/libstdc++-v3/include/std/mdspan
> >>> @@ -652,7 +652,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>>         requires (extents_type::rank() > 0)
> >>>         {
> >>>          __glibcxx_assert(__i < extents_type::rank());
> >>> -       return __mdspan::__fwd_prod(_M_extents, __i);
> >>> +       if constexpr (extents_type::rank() == 1)
> >>> +         return 1;
> >>> +       else if constexpr (extents_type::rank() == 2)
> >>> +         return __i == 0 ? 1 : _M_extents.extent(0);
> >>> +       else
> >>> +         return __mdspan::__fwd_prod(_M_extents, __i);
> >>>         }
> >>>
> >>>         template<typename _OExtents>
> >>> @@ -797,7 +802,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >>>         requires (extents_type::rank() > 0)
> >>>         {
> >>>          __glibcxx_assert(__i < extents_type::rank());
> >>> -       return __mdspan::__rev_prod(_M_extents, __i);
> >>> +       if constexpr (extents_type::rank() == 1)
> >>> +         return 1;
> >>> +       else if constexpr (extents_type::rank() == 2)
> >>> +         return __i == 0 ? _M_extents.extent(1) : 1;
> >>> +       else
> >>> +         return __mdspan::__rev_prod(_M_extents, __i);
> >>>         }
> >>>
> >>>         template<typename _OExtents>
> >>> --
> >>> 2.50.0
> >>>
> >>>
> >
>
>

Reply via email to