On Mon, Jul 28, 2025 at 10:13 AM Luc Grosheintz <luc.groshei...@gmail.com> wrote:
> > > On 7/28/25 08:32, Tomasz Kaminski wrote: > > In the __fwd_prod, __rev_prod I would also add the case for all-dynamic > > extents, so we do not instantiate an array with all 1, i.e. we would > have: > > constexpr size_t __rank = _Extents::rank(); > > + size_t __sta_prod = _RevProd<_Extents>::_S_value[__r]; > > + return __extents_prod(__exts, __sta_prod, __r + 1, __rank); > > if constexpr (__rank == 1) > > return 1; > > else if constexpr (__rank == 2) > > return __i == 0 ? _M_extents.extent(1) : 1; > > else if constexpr (Extents::rank_dynamic() == __rank) > > return __extents_prod(__exts, __1u, __r + 1, __rank); > > else > > { > > size_t __sta_prod = _RevProd<_Extents>::_S_value[__r]; > > return __extents_prod(__exts, __sta_prod, __r + 1, __rank); > > } > > True, purely dynamic extents, is another interesting case to optimize, > I suspect there's more potential in that case. For example the > indirection E[i] := D[k[i]] is not needed because k[i] == i. > I am not concerned about optimization that much, just in avoiding creating storage for arrays with all 1. This two suggestion are independent, so I will still add: if constexpr (Extents::rank_dynamic() == __rank) return __extents_prod(__exts, __1u, __r + 1, __rank); else { size_t __sta_prod = _RevProd<_Extents>::_S_value[__r]; return __extents_prod(__exts, __sta_prod, __r + 1, __rank); } // So the __RevProd is no longer instantiated. > > Therefore, since the series already had several commits, I chose > to leave it for later. > > > > > > > On Mon, Jul 28, 2025 at 8:02 AM Tomasz Kaminski <tkami...@redhat.com> > wrote: > > > >> > >> > >> On Sun, Jul 27, 2025 at 2:47 PM Luc Grosheintz < > luc.groshei...@gmail.com> > >> wrote: > >> > >>> The methods layout_{left,right}::mapping::stride are defined > >>> as > >>> > >>> \prod_{i = 0}^r E[i] > >>> \prod_{i = r+1}^n E[i] > >>> > >>> This is computed as the product of a pre-comupted static product and > the > >>> product of the required dynamic extents. > >>> > >>> Disassembly shows that even for low-rank extents, i.e. rank == 1 and > >>> rank == 2, with at least one dynamic extent, the generated code loads > >>> two values; and then runs the loop over at most one element, e.g. > >>> > >>> 220: 48 8b 0c f5 00 00 00 mov rcx,QWORD PTR [rsi*8+0x0] > >>> 227: 00 > >>> 228: 48 8b 04 f5 00 00 00 mov rax,QWORD PTR [rsi*8+0x0] > >>> 22f: 00 > >>> 230: 48 c1 e1 02 shl rcx,0x2 > >>> 234: 74 1a je 250 <stride_left_d5+0x30> > >>> 236: 48 01 f9 add rcx,rdi > >>> 239: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] > >>> 240: 48 63 17 movsxd rdx,DWORD PTR [rdi] > >>> 243: 48 83 c7 04 add rdi,0x4 > >>> 247: 48 0f af c2 imul rax,rdx > >>> 24b: 48 39 f9 cmp rcx,rdi > >>> 24e: 75 f0 jne 240 <stride_left_d5+0x20> > >>> 250: c3 ret > >>> > >>> If there's no dynamic extents, it simply loads the precomputed product > >>> of static extents. > >>> > >>> For rank == 1 the answer is constant `1`; for rank == 2 it's either 1 > or > >>> extents.extent(k), with k == 0 for layout_left and k == 1 for > >>> layout_right. > >>> > >>> Consider, > >>> > >>> using Ed = std::extents<int, dyn>; > >>> int stride_left_d(const std::layout_left::mapping<Ed>& m, size_t r) > >>> { return m.stride(r); } > >>> > >>> using E3d = std::extents<int, 3, dyn>; > >>> int stride_left_3d(const std::layout_left::mapping<E3d>& m, size_t > r) > >>> { return m.stride(r); } > >>> > >>> using Ed5 = std::extents<int, dyn, 5>; > >>> int stride_left_d5(const std::layout_left::mapping<Ed5>& m, size_t > r) > >>> { return m.stride(r); } > >>> > >>> The optimized code for these three cases is: > >>> > >>> 0000000000000060 <stride_left_d>: > >>> 60: b8 01 00 00 00 mov eax,0x1 > >>> 65: c3 ret > >>> > >>> 0000000000000090 <stride_left_3d>: > >>> 90: 48 83 fe 01 cmp rsi,0x1 > >>> 94: 19 c0 sbb eax,eax > >>> 96: 83 e0 fe and eax,0xfffffffe > >>> 99: 83 c0 03 add eax,0x3 > >>> 9c: c3 ret > >>> > >>> 00000000000000a0 <stride_left_d5>: > >>> a0: b8 01 00 00 00 mov eax,0x1 > >>> a5: 48 85 f6 test rsi,rsi > >>> a8: 74 02 je ac <stride_left_d5+0xc> > >>> aa: 8b 07 mov eax,DWORD PTR [rdi] > >>> ac: c3 ret > >>> > >>> For rank == 1 it simply returns 1 (as expected). For rank == 2, it > >>> either implements a branchless formula, or conditionally loads one > >>> value. In all cases involving a dynamic extent this seems like it's > >>> always doing clearly less work, both in terms of computation and loads. > >>> > >>> For rank == 2, it trades loading one value for a branchless sequence of > >>> four instructions that don't require loading any values. > >>> > >> I will put this optimization into the __fwd_prod and __rev_pord > functions, > >> so it will be applied for all uses. This will also avoid us creating > this > >> caching > >> tables for such small ranks. > >> > >>> > >>> libstdc++-v3/ChangeLog: > >>> > >>> * include/std/mdspan (layout_left::mapping::stride): Optimize > >>> for rank <= 2. > >>> (layout_right::mapping::stride): Ditto. > >>> > >>> Signed-off-by: Luc Grosheintz <luc.groshei...@gmail.com> > >>> --- > >>> libstdc++-v3/include/std/mdspan | 14 ++++++++++++-- > >>> 1 file changed, 12 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/libstdc++-v3/include/std/mdspan > >>> b/libstdc++-v3/include/std/mdspan > >>> index 06ccf3e3827..f288af96cdb 100644 > >>> --- a/libstdc++-v3/include/std/mdspan > >>> +++ b/libstdc++-v3/include/std/mdspan > >>> @@ -652,7 +652,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >>> requires (extents_type::rank() > 0) > >>> { > >>> __glibcxx_assert(__i < extents_type::rank()); > >>> - return __mdspan::__fwd_prod(_M_extents, __i); > >>> + if constexpr (extents_type::rank() == 1) > >>> + return 1; > >>> + else if constexpr (extents_type::rank() == 2) > >>> + return __i == 0 ? 1 : _M_extents.extent(0); > >>> + else > >>> + return __mdspan::__fwd_prod(_M_extents, __i); > >>> } > >>> > >>> template<typename _OExtents> > >>> @@ -797,7 +802,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > >>> requires (extents_type::rank() > 0) > >>> { > >>> __glibcxx_assert(__i < extents_type::rank()); > >>> - return __mdspan::__rev_prod(_M_extents, __i); > >>> + if constexpr (extents_type::rank() == 1) > >>> + return 1; > >>> + else if constexpr (extents_type::rank() == 2) > >>> + return __i == 0 ? _M_extents.extent(1) : 1; > >>> + else > >>> + return __mdspan::__rev_prod(_M_extents, __i); > >>> } > >>> > >>> template<typename _OExtents> > >>> -- > >>> 2.50.0 > >>> > >>> > > > >