On Sun, Jul 27, 2025 at 2:56 PM Luc Grosheintz <luc.groshei...@gmail.com> wrote:
> An interesting case to consider is: > > bool same11(const std::extents<int, dyn, 2, 3>& e1, > const std::extents<int, dyn, dyn, 3>& e2) > { return e1 == e2; } > > Which has the following properties: > > - There's no mismatching static extents, preventing any > short-circuiting. > > - There's a comparison between dynamic and static extents. > > - There's one trivial comparison: ... && 3 == 3. > > Let E[i] denote the array of static extents, D[k] denote the array of > dynamic extents and k[i] be the index of the i-th extent in D. > (Naturally, k[i] is only meaningful if i is a dynamic extent). > > The previous implementation results in assembly that's more or less a > literal translation of: > > for (i = 0; i < 3; ++i) > e1 = E1[i] == -1 ? D1[k1[i]] : E1[i]; > e2 = E2[i] == -1 ? D2[k2[i]] : E2[i]; > if e1 != e2: > return false > return true; > > While the proposed method results in assembly for > > if(D1[0] == D2[0]) return false; > return 2 == D2[1]; > > i.e. > > 110: 8b 17 mov edx,DWORD PTR [rdi] > 112: 31 c0 xor eax,eax > 114: 39 16 cmp DWORD PTR [rsi],edx > 116: 74 08 je 120 <same11+0x10> > 118: c3 ret > 119: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] > 120: 83 7e 04 02 cmp DWORD PTR [rsi+0x4],0x2 > 124: 0f 94 c0 sete al > 127: c3 ret > > It has the following nice properties: > > - It eliminated the indirection D[k[i]], because k[i] is known at > compile time. Saving us a comparison E[i] == -1 and conditionally > loading k[i]. > > - It eliminated the trivial condition 3 == 3. > > The result is code that only loads the required values and performs > exactly the number of comparisons needed by the algorithm. It also > results in smaller object files. Therefore, this seems like a sensible > change. We've check several other examples, including fully statically > determined cases and high-rank examples. The example given above > illustrates the other cases well. > > The constexpr condition: > > if constexpr (!_S_is_compatible_extents<...>) > return false; > > is no longer needed, because the optimizer correctly handles this case. > However, it's retained for clarity/certainty. > > libstdc++-v3/ChangeLog: > > * include/std/mdspan (extents::operator==): Replace loop with > pack expansion. > > Signed-off-by: Luc Grosheintz <luc.groshei...@gmail.com> > --- > LGTM. I was for a bit thinking about replacing cmp_equal(__self.extent(_Counts), __other.extent(_Counts)), that would check if either extent is dynamic in if constexpr, but it seems that your solution makes an optimizer capable of producing the same effects. That's really cool. > libstdc++-v3/include/std/mdspan | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/libstdc++-v3/include/std/mdspan > b/libstdc++-v3/include/std/mdspan > index f288af96cdb..5b47c95d2c6 100644 > --- a/libstdc++-v3/include/std/mdspan > +++ b/libstdc++-v3/include/std/mdspan > @@ -349,10 +349,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > return false; > else > { > - for (size_t __i = 0; __i < __self.rank(); ++__i) > - if (!cmp_equal(__self.extent(__i), __other.extent(__i))) > - return false; > - return true; > + auto __impl = [&__self, &__other]<size_t... _Counts>( > + index_sequence<_Counts...>) > + { return (cmp_equal(__self.extent(_Counts), > + __other.extent(_Counts)) && ...); }; > + return __impl(make_index_sequence<__self.rank()>()); > } > } > > -- > 2.50.0 > >