On Sun, Jul 27, 2025 at 2:56 PM Luc Grosheintz <luc.groshei...@gmail.com>
wrote:

> An interesting case to consider is:
>
>   bool same11(const std::extents<int, dyn,   2, 3>& e1,
>               const std::extents<int, dyn, dyn, 3>& e2)
>   { return e1 == e2; }
>
> Which has the following properties:
>
>   - There's no mismatching static extents, preventing any
>     short-circuiting.
>
>   - There's a comparison between dynamic and static extents.
>
>   - There's one trivial comparison: ... && 3 == 3.
>
> Let E[i] denote the array of static extents, D[k] denote the array of
> dynamic extents and k[i] be the index of the i-th extent in D.
> (Naturally, k[i] is only meaningful if i is a dynamic extent).
>
> The previous implementation results in assembly that's more or less a
> literal translation of:
>
>   for (i = 0; i < 3; ++i)
>     e1 = E1[i] == -1 ? D1[k1[i]] : E1[i];
>     e2 = E2[i] == -1 ? D2[k2[i]] : E2[i];
>     if e1 != e2:
>       return false
>   return true;
>
> While the proposed method results in assembly for
>
>   if(D1[0] == D2[0]) return false;
>   return 2 == D2[1];
>
> i.e.
>
>   110:  8b 17                  mov    edx,DWORD PTR [rdi]
>   112:  31 c0                  xor    eax,eax
>   114:  39 16                  cmp    DWORD PTR [rsi],edx
>   116:  74 08                  je     120 <same11+0x10>
>   118:  c3                     ret
>   119:  0f 1f 80 00 00 00 00   nop    DWORD PTR [rax+0x0]
>   120:  83 7e 04 02            cmp    DWORD PTR [rsi+0x4],0x2
>   124:  0f 94 c0               sete   al
>   127:  c3                     ret
>
> It has the following nice properties:
>
>   - It eliminated the indirection D[k[i]], because k[i] is known at
>     compile time. Saving us a comparison E[i] == -1 and conditionally
>     loading k[i].
>
>   - It eliminated the trivial condition 3 == 3.
>
> The result is code that only loads the required values and performs
> exactly the number of comparisons needed by the algorithm. It also
> results in smaller object files. Therefore, this seems like a sensible
> change. We've check several other examples, including fully statically
> determined cases and high-rank examples. The example given above
> illustrates the other cases well.
>
> The constexpr condition:
>
>   if constexpr (!_S_is_compatible_extents<...>)
>     return false;
>
> is no longer needed, because the optimizer correctly handles this case.
> However, it's retained for clarity/certainty.
>
> libstdc++-v3/ChangeLog:
>
>         * include/std/mdspan (extents::operator==): Replace loop with
>         pack expansion.
>
> Signed-off-by: Luc Grosheintz <luc.groshei...@gmail.com>
> ---
>
LGTM.

I was for a bit thinking about replacing cmp_equal(__self.extent(_Counts),
__other.extent(_Counts)),
that would check if either extent is dynamic in if constexpr, but it seems
that your solution makes an optimizer
capable of producing the same effects. That's really cool.

>  libstdc++-v3/include/std/mdspan | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/mdspan
> b/libstdc++-v3/include/std/mdspan
> index f288af96cdb..5b47c95d2c6 100644
> --- a/libstdc++-v3/include/std/mdspan
> +++ b/libstdc++-v3/include/std/mdspan
> @@ -349,10 +349,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>             return false;
>           else
>             {
> -             for (size_t __i = 0; __i < __self.rank(); ++__i)
> -               if (!cmp_equal(__self.extent(__i), __other.extent(__i)))
> -                 return false;
> -             return true;
> +             auto __impl = [&__self, &__other]<size_t... _Counts>(
> +                 index_sequence<_Counts...>)
> +               { return (cmp_equal(__self.extent(_Counts),
> +                                   __other.extent(_Counts)) && ...); };
> +             return __impl(make_index_sequence<__self.rank()>());
>             }
>         }
>
> --
> 2.50.0
>
>

Reply via email to