[Bug middle-end/112824] Stack spills and vector splitting with vector builtins

elrodc at gmail dot com via Gcc-bugs Sun, 03 Dec 2023 05:29:23 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112824


--- Comment #2 from Chris Elrod <elrodc at gmail dot com> ---
https://godbolt.org/z/3648aMTz8

Perhaps a simpler diff is that you can reproduce by uncommenting the pragma,
but codegen becomes good with it.

template<typename T, ptrdiff_t N>
constexpr auto operator*(OuterDualUA2<T,N> a, OuterDualUA2<T,N>
b)->OuterDualUA2<T,N>{  
  //return
{a.value*b.value,a.value*b.p[0]+b.value*a.p[0],a.value*b.p[1]+b.value*a.p[1]}; 
  OuterDualUA2<T,N> c;
  c.value = a.value*b.value;
#pragma GCC unroll 16
  for (ptrdiff_t i = 0; i < 2; ++i)
    c.p[i] = a.value*b.p[i] + b.value*a.p[i];
  //c.p[0] = a.value*b.p[0] + b.value*a.p[0];
  //c.p[1] = a.value*b.p[1] + b.value*a.p[1];
  return c;
}


It's not great to have to add pragmas everywhere to my actual codebase. I
thought I hit the important cases, but my non-minimal example still gets
unnecessary register splits and stack spills, so maybe I missed places, or
perhaps there's another issue.

Given that GCC unrolls the above code even without the pragma, it seems like a
definite bug that the pragma is needed for the resulting code generation to
actually be good.
Not knowing the compiler pipeline, my naive guess is that the pragma causes
earlier unrolling than whatever optimization pass does it sans pragma, and that
some important analysis/optimization gets run between those two times.

[Bug middle-end/112824] Stack spills and vector splitting with vector builtins

Reply via email to