https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119209
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |NEW
CC| |rguenth at gcc dot gnu.org
Last reconfirmed| |2025-03-11
Ever confirmed|0 |1
--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
The issue is that the lane-combining pattern recognitions are restricted to
loop reductions because lane-order isn't preserved (or even well-defined). The
decision to recognize the SLP as BB reduction comes after this.
The fix is probably to apply the reduction restriction only during SLP
build and vectorizable_* checking.
Nailing down which lanes are combined for V16QI->V4SI for the optab would
also allow to use dot_prod in non-reduction cases (when the V4SI intermediate
result isn't reduced to a single lane in the end). There's a related PR about
this, but IIRC for the SAD patterns.