https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110979
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- The wrong-code part is fixed now, what remains is the inefficiency. I don't think we currently cost the "excess" lanes in regular vectorized operations but of course for open-coded fold-left reductions we should likely account for possibly VF - 1 extra scalar ops (but in the "epilog" even if that doesn't exist, since that only applies to the last vector iteration). I fear it's not going to be enough to fend off vectorization though.