https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120687
--- Comment #7 from Robin Dapp <rdapp at gcc dot gnu.org> --- (In reply to Richard Biener from comment #6) > (In reply to Richard Biener from comment #5) > > The issue is that we somehow fail to SLP discover the reduction chain. I > > will have a look to check why. > > Reassoc improvements disturb the reduction chain layout. I have a patch to > fix this, but this of course only handles the case of -fwrapv or unsigned > integer > arithmetic because of integer overflow UB. The vectorizer does not have a > way > to turn this into an unsigned reduction chain. This should be "reasonably" > easy to implement starting from the reduction detection and using the > existing chain linearization and then forcing that into an unsigned > reduction chain. It's going to be a bit hacky. > > I'm going to fix the reassoc issue for now, which I think is a regression > even. Is this another case where diff --git a/gcc/tree-ssa-reassoc.cc b/gcc/tree-ssa-reassoc.cc index 3c38f3d7a19..ea177495b1d 100644 --- a/gcc/tree-ssa-reassoc.cc +++ b/gcc/tree-ssa-reassoc.cc @@ -7201,7 +7201,7 @@ reassociate_bb (basic_block bb) binary op are chosen wisely. */ int len = ops.length (); if (len >= 3 - && (!has_fma + && ((!has_fma && !reassoc_insert_powi_p) /* width > 1 means ranking ops results in better parallelism. Check current value to avoid calling get_reassociation_width again. */ helps or something else this time?