https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579
--- Comment #8 from rguenther at suse dot de <rguenther at suse dot de> --- On Wed, 31 Jul 2019, crazylht at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579 > > --- Comment #7 from Hongtao.liu <crazylht at gmail dot com> --- > Transform second loop as > > diff --git a/loop.c b/loop.c > index feea9ea..81a3ea6 100644 > --- a/loop.c > +++ b/loop.c > @@ -9,6 +9,6 @@ loop (int k, double x) > for (i=0;i<6;i++) > r[i] = x * a[i + k]; > for (i=0;i<6;i++) > - t+=r[5-i]; > + t+=r[i]; -------- using ascending order, align with former loop. > return t; > } > } > > Can avoid store forward stalls. > > Before loop transform: > > loop_avx256: 3710992 > loop : 671995 > loop_avx128: 650882 > > After loop transform: > > loop_avx256: 661386 > loop : 652932 > loop_avx128: 568710 Since the loop is probably unrolled this would be a task for reassociation which should try to make data dependences in a way the scheduler can then order memory accesses in advancing order without increasing register pressure (would also help using pre/post-inc addressing modes on some targets). Currently operand rank for memory accesses is determined by looking at the rank of SSA uses (which there may be none) only.