https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598
--- Comment #3 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 8 Jan 2021, jiangning.liu at amperecomputing dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98598 > > --- Comment #2 from Jiangning Liu <jiangning.liu at amperecomputing dot com> > --- > Loop distribution can only handle very simple case. If the inner loop has > complicated control flow and other memory accesses with loop-carried > dependence, it would be hard to handle it. For example, > > int foo (int n, int m, A *pa) { > int sum; > > for (int i = 0; i < n; i++) { > for (int j = 0; j < m; j++) { > sum += pa[j].pb->pc->val; // each value is repeatedly loaded "n" times > sum = sum % 7; > } > sum = sum % 13; > } > > return sum; > } > > Alternatively, we can detect "invariant" dependent memory loads for the nested > loops with alias conflict checked. If the outer loop is hot enough, we could > have a chance to "hoist" them to create cache. > > As for temp storage, is it a gcc's rule of thumb not to introduce temp storage > on heap, or it is just gcc doesn't have it yet and we want to have it? It has to be done with care of course, cost modeling is difficult (we need to have a good estimate of n and m or need to version the whole nest). That said, usually we attempt the reverse transform. My personal opinion is that hinting the user to possibly refactor his code (guided by profiling to be not too noisy) is much prefered to the idea that the compiler can ever apply such transform to the loops where it matters and not to the loops where it is harmful.