>> gcc/ >> PR tree-optimization/98598 >> * Makefile.in (OBJS): Add tree-ssa-loop-mgo.o. >> * common.opt (-ftree-loop-mgo): New option. > > Just a quick comment - -ftree-loop-mgo is user-facing and it isn't really a > good > name. -floop-mgo would be better but still I'd have no idea what this would > do. > > I don't have a good suggestion here other than to expand it to > -floop-gather-memory (?!).
OK. Better than "mgo", this abbr. is only a term for development use. > The option documentation isn't informative either. > > From: > > outer-loop () > { > inner-loop (iter, iter_count) > { > Type1 v1 = LOAD (iter); > Type2 v2 = LOAD (v1); > Type3 v3 = LOAD (v2); > ... > iter = NEXT (iter); > } > } > > To: > > typedef struct cache_elem > { > bool init; > Type1 c_v1; > Type2 c_v2; > Type3 c_v3; > } cache_elem; > > cache_elem *cache_arr = calloc (iter_count, sizeof (cache_elem)); > > outer-loop () > { > size_t cache_idx = 0; > > inner-loop (iter, iter_count) > { > if (!cache_arr[cache_idx]->init) > { > v1 = LOAD (iter); > v2 = LOAD (v1); > v3 = LOAD (v2); > > cache_arr[cache_idx]->init = true; > cache_arr[cache_idx]->c_v1 = v1; > cache_arr[cache_idx]->c_v2 = v2; > cache_arr[cache_idx]->c_v3 = v3; > } > else > { > v1 = cache_arr[cache_idx]->c_v1; > v2 = cache_arr[cache_idx]->c_v2; > v3 = cache_arr[cache_idx]->c_v3; > } > ... > cache_idx++; > iter = NEXT (iter); > } > } > > free (cache_arr); > > This is a _very_ special transform. What it seems to do is > optimize the dependent loads for outer loop iteration n > 1 > by caching the result(s). If that's possible then you should > be able to distribute the outer loop to one doing the caching > and one using the cache. Then this transform would be more > like a tradidional array expansion of scalars? In some cases > also loop interchange could remove the need for the caching. > > Doing MGO as the very first loop pass thus looks bad, I think > MGO should be much later, for example after interchange. > I also think that MGO should work in concert with loop > distribution (which could have an improved cost model) > rather than being a separate pass. > > Your analysis phase looks quite expensive, building sth > like a on-the side representation very closely matching SSA. > It seems to work from PHI defs to uses, which looks backwards. Did not catch this point very clearly. Would you please detail it more? > You seem to roll your own dependence analysis code :/ Please > have a look at loop distribution. > > Also you build an actual structure type for reasons that escape > me rather than simply accessing the allocated storage at > appropriate offsets. > > I think simply calling 'calloc' isn't OK because you might need > aligned storage and because calloc might not be available. > Please at least use 'malloc' and make sure MALLOC_ABI_ALIGNMENT > is large enough for the data you want to place (or perform > dynamic re-alignment yourself). We probably want some generic > middle-end utility to obtain aligned allocated storage at some > point. > > As said above I think you want to re-do this transform as > a loop distribution transform. I think if caching works then > the loads should be distributable and the loop distribution > transform should be enhanced to expand the scalars to arrays. I checked code of loop distribution, and its trigger strategy seems to be very conservative, now only targets simple and regular index-based loop, and could not handle link-list traversal, which consists of a series of discrete memory accesses, and MGO would matter a lot. Additionally, for some complicate cases, we could not completely decompose MGO as two separate loops for "do caching" and "use caching" respectively. An example: for (i = 0; i < N; i++) { for (j = 0; j < i; j++) { Type1 v1 = LOAD_FN1 (j); Type2 v2 = LOAD_FN2 (v1); Type3 v3 = LOAD_FN3 (v2); ... condition = ... } if (condition) break; } We should not cache all loads (Totally N) in one step since some of them might be invalid after "condition" breaks loops. We have to mix up "do caching" and "use caching", and let them dynamically switched against "init" flag. But loop distribution does have some overlap on analysis and transformation with MGO, we will try to see if there is a way to unify them. Thanks, Feng