* Feng Xue: > To simplify explanation of this memory gathering optimization, pseudo > code on original nested loops is given as: > > outer-loop ( ) { /* data in inner loop are also invariant in outer loop. */ > ... > inner-loop (iter, iter_count) { /* inner loop to apply MGO */ > > Type1 v1 = LOAD_FN1 (iter); > Type2 v2 = LOAD_FN2 (v1); > Type3 v3 = LOAD_FN3 (v2); > ... > iter = NEXT_FN (iter); > } > > ... > } > > "LOAD_FNx()" is a conceptual function that translates its argument to a > result, and is composed of a sequence of operations in which there is > only one memory load. It is capable of representing simple memory > dereference like "iter->field", or more complicated expression like > "array[iter * 2 + cst].field". > > Likewise, "NEXT_FN()" is also a conceptual function which defines how an > iterator is advanced. It is able to describe simple integer iterator, > list iterator, or even pure-function-based iterator on advanced stuff > like hashset/tree. > > Then the code will be transformed to: > > typedef struct cache_elem { > bool init; > Type1 c_v1; > Type2 c_v2; > Type3 c_v3; > } cache_elem; > > cache_elem *cache_arr = calloc (iter_count, sizeof(cache_elem)); > > outer-loop () { > ... > size_t cache_idx = 0; > > inner-loop (iter, iter_count) { > > if (!cache_arr[cache_idx]->init) { > v1 = LOAD_FN1 (iter); > v2 = LOAD_FN2 (v1); > v3 = LOAD_FN3 (v2); > > cache_arr[cache_idx]->init = true; > cache_arr[cache_idx]->c_v1 = v1; > cache_arr[cache_idx]->c_v2 = v2; > cache_arr[cache_idx]->c_v3 = v3; > } > else { > v1 = cache_arr[cache_idx]->c_v1; > v2 = cache_arr[cache_idx]->c_v2; > v3 = cache_arr[cache_idx]->c_v3; > } > ... > cache_idx++; > iter = NEXT_FN (iter); > } > ... > } > > free (cache_arr);
Why do you need the init field? Isn't the array initialized sequentially? You need to keep the original loop and use it if calloc fails. The transformation is not valid in general because calloc is not an async-signal-safe function, and GCC shouldn't introduce such calls if they are not present in the original source code. For -fnon-call-exceptions, you need to introduce unwinding code that calls free. Can you change this optimization so that it can use a fixed-size buffer? This would avoid all issues around calling calloc. If iter_count can be very large, allocating that much extra memory might not be beneficial anyway. Thanks, Florian