* Feng Xue:

> To simplify explanation of this memory gathering optimization, pseudo
> code on original nested loops is given as:
>
>   outer-loop ( ) { /* data in inner loop are also invariant in outer loop.  */
>       ...
>       inner-loop (iter, iter_count) {  /* inner loop to apply MGO */
>
>           Type1 v1 = LOAD_FN1 (iter);
>           Type2 v2 = LOAD_FN2 (v1);
>           Type3 v3 = LOAD_FN3 (v2);
>           ... 
>           iter = NEXT_FN (iter);
>       }
>
>      ...
>   }
>
> "LOAD_FNx()" is a conceptual function that translates its argument to a
> result, and is composed of a sequence of operations in which there is
> only one memory load. It is capable of representing simple memory
> dereference like "iter->field", or more complicated expression like
> "array[iter * 2 + cst].field".
>
> Likewise, "NEXT_FN()" is also a conceptual function which defines how an
> iterator is advanced. It is able to describe simple integer iterator,
> list iterator, or even pure-function-based iterator on advanced stuff
> like hashset/tree.
>
> Then the code will be transformed to:
>
>   typedef struct cache_elem {
>       bool   init;
>       Type1  c_v1;
>       Type2  c_v2;
>       Type3  c_v3;
>   } cache_elem;
>
>   cache_elem *cache_arr = calloc (iter_count, sizeof(cache_elem));
>
>   outer-loop () {
>       ...
>       size_t cache_idx = 0;
>
>       inner-loop (iter, iter_count) {
>
>           if (!cache_arr[cache_idx]->init) {
>               v1 = LOAD_FN1 (iter);
>               v2 = LOAD_FN2 (v1);
>               v3 = LOAD_FN3 (v2);
>
>               cache_arr[cache_idx]->init = true;
>               cache_arr[cache_idx]->c_v1 = v1;
>               cache_arr[cache_idx]->c_v2 = v2;
>               cache_arr[cache_idx]->c_v3 = v3;
>           }
>           else {
>               v1 = cache_arr[cache_idx]->c_v1;
>               v2 = cache_arr[cache_idx]->c_v2;
>               v3 = cache_arr[cache_idx]->c_v3;
>           }
>           ...
>           cache_idx++;
>           iter = NEXT_FN (iter);
>       }
>       ...
>   }
>
>   free (cache_arr);

Why do you need the init field?  Isn't the array initialized
sequentially?

You need to keep the original loop and use it if calloc fails.  The
transformation is not valid in general because calloc is not an
async-signal-safe function, and GCC shouldn't introduce such calls if
they are not present in the original source code.  For
-fnon-call-exceptions, you need to introduce unwinding code that calls
free.

Can you change this optimization so that it can use a fixed-size buffer?
This would avoid all issues around calling calloc.  If iter_count can be
very large, allocating that much extra memory might not be beneficial
anyway.

Thanks,
Florian

Reply via email to