On Sat, 21 Nov 2015, Tom de Vries wrote: > On 13/11/15 12:39, Jakub Jelinek wrote: > > On Fri, Nov 13, 2015 at 12:29:51PM +0100, Richard Biener wrote: > > > > thanks for the explanation. Filed as PR68331 - '[meta-bug] fipa-pta > > > > issues'. > > > > > > > > Any feedback on the '#pragma GCC offload-alias=<none|pointer|all>' bit > > > > above? > > > > Is that sort of what you had in mind? > > > > > > Yes. Whether that makes sense is another question of course. You can > > > annotate memory references with MR_DEPENDENCE_BASE/CLIQUE yourself > > > as well if you know dependences without the users intervention. > > > > I really don't like even the GCC offload-alias, I just don't see anything > > special on the offload code. Not to mention that the same issue is already > > with other outlined functions, like OpenMP tasks or parallel regions, those > > aren't offloaded, yet they can suffer from worse alias/points-to analysis > > too. > > AFAIU there is one aspect that is different for offloaded code: the setup of > the data on the device. > > Consider this example: > ... > unsigned int a[N]; > unsigned int b[N]; > unsigned int c[N]; > > int > main (void) > { > ... > > #pragma acc kernels copyin (a) copyin (b) copyout (c) > { > for (COUNTERTYPE ii = 0; ii < N; ii++) > c[ii] = a[ii] + b[ii]; > } > > ... > ... > > At gimple level, we have: > ... > #pragma omp target oacc_kernels \ > map(force_from:c [len: 2097152]) \ > map(force_to:b [len: 2097152]) \ > map(force_to:a [len: 2097152]) > ... > > [ The meaning of the force_from/force_to mappings is given in > include/gomp-constants.h: > ... > /* Allocate. */ > GOMP_MAP_FORCE_ALLOC = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_ALLOC), > /* ..., and copy to device. */ > GOMP_MAP_FORCE_TO = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TO), > /* ..., and copy from device. */ > GOMP_MAP_FORCE_FROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_FROM), > /* ..., and copy to and from device. */ > GOMP_MAP_FORCE_TOFROM = (GOMP_MAP_FLAG_FORCE | GOMP_MAP_TOFROM), > ... ] > > So before calling the offloaded function, a separate alloc is done for a, b > and c, and the base pointers of the newly allocated objects are passed to the > offloaded function. > > This means we can mark those base pointers as restrict in the offloaded > function. > > Attached proof-of-concept patch implements that. > > > We simply have some compiler internal interface between the caller and > > callee of the outlined regions, each interface in between those has > > its own structure type used to communicate the info; > > we can attach attributes on the fields, or some flags to indicate some > > properties interesting from aliasing POV. > > We don't really need to perform > > full IPA-PTA, perhaps it would be enough to a) record somewhere in cgraph > > the relationship in between such callers and callees (for offloading regions > > we already have "omp target entrypoint" attribute on the callee and a > > singler caller), tell LTO if possible not to split those into different > > partitions if easily possible, and then just for these pairs perform > > aliasing/points-to analysis in the caller and the result record using > > cliques/special attributes/whatever to the callee side, so that the callee > > (outlined OpenMP/OpenACC/Cilk+ region) can then improve its alias analysis. > > As a start, is the approach of this patch OK?
Works for me but leaving to Jakub to review for correctness. Richard. > It will allow us to commit the oacc kernels patch series with the ability to > parallelize non-trivial testcases, and work on improving the alias bit after > that. > > Thanks, > - Tom > > > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)