Hi, this patch series implements the re-work of the OpenACC "kernels" implementation that has been announced at the GNU Tools Track of this year's Linux Plumbers Conference; see https://linuxplumbersconf.org/event/11/contributions/998/. The central step is contained in the commit titled "openacc: Use Graphite for dependence analysis in \"kernels\" regions" whose commit message also contains further explanations.
Best regards, Frederik PS: The commit series also includes a backport from master "00b98b6cac25 Add dg-final option-based target selectors" and two trivial unrelated commits "fa558c2a6664 Fix gimple_debug_cfg declaration" and "35cdc94463fe Fix branch prediction dump message" Andrew Stubbs (2): openacc: Add data optimization pass openacc: Add runtime alias checking for OpenACC kernels Frederik Harwath (19): openacc: Move pass_oacc_device_lower after pass_graphite graphite: Extend SCoP detection dump output graphite: Rename isl_id_for_ssa_name graphite: Fix minor mistakes in comments Fix branch prediction dump message Move compute_alias_check_pairs to tree-data-ref.c graphite: Add runtime alias checking openacc: Use Graphite for dependence analysis in "kernels" regions openacc: Add "can_be_parallel" flag info to "graph" dumps openacc: Add further kernels tests openacc: Remove unused partitioning in "kernels" regions Add function for printing a single OMP_CLAUSE openacc: Warn about "independent" "kernels" loops with data-dependences openacc: Handle internal function calls in pass_lim openacc: Disable pass_pre on outlined functions analyzed by Graphite graphite: Tune parameters for OpenACC use graphite: Adjust scop loop-nest choice graphite: Accept loops without data references openacc: Adjust test expectations to new "kernels" handling Sandra Loosemore (1): Fortran: delinearize multi-dimensional array accesses gcc/Makefile.in | 2 + gcc/cfgloop.c | 1 + gcc/cfgloop.h | 6 + gcc/cfgloopmanip.c | 1 + gcc/common.opt | 9 + gcc/config/nvptx/nvptx.c | 7 + gcc/doc/gimple.texi | 2 + gcc/doc/invoke.texi | 20 +- gcc/doc/passes.texi | 6 +- gcc/expr.c | 1 + gcc/flag-types.h | 1 + gcc/fortran/lang.opt | 4 + gcc/fortran/trans-array.c | 321 ++++-- gcc/gimple-loop-interchange.cc | 2 +- gcc/gimple-pretty-print.c | 3 + gcc/gimple-walk.c | 15 +- gcc/gimple-walk.h | 6 + gcc/gimple.h | 7 +- gcc/gimplify.c | 13 +- gcc/graph.c | 35 +- gcc/graphite-dependences.c | 220 +++- gcc/graphite-isl-ast-to-gimple.c | 271 ++++- gcc/graphite-oacc.c | 689 ++++++++++++ gcc/graphite-oacc.h | 55 + gcc/graphite-optimize-isl.c | 42 +- gcc/graphite-poly.c | 41 +- gcc/graphite-scop-detection.c | 654 +++++++++-- gcc/graphite-sese-to-poly.c | 90 +- gcc/graphite.c | 120 +- gcc/graphite.h | 40 +- gcc/internal-fn.c | 2 + gcc/internal-fn.h | 4 +- gcc/omp-data-optimize.cc | 951 ++++++++++++++++ gcc/omp-expand.c | 110 +- gcc/omp-general.c | 23 +- gcc/omp-general.h | 1 + gcc/omp-low.c | 321 +++++- gcc/omp-oacc-kernels-decompose.cc | 145 ++- gcc/omp-offload.c | 1001 +++++++++++++---- gcc/omp-offload.h | 2 + gcc/params.opt | 5 +- gcc/passes.c | 42 + gcc/passes.def | 47 +- gcc/predict.c | 2 +- gcc/sese.c | 25 +- gcc/sese.h | 19 + gcc/testsuite/c-c++-common/goacc/acc-icf.c | 4 +- gcc/testsuite/c-c++-common/goacc/cache-3-1.c | 2 +- ...classify-kernels-unparallelized-graphite.c | 41 + ...lassify-kernels-unparallelized-parloops.c} | 12 +- .../c-c++-common/goacc/classify-kernels.c | 27 +- .../c-c++-common/goacc/classify-parallel.c | 8 +- .../c-c++-common/goacc/classify-routine.c | 8 +- .../c-c++-common/goacc/classify-serial.c | 12 +- .../device-lowering-debug-optimization.c | 29 + .../goacc/device-lowering-no-loops.c | 17 + .../goacc/device-lowering-no-optimization.c | 30 + .../c-c++-common/goacc/if-clause-2.c | 2 +- .../goacc/kernels-decompose-1-parloops.c | 125 ++ .../c-c++-common/goacc/kernels-decompose-1.c | 31 +- .../c-c++-common/goacc/kernels-decompose-2.c | 2 +- .../goacc/kernels-decompose-ice-1.c | 5 +- .../goacc/kernels-decompose-ice-2.c | 3 +- .../goacc/kernels-loop-3-acc-loop.c | 2 +- .../c-c++-common/goacc/kernels-loop-3.c | 2 +- ...duction.c => kernels-reduction-parloops.c} | 0 .../c-c++-common/goacc/loop-2-kernels.c | 20 +- .../c-c++-common/goacc/loop-auto-reductions.c | 22 + .../goacc/nested-reductions-2-parallel.c | 138 +++ ...kernels-conditional-loop-independent_seq.c | 129 --- ...parallelism-1-kernels-loop-auto-parloops.c | 128 +++ .../note-parallelism-1-kernels-loop-auto.c | 104 +- ...rallelism-1-kernels-loop-independent_seq.c | 19 +- .../goacc/note-parallelism-1-kernels-loops.c | 11 +- ...note-parallelism-1-kernels-straight-line.c | 11 +- ...e-parallelism-combined-kernels-loop-auto.c | 34 +- ...sm-combined-kernels-loop-independent_seq.c | 16 - ...kernels-conditional-loop-independent_seq.c | 38 +- .../note-parallelism-kernels-loop-auto.c | 100 +- ...parallelism-kernels-loop-independent_seq.c | 27 +- .../goacc/note-parallelism-kernels-loops-1.c | 61 + .../note-parallelism-kernels-loops-parloops.c | 53 + .../goacc/note-parallelism-kernels-loops.c | 39 +- .../c-c++-common/goacc/omp_data_optimize-1.c | 677 +++++++++++ gcc/testsuite/c-c++-common/goacc/routine-1.c | 2 +- .../goacc/routine-level-of-parallelism-2.c | 2 - .../c-c++-common/goacc/routine-nohost-1.c | 4 +- gcc/testsuite/c-c++-common/unroll-1.c | 8 +- gcc/testsuite/c-c++-common/unroll-4.c | 4 +- .../g++.dg/goacc/omp_data_optimize-1.C | 169 +++ .../gcc.dg/goacc/graphite-parameter-1.c | 21 + .../gcc.dg/goacc/graphite-parameter-2.c | 23 + .../gcc.dg/goacc/loop-processing-1.c | 7 +- .../gcc.dg/goacc/nested-function-1.c | 3 +- gcc/testsuite/gcc.dg/graphite/alias-1.c | 22 + gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c | 6 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c | 6 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c | 6 +- gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c | 6 +- gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/loop-38.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c | 2 +- gcc/testsuite/gcc.dg/tree-ssa/pr21463.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/pr45427.c | 4 +- gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c | 2 +- gcc/testsuite/gcc.dg/unroll-2.c | 2 +- gcc/testsuite/gcc.dg/unroll-3.c | 4 +- gcc/testsuite/gcc.dg/unroll-4.c | 4 +- gcc/testsuite/gcc.dg/unroll-5.c | 4 +- gcc/testsuite/gcc.dg/vect/bb-slp-59.c | 2 +- gcc/testsuite/gcc.dg/vect/vect-profile-1.c | 2 +- gcc/testsuite/gfortran.dg/assumed_type_2.f90 | 6 +- ...assify-kernels-unparallelized-parloops.f95 | 44 + .../goacc/classify-kernels-unparallelized.f95 | 26 +- .../gfortran.dg/goacc/classify-kernels.f95 | 26 +- .../gfortran.dg/goacc/classify-parallel.f95 | 6 +- .../gfortran.dg/goacc/classify-routine.f95 | 8 +- .../gfortran.dg/goacc/classify-serial.f95 | 11 +- .../gfortran.dg/goacc/common-block-3.f90 | 14 +- .../gfortran.dg/goacc/gang-static.f95 | 14 +- .../gfortran.dg/goacc/kernels-conversion.f95 | 52 + .../goacc/kernels-decompose-1-parloops.f95 | 121 ++ .../gfortran.dg/goacc/kernels-decompose-1.f95 | 183 ++- .../gfortran.dg/goacc/kernels-decompose-2.f95 | 112 +- .../goacc/kernels-decompose-parloops-2.f95 | 154 +++ .../gfortran.dg/goacc/kernels-loop-2.f95 | 13 +- .../gfortran.dg/goacc/kernels-loop-data-2.f95 | 13 +- .../goacc/kernels-loop-data-parloops-2.f95 | 52 + .../gfortran.dg/goacc/kernels-loop-inner.f95 | 6 +- .../goacc/kernels-loop-parloops-2.f95 | 45 + .../goacc/kernels-loop-parloops.f95 | 39 + .../gfortran.dg/goacc/kernels-loop.f95 | 12 +- .../gfortran.dg/goacc/kernels-reductions.f90 | 37 + .../gfortran.dg/goacc/kernels-tree.f95 | 2 +- .../gfortran.dg/goacc/loop-2-kernels.f95 | 22 +- .../goacc/loop-auto-transfer-2.f90 | 45 + .../goacc/loop-auto-transfer-3.f90 | 95 ++ .../goacc/loop-auto-transfer-4.f90 | 293 +++++ .../gfortran.dg/goacc/nested-function-1.f90 | 2 + .../goacc/nested-reductions-2-parallel.f90 | 177 +++ .../gfortran.dg/goacc/omp_data_optimize-1.f90 | 588 ++++++++++ gcc/testsuite/gfortran.dg/goacc/pr72741.f90 | 8 +- .../goacc/private-explicit-kernels-1.f95 | 13 +- .../goacc/private-predetermined-kernels-1.f95 | 16 +- .../goacc/routine-module-mod-1.f90 | 2 +- gcc/testsuite/gfortran.dg/graphite/block-2.f | 9 +- .../gfortran.dg/graphite/block-3.f90 | 1 - .../gfortran.dg/graphite/block-4.f90 | 1 - gcc/testsuite/gfortran.dg/graphite/id-9.f | 2 +- .../gfortran.dg/inline_matmul_24.f90 | 2 +- gcc/testsuite/gfortran.dg/no_arg_check_2.f90 | 6 +- gcc/testsuite/gfortran.dg/pr32921.f | 2 +- gcc/testsuite/gfortran.dg/reassoc_4.f | 2 +- gcc/tree-chrec.c | 3 + gcc/tree-data-ref.c | 107 +- gcc/tree-data-ref.h | 3 + gcc/tree-loop-distribution.c | 87 -- gcc/tree-parloops.c | 18 +- gcc/tree-pass.h | 3 + gcc/tree-pretty-print.c | 11 + gcc/tree-pretty-print.h | 1 + gcc/tree-scalar-evolution.c | 179 ++- gcc/tree-scalar-evolution.h | 3 + gcc/tree-ssa-dce.c | 14 + gcc/tree-ssa-loop-im.c | 58 +- gcc/tree-ssa-loop-ivcanon.c | 2 + gcc/tree-ssa-loop-manip.h | 2 +- gcc/tree-ssa-loop-niter.c | 6 + gcc/tree-ssa-loop.c | 110 ++ gcc/tree-ssa-phiprop.c | 2 + gcc/tree-ssa-pre.c | 17 + .../acc_prof-kernels-1.c | 19 +- .../kernels-decompose-1.c | 7 +- .../libgomp.oacc-c-c++-common/parallel-dims.c | 34 +- .../libgomp.oacc-c-c++-common/pr84955-1.c | 1 - .../libgomp.oacc-c-c++-common/pr85381-2.c | 8 +- .../libgomp.oacc-c-c++-common/pr85381-3.c | 3 - .../libgomp.oacc-c-c++-common/pr85381-4.c | 4 +- .../libgomp.oacc-c-c++-common/pr85486-2.c | 2 +- .../libgomp.oacc-c-c++-common/pr85486-3.c | 2 +- .../libgomp.oacc-c-c++-common/pr85486.c | 2 +- .../runtime-alias-check-1.c | 79 ++ .../runtime-alias-check-2.c | 90 ++ .../vector-length-128-1.c | 3 +- .../vector-length-128-2.c | 3 +- .../vector-length-128-3.c | 3 +- .../vector-length-128-4.c | 3 +- .../vector-length-128-5.c | 3 +- .../vector-length-128-6.c | 3 +- .../vector-length-128-7.c | 3 +- .../gangprivate-attrib-1.f90 | 5 +- .../gangprivate-attrib-2.f90 | 3 +- .../kernels-acc-loop-reduction-2.f90 | 12 +- .../kernels-independent.f90 | 1 + .../libgomp.oacc-fortran/kernels-loop-1.f90 | 1 + .../libgomp.oacc-fortran/pr94358-1.f90 | 7 +- 201 files changed, 9403 insertions(+), 1524 deletions(-) create mode 100644 gcc/graphite-oacc.c create mode 100644 gcc/graphite-oacc.h create mode 100644 gcc/omp-data-optimize.cc create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-graphite.c rename gcc/testsuite/c-c++-common/goacc/{classify-kernels-unparallelized.c => classify-kernels-unparallelized-parloops.c} (84%) create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c create mode 100644 gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-1-parloops.c rename gcc/testsuite/c-c++-common/goacc/{kernels-reduction.c => kernels-reduction-parloops.c} (100%) create mode 100644 gcc/testsuite/c-c++-common/goacc/loop-auto-reductions.c delete mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-1.c create mode 100644 gcc/testsuite/c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c create mode 100644 gcc/testsuite/c-c++-common/goacc/omp_data_optimize-1.c create mode 100644 gcc/testsuite/g++.dg/goacc/omp_data_optimize-1.C create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-1.c create mode 100644 gcc/testsuite/gcc.dg/goacc/graphite-parameter-2.c create mode 100644 gcc/testsuite/gcc.dg/graphite/alias-1.c create mode 100644 gcc/testsuite/gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-conversion.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-1-parloops.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-decompose-parloops-2.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-data-parloops-2.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops-2.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-loop-parloops.f95 create mode 100644 gcc/testsuite/gfortran.dg/goacc/kernels-reductions.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-2.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-3.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-transfer-4.f90 create mode 100644 gcc/testsuite/gfortran.dg/goacc/omp_data_optimize-1.f90 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c -- 2.33.0 ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955