Re: OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]

2022-03-02 Thread Jakub Jelinek via Gcc-patches
On Tue, Mar 01, 2022 at 05:46:20PM +0100, Thomas Schwinge wrote:
> OK to proceed in this way?

With a suitable ChangeLog entry and one nit fixed yes.

> --- gcc/omp-low.cc
> +++ gcc/omp-low.cc
> @@ -188,7 +188,7 @@ struct omp_context
>  static splay_tree all_contexts;
>  static int taskreg_nesting_level;
>  static int target_nesting_level;
> -static bitmap task_shared_vars;
> +static bitmap make_addressable_vars;
>  static bitmap global_nonaddressable_vars;
>  static vec taskreg_contexts;
>  static vec task_cpyfns;
> @@ -572,9 +572,9 @@ use_pointer_for_field (tree decl, omp_context *shared_ctx)
>   /* Taking address of OUTER in lower_send_shared_vars
>  might need regimplification of everything that uses the
>  variable.  */
> - if (!task_shared_vars)
> -   task_shared_vars = BITMAP_ALLOC (NULL);
> - bitmap_set_bit (task_shared_vars, DECL_UID (outer));
> + if (!make_addressable_vars)
> +   make_addressable_vars = BITMAP_ALLOC (NULL);
> + bitmap_set_bit (make_addressable_vars, DECL_UID (outer));

Has the MUA replaced tabs with spaces?

> --- gcc/omp-oacc-kernels-decompose.cc
> +++ gcc/omp-oacc-kernels-decompose.cc
> @@ -845,7 +845,11 @@ maybe_build_inner_data_region (location_t loc, gimple 
> *body,
>   prev_mapped_var = v;
> 
>   /* See .  */
> - TREE_ADDRESSABLE (v) = 1;
> + if (!TREE_ADDRESSABLE (v))
> +   {
> + /* Request that OMP lowering make 'v' addressable.  */
> + OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE (new_clause) = 1;
> +   }

That is a single statement body, so shouldn't have {}s around it.

Jakub



Re: OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]

2022-03-01 Thread Thomas Schwinge
Hi!

Jakub, need your review/approval here, please:

On 2022-01-13T10:54:16+0100, I wrote:
> On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
>>  - The "addressable" bit is set during the kernels conversion pass for
>>variables that have "create" (alloc) clauses created for them in the
>>synthesised outer data region (instead of in the front-end, etc.,
>>where it can't be done accurately). Such variables actually have
>>their address taken during transformations made in a later pass
>>(omp-low, I think), but there's a phase-ordering problem that means
>>the flag should be set earlier.
>
> The actual issue is a bit different, but yes, there is a problem.
> The related ICE has also been reported as <https://gcc.gnu.org/PR100280>
> "ICE in lower_omp_target, at omp-low.c:12287".  (And I'm confused why we
> didn't run into that with the OpenACC 'kernels' decomposition
> originally.)  I've pushed to master branch
> commit 9b32c1669aad5459dd053424f9967011348add83
> "OpenACC 'kernels' decomposition: Mark variables used in synthesized data
> clauses as addressable [PR100280]"

> ... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:
>
> 13073 else if (is_gimple_reg (var))
> 13074   {
> 13075 gcc_assert (offloaded);
> 13076 tree avar = create_tmp_var (TREE_TYPE 
> (var));
> 13077 mark_addressable (avar);
>
> ..., which (a) is only implemented for actualy *offloaded* regions (but not
> data regions), and (b) the subsequently synthesized code for writing to and
> later reading back from the temporary fundamentally conflicts with OpenACC
> 'async' (as used by OpenACC 'kernels' decomposition).  That's all not trivial
> to make work, so let's just avoid this case.

> --- a/gcc/omp-oacc-kernels-decompose.cc
> +++ b/gcc/omp-oacc-kernels-decompose.cc
> @@ -793,7 +793,8 @@ make_data_region_try_statement (location_t loc, gimple 
> *body)
>
>  /* If INNER_BIND_VARS holds variables, build an OpenACC data region with
> location LOC containing BODY and having 'create (var)' clauses for each
> -   variable.  If INNER_CLEANUP is present, add a try-finally statement with
> +   variable (as a side effect, such variables also get TREE_ADDRESSABLE set).
> +   If INNER_CLEANUP is present, add a try-finally statement with
> this cleanup code in the finally block.  Return the new data region, or
> the original BODY if no data region was needed.  */
>
> @@ -842,6 +843,9 @@ maybe_build_inner_data_region (location_t loc, gimple 
> *body,
> inner_data_clauses = new_clause;
>
> prev_mapped_var = v;
> +
> +   /* See <https://gcc.gnu.org/PR100280>.  */
> +   TREE_ADDRESSABLE (v) = 1;
>   }
>  }

So, that's too simple.  ;-) ... and gives rise to workaround patches like
we have on the og11 development branch:
  - "Avoid introducing 'create' mapping clauses for loop index variables in 
kernels regions",
  - "Run all kernels regions with GOMP_MAP_FORCE_TOFROM mappings synchronously",
  - "Fix for is_gimple_reg vars to 'data kernels'"

We're after gimplification, and must not just set 'TREE_ADDRESSABLE',
because that may easily violate GIMPLE invariants, leading to ICEs later.
There are a few open PRs, which my following changes are addressing.  To
make "late" 'TREE_ADDRESSABLE' work, we have a precedent in OpenMP's
'gcc/omp-low.cc:task_shared_vars' handling, as Jakub had pointed to in
discussion of <https://gcc.gnu.org/PR102330>.  (PR102330 turned out to be
unrelated from the "late" 'TREE_ADDRESSABLE' problem here; I have a
different patch for it.)

I'm thus proposing to generalize 'gcc/omp-low.cc:task_shared_vars' into
'make_addressable_vars', plus new 'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE'
that we then may use instead of the 'TREE_ADDRESSABLE (v) = 1;' quoted
above (plus one or two additional ones to be introduced in later
patches), and wire that up in 'gcc/omp-low.cc:scan_sharing_clauses', for
'OMP_CLAUSE_MAP': set 'TREE_ADDRESSABLE' and put into
'make_addressable_vars' for later fix-up.

(In reply to Jakub Jelinek from comment #9)
> Whether you can use the same bitmap or need to add another bitmap next to
> task_shared_vars is something hard to guess without diving into it deeply.

Per my understanding of the code, the only place where I had doubts is
'gcc/omp-low.cc:finish_taskreg_scan', but I have convinced myself that
what this is doing is either a no-op in the
'OMP_CLAUSE_MAP_DECL_MAKE_ADDRESSABLE' case, or in fact necessary as the
original 'task_shared_vars' handling has been.  Either way: I couldn't
come up with a way (test ca

OpenACC 'kernels' decomposition: Mark variables used in synthesized data clauses as addressable [PR100280]

2022-01-13 Thread Thomas Schwinge
Hi!

On 2019-05-08T14:51:57+0100, Julian Brown  wrote:
>  - The "addressable" bit is set during the kernels conversion pass for
>variables that have "create" (alloc) clauses created for them in the
>synthesised outer data region (instead of in the front-end, etc.,
>where it can't be done accurately). Such variables actually have
>their address taken during transformations made in a later pass
>(omp-low, I think), but there's a phase-ordering problem that means
>the flag should be set earlier.

The actual issue is a bit different, but yes, there is a problem.
The related ICE has also been reported as <https://gcc.gnu.org/PR100280>
"ICE in lower_omp_target, at omp-low.c:12287".  (And I'm confused why we
didn't run into that with the OpenACC 'kernels' decomposition
originally.)  I've pushed to master branch
commit 9b32c1669aad5459dd053424f9967011348add83
"OpenACC 'kernels' decomposition: Mark variables used in synthesized data
clauses as addressable [PR100280]", see attached.


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 9b32c1669aad5459dd053424f9967011348add83 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 16 Dec 2021 22:02:37 +0100
Subject: [PATCH] OpenACC 'kernels' decomposition: Mark variables used in
 synthesized data clauses as addressable [PR100280]

... as otherwise 'gcc/omp-low.c:lower_omp_target' has to create a temporary:

13073			else if (is_gimple_reg (var))
13074			  {
13075			gcc_assert (offloaded);
13076			tree avar = create_tmp_var (TREE_TYPE (var));
13077			mark_addressable (avar);

..., which (a) is only implemented for actualy *offloaded* regions (but not
data regions), and (b) the subsequently synthesized code for writing to and
later reading back from the temporary fundamentally conflicts with OpenACC
'async' (as used by OpenACC 'kernels' decomposition).  That's all not trivial
to make work, so let's just avoid this case.

	gcc/
	PR middle-end/100280
	* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
	Mark variables used in synthesized data clauses as addressable.
	gcc/testsuite/
	PR middle-end/100280
	* c-c++-common/goacc/kernels-decompose-pr100280-1.c: New.
	* c-c++-common/goacc/classify-kernels-parloops.c: Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
	Likewise.
	* c-c++-common/goacc/classify-kernels-unparallelized.c: Test
	'--param openacc-kernels=decompose'.
	* c-c++-common/goacc/classify-kernels.c: Likewise.
	* c-c++-common/goacc/kernels-decompose-2.c: Update.
	* c-c++-common/goacc/kernels-decompose-ice-1.c: Remove.
	* c-c++-common/goacc/kernels-decompose-ice-2.c: Likewise.
	* gfortran.dg/goacc/classify-kernels-parloops.f95: New.
	* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
	Likewise.
	* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Test
	'--param openacc-kernels=decompose'.
	* gfortran.dg/goacc/classify-kernels.f95: Likewise.
	libgomp/
	PR middle-end/100280
	* testsuite/libgomp.oacc-c-c++-common/declare-vla-kernels-decompose-ice-1.c:
	Update.
	* testsuite/libgomp.oacc-c-c++-common/f-asyncwait-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c:
	Likewise.

Suggested-by: Julian Brown 
---
 gcc/omp-oacc-kernels-decompose.cc |   6 +-
 .../goacc/classify-kernels-parloops.c |  41 +++
 ...classify-kernels-unparallelized-parloops.c |  45 +++
 .../goacc/classify-kernels-unparallelized.c   |   5 +-
 .../c-c++-common/goacc/classify-kernels.c |   5 +-
 .../c-c++-common/goacc/kernels-decompose-2.c  |  16 ++-
 .../goacc/kernels-decompose-ice-1.c   | 114 --
 .../goacc/kernels-decompose-ice-2.c   |  22 
 .../goacc/kernels-decompose-pr100280-1.c  |  19 +++
 .../goacc/classify-kernels-parloops.f95   |  43 +++
 ...assify-kernels-unparallelized-parloops.f95 |  47 
 .../goacc/classify-kernels-unparallelized.f95 |   5 +-
 .../gfortran.dg/goacc/classify-kernels.f95|   5 +-
 .../declare-vla-kernels-decompose-ice-1.c |   2 +-
 .../libgomp.oacc-c-c++-common/f-asyncwait-1.c |  53 
 .../kernels-decompose-1.c |   6 +-
 16 files changed, 264 insertions(+), 170 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-parloops.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized-parloops.c
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-1.c
 delete mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-ice-2.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-decompose-pr100280-1.c
 create mode 100644 gcc/testsui