Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives

2023-05-16 Thread Frederik Harwath via Gcc-patches

Hi Jakub,

On 15.05.23 12:19, Jakub Jelinek wrote:

On Fri, Mar 24, 2023 at 04:30:38PM +0100, Frederik Harwath wrote:

this patch series implements the OpenMP 5.1 "unroll" and "tile"
constructs.  It includes changes to the C,C++, and Fortran front end
for parsing the new constructs and a new middle-end
"omp_transform_loops" pass which implements the transformations in a
source language agnostic way.

I'm afraid we can't do it this way, at least not completely.

The OpenMP requirements and what is being discussed for further loop
transformations pretty much requires parts of it to be done as soon as possible.
My understanding is that that is where other implementations implement that
too and would also prefer GCC not to be the only implementation that takes
significantly different decision in that case from other implementations


The place where different compilers implement the loop transformations
was discussed in an OpenMP loop transformation meeting last year. Two 
compilers (another one and GCC with this patch series) transformed the 
loops in the middle end after the handling of data sharing, one planned 
to do so. Yet another vendor had not yet decided where it will be 
implemented. Clang currently does everything in the front end, but it 
was mentioned that this might change in the future e.g. for code sharing 
with Flang. Implementing the loop transformations late could potentially
complicate the implementation of transformations which require 
adjustments of the data sharing clauses, but this is known and 
consequentially, no such transformations are planned for OpenMP 6.0. In 
particular, the "apply" clause therefore only permits loop-transforming 
constructs to be applied to the loops generated from other loop

transformations in TR11.


The normal loop constructs (OMP_FOR, OMP_SIMD, OMP_DISTRIBUTE, OMP_LOOP)
already need to know given their collapse/ordered how many loops they are
actually associated with and the loop transformation constructs can change
that.
So, I think we need to do the loop transformations in the FEs, that doesn't
mean we need to write everything 3 times, once for each frontend.
Already now, e.g. various stuff is shared between C and C++ FEs in c-family,
though how much can be shared between c-family and Fortran is to be
discovered.
Or at least partially, to the extent that we compute how many canonical
loops the loop transformations result in, what artificial iterators they
will use etc., so that during gimplification we can take all that into
account and then can do the actual transformations later.


The patches in this patch series already do compute how many canonical
loop nests result from the loop transformations in the front end.
This is necessary to represent the loop nest that is affected by the
loop transformations by a single OMP_FOR to meet the expectations
of all later OpenMP code transformations. This is also the major
reason why the loop transformations are represented by clauses
instead of representing them as  "OMP_UNROLL/OMP_TILE as
GENERIC constructs like OMP_FOR" as you suggest below. Since the
loop transformations may also appear on inner loops of a collapsed
loop nest (i.e. within the collapsed depth), representing the
transformation by OMP_FOR-like constructs would imply that a collapsed
loop nest would have to be broken apart into single loops. Perhaps this
could be handled somehow, but the collapsed loop nest would have to be
re-assembled to meet the expectations of e.g. gimplification.
The clause representation is also much better suited for the upcoming
OpenMP "apply" clause where the transformations will not appear
as directives in front of actual loops but inside of other clauses.
In fact, the loop transformation clauses in the implementation already
specify the level of a loop nest to which they apply and it could
be possible to re-use this handling for "apply".

My initial reaction also was to implement the loop transformations
as OMP_FOR-like constructs and the patch actually introduces an
OMP_LOOP_TRANS construct which is used to represent loops that
are not going to be associated with another OpenMP directive after
the transformation, e.g.

void foo () {
  #pragma omp tile sizes (4, 8, 16)
  for (int i = 0; i < 64; ++i)
  {
...
  }

}

You suggest to implement the loop transformations during gimplification.
I am not sure if gimplification is actually well-suited to implement the 
depth-first evaluation of the loop transformations. I also believe that 
gimplification already handles too many things which conceptually are 
not related to the translation to GIMPLE. Having a separate pass seems 
to be the right move to achieve a better separation of concerns. I think 
this will be even more important in the future as the size of the loop 
transformation implementation keeps growing. As you mention below, 
several new constructs are already planned.



For C, I think the lowering of loop transformation constructs or at least
determining w

Re: [PATCH 0/7] openmp: OpenMP 5.1 loop transformation directives

2023-05-17 Thread Frederik Harwath via Gcc-patches

Hi Jakub,

On 16.05.23 13:00, Jakub Jelinek wrote:

On Tue, May 16, 2023 at 11:45:16AM +0200, Frederik Harwath wrote:

The place where different compilers implement the loop transformations
was discussed in an OpenMP loop transformation meeting last year. Two
compilers (another one and GCC with this patch series) transformed 
the loops
in the middle end after the handling of data sharing, one planned to 
do so.
Yet another vendor had not yet decided where it will be implemented. 
Clang
currently does everything in the front end, but it was mentioned that 
this

might change in the future e.g. for code sharing with Flang. Implementing
the loop transformations late could potentially
complicate the implementation of transformations which require 
adjustments
of the data sharing clauses, but this is known and consequentially, 
no such

When already in the FE we determine how many canonical loops a particular
loop transformation creates, I think the primary changes I'd like to 
see is
really have OMP_UNROLL/OMP_TILE GENERIC statements (see below) and 
consider

where is the best spot to lower it. I believe for data sharing it is best
done during gimplification before the containing loops are handled, it is
already shared code among all the FEs, I think will make it easier to 
handle

data sharing right and gimplification is also where doacross processing is
done. While there is restriction that ordered clause is incompatible with
generated loops from tile construct, there isn't one for unroll (unless
"The ordered clause must not appear on a worksharing-loop directive if 
the associated loops

include the generated loops of a tile directive."
means unroll partial implicitly because partial unroll tiles the loop, but
it doesn't say it acts as if it was a tile construct), so we'd have to 
handle

#pragma omp for ordered(2)
for (int i = 0; i < 64; i++)
#pragma omp unroll partial(4)
for (int j = 0; j < 64; j++)
{
#pragma omp ordered depend (sink: i - 1, j - 2)
#pragma omp ordered depend (source)
}
and I think handling it after gimplification is going to be increasingly
harder. Of course another possibility is ask lang committee to clarify
unless it has been clarified already in 6.0 (but in TR11 it is not).


I do not really expect that we will have to handle this. Questions 
concerning

the correctness of code after applying loop transformations came up several
times since I have been following the design meetings and the result was
always either that nothing will be changed, because the loop transformations
are not expected to ensure the correctness of enclosing directives, or that
the use of the problematic construct in conjunction with loop 
transformations

will be forbidden. Concerning the use of "ordered" on transformed loops, the
latter approach was suggested for all transformations, cf. issue #3494 
in the
private OpenMP spec repository. I see that you have already asked for 
clarification

on unroll. I suppose this could also be fixed after gimplification with
reasonable effort. But let's just wait for the result of that discussion 
before we

continue worrying about this.


Also, I think creating temporaries is easier to be done during
gimplification than later.


This has not caused problems with the current approach.


Another option is as you implemented a separate pre-omp-lowering pass,
and another one would be do it in the omplower pass, which has actually
several subpasses internally, do it in the scan phase. Disadvantage of
a completely separate pass is that we have to walk the whole IL again,
while doing it in the scan phase means we avoid that cost. We already
do there similar transformations, scan_omp_simd transforms simd constructs
into if (...) simd else simt and then we process it with normal 
scan_omp_for

on what we've created. So, if you insist doing it after gimplification
perhaps for compatibility with other non-LLVM compilers, I'd prefer to
do it there rather than in a completely separate pass.


I see. This would be possible. My current approach is indeed rather
wasteful because the pass is not restricted to functions that actually
use loop transformations. I could add an attribute to such functions
that could be used to avoid the execution of the pass and hence
the gimple walk on functions that do not use transformations.


This is necessary to represent the loop nest that is affected by the
loop transformations by a single OMP_FOR to meet the expectations
of all later OpenMP code transformations. This is also the major
reason why the loop transformations are represented by clauses
instead of representing them as  "OMP_UNROLL/OMP_TILE as
GENERIC constructs like OMP_FOR" as you suggest below. Since the

I really don't see why. We try to represent what we see in the source
as OpenMP constructs as those constructs. We already have a precedent
with composite loop constructs, where for the combined constructs which
aren't innermost we temporarily use NULL 
OMP_FOR_{INIT,COND,INCR,ORIG_DECL

Re: [PATCH 1/7] openmp: Add Fortran support for "omp unroll" directive

2023-04-06 Thread Frederik Harwath via Gcc-patches

Hi Thomas,

On 01.04.23 10:42, Thomas Schwinge wrote:

... I see FAIL for x86_64-pc-linux-gnu '-m32' (thus, host, not
offloading), '-O0' (only):
   

[...]

 FAIL: libgomp.fortran/loop-transforms/unroll-1.f90   -O0  execution test

[...]

 FAIL: libgomp.fortran/loop-transforms/unroll-simd-1.f90   -O0  execution 
test



Thank you for reporting the failures! They are caused by mistakes in the 
test code, not the implementation. I have attached a patch which fixes 
the failures.


I have been able to reproduce the failures with -m32. With the patch 
they went away, even with 100 of repeated test executions ;-).



Best regards,

Frederik
From 3f471ed293d2e97198a65447d2f0d2bb69a2f305 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Thu, 6 Apr 2023 14:52:07 +0200
Subject: [PATCH] openmp: Fix loop transformation tests

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/loop-transforms/tile-2.f90: Add reduction clause.
	* testsuite/libgomp.fortran/loop-transforms/unroll-1.f90: Initialize var.
	* testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90: Add reduction
	and initialization.
---
 libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90   | 2 +-
 libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 | 2 ++
 .../libgomp.fortran/loop-transforms/unroll-simd-1.f90  | 3 ++-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
index 6aedbf4724f..a7cb5e7635d 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/tile-2.f90
@@ -69,7 +69,7 @@ module test_functions
 integer :: i,j
 
 sum = 0
-!$omp parallel do collapse(2)
+!$omp parallel do collapse(2) reduction(+:sum)
 !$omp tile sizes(6,10)
 do i = 1,10,3
do j = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
index f07aab898fa..b91ea275577 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-1.f90
@@ -8,6 +8,7 @@ module test_functions
 
 integer :: i,j
 
+sum = 0
 !$omp do
 do i = 1,10,3
!$omp unroll full
@@ -22,6 +23,7 @@ module test_functions
 
 integer :: i,j
 
+sum = 0
 !$omp parallel do reduction(+:sum)
 !$omp unroll partial(2)
 do i = 1,10,3
diff --git a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90 b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
index 5fb64ddd6fd..7a43458f0dd 100644
--- a/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
+++ b/libgomp/testsuite/libgomp.fortran/loop-transforms/unroll-simd-1.f90
@@ -9,7 +9,8 @@ module test_functions
 
 integer :: i,j
 
-!$omp simd
+sum = 0
+!$omp simd reduction(+:sum)
 do i = 1,10,3
!$omp unroll full
do j = 1,10,3
-- 
2.36.1



[PATCH] Docs, OpenMP: Small fixes to internal OMP_FOR doc

2023-04-19 Thread Frederik Harwath via Gcc-patches

Hi Sandra,
the OMP_FOR documentation says that the loop index variable
must be signed and it does not list "!=" in the allowed conditional
expressions. But there is nothing that would automatically cast an 
unsigned variable

to signed or that converts the "!=" as you can see from the dump
for this program:

int main ()
{
#pragma omp for
for (unsigned i = 0; i != 10; i++) {}
}

The 005t.gimple dump is:

int __GIMPLE ()
{
  int D_2064;

  {
    {
  unsigned int i;

  #pragma omp for private(i)
  for (i = 0u; i != 10u; i = i + 1u)
    }
  }
  D_2064 = 0;
  return D_2064;
}

(Strictly speaking, the OMP_FOR is represented as a gomp_for at this point,
but this does not really matter.)

Can I commit the patch?

Best regards,
Frederik
From 8af01114c295086526a67f56f6256fc945b1ccb5 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Wed, 19 Apr 2023 13:18:55 +0200
Subject: [PATCH] Docs, OpenMP: Small fixes to internal OMP_FOR doc.

gcc/ChangeLog:

	* doc/generic.texi (OpenMP): Add != to allowed
	conditions and state that vars can be unsigned.

	* tree.def (OMP_FOR): Likewise.
---
 gcc/doc/generic.texi | 4 ++--
 gcc/tree.def | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 2c14b7abce2..8b2882da4fe 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -2323,7 +2323,7 @@ Operand @code{OMP_FOR_INIT} is a vector containing iteration
 variable initializations of the form @code{VAR = N1}.
 
 Operand @code{OMP_FOR_COND} is vector containing loop
-conditional expressions of the form @code{VAR @{<,>,<=,>=@} N2}.
+conditional expressions of the form @code{VAR @{<,>,<=,>=,!=@} N2}.
 
 Operand @code{OMP_FOR_INCR} is a vector containing loop index
 increment expressions of the form @code{VAR @{+=,-=@} INCR}.
@@ -2349,7 +2349,7 @@ adjust their data-sharing attributes and diagnose errors.
 @code{OMP_FOR_ORIG_DECLS} is a vector field, with each element holding
 a list of @code{VAR_DECLS} for the corresponding collapse level.
 
-The loop index variable @code{VAR} must be a signed integer variable,
+The loop index variable @code{VAR} must be an integer variable,
 which is implicitly private to each thread.  For rectangular loops,
 the bounds @code{N1} and @code{N2} and the increment expression
 @code{INCR} are required to be loop-invariant integer expressions
diff --git a/gcc/tree.def b/gcc/tree.def
index ee02754354f..90ceeec0b51 100644
--- a/gcc/tree.def
+++ b/gcc/tree.def
@@ -1159,7 +1159,7 @@ DEFTREECODE (OMP_TASK, "omp_task", tcc_statement, 2)
variable initializations of the form VAR = N1.
 
Operand 3: OMP_FOR_COND is vector containing loop
-   conditional expressions of the form VAR {<,>,<=,>=} N2.
+   conditional expressions of the form VAR {<,>,<=,>=,!=} N2.
 
Operand 4: OMP_FOR_INCR is a vector containing loop index
increment expressions of the form VAR {+=,-=} INCR.
@@ -1185,7 +1185,7 @@ DEFTREECODE (OMP_TASK, "omp_task", tcc_statement, 2)
OMP_FOR_ORIG_DECLS is a vector field, with each element holding
a list of VAR_DECLS for the corresponding collapse level.
 
-   The loop index variable VAR must be a signed integer variable,
+   The loop index variable VAR must be an integer variable,
which is implicitly private to each thread.  For rectangular loops,
the bounds N1 and N2 and the increment expression
INCR are required to be loop-invariant integer expressions
-- 
2.36.1