Re: [PATCH] Assorted --disable-checking fixes [PR117249]

2024-10-25 Thread Thomas Schwinge
Hi Jakub!

Just one item, quickly:

On 2024-10-25T10:19:58+0200, Jakub Jelinek  wrote:
> We have currently 3 different definitions of gcc_assert macro, one used most
> of the time (unless --disable-checking) which evaluates the condition at
> runtime and also checks it at runtime, then one for --disable-checking GCC 
> 4.5+
> which looks like
> ((void)(UNLIKELY (!(EXPR)) ? __builtin_unreachable (), 0 : 0))
> and a fallback one
> ((void)(0 && (EXPR)))
> Now, the last one actually doesn't evaluate any of the side-effects in the
> argument, just quiets up unused var/parameter warnings.
> I've tried to replace the middle definition with
> ({ [[assume (EXPR)]]; (void) 0; })
> for compilers which support assume attribute and statement expressions
> (surprisingly quite a few spots use gcc_assert inside of comma expressions),
> but ran into PR117287, so for now such a change isn't being proposed.

> [...] I've attempted to do
> x86_64-linux bootstrap with --disable-checking and gcc_assert changed to the
> ((void)(0 && (EXPR)))
> version when --disable-checking.  That version ran into spurious middle-end
> warnings
> ../../gcc/../include/libiberty.h:733:36: error: argument to ‘alloca’ is too 
> large [-Werror=alloca-larger-than=]
> ../../gcc/tree-ssa-reassoc.cc:5659:20: note: in expansion of macro 
> ‘XALLOCAVEC’
>   int op_num = ops.length ();
>   int op_normal_num = op_num;
>   gcc_assert (op_num > 0);
>   int stmt_num = op_num - 1;
>   gimple **stmts = XALLOCAVEC (gimple *, stmt_num);
> where we have gcc_assert exactly to work-around middle-end warnings.

See last year's

"Fix false positive for -Walloc-size-larger-than, part II [PR79132]".


Grüße
 Thomas


> Guess I'd need to also disable -Werror for this experiment, which actually
> isn't a problem with unmodified system.h, because even for
> --disable-checking we use the __builtin_unreachable at least in
> stage2/stage3 and so the warnings aren't emitted, and even if it used
> [[assume ()]]; it would work too because in stage2/stage3 we could again
> rely on assume and statement expression support.


Use unique_ptr in more places in pretty_printer/diagnostics: 'gcc/config/gcn/mkoffload.cc' [PR116613] (was: [RFC/PATCH] Use unique_ptr in more places in pretty_printer/diagnostics [PR116613])

2024-10-25 Thread Thomas Schwinge
Hi!

On 2024-10-14T19:18:46-0400, David Malcolm  wrote:
> [...]
> Unfortunately we can't directly include  in our internal headers
> but instead any of our TUs that make use of std::unique_ptr must #define
> INCLUDE_MEMORY before including system.h.
>
> Hence the bulk of this patch is taken up with adding a define of
> INCLUDE_MEMORY to hundreds of source files: everything that includes
> diagnostic.h or pretty-print.h (and thus anything transitively such as
> includers of lto-wrapper.h, c-tree.h, cp-tree.h and rtl-ssa.h).

> I've successfully built stage 1 on all configurations with this patch
> *without* Modula 2.

..., and without offloading configured -- which would bring a little bit
of extra code.  (Indeed offloading configurations aren't covered in
'contrib/config-list.mk', hmm...)

> Pushed to trunk as r15-4610-gbf43fe6aa966ea.

So you've got the nvptx 'mkoffload' adjusted:

> --- a/gcc/config/nvptx/mkoffload.cc
> +++ b/gcc/config/nvptx/mkoffload.cc
> @@ -29,6 +29,7 @@
>  
>  #define IN_TARGET_CODE 1
>  
> +#define INCLUDE_MEMORY
>  #include "config.h"
>  #include "system.h"
>  #include "coretypes.h"

..., but we likewise need to adjust the GCN one; I've pushed to
trunk branch commit b3aa301db1b09b533b3635791a98d6bf906e9a15
"Use unique_ptr in more places in pretty_printer/diagnostics: 
'gcc/config/gcn/mkoffload.cc' [PR116613]",
see attached.


Grüße
 Thomas


>From b3aa301db1b09b533b3635791a98d6bf906e9a15 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 24 Oct 2024 20:56:21 +0200
Subject: [PATCH] Use unique_ptr in more places in pretty_printer/diagnostics:
 'gcc/config/gcn/mkoffload.cc' [PR116613]

After recent commit bf43fe6aa966eaf397ea3b8ebd6408d3d124e285
"Use unique_ptr in more places in pretty_printer/diagnostics [PR116613]":

[...]
In file included from ../../source-gcc/gcc/config/gcn/mkoffload.cc:31:0:
../../source-gcc/gcc/diagnostic.h:29:3: error: #error "You must define INCLUDE_MEMORY before including system.h to use diagnostic.h"
 # error "You must define INCLUDE_MEMORY before including system.h to use diagnostic.h"
   ^
In file included from ../../source-gcc/gcc/diagnostic.h:34:0,
 from ../../source-gcc/gcc/config/gcn/mkoffload.cc:31:
../../source-gcc/gcc/pretty-print.h:29:3: error: #error "You must define INCLUDE_MEMORY before including system.h to use pretty-print.h"
 # error "You must define INCLUDE_MEMORY before including system.h to use pretty-print.h"
   ^
In file included from ../../source-gcc/gcc/diagnostic.h:34:0,
 from ../../source-gcc/gcc/config/gcn/mkoffload.cc:31:
../../source-gcc/gcc/pretty-print.h:280:16: error: 'unique_ptr' in namespace 'std' does not name a template type
   virtual std::unique_ptr clone () const;
^
In file included from ../../source-gcc/gcc/config/gcn/mkoffload.cc:31:0:
../../source-gcc/gcc/diagnostic.h:585:32: error: 'std::unique_ptr' has not been declared
   void set_output_format (std::unique_ptr output_format);
^
[...]

	PR other/116613
	gcc/
	* config/gcn/mkoffload.cc: Add '#define INCLUDE_MEMORY'.
---
 gcc/config/gcn/mkoffload.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index c1d80aae59c..17a33421134 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -24,6 +24,7 @@
This is not a complete assembler.  We presume the source is well
formed from the compiler and can die horribly if it is not.  */
 
+#define INCLUDE_MEMORY
 #include "config.h"
 #include "system.h"
 #include "coretypes.h"
-- 
2.34.1



Re: Fortran test typebound_operator_7.f03 broken by non-Fortran commit. Confirm anyone?

2024-10-15 Thread Thomas Schwinge
gt;expr = build_call_vec (TREE_TYPE (fntype), se->expr, arglist);
>  
> +  if (is_builtin)
> +se->expr = update_builtin_function (se->expr, sym);
> +
>/* Allocatable scalar function results must be freed and nullified
>   after use. This necessitates the creation of a temporary to
>   hold the result to prevent duplicate calls.  */

..., however: 'conv_function_val' is not always called here, and
therefore 'is_builtin' not always initialized, giving rise to
PR117136 "[15 regression] ICE for gfortran.dg/typebound_operator_11.f90 since 
r15-4298-g3269a722b7a036".
Based on Harald's analysis and patch, I've pushed to trunk branch
commit fa90febea9801d4255bf6a1e9f0fd998629c3c7c
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' 
__builtin_is_initial_device: Fix 'is_builtin' initialization",
see attached.


Grüße
 Thomas


>From fa90febea9801d4255bf6a1e9f0fd998629c3c7c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 15 Oct 2024 09:29:53 +0200
Subject: [PATCH] Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP'
 __builtin_is_initial_device: Fix 'is_builtin' initialization

Bug fix for commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device".

	PR fortran/82250
	PR fortran/82251
	PR fortran/117136
	gcc/fortran/
	* trans-expr.cc (gfc_conv_procedure_call): Initialize
	'is_builtin'.
	(conv_function_val): Clean up.

Co-authored-by: Harald Anlauf 
---
 gcc/fortran/trans-expr.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index b9f585d0d2f..569b92a48ab 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -4437,7 +4437,6 @@ conv_function_val (gfc_se * se, bool *is_builtin, gfc_symbol * sym,
 {
   tree tmp;
 
-  *is_builtin = false;
   if (gfc_is_proc_ptr_comp (expr))
 tmp = get_proc_ptr_comp (expr);
   else if (sym->attr.dummy)
@@ -8218,6 +8217,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
   arglist = retargs;
 
   /* Generate the actual call.  */
+  is_builtin = false;
   if (base_object == NULL_TREE)
 conv_function_val (se, &is_builtin, sym, expr, args);
   else
-- 
2.34.1



OpenACC 'nohost' clause: harmonize 'libgomp.oacc-{c-c++-common,fortran}/routine-nohost-1.*'

2024-10-14 Thread Thomas Schwinge
Hi!

On 2021-07-22T00:20:13+0200, I wrote:
> [...], I've now pushed "OpenACC 'nohost' clause" to
> master branch in commit a61f6afbee370785cf091fe46e2e022748528307, [...]

Via Tobias' recent commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' 
__builtin_is_initial_device",
I remembered this thing from three years ago:

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c
> @@ -0,0 +1,63 @@
> +/* Test 'nohost' clause via 'acc_on_device'.
> +
> +   With optimizations disabled, we currently don't expect that 
> 'acc_on_device' "evaluates at compile time to a constant".
> +   { dg-skip-if "TODO PR82391" { *-*-* } { "-O0" } }
> +*/
> +
> +/* { dg-additional-options "-fdump-tree-oaccdevlow" } */
> +
> +/* { dg-additional-options "-fno-inline" } for stable results regarding 
> OpenACC 'routine'.  */
> +[...]

Here we do specify '-fno-inline'...

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.oacc-fortran/routine-nohost-1.f90
> @@ -0,0 +1,63 @@
> +! Test 'nohost' clause via 'acc_on_device'.
> +
> +! { dg-do run }
> +
> +! With optimizations disabled, we currently don't expect that 
> 'acc_on_device' "evaluates at compile time to a constant".
> +! { dg-skip-if "TODO PR82391" { *-*-* } { "-O0" } }
> +
> +! { dg-additional-options "-fdump-tree-oaccdevlow" }
> +
> +program main
> +[...]

..., but here we didn't.  To address that, I've pushed to trunk branch
commit de0320712d026a2d1eeb57aef277fa5a91808ac2 (HEAD, upstream/trunk)
"OpenACC 'nohost' clause: harmonize 
'libgomp.oacc-{c-c++-common,fortran}/routine-nohost-1.*'",
see attached.


Grüße
 Thomas


>From de0320712d026a2d1eeb57aef277fa5a91808ac2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 14 Oct 2024 14:38:13 +0200
Subject: [PATCH] OpenACC 'nohost' clause: harmonize
 'libgomp.oacc-{c-c++-common,fortran}/routine-nohost-1.*'

The test case 'libgomp.oacc-fortran/routine-nohost-1.f90' added in 2021
commit a61f6afbee370785cf091fe46e2e022748528307 "OpenACC 'nohost' clause" was
dependend on inlining being enabled, and otherwise ('-fno-inline') failed to
optimize/link:

/tmp/ccb2hsPd.o: In function `MAIN__._omp_fn.0':
routine-nohost-1.f90:(.text+0xf4): undefined reference to `fact_nohost_'

However, as of recent commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device",
we're now properly handling OpenACC/Fortran 'acc_on_device', and may specify
'-fno-inline', like done in 'libgomp.oacc-c-c++-common/routine-nohost-1.c'.

	libgomp/
	* testsuite/libgomp.oacc-fortran/routine-nohost-1.f90: Add
	'-fno-inline'.
---
 libgomp/testsuite/libgomp.oacc-fortran/routine-nohost-1.f90 | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/routine-nohost-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/routine-nohost-1.f90
index b0537b8ff0b..e5f3e5740da 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/routine-nohost-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/routine-nohost-1.f90
@@ -7,6 +7,8 @@
 
 ! { dg-additional-options "-fdump-tree-oaccloops" }
 
+! { dg-additional-options "-fno-inline" } for stable results regarding OpenACC 'routine'.
+
 program main
   use openacc
   implicit none
-- 
2.34.1



Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device: Revert 'gimple_fold_builtin_acc_on_device' change

2024-10-14 Thread Thomas Schwinge
Hi!

On 2024-10-14T10:23:56+0200, I wrote:
> On 2024-10-13T10:21:01+0200, Tobias Burnus  wrote:
>> Now pushed as r15-4298-g3269a722b7a036.

>>>>> * (new) For OpenACC, use a builtin for acc_on_device + actually do 
>>>>> compile-time optimization when offloading is not configured.
>
> No. 2.  This resolved
> PR82250 "Fortran OpenACC acc_on_device early folding", right?
> (..., which you recently had duplicated as
> PR116269 "[OpenACC] acc_on_device – compile-time optimization fails",
> right?)
>
> Please:
>
> git mv gfortran.dg/goacc/acc_on_device-2{-off,_-fno-openacc}.f95
>
> ..., and add a 's%-fno-openacc%-fno-builtin-acc_on_device' variant.
>
> Hmm, why can't 'gfortran.dg/goacc/acc_on_device-2.f95' be un-XFAILed?

>>>>> PS: The testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c 
>>>>> example is not completely clear to me; however, the new optimization 
>>>>> causes that without offloading enabled, the dump message is not 
>>>>> shown. I tried to understand it better with 
>>>>> -fno-builtin-acc_on_device, but that then caused link errors as the 
>>>>> device function wasn't optimizated away, leaving me puzzled. — At 
>>>>> the end, I just changed the dg-* and did not try to understand the 
>>>>> issue.
>
> Why then not wait for someone else to help look into that?  :-)

> On 2024-10-10T10:31:13+0200, Tobias Burnus  wrote:
>> Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' 
>> __builtin_is_initial_device

>> Extend the code to also use the builtin acc_on_device with OpenACC,
>> which was previously only used in C/C++.  Additionally, fix folding
>> when offloading is not enabled.

I don't understand the latter part: what needs to be fixed?

>> gcc/ChangeLog:
>>
>>  * gimple-fold.cc (gimple_fold_builtin_acc_on_device): Also fold
>>  when offloading is not configured.

We already did fold, didn't we?

>> --- a/gcc/gimple-fold.cc
>> +++ b/gcc/gimple-fold.cc
>> @@ -4190,7 +4190,7 @@ static bool
>>  gimple_fold_builtin_acc_on_device (gimple_stmt_iterator *gsi, tree arg0)
>>  {
>>/* Defer folding until we know which compiler we're in.  */
>> -  if (symtab->state != EXPANSION)
>> +  if (ENABLE_OFFLOADING && symtab->state != EXPANSION)
>>  return false;
>>  
>>unsigned val_host = GOMP_DEVICE_HOST;

That is, I don't understand the rationale for diverging GCC's (default)
'--disable-offload-targets' vs. '--enable-offload-targets=[...]'
configurations here?

>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c
>> @@ -36,8 +36,7 @@ static int fact_nohost(int n)
>>  
>>return fact(n);
>>  }
>> -/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine 'fact_nohost' 
>> has 'nohost' clause\.$} 1 oaccloops { target c } } }
>> -   { dg-final { scan-tree-dump-times {(?n)^OpenACC routine 'int 
>> fact_nohost\(int\)' has 'nohost' clause\.$} 1 oaccloops { target { c++ && { 
>> ! offloading_enabled } } } } }
>> +/* { dg-final { scan-tree-dump-times {(?n)^OpenACC routine 'fact_nohost' 
>> has 'nohost' clause\.$} 1 oaccloops { target { c && offloading_enabled } } } 
>> }
>> { dg-final { scan-tree-dump-times {(?n)^OpenACC routine 
>> 'fact_nohost\(int\)' has 'nohost' clause\.$} 1 oaccloops { target { c++ && 
>> offloading_enabled } } } }
>> TODO See PR101551 for 'offloading_enabled' differences.  */

OK to push the attached
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' 
__builtin_is_initial_device: Revert 'gimple_fold_builtin_acc_on_device' change"?


Grüße
 Thomas


>From d4cf1d795a70b35082ec33315efe9e49fa6b0cbf Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 14 Oct 2024 10:45:06 +0200
Subject: [PATCH] Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP'
 __builtin_is_initial_device: Revert 'gimple_fold_builtin_acc_on_device'
 change

The motivation of the 'gimple_fold_builtin_acc_on_device' change in
commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device"
is unclear, and it unnecessarily diverges GCC's (default)
'--disable-offload-targets' vs. '--enable-offload-targets=[...]'
configurations.

	PR testsuite/8

Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device: Harmonize 'libgomp.oacc-fortran/acc_on_device-1-*'

2024-10-14 Thread Thomas Schwinge
Hi!

On 2024-10-14T10:23:56+0200, I wrote:
> On 2024-10-13T10:21:01+0200, Tobias Burnus  wrote:
>> Now pushed as r15-4298-g3269a722b7a036.

>> --- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90

>> -! TODO: Have to disable the acc_on_device builtin for we want to test the
>> -! libgomp library function?  The command line option
>> -! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not for
>> -! Fortran.

Here, you've just remove the comment, whereas...

>> --- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
>> @@ -1,5 +1,5 @@
>>  ! { dg-do run }
>> -! { dg-additional-options "-cpp" }
>> +! { dg-additional-options "-cpp -fno-builtin-acc_on_device" }

>> -! TODO: Have to disable the acc_on_device builtin for we want to test
>> -! the libgomp library function?  The command line option
>> -! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not
>> -! for Fortran.

... here, and...

>> --- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f
>> @@ -1,5 +1,5 @@
>>  ! { dg-do run }
>> -! { dg-additional-options "-cpp" }
>> +! { dg-additional-options "-cpp -fno-builtin-acc_on_device" }

>> -! TODO: Have to disable the acc_on_device builtin for we want to test
>> -! the libgomp library function?  The command line option
>> -! '-fno-builtin-acc_on_device' is valid for C/C++/ObjC/ObjC++ but not
>> -! for Fortran.

... here, you also specify '-fno-builtin-acc_on_device'.  This should be
done in the former, too, and some explanation be added, like in
'libgomp.oacc-c-c++-common/acc_on_device-1.c'.  Pushed to trunk branch
commit 9f549d216c9716e787aaa38593bc9f83195b60ae
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' 
__builtin_is_initial_device: Harmonize 
'libgomp.oacc-fortran/acc_on_device-1-*'",
see attached.


Grüße
 Thomas


>From 9f549d216c9716e787aaa38593bc9f83195b60ae Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 14 Oct 2024 10:34:34 +0200
Subject: [PATCH] Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP'
 __builtin_is_initial_device: Harmonize
 'libgomp.oacc-fortran/acc_on_device-1-*'

The test case 'libgomp.oacc-fortran/acc_on_device-1-1.f90' added in
commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device"
was missing '-fno-builtin-acc_on_device', and all
'libgomp.oacc-fortran/acc_on_device-1-*' need comments, why that option is
specified.

	PR testsuite/82250
	libgomp/
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Add
	'-fno-builtin-acc_on_device'.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Comment.
	* testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Comment.
---
 libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90 | 3 +++
 libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f   | 5 -
 libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f   | 5 -
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90 b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
index 89748204f05..774c2b869e8 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90
@@ -1,6 +1,9 @@
 ! { dg-do run }
 ! { dg-additional-options "-cpp" }
 
+! Disable the acc_on_device builtin; we want to test the libgomp library function.
+! { dg-additional-options -fno-builtin-acc_on_device }
+
 ! { dg-additional-options "-fopt-info-all-omp" }
 ! { dg-additional-options "--param=openacc-privatization=noisy" }
 ! { dg-additional-options "-foffload=-fopt-info-all-omp" }
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
index e31e0fc715b..b57beac6f43 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
+++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f
@@ -1,5 +1,8 @@
 ! { dg-do run }
-! { dg-additional-options "-cpp -fno-builtin-acc_on_device" }
+! { dg-additional-options "-cpp" }
+
+! Disable the acc_on_device builtin; we want to test the libgomp library function.
+! { dg-additional-options -fno-builtin-acc_on_device }
 
 ! { dg-additional-options "-fopt-info-all-omp" }
 ! { dg-additional-options "--param=openac

Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device: Fix effective-target keyword in 'libgomp.oacc-fortran/acc_on_device-2.f90'

2024-10-14 Thread Thomas Schwinge
Hi!

On 2024-10-14T10:23:56+0200, I wrote:
> On 2024-10-13T10:21:01+0200, Tobias Burnus  wrote:
>> Now pushed as r15-4298-g3269a722b7a036.

>>>>> Tested on x86-64 without and with offloading configured, running 
>>>>> with nvptx offloading.

I see an UNRESOLVED:

+PASS: libgomp.oacc-fortran/acc_on_device-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O   scan-tree-dump-not optimized 
"acc_on_device"
+PASS: libgomp.oacc-fortran/acc_on_device-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O   scan-tree-dump-times gimple 
"acc_on_device" 1
+PASS: libgomp.oacc-fortran/acc_on_device-2.f90 -DACC_DEVICE_TYPE_host=1 
-DACC_MEM_SHARED=1 -foffload=disable  -O  (test for excess errors)
+PASS: libgomp.oacc-fortran/acc_on_device-2.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O   at line 37 (test for warnings, 
line 36)
+UNRESOLVED: libgomp.oacc-fortran/acc_on_device-2.f90 
-DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O   
scan-nvptx-none-offload-tree-dump-not optimized "acc_on_device"
+PASS: libgomp.oacc-fortran/acc_on_device-2.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O   scan-tree-dump-not optimized 
"acc_on_device"
+PASS: libgomp.oacc-fortran/acc_on_device-2.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O   scan-tree-dump-times gimple 
"acc_on_device" 1
+PASS: libgomp.oacc-fortran/acc_on_device-2.f90 -DACC_DEVICE_TYPE_nvidia=1 
-DACC_MEM_SHARED=0 -foffload=nvptx-none  -O  (test for excess errors)

>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-2.f90
>> @@ -0,0 +1,40 @@
>> +! { dg-do link }
>> +
>> +! Check whether 'acc_on_device()' is properly compile-time optimized. */
>> +
>> +! { dg-additional-options "-fdump-tree-gimple -fdump-tree-optimized" }
>> +! { dg-additional-options -foffload-options=-fdump-tree-optimized { target 
>> { offload_device_nvptx || offload_target_amdgcn } } }
>> +
>> +! { dg-final { scan-tree-dump-times "acc_on_device" 1 "gimple" } }
>> +
>> +! { dg-final { scan-tree-dump-not "acc_on_device" "optimized" } }
>> +
>> +! { dg-final { only_for_offload_target amdgcn-amdhsa 
>> scan-offload-tree-dump-not "acc_on_device" "optimized" { target 
>> offload_target_amdgcn } } }
>> +! { dg-final { only_for_offload_target nvptx-none 
>> scan-offload-tree-dump-not "acc_on_device" "optimized" { target 
>> offload_target_nvptx } } }

Pushed to trunk branch commit c3774b2e2d7d00ad9f9f6fce10aa6bc872bd951f
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' 
__builtin_is_initial_device: Fix effective-target keyword in 
'libgomp.oacc-fortran/acc_on_device-2.f90'",
see attached.


Grüße
 Thomas


>> +
>> +
>> +module m
>> +   integer :: 
>> +   !$acc declare device_resident()
>> +contains
>> +  subroutine set_var
>> +!$acc routine
>> +use openacc
>> +implicit none (type, external)
>> +if (acc_on_device(acc_device_host)) then
>> +  xxxx = 1234
>> +else
>> +   = 4242
>> +end if
>> +  end
>> +end module m
>> +
>> +
>> +program main
>> +  use m
>> +  call set_var
>> +  !$acc serial
>> +! { dg-warning "using 'vector_length \\(32\\)', ignoring 1" "" { target 
>> openacc_nvidia_accel_selected } .-1 }
>> +call set_var
>> +  !$acc end serial
>> +end


>From c3774b2e2d7d00ad9f9f6fce10aa6bc872bd951f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 14 Oct 2024 10:26:13 +0200
Subject: [PATCH] Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP'
 __builtin_is_initial_device: Fix effective-target keyword in
 'libgomp.oacc-fortran/acc_on_device-2.f90'

The test case 'libgomp.oacc-fortran/acc_on_device-2.f90' added in
commit 3269a722b7a03613e9c4e2862bc5088c4a17cc11
"Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device"
had a mismatch between dump file production and its scanning; the former needs
to use 'offload_target_nvptx' (like 'offload_target_amdgcn'), not
'offload_device_nvptx'.

	PR testsuite/82250
	libgomp/
	* testsuite/libgomp.oacc-fortran/acc_on_device-2.f90: Fix
	effective-target keyword.
---
 libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-2.f90 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/testsuite/libgomp.oacc-fortran/acc_on_device-2

Re: [Patch] Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' __builtin_is_initial_device

2024-10-14 Thread Thomas Schwinge
Hi Tobias!

On 2024-10-13T10:21:01+0200, Tobias Burnus  wrote:
> Now pushed as r15-4298-g3269a722b7a036.

> Tobias Burnus wrote:
>> Anyone feeling like reviewing this patch?

Yes.  But please allow for more than 1 1/2 work days.

>> Tobias Burnus write:
>>> Tobias Burnus wrote:
 Sometimes waiting a bit leads to better code …

 Tobias Burnus wrote:
> ...
> [I guess, we eventually want to add support for more builtins. For 
> instance, acc_on_device would be a candidate, but I could imagine 
> some additional builtins.]

 I have now implemented acc_on_device and I think the new fix-up 
 function is way is nicer.

Thanks for looking into this!

 Thus, this patch does:

I wonder why you didn't make these several orthogonal changes into
several separate patches?

 * (v1) Fix omp_is_initial_device → do only replace when used in 
 calls (and not when used as function pointer/actual to a dummy 
 function) + fix ICE due to integer(4) != logical(4) in the middle end.

No. 1.

 * (new) For OpenACC, use a builtin for acc_on_device + actually do 
 compile-time optimization when offloading is not configured.

No. 2.  This resolved
PR82250 "Fortran OpenACC acc_on_device early folding", right?
(..., which you recently had duplicated as
PR116269 "[OpenACC] acc_on_device – compile-time optimization fails",
right?)

Please:

git mv gfortran.dg/goacc/acc_on_device-2{-off,_-fno-openacc}.f95

..., and add a 's%-fno-openacc%-fno-builtin-acc_on_device' variant.

Hmm, why can't 'gfortran.dg/goacc/acc_on_device-2.f95' be un-XFAILed?

 * (new) libgomp.texi: Typo fixes accumulated, fix wording

No. 3.

 and for 
 acc_on_device, add a note that compile-time folding may be done (and 
 how it can be disabled).

Into No. 2.

 For OpenACC, I now mix compile time folding vs. runtime to ensure 
 that it works.

No. 4.

And this:

--- a/gcc/fortran/types.def
+++ b/gcc/fortran/types.def

-DEF_PRIMITIVE_TYPE (BT_BOOL,
-   (*lang_hooks.types.type_for_size) (BOOL_TYPE_SIZE, 1))
+DEF_PRIMITIVE_TYPE (BT_BOOL, boolean_type_node)

... is yet another unrelated change?  No. 5.

 Tested on x86-64 without and with offloading configured, running 
 with nvptx offloading.

 PS: The testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c 
 example is not completely clear to me; however, the new optimization 
 causes that without offloading enabled, the dump message is not 
 shown. I tried to understand it better with 
 -fno-builtin-acc_on_device, but that then caused link errors as the 
 device function wasn't optimizated away, leaving me puzzled. — At 
 the end, I just changed the dg-* and did not try to understand the 
 issue.

Why then not wait for someone else to help look into that?  :-)

On 2024-10-10T10:31:13+0200, Tobias Burnus  wrote:
> Fortran: Use OpenACC's acc_on_device builtin, fix OpenMP' 
> __builtin_is_initial_device

Missing 'omp_' in '__builtin_[omp_]is_initial_device'.


I just received a SIGKID; to be continued in follow-on emails.


Grüße
 Thomas


> It turned out that 'if (omp_is_initial_device() .eqv. true)' gave an ICE
> due to comparing 'int' with 'logical(4)'. When digging deeper, it also
> turned out that when the procedure pointer is needed, the builtin cannot
> be used, either.  (Follow up to r15-2799-gf1bfba3a9b3f31 )

> Extend the code to also use the builtin acc_on_device with OpenACC,
> which was previously only used in C/C++.  Additionally, fix folding
> when offloading is not enabled.
>
> Fixes additionally the BT_BOOL data type, which was 'char'/integer(1)
> instead of bool, backing the booleaness; use bool_type_node as the rest
> of GCC.
>
> gcc/fortran/ChangeLog:
>
>   * gfortran.h (gfc_option_t): Add disable_acc_on_device.
>   * options.cc (gfc_handle_option): Handle -fno-builtin-acc_on_device.
>   * trans-decl.cc (gfc_get_extern_function_decl): Move
>   __builtin_omp_is_initial_device handling to ...
>   * trans-expr.cc (get_builtin_fn): ... this new function.
>   (conv_function_val): Call it.
>   (update_builtin_function): New.
>   (gfc_conv_procedure_call): Call it.
>   * types.def (BT_BOOL): Fix type by using bool_type_node.
>
> gcc/ChangeLog:
>
>   * gimple-fold.cc (gimple_fold_builtin_acc_on_device): Also fold
>   when offloading is not configured.
>
> libgomp/ChangeLog:
>
>   * libgomp.texi (TR13): Fix minor typos.
>   (omp_is_initial_device): Improve wording.
>   (acc_on_device): Note how to disable the builtin.
>   * testsuite/libgomp.oacc-fortran/acc_on_device-1-1.f90: Remove TODO.
>   * testsuite/libgomp.oacc-fortran/acc_on_device-1-2.f: Likewise.
>   Add -fno-builtin-acc_on_device.
>   * testsuite/libgomp.oacc-fortran/acc_on_device-1-3.f: Likewise.
>   * testsuite/libgomp.oacc-c-c++-common/routine-nohost-1.c: Update
>   dg- as !offloading_enabled now compile-time e

Re: [r15-4104 Regression] FAIL: gfortran.dg/gomp/allocate-static.f90 -Os (test for excess errors) on Linux/x86_64

2024-10-07 Thread Thomas Schwinge
Hi Tobias!

On 2024-10-07T17:07:05+0200, Tobias Burnus  wrote:
> haochen.jiang wrote:
>> On Linux/x86_64,
>> FAIL: gfortran.dg/gomp/allocate-static.f90   -O0  (test for excess errors)
>
> If anyone can reproduce this, I would be interested in the excess errors.

gfortran: fatal error: cannot read spec file 'libgomp.spec': No such file 
or directory

> On two machines – with and without offloading configured – I cannot 
> reproduce this neither with a bootsstrap nor non-bootstrap build, 
> neither with the testsuite nor under valgrind and also not with -m32 vs. 
> -m64.

Try again with build-tree (non-installed) testing.  ;-)

On 2024-10-07T10:47:56+0200, Tobias Burnus  wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/gomp/allocate-static.f90
> @@ -0,0 +1,62 @@
> +! { dg-do run }

Implicit linking here.

I already was about to 'git mv' the file into
'libgomp/testsuite/libgomp.fortran/' -- but then realized that we
probably also should get rid of this local 'module omp_lib_kinds':

> +module omp_lib_kinds
> +  use iso_c_binding, only: c_int, c_intptr_t
> +  implicit none
> +  private :: c_int, c_intptr_t
> +  integer, parameter :: omp_allocator_handle_kind = c_intptr_t
> +
> +  integer (kind=omp_allocator_handle_kind), &
> + parameter :: omp_null_allocator = 0
> +  [...]
> +end module

..., right?

> +[...]


Grüße
 Thomas


nvptx: Disable effective-target 'freestanding' (was: [PATCH 3/9] nvptx: Re-enable test cases by removing effective target 'freestanding')

2024-10-07 Thread Thomas Schwinge
Hi!

On 2022-12-02T13:03:09+0100, I wrote:
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp

> -# Check to see if a target is "freestanding". This is as per the definition
> -# in Section 4 of C99 standard. Effectively, it is a target which supports no
> -# extra headers or libraries other than what is considered essential.
> -proc check_effective_target_freestanding { } {
> -if { [istarget nvptx-*-*] } {
> -   return 1
> -}
> -return 0
> -}

I have, for now, pushed a simpler variant of this to trunk branch in
commit 65c7616c251a6697134b2a3ac7fe6460d308d2ed
"nvptx: Disable effective-target 'freestanding'", see attached.


Grüße
 Thomas


>From 65c7616c251a6697134b2a3ac7fe6460d308d2ed Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 28 Nov 2022 13:49:06 +0100
Subject: [PATCH] nvptx: Disable effective-target 'freestanding'

After 2014's commit 157e859ffe3b5d43db1e19475711c1a3d21ab57a "remove picochip",
the effective-target 'freestanding' (later) was only ever used for nvptx.
However, the relevant I/O library functions have long been implemented in nvptx
newlib.

These test cases generally PASS, just a few need to get XFAILed; see
<https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/#system-calls>,
and then supposedly
<https://docs.nvidia.com/cuda/cuda-c-programming-guide/#formatted-output> for
description of the non-standard PTX 'vprintf' return value:

> Unlike the C-standard 'printf()', which returns the number of characters
> printed, CUDA's 'printf()' returns the number of arguments parsed. If no
> arguments follow the format string, 0 is returned. If the format string is
> NULL, -1 is returned. If an internal error occurs, -2 is returned.

(I've tried a few variants to confirm that PTX 'vprintf' -- which supposedly is
underlying the CUDA 'printf' -- is what's implementing this behavior.)
Probably, we ought to fix that up in nvptx newlib.

	gcc/testsuite/
	* gcc.c-torture/execute/printf-1.c: XFAIL for nvptx.
	* gcc.c-torture/execute/printf-chk-1.c: Likewise.
	* gcc.c-torture/execute/vprintf-1.c: Likewise.
	* gcc.c-torture/execute/vprintf-chk-1.c: Likewise.
	* lib/target-supports.exp (check_effective_target_freestanding):
	Disable for nvptx.
---
 gcc/testsuite/gcc.c-torture/execute/printf-1.c  | 1 +
 gcc/testsuite/gcc.c-torture/execute/printf-chk-1.c  | 1 +
 gcc/testsuite/gcc.c-torture/execute/vprintf-1.c | 1 +
 gcc/testsuite/gcc.c-torture/execute/vprintf-chk-1.c | 1 +
 gcc/testsuite/lib/target-supports.exp   | 3 ---
 5 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.c-torture/execute/printf-1.c b/gcc/testsuite/gcc.c-torture/execute/printf-1.c
index 654e62766a8..e1201365c1f 100644
--- a/gcc/testsuite/gcc.c-torture/execute/printf-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/printf-1.c
@@ -1,4 +1,5 @@
 /* { dg-skip-if "requires io" { freestanding } }  */
+/* { dg-xfail-run-if {unexpected PTX 'vprintf' return value} { nvptx-*-* } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.c-torture/execute/printf-chk-1.c b/gcc/testsuite/gcc.c-torture/execute/printf-chk-1.c
index aab43062bae..6418957edae 100644
--- a/gcc/testsuite/gcc.c-torture/execute/printf-chk-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/printf-chk-1.c
@@ -1,4 +1,5 @@
 /* { dg-skip-if "requires io" { freestanding } }  */
+/* { dg-xfail-run-if {unexpected PTX 'vprintf' return value} { nvptx-*-* } } */
 
 #include 
 #include 
diff --git a/gcc/testsuite/gcc.c-torture/execute/vprintf-1.c b/gcc/testsuite/gcc.c-torture/execute/vprintf-1.c
index 259397ebda3..0fb1ade94e0 100644
--- a/gcc/testsuite/gcc.c-torture/execute/vprintf-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/vprintf-1.c
@@ -1,4 +1,5 @@
 /* { dg-skip-if "requires io" { freestanding } }  */
+/* { dg-xfail-run-if {unexpected PTX 'vprintf' return value} { nvptx-*-* } } */
 
 #ifndef test
 #include 
diff --git a/gcc/testsuite/gcc.c-torture/execute/vprintf-chk-1.c b/gcc/testsuite/gcc.c-torture/execute/vprintf-chk-1.c
index 04ecc4df4d9..7ea3617e184 100644
--- a/gcc/testsuite/gcc.c-torture/execute/vprintf-chk-1.c
+++ b/gcc/testsuite/gcc.c-torture/execute/vprintf-chk-1.c
@@ -1,4 +1,5 @@
 /* { dg-skip-if "requires io" { freestanding } }  */
+/* { dg-xfail-run-if {unexpected PTX 'vprintf' return value} { nvptx-*-* } } */
 
 #ifndef test
 #include 
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 459af8e58c6..1c9bbf64817 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -859,9 +859,6 @@ proc check_profiling_available { test_what } {
 # in Section 4 of C99 standard. Effectively, it is a target which supports no
 # extra headers or libraries other than what is considered essential.
 proc check_effective_target_freestanding { } {
-if { [istarget nvptx-*-*] } {
-	return 1
-}
 return 0
 }
 
-- 
2.34.1



Handle non-grouped stores as single-lane SLP: adjust 'gcc.dg/vect/slp-26.c', GCN (was: [PATCH 3/3] Handle non-grouped stores as single-lane SLP)

2024-10-07 Thread Thomas Schwinge
Hi!

On 2024-10-03T13:34:47+0200, Richard Biener  wrote:
> On Thu, 3 Oct 2024, Thomas Schwinge wrote:
>> On 2024-09-06T11:30:06+0200, Richard Biener  wrote:
>> > On Thu, 5 Sep 2024, Richard Biener wrote:
>> >> The following enables single-lane loop SLP discovery for non-grouped 
>> >> stores
>> >> and adjusts vectorizable_store to properly handle those.
>> 
>> > I have now pushed this as r15-3509-gd34cda72098867
>> 
>> >> --- a/gcc/testsuite/gcc.dg/vect/slp-26.c
>> >> +++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
>> >> @@ -50,4 +50,5 @@ int main (void)
>> >>  /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { 
>> >> target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } 
>> >> } } } } */
>> >>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { 
>> >> target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } 
>> >> } */
>> >>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 
>> >> "vect" { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || 
>> >> loongarch_sx } } } } } } } */
>> >> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 
>> >> "vect" { target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } 
>> >> } } } } } */
>> >> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 
>> >> "vect" { target { mips_msa || { amdgcn-*-* || loongarch_sx } } } } } */
>> >> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 
>> >> "vect" { target riscv_v } } } */
>> 
>> For '--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100'),
>> I see:
>> 
>> PASS: gcc.dg/vect/slp-26.c (test for excess errors)
>> PASS: gcc.dg/vect/slp-26.c execution test
>> PASS: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 1 
>> loops" 1
>> [-PASS:-]{+FAIL:+} gcc.dg/vect/slp-26.c scan-tree-dump-times vect 
>> "vectorizing stmts using SLP" 1
>> 
>> gcc.dg/vect/slp-26.c: pattern found 2 times
>> 
>> ..., so I suppose I'll apply the same change to 'amdgcn-*-*' as you did
>> to 'riscv_v'?
>
> I guess yes

Pushed to trunk branch commit b137e4bbcc488b44a037baad62a8da90659d7468
"Handle non-grouped stores as single-lane SLP: adjust 'gcc.dg/vect/slp-26.c', 
GCN",
see attached.


Grüße
 Thomas


> I don't remember exactly the reason but IIRC it's about the
> unsigned division which gcn might also be able to do - the 32817
> value is explicitly excluded from pattern recognition.  We don't have
> an effective target for unsigned [short] integer division.
>
> Richard.


>From b137e4bbcc488b44a037baad62a8da90659d7468 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 3 Oct 2024 12:52:30 +0200
Subject: [PATCH] Handle non-grouped stores as single-lane SLP: adjust
 'gcc.dg/vect/slp-26.c', GCN

As of commit d34cda720988674bcf8a24267c9e1ec61335d6de
"Handle non-grouped stores as single-lane SLP", we see for
'--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100'):

PASS: gcc.dg/vect/slp-26.c (test for excess errors)
PASS: gcc.dg/vect/slp-26.c execution test
PASS: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 1 loops" 1
[-PASS:-]{+FAIL:+} gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorizing stmts using SLP" 1

gcc.dg/vect/slp-26.c: pattern found 2 times

Apply the same change to 'amdgcn-*-*' as done for 'riscv_v'.

	gcc/testsuite/
	* gcc.dg/vect/slp-26.c: Adjust GCN.
---
 gcc/testsuite/gcc.dg/vect/slp-26.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-26.c b/gcc/testsuite/gcc.dg/vect/slp-26.c
index cdb5d9c694b..23917474ddc 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-26.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
@@ -50,5 +50,5 @@ int main (void)
 /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } */
 /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { mips_msa || { amdgcn-*-* || loongarch_sx } } } } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target riscv_v } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { target { mips_msa || loongarch_sx } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" { target { riscv_v || amdgcn-*-* } } } } */
-- 
2.34.1



Re: [PATCH 3/3] Handle non-grouped stores as single-lane SLP

2024-10-03 Thread Thomas Schwinge
Hi!

On 2024-09-06T11:30:06+0200, Richard Biener  wrote:
> On Thu, 5 Sep 2024, Richard Biener wrote:
>> The following enables single-lane loop SLP discovery for non-grouped stores
>> and adjusts vectorizable_store to properly handle those.

> I have now pushed this as r15-3509-gd34cda72098867

>> --- a/gcc/testsuite/gcc.dg/vect/slp-26.c
>> +++ b/gcc/testsuite/gcc.dg/vect/slp-26.c
>> @@ -50,4 +50,5 @@ int main (void)
>>  /* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" { target 
>> { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } } */
>>  /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
>> { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } */
>>  /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 0 "vect" 
>> { target { ! { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } 
>> } } } */
>> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
>> { target { mips_msa || { amdgcn-*-* || { riscv_v || loongarch_sx } } } } } } 
>> */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
>> { target { mips_msa || { amdgcn-*-* || loongarch_sx } } } } } */
>> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 2 "vect" 
>> { target riscv_v } } } */

For '--target=amdgcn-amdhsa' (tested '-march=gfx908', '-march=gfx1100'),
I see:

PASS: gcc.dg/vect/slp-26.c (test for excess errors)
PASS: gcc.dg/vect/slp-26.c execution test
PASS: gcc.dg/vect/slp-26.c scan-tree-dump-times vect "vectorized 1 loops" 1
[-PASS:-]{+FAIL:+} gcc.dg/vect/slp-26.c scan-tree-dump-times vect 
"vectorizing stmts using SLP" 1

gcc.dg/vect/slp-26.c: pattern found 2 times

..., so I suppose I'll apply the same change to 'amdgcn-*-*' as you did
to 'riscv_v'?


Grüße
 Thomas


Re: [nvptx PATCH] Implement isfinite and isnormal optabs in nvptx.md.

2024-09-27 Thread Thomas Schwinge
Hi Roger!

If you don't mind, I could use your help here (but: low priority!):

On 2024-07-27T19:18:35+0100, "Roger Sayle"  wrote:
> Previously, for isnormal, GCC -O2 would generate: [...]
> and with this patch becomes:
>
> mov.f64 %r23, %ar0;
> setp.neu.f64%r24, %r23, 0d;
> testp.normal.f64%r25, %r23;
> and.pred%r26, %r24, %r25;
> selp.u32%value, 1, 0, %r26;

Looking at this, shouldn't we be able to optimize ("combine") this into
somethink like (untested):

mov.f64 %r23, %ar0;
testp.normal.f64%r25, %r23;
setp.neu.and.f64%r26, %r23, 0d, %r25;
selp.u32%value, 1, 0, %r26;

(I hope I correctly understood PTX 'setp', 'combine [...] with a
predicate value by applying a Boolean operator'!)

That is, "combine":

CmpOp = { eq, ne, lt, le, gt, ge, lo, ls, hi, hs, equ, neu, ltu, leu, gtu, 
geu, num, nan };

BoolOp = { and, or, xor };

setp.CmpOp.TYPE %3, %2, %1;
BoolOp.pred %5, %3, %4

... into:

setp.CmpOp.BoolOp.TYPE %5, %2, %1, %4;

I tried adding a corresponding 'define_insn' for just the 'and' case at
hand (eventually to be generalized to 'BoolOp'), see the attached
"WIP nvptx: 'setp', 'combine [...] with a predicate value by applying a Boolean 
operator'".
This does do the expected transformation for quite a number of instances
in the GCC/nvptx target libraries (again: completely untested!) -- but it
doesn't for the new 'gcc.target/nvptx/isnormal.c', and I don't know how
to read '-fdump-rtl-combine-all', to understand, why.  Any "RTFM" or
other pointers gladly accepted, guidance about how to approach such an
issue.  (Or tell me it's just 'TARGET_RTX_COSTS'...)


Grüße
 Thomas


> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md

> +(define_insn "setcc_isnormal"
> +  [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
> + (unspec:BI [(match_operand:SDFM 1 "nvptx_register_operand" "R")]
> +UNSPEC_ISNORMAL))]
> +  ""
> +  "%.\\ttestp.normal%t1\\t%0, %1;")
> +
> +(define_expand "isnormal2"
> +  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
> + (unspec:SI [(match_operand:SDFM 1 "nvptx_register_operand" "R")]
> +UNSPEC_ISNORMAL))]
> +  ""
> +{
> +  rtx pred1 = gen_reg_rtx (BImode);
> +  rtx pred2 = gen_reg_rtx (BImode);
> +  rtx pred3 = gen_reg_rtx (BImode);
> +  rtx zero = CONST0_RTX (mode);
> +  rtx cmp = gen_rtx_fmt_ee (NE, BImode, operands[1], zero);
> +  emit_insn (gen_cmp (pred1, cmp, operands[1], zero));
> +  emit_insn (gen_setcc_isnormal (pred2, operands[1]));
> +  emit_insn (gen_andbi3 (pred3, pred1, pred2));
> +  emit_insn (gen_setccsi_from_bi (operands[0], pred3));
> +  DONE;
> +})

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/isnormal.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int isnormal(double x)
> +{
> +  return __builtin_isnormal(x);
> +}
> +
> +/* { dg-final { scan-assembler-times "testp.normal.f64" 1 } } */


>From c4c389a6bd262356023202adab08a48f044e59b2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 27 Sep 2024 15:14:19 +0200
Subject: [PATCH] WIP nvptx: 'setp', 'combine [...] with a predicate value by
 applying a Boolean operator'

Re "Implement isfinite and isnormal optabs in nvptx.md"

mov.f64 %r23, %ar0;
setp.neu.f64%r24, %r23, 0d;
testp.normal.f64%r25, %r23;
and.pred%r26, %r24, %r25;
selp.u32%value, 1, 0, %r26;

Can we optimize this into somethink like (untested):

mov.f64 %r23, %ar0;
testp.normal.f64%r25, %r23;
setp.neu.and.f64%r26, %r23, 0d, %r25;
selp.u32%value, 1, 0, %r26;

That is, "combine":

CmpOp = { eq, ne, lt, le, gt, ge, lo, ls, hi, hs, equ, neu, ltu, leu, gtu, geu, num, nan };

BoolOp = { and, or, xor };

setp.CmpOp.TYPE %3, %2, %1;
BoolOp.pred %5, %3, %4

..., into:

setp.CmpOp.BoolOp.TYPE %5, %2, %1, %4;
---
 gcc/config/nvptx/nvptx.cc |  3 +++
 gcc/config/nvptx/nvptx.md | 23 ---
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 96a1134220e..b4c4f9ff021 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -3080,6 +3080,9 @@ nvptx_print_operand (FILE *file, rtx x, int code)
 	default:
 	  gcc_unrea

Re: [nvptx PATCH] Implement isfinite and isnormal optabs in nvptx.md.

2024-09-27 Thread Thomas Schwinge
Hi Roger!

On 2024-07-27T19:18:35+0100, "Roger Sayle"  wrote:
> Firstly, thanks to Haochen Gui for recently adding optab support for
> isfinite and isnormal to the middle-end.

Do we, by the way, have documentation (I suppose that should be in
"GNU Compiler Collection (GCC) Internals"?) about the rationale and
subsequent optimization opportunities for having vs. not having
representations of "codes" (like, 'isfinite') in the various GCC IRs
etc., like builtins, internal functions, GIMPLE, optabs, RTL (..., and
I've probably missed some more)?

Of course, a lot of it can be inferred from the context or otherwise,
like having builtins corresponding to C library functions and then be
able to optimize according to their defined semantics, but others are not
always clear to me: like, why do we have 'copysign' RTL but not
'ifnormal'?

> This patch adds define_expand
> for both these functions to the nvptx backend, which conveniently has
> special instructions to simplify their implementation.

ACK.

> As this patch
> adds UNSPEC_ISFINITE and UNSPEC_ISNORMAL, I've also taken the opportunity
> to include/repost my tweak to clean-up/eliminate UNSPEC_COPYSIGN.

I'd seen your 2023 "Add RTX codes for [...] COPYSIGN", but not yet seen a
patch to use it for nvptx -- but indeed have stumbled over nvptx
'UNSPEC_COPYSIGN' a while ago; ACK.

> Previously, for isfinite, GCC on nvptx-none with -O2 would generate:
>
> mov.f64 %r26, %ar0;
> abs.f64 %r28, %r26;
> setp.gtu.f64%r31, %r28, 0d7fef;
> selp.u32%value, 0, 1, %r31;
>
> and with this patch, we now generate:
>
> mov.f64 %r23, %ar0;
> testp.finite.f64%r24, %r23;
> selp.u32%value, 1, 0, %r24;

Nice!

> Previously, for isnormal, GCC -O2 would generate:
>
> mov.f64 %r28, %ar0;
> abs.f64 %r22, %r28;
> setp.gtu.f64%r32, %r22, 0d7fef;
> setp.ltu.f64%r35, %r22, 0d0010;
> or.pred %r43, %r35, %r32;
> selp.u32%value, 0, 1, %r43;
>
> and with this patch becomes:
>
> mov.f64 %r23, %ar0;
> setp.neu.f64%r24, %r23, 0d;
> testp.normal.f64%r25, %r23;
> and.pred%r26, %r24, %r25;
> selp.u32%value, 1, 0, %r26;
>
> Notice that although nvptx provides a testp.normal.f{32,64} instruction,
> the semantics don't quite match those required of libm [+0.0 and -0.0
> are considered normal by this instruction, but need to return false
> for __builtin_isnormal, hence the additional logic

Ugh.  ;-)

> which is still
> better than the original].

ACK.

> This patch has been tested on nvptx-none hosted by x86_64-pc-linux-gnu
> using make and make -k check, with only one new failure in the testsuite.
> The test case g++.dg/opt/pr107569.C exposes a latent bug in the middle-end
> (actually a missed optimization) as evrp fails to bound the results of
> isfinite.  This issue is independent of the back-end, as the tree-ssa
> evrp pass is run long before __builtin_finite is expanded by the backend,
> and the existence of an (any) isfinite optab is sufficient to expose it.
> Fortunately, Haochem Gui has already posted/proposed a fix at
> https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657881.html
> [which I'm sad to see is taking a while to review/get approved].

Well, now this nvptx one here took me even longer to look into, so the
'g++.dg/opt/pr107569.C' regression is resolved by now.  ;-\

> Ok for mainline?

Just minor items: generally, I do like seeing logically separate changes
as separate commits (like, the 'copysign' cleanup is not conceptually
related to the 'isfinite', 'isnormal' enhancements).  However, that's my
own ambition; I do acknowledge that others do things differently, like
mixing in small cleanups with other changes.  Also, I personally strive
to go one step further with enhancing test suite coverage (for example,
move towards using 'check-function-bodies' instead of 'scan-assembler',
and first push the current/"bad" test case as its own commit, possibly
partly XFAILed, and as part of the code-changes commit then "fix up" the
test case, so that the latter changes are visible in the commit history).
But again, that's my own ambition; I do acknowledge that others do things
differently.

All that said, the patch is OK as is, with just one small enhancement,
see below.  Thank you!

> Thanks in advance (p.s. don't forget the nvptx_rtx_costs patch),

Aye-aye!

> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md

> +(define_insn "setcc_isnormal"
> +  [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
> + (unspec:BI [(match_operand:SDFM 1 "nvptx_register_operand" "R")]
> +UNSPEC_ISNORMAL))]
> +  ""
> +  "%.\\ttestp.normal%t1\\t%0, %1;")
> +
> +(define

RE: [nvptx] Fix code-gen for alias attribute

2024-09-23 Thread Thomas Schwinge
Hi Prathamesh!

On 2024-09-23T08:24:36+, Prathamesh Kulkarni  wrote:
> Thanks for the review and sorry for late reply.

No worries.  My replies often are way more delayed...  ;'-|

> The attached patch addresses the above suggestions.
> Does it look OK ?

ACK, thanks!

> (Also, could you please test it at your end as well?)

As expected:

 PASS: gcc.target/nvptx/alias-to-alias-1.c (test for excess errors)
+PASS: gcc.target/nvptx/alias-to-alias-1.c execution test
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)\\tcall 
bar;$ 0
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)\\tcall 
baz;$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)\\tcall 
foo;$ 0
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)^// 
BEGIN GLOBAL FUNCTION DECL: bar$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)^// 
BEGIN GLOBAL FUNCTION DECL: baz$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)^// 
BEGIN GLOBAL FUNCTION DECL: foo$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)^// 
BEGIN GLOBAL FUNCTION DEF: bar$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)^// 
BEGIN GLOBAL FUNCTION DEF: baz$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times (?n)^// 
BEGIN GLOBAL FUNCTION DEF: foo$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.alias bar,foo;$ 1
-PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.alias baz,bar;$ 1
+PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.alias baz,foo;$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.visible \\.func bar;$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.visible \\.func baz;$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.visible \\.func foo$ 1
 PASS: gcc.target/nvptx/alias-to-alias-1.c scan-assembler-times 
(?n)^\\.visible \\.func foo;$ 1


Grüße
 Thomas


> nvptx: Partial support for aliases to aliases.
>
> For the following test (adapted from pr96390.c):
>
> __attribute__((noipa)) int foo () { return 42; }
> int bar () __attribute__((alias ("foo")));
> int baz () __attribute__((alias ("bar")));
>
> int main ()
> {
>   int n;
>   #pragma omp target map(from:n)
> n = baz ();
>   return n;
> }
>
> gcc emits following ptx for baz:
> .visible .func (.param.u32 %value_out) bar;
> .alias bar,foo;
> .visible .func (.param.u32 %value_out) baz;
> .alias baz,bar;
>
> which is incorrect since PTX requires aliasee to be a defined function.
> The patch instead uses cgraph_node::get(name)->ultimate_alias_target,
> which generates the following PTX:
>
> .visible .func (.param.u32 %value_out) baz;
> .alias baz,foo;
>
> gcc/ChangeLog:
>   PR target/104957
> * config/nvptx/nvptx.cc (nvptx_asm_output_def_from_decls): Use
> cgraph_node::get(name)->ultimate_alias_target instead of value.
>
> gcc/testsuite/ChangeLog:
>   PR target/104957
>   * gcc.target/nvptx/alias-to-alias-1.c: Adjust.
>
> Signed-off-by: Prathamesh Kulkarni 
> Co-authored-by: Thomas Schwinge 
>
> diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
> index 4a7c64f05eb..96a1134220e 100644
> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc
> @@ -7582,7 +7582,8 @@ nvptx_mem_local_p (rtx mem)
>while (0)
>  
>  void
> -nvptx_asm_output_def_from_decls (FILE *stream, tree name, tree value)
> +nvptx_asm_output_def_from_decls (FILE *stream, tree name,
> +  tree value ATTRIBUTE_UNUSED)
>  {
>if (nvptx_alias == 0 || !TARGET_PTX_6_3)
>  {
> @@ -7617,7 +7618,8 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree 
> name, tree value)
>return;
>  }
>  
> -  if (!cgraph_node::get (name)->referred_to_p ())
> +  cgraph_node *cnode = cgraph_node::get (name);
> +  if (!cnode->referred_to_p ())
>  /* Prevent "Internal error: reference to deleted section".  */
>  return;
>  
> @@ -7626,11 +7628,27 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree 
> name, tree value)
>fputs (s.str ().c_str (), stream);
>  
>tree id = DECL_ASSEMBLER_NAME (name);
> +
> +  /* Walk alias chain to get reference callgraph node.
> + The rationale of using ultimate_alias_target here is that
> + PTX's .alias directive only supports 1-level aliasing where
> + aliasee is function defined in same module.
> +
> + So for the following case:
> + int foo() { return 42; }
> + int bar () __attribute__((alias ("foo")));

Re: [COMMITTED] testsuite: debug: fix dejagnu directive syntax

2024-09-21 Thread Thomas Schwinge
Hi Andrew, Sam!

On 2024-09-20T14:21:33-0700, Andrew Pinski  wrote:
> On Fri, Sep 20, 2024 at 1:53 AM Thomas Schwinge  
> wrote:
>> On 2024-09-20T05:12:19+0100, Sam James  wrote:
>> > In this case, they were all harmless in reality (no diff in test logs).
>>
>> > -/* { dg-do compile )  */
>> > +/* { dg-do compile } */
>>
>> DejaGnu directives are matched by '{ dg-[...] }' (simplified; see
>> '/usr/share/dejagnu/dg.exp:dg-get-options' for the details), so your
>> changes did not "fix dejagnu directive syntax", but rather fix whitespace
>> around DejaGnu directives.  ;-P (Thanks either way!)
>
> I think you missed that in these cases it was `)` vs `}` . yes some
> fonts it is sometimes hard to tell especially it is on different lines
> but they are different characters.

Indeed, you're absolutely right, thanks for pointing that out, Andrew!
Sam, I apologize -- you did "fix dejagnu directive syntax" after all!
I'll blame it 1/3 on the font display (..., but now I do see it, of
course...), 1/3 on me not paying carful attention, and 1/3 on me not
paying carful attention due to having a cold after return from the
GNU Tools Cauldron 2024 yet trying to be funny.

..., and now let me please crawl back under my stone, and hide in shame.


Grüße
 Thomas


Re: [Patch, v3] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-09-20 Thread Thomas Schwinge
Hi Tobias!

I've not verified, but I very much suspect that this change:

On 2024-09-13T16:24:47+0200, Tobias Burnus  wrote:
> commit 508ef585243d4674d06b0737bfe8769fc18f824f
> Author: Tobias Burnus 
> Date:   Fri Sep 13 16:18:46 2024 +0200
>
> gcn/mkoffload.cc: Use #embed for including the generated ELF file

... is responsible for:

[-PASS:-]{+FAIL:+} libgomp.c/simd-math-1.c (test for excess errors)
[-PASS:-]{+UNRESOLVED:+} libgomp.c/simd-math-1.c [-execution 
test-]{+compilation failed to produce executable+}

/tmp/ccHVeRbm.c: In function 'configure_stack_size':
/tmp/ccHVeRbm.c:80:21: error: implicit declaration of function 'getenv' 
[-Wimplicit-function-declaration]
   80 |   const char *val = getenv ("GCN_STACK_SIZE");
  | ^~
/tmp/ccHVeRbm.c:1:1: note: 'getenv' is defined in header ''; this 
is probably fixable by adding '#include '
  +++ |+#include 
1 | static const int gcn_num_vars = 0;
[...]
/tmp/ccHVeRbm.c:82:5: error: implicit declaration of function 'setenv' 
[-Wimplicit-function-declaration]
   82 | setenv ("GCN_STACK_SIZE", "300", true);
  | ^~
/tmp/ccHVeRbm.c:82:42: error: 'true' undeclared (first use in this function)
   82 | setenv ("GCN_STACK_SIZE", "300", true);
  |  ^~~~
/tmp/ccHVeRbm.c:1:1: note: 'true' is defined in header ''; this 
is probably fixable by adding '#include '
  +++ |+#include 
1 | static const int gcn_num_vars = 0;
/tmp/ccHVeRbm.c:82:42: note: each undeclared identifier is reported only 
once for each function it appears in
   82 | setenv ("GCN_STACK_SIZE", "300", true);
  |  ^~~~
gcn mkoffload: fatal error: [...]/build-gcc/gcc/xgcc returned 1 exit status
compilation terminated.
lto-wrapper: fatal error: 
[...]/install/offload-amdgcn-amdhsa/libexec/gcc/x86_64-pc-linux-gnu/15.0.0//accel/amdgcn-amdhsa/mkoffload
 returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
compiler exited with status 1
FAIL: libgomp.c/simd-math-1.c (test for excess errors)

..., due to:

> --- a/gcc/config/gcn/mkoffload.cc
> +++ b/gcc/config/gcn/mkoffload.cc

> @@ -651,10 +613,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile)

> -  fprintf (cfile, "#include \n");
> -  fprintf (cfile, "#include \n");
> -  fprintf (cfile, "#include \n\n");

Did you not see that happen in your testing?


Grüße
 Thomas


Re: [Patch][v2] OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

2024-09-20 Thread Thomas Schwinge
Hi Tobias!

On 2024-09-19T19:11:32+0200, Tobias Burnus  wrote:
> OpenMP: Add get_device_from_uid/omp_get_uid_from_device routines

'[omp_]get_device_from_uid'.

> Those TR13/OpenMP 6.0 routines permit a reproducible offloading to
> a specific device by mapping an OpenMP device number to a
> unique ID (UID). The GPU device UIDs should be universally unique,
> the one for the host is not.

> --- a/gcc/omp-general.cc
> +++ b/gcc/omp-general.cc
> @@ -3260,6 +3260,7 @@ omp_runtime_api_procname (const char *name)
>"alloc",
>"calloc",
>"free",
> +  "get_device_from_uid",
>"get_interop_int",
>"get_interop_ptr",
>"get_mapped_ptr",
> @@ -3338,12 +3339,13 @@ omp_runtime_api_procname (const char *name)
>as DECL_NAME only omp_* and omp_*_8 appear.  */
>"display_env",
>"get_ancestor_thread_num",
> -  "init_allocator",
> +  "omp_get_uid_from_device",
>"get_partition_place_nums",
>"get_place_num_procs",
>"get_place_proc_ids",
>"get_schedule",
>"get_team_size",
> +  "init_allocator",
>"set_default_device",
>"set_dynamic",
>"set_max_active_levels",

..., but here without 'omp_' prefix: 'get_uid_from_device' (and properly
sorted).

Do we apparently not have test suite coverage for these things?

> --- a/libgomp/libgomp.h
> +++ b/libgomp/libgomp.h
> @@ -1387,6 +1387,7 @@ struct gomp_device_descr
>  
>/* The name of the device.  */
>const char *name;
> +  const char *uid;

Caching this here, instead of acquiring via 'GOMP_OFFLOAD_get_uid' for
each call, is a minor performance optimization?  (Similar to other items
cached here, I guess.)

> @@ -1399,6 +1400,7 @@ struct gomp_device_descr
>  
>/* Function handlers.  */
>__typeof (GOMP_OFFLOAD_get_name) *get_name_func;
> +  __typeof (GOMP_OFFLOAD_get_uid) *get_uid_func;
>__typeof (GOMP_OFFLOAD_get_caps) *get_caps_func;
>__typeof (GOMP_OFFLOAD_get_type) *get_type_func;
>__typeof (GOMP_OFFLOAD_get_num_devices) *get_num_devices_func;

Please also update 'libgomp/oacc-host.c:host_dispatch'.

> --- a/libgomp/omp_lib.f90.in
> +++ b/libgomp/omp_lib.f90.in

> +interface
> +  ! Note: In gfortran, strings are \0 termined
> +  integer(c_int) function omp_get_device_from_uid(uid) bind(C)
> +use iso_c_binding
> +character(c_char), intent(in) :: uid(*)
> +  end function omp_get_device_from_uid
> +end interface

For my understanding: in general, Fortran strings are *not*
NUL-terminated, right?  So this is a specific properly of 'gfortran'
and/or this GCC/OpenMP interface, that it requires passing in a
NUL-terminated string?  (..., so that you're permitted to simply
'bind(C)' to the C 'omp_get_device_from_uid'?)

> +interface omp_get_uid_from_device
> +  ! Deviation from OpenMP 6.0: VALUE added.

(..., which I suppose you've reported to OpenMP...)

> +  character(:) function omp_get_uid_from_device (device_num)
> +use iso_c_binding
> +pointer :: omp_get_uid_from_device
> +integer(c_int32_t), intent(in), value :: device_num
> +  end function omp_get_uid_from_device
> +
> +  character(:) function omp_get_uid_from_device_8 (device_num)
> +use iso_c_binding
> +pointer :: omp_get_uid_from_device_8
> +integer(c_int64_t), intent(in), value :: device_num
> +  end function omp_get_uid_from_device_8
> +end interface omp_get_uid_from_device

> --- a/libgomp/omp_lib.h.in
> +++ b/libgomp/omp_lib.h.in

Likewise.

> --- a/libgomp/plugin/plugin-gcn.c
> +++ b/libgomp/plugin/plugin-gcn.c

> +const char *
> +GOMP_OFFLOAD_get_uid (int ord)
> +{
> +  char *str;
> +  hsa_status_t status;
> +  struct agent_info *agent = get_agent_info (ord);
> +
> +  /* HSA documentation states: maximally 21 characters including NUL.  */
> +  str = GOMP_PLUGIN_malloc (21 * sizeof (char));
> +  status = hsa_fns.hsa_agent_get_info_fn (agent->id, HSA_AMD_AGENT_INFO_UUID,
> +   str);
> +  if (status != HSA_STATUS_SUCCESS)
> +hsa_fatal ("Could not obtain device UUID", status);
> +  return str;
> +}

I guess I'd have just put this code into 'init_hsa_context', filling a
new statically-sized 'uuid' field in 'hsa_context_info' (like
'driver_version_s'; and assuming that 'hsa_context_info' is the right
abstraction for this), and then just return that 'uuid' from
'GOMP_OFFLOAD_get_uid'.  That way, you'd avoid the unclear semantics of
who gets to 'free' the buffer returned from 'GOMP_OFFLOAD_get_uid' upon
'GOMP_OFFLOAD_fini_device' -- currently the memory is lost?

> --- a/libgomp/plugin/plugin-nvptx.c
> +++ b/libgomp/plugin/plugin-nvptx.c

> +const char *
> +GOMP_OFFLOAD_get_uid (int ord)
> +{

Likewise.  ('nvptx_init', new statically-sized 'uuid' field in
'ptx_device', or similar.)

> +  CUresult r;
> +  CUuuid s;
> +  struct ptx_device *de

Re: [COMMITTED] testsuite: debug: fix dejagnu directive syntax

2024-09-20 Thread Thomas Schwinge
Hi Sam!

On 2024-09-20T05:12:19+0100, Sam James  wrote:
> In this case, they were all harmless in reality (no diff in test logs).

> -/* { dg-do compile )  */
> +/* { dg-do compile } */

DejaGnu directives are matched by '{ dg-[...] }' (simplified; see
'/usr/share/dejagnu/dg.exp:dg-get-options' for the details), so your
changes did not "fix dejagnu directive syntax", but rather fix whitespace
around DejaGnu directives.  ;-P (Thanks either way!)


Grüße
 Thomas


GCC 15: nvptx '-mptx=3.1' multilib variants are deprecated

2024-09-19 Thread Thomas Schwinge
Hi!

Regarding ongoing maintenance efforts, and avoiding to build multilib
variants that probably nobody uses apart from a few of us testing these
out of routine (via building/linking with explicit '-mptx=3.1'), I
propose: "GCC 15: nvptx '-mptx=3.1' multilib variants are deprecated",
see attached, "[...], and will be removed in GCC 16".  Any objections?
If not, then I'll push this before the GCC 15 release, and timely after
the GCC 15 release apply the corresponding code changes (yet to be
implemented).  (That is, no actual change for GCC release users for
another 1.5 years.)

These '-mptx=3.1' multilib variants are only useful for users of ancient
CUDA/Nvidia Driver, which doesn't support GCC's default PTX ISA 6.0
multilib variants; PTX ISA 6.0 is supported as of CUDA 9, 2017-09.


Grüße
 Thomas


>From 8c099b2c4fed4f0745ef913c865868e76c061232 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Thu, 19 Sep 2024 22:04:28 +0200
Subject: [PATCH] GCC 15: nvptx '-mptx=3.1' multilib variants are deprecated

---
 htdocs/gcc-15/changes.html | 4 
 1 file changed, 4 insertions(+)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 7c372688..99242d2c 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -191,6 +191,10 @@ a work-in-progress.
   For this, a recent version of https://gcc.gnu.org/install/specific.html#nvptx-x-none";
   >nvptx-tools is required.
+  
+The -mptx=3.1 multilib variants are deprecated and will be
+removed in GCC 16.
+  
 
 
 
-- 
2.45.2



Re: libgomp: with USM, init 'link' variables with host address

2024-09-17 Thread Thomas Schwinge
Hi Tobias!

On 2024-09-15T00:32:21+0200, Tobias Burnus  wrote:
> The idea of link variables is to replace he full device variable by a 
> pointer, permitting to map only parts of the variable to the device, 
> saving memory.
>
> However, having a pointer permits for (unified) shared memory to point 
> to the host variable.
>
> That's what this patch does: instead of having a dangling pointer, upon 
> loading the image, the device side pointers are updated to point to the 
> host. With the current patch, this is only done when explicitly 
> requesting unified-shared memory.
>
> Tested on x86-64-gnu-linux and nvptx offloading (that supports USM).

(I yet have to set up such a USM configuration...)

> Remarks/comments/suggestions before I commit it?

> libgomp: with USM, init 'link' variables with host address
>
> If requires unified_shared_memory is set, make 'declare target link'
> variables to point initially to the host pointer.
>
> libgomp/ChangeLog:
>
>   * target.c (gomp_load_image_to_device): For requires
>   unified_shared_memory, update 'link' vars to point to the host var.
>   * testsuite/libgomp.c-c++-common/target-link-3.c: New test.
>
>  libgomp/target.c   |  5 +++
>  .../testsuite/libgomp.c-c++-common/target-link-3.c | 52 
> ++
>  2 files changed, 57 insertions(+)

> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -2451,6 +2451,11 @@ gomp_load_image_to_device (struct gomp_device_descr 
> *devicep, unsigned version,
>array->right = NULL;
>splay_tree_insert (&devicep->mem_map, array);
>array++;

Do I understand correctly that even if
'GOMP_REQUIRES_UNIFIED_SHARED_MEMORY', we cannot just skip all the
'mem_map' setup in 'gomp_load_image_to_device' etc., because we're not
(yet?) setting 'GOMP_OFFLOAD_CAP_SHARED_MEM'?  (I've not yet worked
through the "libgomp: Enable USM for some nvptx devices" discussion from
earlier this year.)

> +
> +  if (is_link_var
> +   && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
> + gomp_copy_host2dev (devicep, NULL, (void *) target_var->start,
> + &k->host_start, sizeof (void *), false, NULL);
>  }

Calling 'gomp_copy_host2dev' looks a bit funny given we've just
determined USM (..., but I'm not asking for plain 'memcpy').

There is nothing to un-do in 'gomp_unload_image_from_device', right?

What's the advantage/rationale of doing this here vs. in
'gomp_map_vars_internal' for 'REFCOUNT_LINK'?  (May be worth a source
code comment?)

> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-3.c
> @@ -0,0 +1,52 @@
> +/* { dg-do run }  */
> +
> +#include 
> +#include 
> +
> +#pragma omp requires unified_shared_memory
> +

Intentionally mixing non-'static' vs. 'static' in the following?

> +int A[3] = {-3,-4,-5};
> +static int q = -401;
> +#pragma omp declare target link(A, q)
> +
> +#pragma omp begin declare target
> +void
> +f (uintptr_t *pA, uintptr_t *pq)
> +{
> +  if (A[0] != 1 || A[1] != 2 || A[2] != 3 || q != 42)
> +__builtin_abort ();
> +  A[0] = 13;
> +  A[1] = 14;
> +  A[2] = 15;
> +  q = 23;
> +  *pA = (uintptr_t) &A[0];
> +  *pq = (uintptr_t) &q;
> +}
> +#pragma omp end declare target
> +
> +int
> +main ()
> +{
> +  uintptr_t hpA = (uintptr_t) &A[0];
> +  uintptr_t hpq = (uintptr_t) &q;
> +  uintptr_t dpA, dpq;
> +
> +  A[0] = 1;
> +  A[1] = 2;
> +  A[2] = 3;
> +  q = 42;
> +
> +  for (int i = 0; i <= omp_get_num_devices (); ++i)
> +{
> +  #pragma omp target device(device_num: i) map(dpA, dpq)
> + f (&dpA, &dpq);
> +  if (hpA != dpA || hpq != dpq)
> + __builtin_abort ();
> +  if (A[0] != 13 || A[1] != 14 || A[2] != 15 || q != 23)
> + __builtin_abort ();
> +  A[0] = 1;
> +  A[1] = 2;
> +  A[2] = 3;
> +  q = 42;
> +}
> +}


RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-09-10 Thread Thomas Schwinge
Hi Prathamesh!

On 2024-09-10T13:22:10+, Prathamesh Kulkarni  wrote:
>> -Original Message-
>> From: Thomas Schwinge 
>> Sent: Monday, September 9, 2024 8:50 PM

>> > Could you please test the patch for gcn backend ?

I've successfully tested x86_64 host with GCN as well as nvptx
offloading, and also ppc64le host with nvptx offloading.

I just realized two more minor things:

> [nvptx] Pass host specific ABI opts from mkoffload.
>
> The patch adds an option -foffload-abi-host-opts, which
> is set by host in TARGET_OFFLOAD_OPTIONS, and mkoffload then passes its value
> to host_compiler.
>

Please add here "   PR target/96265".

> gcc/ChangeLog:
>   * common.opt (foffload-abi-host-opts): New option.
>   * config/aarch64/aarch64.cc (aarch64_offload_options): Pass
>   -foffload-abi-host-opts.
>   * config/i386/i386-opts.cc (ix86_offload_options): Likewise.
>   * config/rs6000/rs6000.cc (rs6000_offload_options): Likewise.
>   * config/nvptx/mkoffload.cc (offload_abi_host_opts): Define.
>   (compile_native): Append offload_abi_host_opts to argv_obstack.
>   (main): Handle option -foffload-abi-host-opts.
>   * config/gcn/mkoffload.cc (offload_abi_host_opts): Define.
>   (compile_native): Append offload_abi_host_opts to argv_obstack.
>   (main): Handle option -foffload-abi-host-opts.
>   * lto-wrapper.cc (merge_and_complain): Handle
>   -foffload-abi-host-opts.
>   (append_compiler_options): Likewise.
>   * opts.cc (common_handle_option): Likewise.
>
> Signed-off-by: Prathamesh Kulkarni 

Given that we're adding a new option to 'gcc/common.opt', do we need to
update (regenerate?) 'gcc/common.opt.urls'?  (I've not yet had the need
myself, and therefore not yet looked up how to do that.)  Or maybe not,
given that '-foffload-abi-host-opts=[...]' isn't documented?

Otherwise looks good to me; OK to push (with these minor items addressed,
as necessary), thanks!


Grüße
 Thomas


> diff --git a/gcc/common.opt b/gcc/common.opt
> index ea39f87ae71..d270e524ff4 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2361,6 +2361,10 @@ Enum(offload_abi) String(ilp32) 
> Value(OFFLOAD_ABI_ILP32)
>  EnumValue
>  Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64)
>  
> +foffload-abi-host-opts=
> +Common Joined MissingArgError(option missing after %qs)
> +-foffload-abi-host-opts=Specify host ABI options.
> +
>  fomit-frame-pointer
>  Common Var(flag_omit_frame_pointer) Optimization
>  When possible do not generate stack frames.
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 6a3f1a23a9f..6ccf08d1cc0 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -19000,9 +19000,9 @@ static char *
>  aarch64_offload_options (void)
>  {
>if (TARGET_ILP32)
> -return xstrdup ("-foffload-abi=ilp32");
> +return xstrdup ("-foffload-abi=ilp32 
> -foffload-abi-host-opts=-mabi=ilp32");
>else
> -return xstrdup ("-foffload-abi=lp64");
> +return xstrdup ("-foffload-abi=lp64 -foffload-abi-host-opts=-mabi=lp64");
>  }
>  
>  static struct machine_function *
> diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
> index b8d981878ed..345bbf7709c 100644
> --- a/gcc/config/gcn/mkoffload.cc
> +++ b/gcc/config/gcn/mkoffload.cc
> @@ -133,6 +133,8 @@ static const char *gcn_dumpbase;
>  static struct obstack files_to_cleanup;
>  
>  enum offload_abi offload_abi = OFFLOAD_ABI_UNSET;
> +const char *offload_abi_host_opts = NULL;
> +
>  uint32_t elf_arch = EF_AMDGPU_MACH_AMDGCN_GFX900;  // Default GPU 
> architecture.
>  uint32_t elf_flags = EF_AMDGPU_FEATURE_SRAMECC_UNSUPPORTED_V4;
>  
> @@ -819,17 +821,10 @@ compile_native (const char *infile, const char 
> *outfile, const char *compiler,
>obstack_ptr_grow (&argv_obstack, gcn_dumpbase);
>obstack_ptr_grow (&argv_obstack, "-dumpbase-ext");
>obstack_ptr_grow (&argv_obstack, ".c");
> -  switch (offload_abi)
> -{
> -case OFFLOAD_ABI_LP64:
> -  obstack_ptr_grow (&argv_obstack, "-m64");
> -  break;
> -case OFFLOAD_ABI_ILP32:
> -  obstack_ptr_grow (&argv_obstack, "-m32");
> -  break;
> -default:
> -  gcc_unreachable ();
> -}
> +  if (!offload_abi_host_opts)
> +fatal_error (input_location,
> +  "%<-foffload-abi-host-opts%> not specified.");
> +  obstack_ptr_grow (&argv_obstack, offload_abi_host_opts);
>obstack_ptr_grow (&argv_obstack, infile);
>obstack_

RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-09-09 Thread Thomas Schwinge
Hi Prathamesh!

On 2024-09-09T06:31:18+, Prathamesh Kulkarni  wrote:
>> -Original Message-
>> From: Thomas Schwinge 
>> Sent: Friday, September 6, 2024 2:31 PM
>> On 2024-08-16T15:36:29+, Prathamesh Kulkarni
>>  wrote:
>> >> > Am 13.08.2024 um 17:48 schrieb Thomas Schwinge
>> >> :
>> >> > On 2024-08-12T07:50:07+, Prathamesh Kulkarni
>> >>  wrote:
>> >> >> I added another option -foffload-abi-host-opts to specify host
>> abi
>> >> >> opts, and leave -foffload-abi to specify if ABI is 32/64 bit
>> which
>> >> >> mkoffload can use to enable/disable offloading (as before).

>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -18999,9 +18999,9 @@ static char *
>> >  aarch64_offload_options (void)
>> >  {
>> >if (TARGET_ILP32)
>> > -return xstrdup ("-foffload-abi=ilp32");
>> > +return xstrdup ("-foffload-abi=ilp32 
>> > -foffload-abi-host-opts=-mabi=ilp32");
>> >else
>> > -return xstrdup ("-foffload-abi=lp64");
>> > +return xstrdup ("-foffload-abi=lp64 
>> > -foffload-abi-host-opts=-mabi=lp64");
>> >  }
>> 
>> As none of the current offload compilers is set up of ILP32, I suggest
>> we continue to pass '-foffload-abi=ilp32' without '-foffload-abi-host-
>> opts=[...]' -- the 'mkoffload's in that case should get to the point
>> where the latter is used.

Oh...  I was wrong with the latter item: I failed to see that the
'mkoffload's still do 'compile_native' even if they don't create an
actual offload image, sorry!

> Um, would that still possibly result in arch mismatch for host objects and 
> xnvptx-none.o if we don't pass host ABI opts for ILP32 ?
> For eg, if the host compiler defaults to 64-bit code-gen (and user requests 
> for 32-bit code gen on host), and we avoid passing host ABI opts for 
> -foffload-abi=ilp32,
> it will generate 64-bit xnvptx-none.o (corresponding to empty 
> ptx_cfile_name), while rest of the host objects will be 32-bit, or am I 
> misunderstanding ?

You're quite right -- my fault.

> The attached patch avoids passing -foffload-abi-host-opts if 
> -foffload-abi=ilp32.

So, sorry for the back and forth.  I think we now agree that we do need
'-foffload-abi-host-opts=[...]' specified in call cases (as you
originally had), and then again unconditionally use
'offload_abi_host_opts' in the 'mkoffload's' 'compile_native' functions.

> Could you please test the patch for gcn backend ?

I'll do that.

> [nvptx] Pass host specific ABI opts from mkoffload.
>
> The patch adds an option -foffload-abi-host-opts, which
> is set by host in TARGET_OFFLOAD_OPTIONS, and mkoffload then passes it's value

"its", by the way.  ;-)

> to host_compiler.

> --- a/gcc/common.opt
> +++ b/gcc/common.opt

> +foffload-abi-host-opts=
> +Common Driver Joined MissingArgError(option missing after %qs)
> +-foffload-abi-host-opts= Specify host ABI options.
> +

Still need TAB between '-foffload-abi-host-opts=' and its help
text.

> --- a/gcc/config/gcn/mkoffload.cc
> +++ b/gcc/config/gcn/mkoffload.cc

> @@ -998,6 +996,14 @@ main (int argc, char **argv)
>"unrecognizable argument of option %<" STR "%>");
>   }
>  #undef STR
> +  else if (startswith (argv[i], "-foffload-abi-host-opts="))
> + {
> +   if (offload_abi_host_opts)
> + fatal_error (input_location,
> +  "-foffload-abi-host-opts specified multiple times");

ACK, but again '%<-foffload-abi-host-opts%>', please.  (May also use
another '#define STR "[...]"' for the duplicated string, but I don't
care.)

> --- a/gcc/config/nvptx/mkoffload.cc
> +++ b/gcc/config/nvptx/mkoffload.cc

> @@ -721,6 +718,14 @@ main (int argc, char **argv)
>"unrecognizable argument of option " STR);
>   }
>  #undef STR
> +  else if (startswith (argv[i], "-foffload-abi-host-opts="))
> + {
> +   if (offload_abi_host_opts)
> + fatal_error (input_location,
> +  "-foffload-abi-host-opts specified multiple times");

Likewise.

> --- a/gcc/lto-wrapper.cc
> +++ b/gcc/lto-wrapper.cc
> @@ -484,6 +484,7 @@ merge_and_complain (vec 
> &decoded_options,
>   
>  
>   case OPT_foffload_abi_:
> + case OPT_foffload_abi_host_opts_:
>  

Match: Fix ordered and nonequal: Fix 'gcc.dg/opt-ordered-and-nonequal-1.c' re 'LOGICAL_OP_NON_SHORT_CIRCUIT' [PR116635] (was: [PATCH] Match: Fix ordered and nonequal)

2024-09-08 Thread Thomas Schwinge
Hi!

On 2024-09-04T13:43:45+0800, "Hu, Lin1"  wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
> @@ -0,0 +1,49 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
> +
> +int is_ordered_and_nonequal_sh_1 (float a, float b)
> +{
> +  return !__builtin_isunordered (a, b) && (a != b);
> +}
> +
> +[...]
> +
> +/* { dg-final { scan-tree-dump-times "gimple_simplified to\[^\n\r]*<>" 9 
> "forwprop1" } } */

OK to push
"Match: Fix ordered and nonequal: Fix 'gcc.dg/opt-ordered-and-nonequal-1.c' re 
'LOGICAL_OP_NON_SHORT_CIRCUIT' [PR116635]",
see attached?


Grüße
 Thomas


>From 3e85cb373fb86db5fad86a452a69e713c7050f16 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 9 Sep 2024 08:39:10 +0200
Subject: [PATCH] Match: Fix ordered and nonequal: Fix
 'gcc.dg/opt-ordered-and-nonequal-1.c' re 'LOGICAL_OP_NON_SHORT_CIRCUIT'
 [PR116635]

Fix up to make 'gcc.dg/opt-ordered-and-nonequal-1.c' of
commit 91421e21e8f0f05f440174b8de7a43a311700e08
"Match: Fix ordered and nonequal" work for default
'LOGICAL_OP_NON_SHORT_CIRCUIT == false' configurations.

	PR testsuite/116635
	gcc/testsuite/
	* gcc.dg/opt-ordered-and-nonequal-1.c: Fix re
	'LOGICAL_OP_NON_SHORT_CIRCUIT'.
---
 gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
index 6d102c2bd0c..d61c3322214 100644
--- a/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
+++ b/gcc/testsuite/gcc.dg/opt-ordered-and-nonequal-1.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-tree-forwprop1-details" } */
+/* Make this work for default 'LOGICAL_OP_NON_SHORT_CIRCUIT == false' configurations:
+   { dg-additional-options "--param logical-op-non-short-circuit=1" } */
 
 int is_ordered_and_nonequal_sh_1 (float a, float b)
 {
-- 
2.34.1



RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-09-06 Thread Thomas Schwinge
Hi!

On 2024-08-16T15:36:29+, Prathamesh Kulkarni  wrote:
>> > Am 13.08.2024 um 17:48 schrieb Thomas Schwinge
>> :
>> > On 2024-08-12T07:50:07+, Prathamesh Kulkarni
>>  wrote:
>> >>> From: Thomas Schwinge 
>> >>> Sent: Friday, August 9, 2024 12:55 AM
>> >
>> >>> On 2024-08-08T06:46:25-0700, Andrew Pinski 
>> wrote:
>> >>>> On Thu, Aug 8, 2024 at 6:11 AM Prathamesh Kulkarni
>> >>>>  wrote:
>> >>>>> compiled with -fopenmp -foffload=nvptx-none now fails with:
>> >>>>> gcc: error: unrecognized command-line option '-m64'
>> >>>>> nvptx mkoffload: fatal error: ../install/bin/gcc returned 1 exit
>> >>> status compilation terminated.
>> >>>
>> >>> Heh.  Yeah...
>> >>>
>> >>>>> As mentioned in RFC email, this happens because
>> >>>>> nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host
>> >>>>> compiler
>> >>> depending on whether offload_abi is OFFLOAD_ABI_LP64 or
>> >>> OFFLOAD_ABI_ILP32, and aarch64 backend doesn't recognize these
>> >>> options.
>> >
>> >>> So, my idea is: instead of the current strategy that the host
>> >>> 'TARGET_OFFLOAD_OPTIONS' synthesizes '-foffload-abi=lp64' etc.,
>> >>> which the 'mkoffload's then interpret and re-synthesize '-m64' etc.
>> >>> -- how about we instead directly tell the 'mkoffload's the relevant
>> >>> ABI options?  That is, 'TARGET_OFFLOAD_OPTIONS' instead synthesizes
>> >>> '-foffload-abi=-m64'
>> >>> etc., which the 'mkoffload's can then readily use.  Could you please
>> >>> give that a try, and/or does anyone see any issues with that approach?
>> >>>
>> >>> And use something like '-foffload-abi=disable' to replace the current:
>> >>>
>> >>>/* PR libgomp/65099: Currently, we only support offloading in 64-bit
>> >>>   configurations.  */
>> >>>if (offload_abi == OFFLOAD_ABI_LP64)
>> >>>  {
>> >>>
>> >>> (As discussed before, this should be done differently altogether,
>> >>> but that's for another day.)
>> >> Sorry, I don't quite follow. Currently we enable offloading if
>> >> offload_abi == OFFLOAD_ABI_LP64, which is synthesized from
>> >> -foffload-abi=lp64. If we change -foffload-abi to instead specify
>> >> host-specific ABI opts, I guess mkoffload will still need to somehow
>> >> figure out which ABI is used, so it can disable offloading for 32-bit
>> >> ? I suppose we could adjust TARGET_OFFLOAD_OPTIONS for each host to
>> pass -foffload-abi=disable if TARGET_ILP32 is set and offload target
>> is nvptx, but not sure if that'd be correct ?
>> >
>> > Basically, yes.  My idea was that all 'TARGET_OFFLOAD_OPTIONS'
>> > implementations return either the correct host flags to be used by the
>> > 'mkoffload's (the case that offloading is supported for the current
>> > host flags/ABI configuration), or otherwise return '-foffload-abi=disable'.

Oh..., you're right of course: we do need to continue to tell the
'mkoffload's which kind of offload code to generate!  My bad...

>> >> I added another option -foffload-abi-host-opts to specify host abi
>> >> opts, and leave -foffload-abi to specify if ABI is 32/64 bit which
>> >> mkoffload can use to enable/disable offloading (as before).
>> >
>> > I'm not sure however, if this additional option is really necessary?
> Well, my concern was if that'd change the behavior for TARGET_ILP32 ?
> IIUC, currently for -foffload-abi=ilp32, mkoffload will create empty C file
> for ptx_cfile_name (instead of munged ptx assembly since offloading will be 
> disabled),
> and pass that to host compiler with -m32 option (in compile_native).
>
> If we change -foffload-abi to specify ABI host opts, and pass 
> -foffload-abi=disable 
> for TARGET_ILP32 in TARGET_OFFLOAD_OPTIONS, mkoffload will no longer be able 
> to
> pass 32-bit ABI opts to host compiler, which may result in linker error (arch 
> mismatch?)
> if the host object files are 32-bit ABI and xnvptx-none.o is 64-bit (assuming 
> the host
> compiler is configured to generate 64-bit code-gen by default) ?
>
> So, I thought to add ano

nvptx: Emit DECL and DEF linker markers for aliases [PR104957] (was: Add 'g++.target/nvptx/alias-g++.dg_init_dtor2-1.C' (was: Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning (was: [committed][

2024-09-05 Thread Thomas Schwinge
Hi!

On 2024-09-05T14:42:00+0200, I wrote:
> On 2024-09-05T14:39:46+0200, I wrote:
>> On 2024-09-05T14:36:54+0200, I wrote:
>>> On 2022-03-22T14:41:46+0100, Tom de Vries via Gcc-patches 
>>>  wrote:
>>>> [nvptx] Use .alias directive for mptx >= 6.3
>>>
>>>> --- a/gcc/config/nvptx/nvptx.cc
>>>> +++ b/gcc/config/nvptx/nvptx.cc
>>>
>>>> @@ -968,7 +969,8 @@ static void
>>>>  write_fn_proto_1 (std::stringstream &s, bool is_defn,
>>>>  const char *name, const_tree decl)
>>>>  {
>>>> -  write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
>>>> +  if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
>>>> +write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
>>>
>>> This non-emitting of DECL and DEF linker markers for aliases is
>>> problematic, as I'll discuss in the following.
>>
>> First, to show what currently is (not) happening, I've pushed to trunk
>> branch commit d0f02538494ded78cac12c63f5708a53f5a77bda
>> "Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning", see attached.
>
> Then, commit a1865fd33897bc6c6e0109df0a12ee73ce386315
> "Add 'g++.target/nvptx/alias-g++.dg_init_dtor2-1.C'", see attached, as
> one representative example of C++ code where the current behavior is an
> actual problem.

Finally, commit 8f5aade15e595b288a2c4ec60ddde8dc80df1a80
"nvptx: Emit DECL and DEF linker markers for aliases [PR104957]", see
attached, to address this issue.


Grüße
 Thomas


>From 8f5aade15e595b288a2c4ec60ddde8dc80df1a80 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 17 Jul 2024 23:56:25 +0200
Subject: [PATCH] nvptx: Emit DECL and DEF linker markers for aliases
 [PR104957]

With nvptx '-malias' enabled (as implemented in
commit f8b15e177155960017ac0c5daef8780d1127f91c
"[nvptx] Use .alias directive for mptx >= 6.3"), the C++ front end in certain
cases does 'write_fn_proto' before an eventual 'alias' attribute has been
added.  In that case, we do emit (via 'write_fn_marker') a DECL linker marker,
but then never emit a corresponding DEF linker marker for the alias.  This
causes hundreds of instances of link-time 'unresolved symbol [alias]' across
the C++ test suite, which are regressions compared to a test run with (default)
'-mno-alias' (in which case the respective functions get duplicated).

	PR target/104957
	gcc/
	* config/nvptx/nvptx.cc (write_fn_proto_1): Revert 2022-03-22
	change; 'write_fn_marker' also for alias DECL.
	(nvptx_asm_output_def_from_decls): 'write_fn_marker' for alias
	DEF.
	gcc/testsuite/
	* g++.target/nvptx/alias-g++.dg_init_dtor2-1.C: Un-XFAIL.
	* gcc.target/nvptx/alias-1.c: Likewise.
	* gcc.target/nvptx/alias-3.c: Likewise.
	* gcc.target/nvptx/alias-to-alias-1.c: Likewise.
---
 gcc/config/nvptx/nvptx.cc | 6 --
 .../g++.target/nvptx/alias-g++.dg_init_dtor2-1.C  | 4 ++--
 gcc/testsuite/gcc.target/nvptx/alias-1.c  | 4 ++--
 gcc/testsuite/gcc.target/nvptx/alias-3.c  | 4 ++--
 gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c | 8 
 5 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 144b8d0c874..4a7c64f05eb 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -997,8 +997,7 @@ static void
 write_fn_proto_1 (std::stringstream &s, bool is_defn,
 		  const char *name, const_tree decl, bool force_public)
 {
-  if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
-write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
+  write_fn_marker (s, is_defn, TREE_PUBLIC (decl) || force_public, name);
 
   /* PTX declaration.  */
   if (DECL_EXTERNAL (decl))
@@ -7627,6 +7626,9 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree name, tree value)
   fputs (s.str ().c_str (), stream);
 
   tree id = DECL_ASSEMBLER_NAME (name);
+  std::stringstream s_def;
+  write_fn_marker (s_def, true, TREE_PUBLIC (name), IDENTIFIER_POINTER (id));
+  fputs (s_def.str ().c_str (), stream);
   NVPTX_ASM_OUTPUT_DEF (stream, IDENTIFIER_POINTER (id),
 			IDENTIFIER_POINTER (value));
 }
diff --git a/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C b/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C
index 747656d51d6..a30f99af308 100644
--- a/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C
+++ b/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C
@@ -1,6 +1,6 @@
 /* Reduced from 'g++.dg/init/dtor2.C'.  */
 
-/* { dg-do compile } */
+/* { dg-do link } */
 /* { dg-add-options nvptx_alias_ptx } */
 /* { dg-addi

Add 'g++.target/nvptx/alias-g++.dg_init_dtor2-1.C' (was: Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning (was: [committed][nvptx] Use .alias directive for mptx >= 6.3))

2024-09-05 Thread Thomas Schwinge
Hi!

On 2024-09-05T14:39:46+0200, I wrote:
> On 2024-09-05T14:36:54+0200, I wrote:
>> On 2022-03-22T14:41:46+0100, Tom de Vries via Gcc-patches 
>>  wrote:
>>> [nvptx] Use .alias directive for mptx >= 6.3
>>
>>> --- a/gcc/config/nvptx/nvptx.cc
>>> +++ b/gcc/config/nvptx/nvptx.cc
>>
>>> @@ -968,7 +969,8 @@ static void
>>>  write_fn_proto_1 (std::stringstream &s, bool is_defn,
>>>   const char *name, const_tree decl)
>>>  {
>>> -  write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
>>> +  if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
>>> +write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
>>
>> This non-emitting of DECL and DEF linker markers for aliases is
>> problematic, as I'll discuss in the following.
>
> First, to show what currently is (not) happening, I've pushed to trunk
> branch commit d0f02538494ded78cac12c63f5708a53f5a77bda
> "Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning", see attached.

Then, commit a1865fd33897bc6c6e0109df0a12ee73ce386315
"Add 'g++.target/nvptx/alias-g++.dg_init_dtor2-1.C'", see attached, as
one representative example of C++ code where the current behavior is an
actual problem.


Grüße
 Thomas


>From a1865fd33897bc6c6e0109df0a12ee73ce386315 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 17 Jul 2024 18:02:50 +0200
Subject: [PATCH] Add 'g++.target/nvptx/alias-g++.dg_init_dtor2-1.C'

... as one minimized example for the issue that with nvptx '-malias' enabled
(as implemented in commit f8b15e177155960017ac0c5daef8780d1127f91c
"[nvptx] Use .alias directive for mptx >= 6.3"), there are hundreds of
instances of link-time 'unresolved symbol [alias]' across the C++ test suite,
which are regressions compared to a test run with (default) '-mno-alias'.

	PR target/104957
	gcc/testsuite/
	* g++.target/nvptx/alias-g++.dg_init_dtor2-1.C: Add.
---
 .../nvptx/alias-g++.dg_init_dtor2-1.C | 33 +++
 1 file changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C

diff --git a/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C b/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C
new file mode 100644
index 000..747656d51d6
--- /dev/null
+++ b/gcc/testsuite/g++.target/nvptx/alias-g++.dg_init_dtor2-1.C
@@ -0,0 +1,33 @@
+/* Reduced from 'g++.dg/init/dtor2.C'.  */
+
+/* { dg-do compile } */
+/* { dg-add-options nvptx_alias_ptx } */
+/* { dg-additional-options -save-temps } */
+/* Via the magic string "-std=*++" indicate that testing one (the default) C++ standard is sufficient.  */
+
+struct B
+{
+  ~B();
+};
+
+B::~B () {
+}
+
+int main()
+{
+  B b;
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: _ZN1BD2Ev$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func _ZN1BD2Ev \(\.param\.u64 %in_ar0\);$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DEF: _ZN1BD2Ev$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func _ZN1BD2Ev \(\.param\.u64 %in_ar0\)$} 1 } } */
+
+/* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: _ZN1BD1Ev$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func _ZN1BD1Ev \(\.param\.u64 %in_ar0\);$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DEF: _ZN1BD1Ev$} 1 { xfail *-*-* } } }
+   { dg-final { scan-assembler-times {(?n)^\.alias _ZN1BD1Ev,_ZN1BD2Ev;$} 1 } } */
+
+/* { dg-final { scan-assembler-times {(?n)\tcall _ZN1BD1Ev, \(} 1 } }
+   { dg-final { scan-assembler-times {(?n)\tcall _ZN1BD2Ev, \(} 0 } } */
-- 
2.34.1



Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning (was: [committed][nvptx] Use .alias directive for mptx >= 6.3)

2024-09-05 Thread Thomas Schwinge
Hi!

On 2024-09-05T14:36:54+0200, I wrote:
> On 2022-03-22T14:41:46+0100, Tom de Vries via Gcc-patches 
>  wrote:
>> [nvptx] Use .alias directive for mptx >= 6.3
>
>> --- a/gcc/config/nvptx/nvptx.cc
>> +++ b/gcc/config/nvptx/nvptx.cc
>
>> @@ -968,7 +969,8 @@ static void
>>  write_fn_proto_1 (std::stringstream &s, bool is_defn,
>>const char *name, const_tree decl)
>>  {
>> -  write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
>> +  if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
>> +write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
>
> This non-emitting of DECL and DEF linker markers for aliases is
> problematic, as I'll discuss in the following.

First, to show what currently is (not) happening, I've pushed to trunk
branch commit d0f02538494ded78cac12c63f5708a53f5a77bda
"Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning", see attached.


Grüße
 Thomas


>From d0f02538494ded78cac12c63f5708a53f5a77bda Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 17 Jul 2024 15:27:51 +0200
Subject: [PATCH] Enhance 'gcc.target/nvptx/alias-*.c' assembler scanning

... in order to demonstrate unexpected behavior (XFAILed here).

	PR target/104957
	gcc/testsuite/
	* gcc.target/nvptx/alias-1.c: Enhance assembler scanning.
	* gcc.target/nvptx/alias-2.c: Likewise.
	* gcc.target/nvptx/alias-3.c: Likewise.
	* gcc.target/nvptx/alias-4.c: Likewise.
	* gcc.target/nvptx/alias-to-alias-1.c: Likewise.
---
 gcc/testsuite/gcc.target/nvptx/alias-1.c  | 15 ++---
 gcc/testsuite/gcc.target/nvptx/alias-2.c  | 16 ++
 gcc/testsuite/gcc.target/nvptx/alias-3.c  | 15 ++---
 gcc/testsuite/gcc.target/nvptx/alias-4.c  | 17 ++
 .../gcc.target/nvptx/alias-to-alias-1.c   | 22 ++-
 5 files changed, 66 insertions(+), 19 deletions(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/alias-1.c b/gcc/testsuite/gcc.target/nvptx/alias-1.c
index 1c0642b14d9..0fb06495f67 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-1.c
@@ -23,6 +23,15 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "\\.alias f,__f;" 1 } } */
-/* { dg-final { scan-assembler-times "\\.visible \\.func __f;" 1 } } */
-/* { dg-final { scan-assembler-times "\\.visible \\.func f;" 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: __f$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func __f;$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DEF: __f$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func __f$} 1 } } */
+
+/* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: f$} 1 { xfail *-*-* } } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func f;$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DEF: f$} 1 { xfail *-*-* } } }
+   { dg-final { scan-assembler-times {(?n)^\.alias f,__f;$} 1 } } */
+
+/* { dg-final { scan-assembler-times {(?n)\tcall __f;$} 0 } }
+   { dg-final { scan-assembler-times {(?n)\tcall f;$} 1 } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/alias-2.c b/gcc/testsuite/gcc.target/nvptx/alias-2.c
index 7a88b6f4f6f..8ae8b5cfaed 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-2.c
@@ -5,10 +5,18 @@
 
 #include "alias-1.c"
 
+/* Note extern and inlined, so still there.  */
+/* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: __f$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func __f;$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DEF: __f$} 1 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func __f$} 1 } } */
+
 /* Inlined, so no alias.  */
-/* { dg-final { scan-assembler-not "\\.alias.*;" } } */
-/* { dg-final { scan-assembler-not "\\.visible \\.func f;" } } */
 
-/* Note extern and inlined, so still there.  */
-/* { dg-final { scan-assembler-times "\\.visible \\.func __f;" 1 } } */
+/* { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DECL: f$} 0 } }
+   { dg-final { scan-assembler-times {(?n)^\.visible \.func f;$} 0 } }
+   { dg-final { scan-assembler-times {(?n)^// BEGIN GLOBAL FUNCTION DEF: f$} 0 } }
+   { dg-final { scan-assembler-times {(?n)^\.alias f,__f;$} 0 } } */
 
+/* { dg-final { scan-assembler-times {(?n)\tcall __f;$} 0 } }
+   { dg-final { scan-assembler-times {(?n)\tcall f;$} 0 } } */
diff --git a/gcc/testsuite/gcc.target/nvptx/alias-3.c b/gcc/testsuite/gcc.target/nvptx/alias-3.c
index b55ff26269e..1906607f95f 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-3.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-3.c
@@ -25,6 +25,15 @@ main (void)
   return 0;
 }
 
-/* { d

Re: [committed][nvptx] Use .alias directive for mptx >= 6.3

2024-09-05 Thread Thomas Schwinge
Hi!

On 2022-03-22T14:41:46+0100, Tom de Vries via Gcc-patches 
 wrote:
> Starting with ptx isa version 6.3, a ptx directive .alias is available.
> Use this directive to support symbol aliases, as far as possible.

> The alias support has the following [and more] limitations.

> Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c).
> This is currently not prohibited by the compiler, but with the driver link we
> run into:  "Internal error: alias to unknown symbol" .

Prathamesh in

"[nvptx] Fix code-gen for alias attribute" has proposed a way to make
these work, to a degree, via resolving to 'ultimate_alias_target'.

> Unreferenced aliases are not emitted (these can occur f.i. when inlining a
> call to an alias).  This avoids driver link error "Internal error: reference
> to deleted section".

That is, indeed, (still) necessary, but also problematic: if the
reference (use) of the alias is in a different compilation unit
("there"), we can't detect that "here" when deciding to not emit the
alias that's unused "here", and we then run into an unresolved symbol
"there".  (I've not yet spent further thoughts on this, in the current
GCC/nvptx using PTX '.alias' scenario.)

> At some point we may add support in the nvptx-tools linker for symbol
> aliases, and define f.i. malias=ptx and malias=ld to choose between the two in
> the compiler.

I'm working on that: 
"[nvptx] Need better alias support", via

"[LD] Handle alias in nvptx-ld as nvptx's .alias does not handle it fully".

> [nvptx] Use .alias directive for mptx >= 6.3

> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc

> @@ -968,7 +969,8 @@ static void
>  write_fn_proto_1 (std::stringstream &s, bool is_defn,
> const char *name, const_tree decl)
>  {
> -  write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);
> +  if (lookup_attribute ("alias", DECL_ATTRIBUTES (decl)) == NULL)
> +write_fn_marker (s, is_defn, TREE_PUBLIC (decl), name);

This non-emitting of DECL and DEF linker markers for aliases is
problematic, as I'll discuss in the following.


Grüße
 Thomas




Fix 'gcc.target/nvptx/alias-2.c' comment (was: [committed][nvptx] Use .alias directive for mptx >= 6.3)

2024-09-05 Thread Thomas Schwinge
Hi!

On 2022-03-22T14:41:46+0100, Tom de Vries via Gcc-patches 
 wrote:
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/alias-1.c
> @@ -0,0 +1,27 @@
> +[...]
> +int v;
> +
> +void __f ()
> +{
> +  v = 1;
> +}
> +
> +void f () __attribute__ ((alias ("__f")));
> +
> +int
> +main (void)
> +{
> +  if (v != 0)
> +__builtin_abort ();
> +  f ();
> +  if (v != 1)
> +__builtin_abort ();
> +  return 0;
> +}
> +[...]
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nvptx/alias-2.c
> @@ -0,0 +1,13 @@
> +/* { dg-do link } */
> +/* { dg-do run { target runtime_ptx_isa_version_6_3 } } */
> +/* { dg-options "-save-temps -malias -mptx=6.3 -O2" } */
> +
> +#include "alias-1.c"
> +
> +/* Inlined, so no alias.  */
> +/* { dg-final { scan-assembler-not "\\.alias.*;" } } */
> +/* { dg-final { scan-assembler-not "\\.visible \\.func f;" } } */
> +
> +/* Note static and inlined, so still there.  */
> +/* { dg-final { scan-assembler-times "\\.visible \\.func __f;" 1 } } */

Actually: 's%static%extern'.  Pushed to trunk branch
commit 973c1bf51fb0f58fbfe43651bb0a61e1d124b35d
"Fix 'gcc.target/nvptx/alias-2.c' comment", see attached.


Grüße
 Thomas


>From 973c1bf51fb0f58fbfe43651bb0a61e1d124b35d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 18 Sep 2023 22:41:56 +0200
Subject: [PATCH] Fix 'gcc.target/nvptx/alias-2.c' comment

	PR target/104957
	gcc/testsuite/
	* gcc.target/nvptx/alias-2.c: Fix comment.
---
 gcc/testsuite/gcc.target/nvptx/alias-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/nvptx/alias-2.c b/gcc/testsuite/gcc.target/nvptx/alias-2.c
index 5c4b9c787e1..7a88b6f4f6f 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-2.c
@@ -9,6 +9,6 @@
 /* { dg-final { scan-assembler-not "\\.alias.*;" } } */
 /* { dg-final { scan-assembler-not "\\.visible \\.func f;" } } */
 
-/* Note static and inlined, so still there.  */
+/* Note extern and inlined, so still there.  */
 /* { dg-final { scan-assembler-times "\\.visible \\.func __f;" 1 } } */
 
-- 
2.34.1



Move from 'gcc.target/nvptx/nvptx.exp' into 'target-supports.exp' additions for nvptx target (was: [PATCH] Make 'target-supports.exp' additions for nvptx target generally available)

2024-09-05 Thread Thomas Schwinge
Hi!

On 2024-07-18T13:44:37+0200,  wrote:
> OK to push (once testing completes) the attached
> "Make 'target-supports.exp' additions for nvptx target generally available"?
>
> The idea of this new scheme is that explicit feature/target-specific
> stuff isn't kept in 'gcc/testsuite/lib/target-supports.exp', but instead
> in feature/target-specific 'gcc/testsuite/lib/target-supports-*.exp'
> files.  (..., and hoping that other maintainers also pick up this new
> scheme, and likewise move any feature/target-specific stuff from
> 'gcc/testsuite/lib/target-supports.exp', for example, into new
> 'gcc/testsuite/lib/target-supports-*.exp' files, to un-bloat the former
> one.)

I've not yet had any response to that proposal, so I've for now done it
the standard way, and pushed to trunk branch
commit a121af90fe9244258c8620901dd6fa22537767bb
"Move from 'gcc.target/nvptx/nvptx.exp' into 'target-supports.exp' additions 
for nvptx target",
see attached.


Grüße
 Thomas


>From a121af90fe9244258c8620901dd6fa22537767bb Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 22 Jul 2024 14:40:34 +0200
Subject: [PATCH] Move from 'gcc.target/nvptx/nvptx.exp' into
 'target-supports.exp' additions for nvptx target

	gcc/testsuite/
	* gcc.target/nvptx/nvptx.exp
	(check_effective_target_default_ptx_isa_version_at_least)
	(check_effective_target_default_ptx_isa_version_at_least_6_0)
	(check_effective_target_runtime_ptx_isa_version_at_least)
	(check_effective_target_runtime_ptx_alias)
	(add_options_for_ptx_alias): Move...
	* lib/target-supports.exp
	(check_nvptx_default_ptx_isa_version_at_least)
	(check_effective_target_nvptx_default_ptx_isa_version_at_least_6_0)
	(check_nvptx_runtime_ptx_isa_version_at_least)
	(check_effective_target_nvptx_runtime_alias_ptx)
	(add_options_for_nvptx_alias_ptx): ... here.
	* gcc.target/nvptx/alias-1.c: Adjust.
	* gcc.target/nvptx/alias-2.c: Likewise.
	* gcc.target/nvptx/alias-3.c: Likewise.
	* gcc.target/nvptx/alias-4.c: Likewise.
	* gcc.target/nvptx/alias-to-alias-1.c: Likewise.
	* gcc.target/nvptx/alias-weak-1.c: Likewise.
	* gcc.target/nvptx/uniform-simt-5.c: Likewise.
	gcc/
	* doc/sourcebuild.texi (Effective-Target Keywords): Document
	'nvptx_default_ptx_isa_version_at_least_6_0',
	'nvptx_runtime_alias_ptx'.
	(Add Options): Document 'nvptx_alias_ptx'.
---
 gcc/doc/sourcebuild.texi  | 14 
 gcc/testsuite/gcc.target/nvptx/alias-1.c  |  4 +-
 gcc/testsuite/gcc.target/nvptx/alias-2.c  |  4 +-
 gcc/testsuite/gcc.target/nvptx/alias-3.c  |  4 +-
 gcc/testsuite/gcc.target/nvptx/alias-4.c  |  4 +-
 .../gcc.target/nvptx/alias-to-alias-1.c   |  2 +-
 gcc/testsuite/gcc.target/nvptx/alias-weak-1.c |  2 +-
 gcc/testsuite/gcc.target/nvptx/nvptx.exp  | 66 -
 .../gcc.target/nvptx/uniform-simt-5.c |  4 +-
 gcc/testsuite/lib/target-supports.exp | 72 +++
 10 files changed, 98 insertions(+), 78 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 7c7094dc5a9..6ba72fd44a2 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2424,6 +2424,17 @@ MSP430 target has the small memory model enabled (@code{-msmall}).
 MSP430 target has the large memory model enabled (@code{-mlarge}).
 @end table
 
+@subsubsection nvptx-specific attributes
+
+@table @code
+@item nvptx_default_ptx_isa_version_at_least_6_0
+nvptx code by default compiles for at least PTX ISA version 6.0.
+
+@item nvptx_runtime_alias_ptx
+The nvptx runtime environment supports the PTX ISA directive
+@code{.alias}.
+@end table
+
 @subsubsection PowerPC-specific attributes
 
 @table @code
@@ -3302,6 +3313,9 @@ compliance mode.
 @code{mips16} function attributes.
 Only MIPS targets support this feature, and only then in certain modes.
 
+@item nvptx_alias_ptx
+Enable using the PTX ISA directive @code{.alias} on nvptx targets.
+
 @item riscv_a
 Add the 'A' extension to the -march string on RISC-V targets.
 
diff --git a/gcc/testsuite/gcc.target/nvptx/alias-1.c b/gcc/testsuite/gcc.target/nvptx/alias-1.c
index d251eee6e42..1c0642b14d9 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-1.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-1.c
@@ -1,7 +1,7 @@
 /* { dg-do link } */
-/* { dg-do run { target runtime_ptx_alias } } */
+/* { dg-do run { target nvptx_runtime_alias_ptx } } */
 /* { dg-options "-save-temps" } */
-/* { dg-add-options ptx_alias } */
+/* { dg-add-options nvptx_alias_ptx } */
 
 int v;
 
diff --git a/gcc/testsuite/gcc.target/nvptx/alias-2.c b/gcc/testsuite/gcc.target/nvptx/alias-2.c
index 96cb7e2c1ef..5c4b9c787e1 100644
--- a/gcc/testsuite/gcc.target/nvptx/alias-2.c
+++ b/gcc/testsuite/gcc.target/nvptx/alias-2.c
@@ -1,7 +1,7 @@
 /* { dg-do link } */
-/* { dg-do run { target runtime_ptx_alias } } */
+

nvptx: Use 'enum ptx_version', 'enum ptx_isa' instead of 'int'

2024-09-04 Thread Thomas Schwinge
Hi!

Pushed to trunk branch in commit fee2fbedbb43ad7a017a33ed2b820be79b75e7e5
"nvptx: Use 'enum ptx_version', 'enum ptx_isa' instead of 'int'", see
attached.


Grüße
 Thomas


>From fee2fbedbb43ad7a017a33ed2b820be79b75e7e5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 22 Jul 2024 10:49:16 +0200
Subject: [PATCH] nvptx: Use 'enum ptx_version', 'enum ptx_isa' instead of
 'int'

This allows getting rid of the respective type casts.  No change in behavior
intended.

	gcc/
	* config/nvptx/gen-opt.sh: Use 'enum ptx_isa' instead of 'int'.
	* config/nvptx/nvptx-gen.opt: Regenerate.
	* config/nvptx/nvptx.opt: Use 'enum ptx_version' instead of 'int'.
	* config/nvptx/nvptx-opts.h (enum ptx_isa): Add 'PTX_ISA_unset'.
	(enum ptx_version): Add 'PTX_VERSION_unset'.
	* config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Adjust.
	* config/nvptx/nvptx.cc (default_ptx_version_option)
	(handle_ptx_version_option, nvptx_option_override)
	(nvptx_file_start): Likewise.
---
 gcc/config/nvptx/gen-opt.sh| 14 +-
 gcc/config/nvptx/nvptx-c.cc|  6 ++
 gcc/config/nvptx/nvptx-gen.opt |  2 +-
 gcc/config/nvptx/nvptx-opts.h  |  4 +++-
 gcc/config/nvptx/nvptx.cc  | 24 
 gcc/config/nvptx/nvptx.opt |  9 ++---
 6 files changed, 37 insertions(+), 22 deletions(-)

diff --git a/gcc/config/nvptx/gen-opt.sh b/gcc/config/nvptx/gen-opt.sh
index 3f7838251d2..6022f51f897 100644
--- a/gcc/config/nvptx/gen-opt.sh
+++ b/gcc/config/nvptx/gen-opt.sh
@@ -38,12 +38,24 @@ echo
 
 . $gen_copyright_sh opt
 
+# Not emitting the following here (in addition to having it in 'nvptx.opt'), as
+# we'll otherwise run into:
+# 
+# gtyp-input.list:10: file [...]/gcc/config/nvptx/nvptx-opts.h specified more than once for language (all)
+# make[2]: *** [Makefile:2981: s-gtype] Error 1
+: ||
+cat <http://www.gnu.org/licenses/>.
 
 Enum
-Name(ptx_isa) Type(int)
+Name(ptx_isa) Type(enum ptx_isa)
 Known PTX ISA target architectures (for use with the -misa= option):
 
 EnumValue
diff --git a/gcc/config/nvptx/nvptx-opts.h b/gcc/config/nvptx/nvptx-opts.h
index f8975327223..fb5147c143e 100644
--- a/gcc/config/nvptx/nvptx-opts.h
+++ b/gcc/config/nvptx/nvptx-opts.h
@@ -22,6 +22,7 @@
 
 enum ptx_isa
 {
+  PTX_ISA_unset,
 #define NVPTX_SM(XX, SEP) PTX_ISA_SM ## XX SEP
 #define NVPTX_SM_SEP ,
 #include "nvptx-sm.def"
@@ -31,7 +32,8 @@ enum ptx_isa
 
 enum ptx_version
 {
-  PTX_VERSION_default,
+  PTX_VERSION_unset,
+  PTX_VERSION_default = PTX_VERSION_unset,
   PTX_VERSION_3_0,
   PTX_VERSION_3_1,
   PTX_VERSION_4_2,
diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index 2a8f713c680..144b8d0c874 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -231,8 +231,7 @@ first_ptx_version_supporting_sm (enum ptx_isa sm)
 static enum ptx_version
 default_ptx_version_option (void)
 {
-  enum ptx_version first
-= first_ptx_version_supporting_sm ((enum ptx_isa) ptx_isa_option);
+  enum ptx_version first = first_ptx_version_supporting_sm (ptx_isa_option);
 
   /* Pick a version that supports the sm.  */
   enum ptx_version res = first;
@@ -311,20 +310,21 @@ sm_version_to_string (enum ptx_isa sm)
 static void
 handle_ptx_version_option (void)
 {
-  if (!OPTION_SET_P (ptx_version_option)
-  || ptx_version_option == PTX_VERSION_default)
+  if (!OPTION_SET_P (ptx_version_option))
+gcc_checking_assert (ptx_version_option == PTX_VERSION_default);
+
+  if (ptx_version_option == PTX_VERSION_default)
 {
   ptx_version_option = default_ptx_version_option ();
   return;
 }
 
-  enum ptx_version first
-= first_ptx_version_supporting_sm ((enum ptx_isa) ptx_isa_option);
+  enum ptx_version first = first_ptx_version_supporting_sm (ptx_isa_option);
 
   if (ptx_version_option < first)
 error ("PTX version (%<-mptx%>) needs to be at least %s to support selected"
 	   " %<-misa%> (sm_%s)", ptx_version_to_string (first),
-	   sm_version_to_string ((enum ptx_isa)ptx_isa_option));
+	   sm_version_to_string (ptx_isa_option));
 }
 
 /* Implement TARGET_OPTION_OVERRIDE.  */
@@ -336,7 +336,9 @@ nvptx_option_override (void)
 
   /* Via nvptx 'OPTION_DEFAULT_SPECS', '-misa' always appears on the command
  line; but handle the case that the compiler is not run via the driver.  */
-  if (!OPTION_SET_P (ptx_isa_option))
+  gcc_checking_assert ((ptx_isa_option == PTX_ISA_unset)
+		   == (!OPTION_SET_P (ptx_isa_option)));
+  if (ptx_isa_option == PTX_ISA_unset)
 fatal_error (UNKNOWN_LOCATION, "%<-march=%> must be specified");
 
   handle_ptx_version_option ();
@@ -5953,13 +5955,11 @@ nvptx_file_start (void)
   fputs ("// BEGIN PREAMBLE\n", asm_out_file);
 
   fputs ("\t.version\t", asm_out_file);
-  fputs (ptx_ver

Fix branch prediction dump message (was: Predict loops containing recursive call with fewer iterations)

2024-09-04 Thread Thomas Schwinge
Hi!

On 2016-06-26T21:36:56+0200, Jan Hubicka  wrote:
> this patch [...]

> --- predict.c (revision 237789)
> +++ predict.c (working copy)

> @@ -3367,6 +3446,15 @@ pass_profile::execute (function *fun)
>  gimple_dump_cfg (dump_file, dump_flags);
>   if (profile_status_for_fn (fun) == PROFILE_ABSENT)
>  profile_status_for_fn (fun) = PROFILE_GUESSED;
> + if (dump_file && (dump_flags & TDF_DETAILS))
> +   {
> + struct loop *loop;
> + FOR_EACH_LOOP (loop, LI_FROM_INNERMOST)
> +   if (loop->header->frequency)
> + fprintf (dump_file, "Loop got predicted %d to iterate %i times.\n",
> +loop->num,
> +(int)expected_loop_iterations_unbounded (loop));
> +   }
>return 0;
>  }

... has some in a strange order terms.  ;-) Long ago, Frederik has fixed
this (unfortunately only) on a development branch.  As obvious, I've now
pushed to trunk branch commit 35e4414bac06927387fb7a6fe10b373e766da1c1
"Fix branch prediction dump message", see attached.


Grüße
 Thomas


>From 35e4414bac06927387fb7a6fe10b373e766da1c1 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 16 Nov 2021 16:13:51 +0100
Subject: [PATCH] Fix branch prediction dump message

Instead of, for instance, "Loop got predicted 1 to iterate 10 times"
the message should be "Loop 1 got predicted to iterate 10 times".

gcc/ChangeLog:

	* predict.cc (pass_profile::execute): Fix dump message.

Co-authored-by: Thomas Schwinge 
---
 gcc/predict.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/predict.cc b/gcc/predict.cc
index 43e3694cb42..f611161f4aa 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -4210,7 +4210,7 @@ pass_profile::execute (function *fun)
  sreal iterations;
  for (auto loop : loops_list (cfun, LI_FROM_INNERMOST))
if (expected_loop_iterations_by_profile (loop, &iterations))
-	 fprintf (dump_file, "Loop got predicted %d to iterate %f times.\n",
+	 fprintf (dump_file, "Loop %d got predicted to iterate %f times.\n",
 	   loop->num, iterations.to_double ());
}
   return 0;
-- 
2.34.1



Fix gimple_debug_cfg declaration (was: [PATCH v2 2/N] Introduce dump_flags_t type and use it instead of int, type)

2024-09-04 Thread Thomas Schwinge
Hi!

On 2017-05-17T11:02:09+0200, Martin Liška  wrote:
> On 05/17/2017 09:44 AM, Richard Biener wrote:
>> On Tue, May 16, 2017 at 4:55 PM, Martin Liška  wrote:
>>> On 05/16/2017 03:48 PM, Richard Biener wrote:
 On Fri, May 12, 2017 at 3:00 PM, Martin Liška  wrote:
> Second part changes 'int flags' to a new typedef.
> All corresponding interfaces have been changed.

> installed as r248140.

Long ago, Frederik found that while the 'gimple_debug_cfg' definition had
been adjusted:

| --- a/gcc/tree-cfg.c
| +++ b/gcc/tree-cfg.c
| @@ -2372,7 +2372,7 @@ gimple_debug_bb_n (int n)
| (see TDF_* in dumpfile.h).  */
| 
|  void
| -gimple_debug_cfg (int flags)
| +gimple_debug_cfg (dump_flags_t flags)
|  {
|gimple_dump_cfg (stderr, flags);
|  }

..., but it's prototype had not been:

| --- a/gcc/tree-cfg.h
| +++ b/gcc/tree-cfg.h

|  extern void gimple_debug_cfg (int);

..., and (unfortunately only) fixed that on a development branch.
As obvious, I've now pushed to trunk branch
commit 347a953d855c6b246b1604bdf4728f615cb471b6
"Fix gimple_debug_cfg declaration", see attached.


Grüße
 Thomas


>From 347a953d855c6b246b1604bdf4728f615cb471b6 Mon Sep 17 00:00:00 2001
From: Frederik Harwath 
Date: Tue, 16 Nov 2021 16:08:40 +0100
Subject: [PATCH] Fix gimple_debug_cfg declaration

Silence a warning. The argument type did not match the definition.

gcc/ChangeLog:

	* tree-cfg.h (gimple_debug_cfg): Change argument type from int
	to dump_flags_t.
---
 gcc/tree-cfg.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-cfg.h b/gcc/tree-cfg.h
index 0564b79b4ab..e55991740e8 100644
--- a/gcc/tree-cfg.h
+++ b/gcc/tree-cfg.h
@@ -45,7 +45,7 @@ extern void clear_special_calls (void);
 extern edge find_taken_edge (basic_block, tree);
 extern void gimple_debug_bb (basic_block);
 extern basic_block gimple_debug_bb_n (int);
-extern void gimple_debug_cfg (int);
+extern void gimple_debug_cfg (dump_flags_t);
 extern void gimple_dump_cfg (FILE *, dump_flags_t);
 extern void dump_cfg_stats (FILE *);
 extern void debug_cfg_stats (void);
-- 
2.34.1



[PING] Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN' (was: [PATCH 03/11] Handwritten part of conversion of passes to C++ classes)

2024-09-04 Thread Thomas Schwinge
Hi!

Ping.

On 2024-06-28T15:06:21+0200, I wrote:
> As part of this:
>
> On 2013-07-26T11:04:33-0400, David Malcolm  wrote:
>> This patch is the hand-written part of the conversion of passes from
>> C structs to C++ classes.
>
>> --- a/gcc/passes.c
>> +++ b/gcc/passes.c
>
> ..., we did hard-code 'PUSH_INSERT_PASSES_WITHIN(PASS)' to always refer
> to the first instance of 'PASS':
>
>>  #define PUSH_INSERT_PASSES_WITHIN(PASS) \
>>{ \
>> -struct opt_pass **p = &(PASS).pass.sub;
>> +struct opt_pass **p = &(PASS ## _1)->sub;
>
> ..., however we did change 'NEXT_PASS(PASS, NUM)' to actually use 'NUM':
>
>> -#define NEXT_PASS(PASS, NUM)  (p = next_pass_1 (p, &((PASS).pass)))
>> +#define NEXT_PASS(PASS, NUM) \
>> +  do { \
>> +gcc_assert (NULL == PASS ## _ ## NUM); \
>> +if ((NUM) == 1)  \
>> +  PASS ## _1 = make_##PASS (ctxt_);  \
>> +else \
>> +  {  \
>> +gcc_assert (PASS ## _1); \
>> +PASS ## _ ## NUM = PASS ## _1->clone (); \
>> +  }  \
>> +p = next_pass_1 (p, PASS ## _ ## NUM);  \
>> +  } while (0)
>
> This was never re-synchronized later on, and is problematic if you try to
> do something like this; change:
>
> [...]
> NEXT_PASS (pass_postreload);
> PUSH_INSERT_PASSES_WITHIN (pass_postreload)
> NEXT_PASS (pass_postreload_cse);
> [...]
> NEXT_PASS (pass_cprop_hardreg);
> NEXT_PASS (pass_fast_rtl_dce);
> NEXT_PASS (pass_reorder_blocks);
> [...]
> POP_INSERT_PASSES ()
> [...]
>
> ... into:
>
> [...]
> NEXT_PASS (pass_postreload);
> PUSH_INSERT_PASSES_WITHIN (pass_postreload)
> NEXT_PASS (pass_postreload_cse);
> [...]
> NEXT_PASS (pass_cprop_hardreg);
> POP_INSERT_PASSES ()
> NEXT_PASS (pass_fast_rtl_dce);
> NEXT_PASS (pass_postreload);
> PUSH_INSERT_PASSES_WITHIN (pass_postreload)
> NEXT_PASS (pass_reorder_blocks);
> [...]
> POP_INSERT_PASSES ()
> [...]
>
> That is, interrupt the pass pipeline within 'pass_postreload', in order
> to unconditionally run 'pass_fast_rtl_dce' even if not running
> 'pass_postreload'.  What happens is that the second
> 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' overwrites the first
> 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' instead of applying to the
> second (preceding) 'NEXT_PASS (pass_postreload);'.
>
> (I ran into this in context of what I tried in
> <https://inbox.sourceware.org/87ed8i2ekt@euler.schwinge.ddns.net>
> "nvptx vs. [PATCH] Add a late-combine pass [PR106594]"; discuss that
> specific use case over there, not here.)
>
> OK to address this with the attached
> "Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN'"?
>
> This depends on
> <https://inbox.sourceware.org/87jzi9tgcw@euler.schwinge.ddns.net>
> "Rewrite usage comment at the top of 'gcc/passes.def'" to avoid running
> into the 'ERROR: Can't locate [...]' that I'm adding, while processing
> the 'PUSH_INSERT_PASSES_WITHIN (PASS)' in the usage comment at the top of
> 'gcc/passes.def', where 'NEXT_PASS (PASS)' only appears later.  ;-)

(Already pushed.)

> I've verified this does the expected thing for the main 'gcc/passes.def',
> and that 'PUSH_INSERT_PASSES_WITHIN' is not used/not applicable for
> 'PASSES_EXTRA' ('gcc/config/*/*-passes.def').


Grüße
 Thomas


>From e368ccba93f5bbaee882076c80849adb55a68fa0 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 28 Jun 2024 12:10:12 +0200
Subject: [PATCH] Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN'

..., such that also for repeated 'NEXT_PASS', 'PUSH_INSERT_PASSES_WITHIN' for a
given 'PASS', the 'PUSH_INSERT_PASSES_WITHIN' applies to the preceeding
'NEXT_PASS', and not unconditionally applies to the first 'NEXT_PASS'.

	gcc/
	* gen-pass-instances.awk: Handle 'PUSH_INSERT_PASSES_WITHIN'.
	* pass_manager.h (PUSH_INSERT_PASSES_WITHIN): Adjust.
	* passes.cc (PUSH_INSERT_PASSES_WITHIN): Likewise.
---
 gcc/gen-pass-instances.awk | 28 +---
 gcc/pass_manager.h |  2 +-
 gcc/passes.cc  |  6 +++---
 3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/

Add 'gcc.target/nvptx/alias-to-alias-1.c' (was: [nvptx] Fix code-gen for alias attribute)

2024-09-04 Thread Thomas Schwinge
Hi!

On 2024-09-04T11:45:20+0200, I wrote:
> On 2024-08-26T10:50:36+, Prathamesh Kulkarni  
> wrote:
>> For the following test (adapted from pr96390.c):
>>
>> __attribute__((noipa)) int foo () { return 42; }
>> int bar () __attribute__((alias ("foo")));
>> int baz () __attribute__((alias ("bar")));
>
>> Compiling [for nvptx] results in: [...]

> proposed patch [...] (doesn't affect
> '--target=nvptx-none' test results at all...)

Pushed to trunk branch commit a89321c890b96c583671b73fc802e87545e4a2b1
"Add 'gcc.target/nvptx/alias-to-alias-1.c'", see attached, which as part
of your proposed patch you'll then need to update as follows (or
similar):

--- gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c
+++ gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c
@@ -1,6 +1,8 @@
 /* Alias to alias; 'libgomp.c-c++-common/pr96390.c'.  */
 
-/* { dg-do compile } */
+/* { dg-do link } */
+/* { dg-do run { target runtime_ptx_alias } } */
+/* { dg-options -save-temps } */
 /* { dg-add-options ptx_alias } */
 
 int v;
@@ -23,5 +25,6 @@ main (void)
 /* { dg-final { scan-assembler-times "\\.visible \\.func foo;" 1 } } */
 /* { dg-final { scan-assembler-times "\\.visible \\.func bar;" 1 } } */
 
-/* { dg-final { scan-assembler-times "\\.alias baz,bar;" 1 } } */
+/* Via 'ultimate_alias_target':
+   { dg-final { scan-assembler-times "\\.alias baz,foo;" 1 } } */
     /* { dg-final { scan-assembler-times "\\.visible \\.func baz;" 1 } } */


Grüße
 Thomas


>From a89321c890b96c583671b73fc802e87545e4a2b1 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 4 Sep 2024 09:44:33 +0200
Subject: [PATCH] Add 'gcc.target/nvptx/alias-to-alias-1.c'

... similar to alias to alias usage in 'libgomp.c-c++-common/pr96390.c'.

	PR target/104957
	gcc/testsuite/
	* gcc.target/nvptx/alias-to-alias-1.c: New.
---
 .../gcc.target/nvptx/alias-to-alias-1.c   | 27 +++
 1 file changed, 27 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c

diff --git a/gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c b/gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c
new file mode 100644
index 000..3db79d1fc0b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/alias-to-alias-1.c
@@ -0,0 +1,27 @@
+/* Alias to alias; 'libgomp.c-c++-common/pr96390.c'.  */
+
+/* { dg-do compile } */
+/* { dg-add-options ptx_alias } */
+
+int v;
+
+void foo () { v = 42; }
+void bar () __attribute__((alias ("foo")));
+void baz () __attribute__((alias ("bar")));
+
+int
+main (void)
+{
+  baz ();
+  if (v != 42)
+__builtin_abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "\\.alias bar,foo;" 1 } } */
+/* { dg-final { scan-assembler-times "\\.visible \\.func foo;" 1 } } */
+/* { dg-final { scan-assembler-times "\\.visible \\.func bar;" 1 } } */
+
+/* { dg-final { scan-assembler-times "\\.alias baz,bar;" 1 } } */
+/* { dg-final { scan-assembler-times "\\.visible \\.func baz;" 1 } } */
-- 
2.34.1



Add 'gcc.target/nvptx/alias-weak-1.c' (was: [nvptx] Fix code-gen for alias attribute)

2024-09-04 Thread Thomas Schwinge
Hi!

On 2024-09-04T11:45:20+0200, I wrote:
> +int bar () __attribute__((weak, alias ("foo")));

> Now, that said: GCC/nvptx for such code currently diagnoses
> "error: weak alias definitions not supported [...]" ;-|

Pushed to trunk branch commit 2267d254eb6ad782cef7b462f2bb2128bc8ace30
"Add 'gcc.target/nvptx/alias-weak-1.c'", see attached.


Grüße
 Thomas


>From 2267d254eb6ad782cef7b462f2bb2128bc8ace30 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 4 Sep 2024 09:58:32 +0200
Subject: [PATCH] Add 'gcc.target/nvptx/alias-weak-1.c'

... testing for the GCC/nvptx "weak alias definitions not supported" error
diagnostic (limitation of PTX).

	gcc/testsuite/
	* gcc.target/nvptx/alias-weak-1.c: New.
---
 gcc/testsuite/gcc.target/nvptx/alias-weak-1.c | 10 ++
 1 file changed, 10 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/nvptx/alias-weak-1.c

diff --git a/gcc/testsuite/gcc.target/nvptx/alias-weak-1.c b/gcc/testsuite/gcc.target/nvptx/alias-weak-1.c
new file mode 100644
index 000..37d9543fc7f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/nvptx/alias-weak-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-add-options ptx_alias } */
+
+void __f ()
+{
+}
+
+void f () __attribute__ ((weak, alias ("__f")));
+/* { dg-error {weak alias definitions not supported} {} { target *-*-* } .-1 }
+   (limitation of PTX).  */
-- 
2.34.1



Re: [nvptx] Fix code-gen for alias attribute

2024-09-04 Thread Thomas Schwinge
Hi!

Honza (or others, of course), there's a question about
'ultimate_alias_target'.

On 2024-08-26T10:50:36+, Prathamesh Kulkarni  wrote:
> For the following test (adapted from pr96390.c):
>
> __attribute__((noipa)) int foo () { return 42; }
> int bar () __attribute__((alias ("foo")));
> int baz () __attribute__((alias ("bar")));

> Compiling [for nvptx] results in:
>
> ptxas fatal   : Internal error: alias to unknown symbol
> nvptx-as: ptxas returned 255 exit status

Prathamesh: thanks for looking into this, and ACK: one of the many
limitations of PTX '.alias'.  :-|

> This happens because ptx code-gen shows:
>
> // BEGIN GLOBAL FUNCTION DEF: foo
> .visible .func (.param.u32 %value_out) foo
> {
>   [...]
> }
> .visible .func (.param.u32 %value_out) bar;
> .alias bar,foo;
> .visible .func (.param.u32 %value_out) baz;
> .alias baz,bar;

> .alias baz, bar is invalid since PTX requires aliasee to be a defined 
> function:
> https://sw-docs-dgx-station.nvidia.com/cuda-latest/parallel-thread-execution/latest-internal/#kernel-and-function-directives-alias

(Us ordinary mortals need to look at
;
please update the Git commit log.)

> The patch uses cgraph_node::get(name)->ultimate_alias_target () instead of 
> the provided value in nvptx_asm_output_def_from_decls.

I confirm that resolving to 'ultimate_alias_target' does work for this
case:

> For the above case, it now generates the following ptx:
>
> .alias baz,foo; 
> instead of:
> .alias baz,bar;
>
> which fixes the issue.

..., but I'm not sure if that's conceptually correct; I'm not familiar
with 'ultimate_alias_target' semantics.  (Honza?)

Also, I wonder whether 'gcc/varasm.cc:do_assemble_alias' is prepared for
'ASM_OUTPUT_DEF_FROM_DECLS' to disregard the specified 'target'/'value'
and instead do its own thing (here, the proposed resolving to
'ultimate_alias_target')?  (No other GCC back end appears to be doing
such a thing; from a quick look, all appear to faithfully use the
specified 'target'/'value'.)

Now, consider the case that the source code is changed as follows:

 __attribute__((noipa)) int foo () { return 42; }
-int bar () __attribute__((alias ("foo")));
+int bar () __attribute__((weak, alias ("foo")));
 int baz () __attribute__((alias ("bar")));

With 'ultimate_alias_target', I've checked, you'd then still emit
'.alias baz,foo;', losing the ability to override the weak alias with a
strong 'bar' definition in another compilation unit?

Now, that said: GCC/nvptx for such code currently diagnoses
"error: weak alias definitions not supported [...]" ;-| -- so we may be
safe, after all?  ..., or is there any other way that the resolving to
'ultimate_alias_target' might cause issues?  If not, then at least your
proposed patch shouldn't be causing any harm (doesn't affect
'--target=nvptx-none' test results at all...), and does address one
user-visible issue ('libgomp.c-c++-common/pr96390.c'), and thus makes
sense to install.

> [nvptx] Fix code-gen for alias attribute.

I'd rather suggest something like:
"[nvptx] (Some) support for aliases to aliases" (or similar).

Also, please add "PR target/104957" to the Git commit log, as your change
directly alters this one aspect of PR104957
"[nvptx] Use .alias directive (available starting ptx isa version 6.3)"'s
commit r12-7766-gf8b15e177155960017ac0c5daef8780d1127f91c
"[nvptx] Use .alias directive for mptx >= 6.3":

| Aliases to aliases are not supported (see libgomp.c-c++-common/pr96390.c).
| This is currently not prohibited by the compiler, but with the driver link we
| run into:  "Internal error: alias to unknown symbol" .

... which we then have (some) support for with the proposed code changes:

> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc
> @@ -7583,7 +7583,8 @@ nvptx_mem_local_p (rtx mem)
>while (0)
>  
>  void
> -nvptx_asm_output_def_from_decls (FILE *stream, tree name, tree value)
> +nvptx_asm_output_def_from_decls (FILE *stream, tree name,
> +  tree value ATTRIBUTE_UNUSED)
>  {
>if (nvptx_alias == 0 || !TARGET_PTX_6_3)
>  {
> @@ -7618,7 +7619,8 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree 
> name, tree value)
>return;
>  }
>  
> -  if (!cgraph_node::get (name)->referred_to_p ())
> +  cgraph_node *cnode = cgraph_node::get (name);
> +  if (!cnode->referred_to_p ())
>  /* Prevent "Internal error: reference to deleted section".  */
>  return;
>  
> @@ -7627,8 +7629,10 @@ nvptx_asm_output_def_from_decls (FILE *stream, tree 
> name, tree value)
>fputs (s.str ().c_str (), stream);
>  
>tree id = DECL_ASSEMBLER_NAME (name);
> +  symtab_node *alias_target_node = cnode->ultimate_alias_target ();
> +  tree alias_target_id = DECL_ASSEMBLER_NAME (alias_target_node->decl);
>NVPTX_ASM_OUTPUT_DEF (stream, IDENTIFIER_POINTER (id),
> - IDENTIFIER_POINTER (value));
> +   

Un-XFAIL 'gcc.dg/signbit-5.c' for GCN (was: [PATCH] RISC-V: Remove testcase XFAIL)

2024-08-27 Thread Thomas Schwinge
Hi!

On 2024-08-19T13:14:02-0700, Edwin Lu  wrote:
> The testcase has been modified to include the -fwrapv flag which now
> causes the test to pass. Remove the xfail exception

> --- a/gcc/testsuite/gcc.dg/signbit-5.c
> +++ b/gcc/testsuite/gcc.dg/signbit-5.c
> @@ -4,7 +4,6 @@
>  /* This test does not work when the truth type does not match vector type.  
> */
>  /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
>  /* { dg-xfail-run-if "truth type does not match vector type" { amdgcn-*-* } 
> } */
> -/* { dg-xfail-run-if "truth type does not match vector type" { riscv_v } } */

Same thing for GCN; I've pushed to trunk branch
commit 2daf6187c7289d012365419e10995042139cf8f5
"Un-XFAIL 'gcc.dg/signbit-5.c' for GCN", see attached.


Grüße
 Thomas


>From 2daf6187c7289d012365419e10995042139cf8f5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 27 Aug 2024 12:37:29 +0200
Subject: [PATCH] Un-XFAIL 'gcc.dg/signbit-5.c' for GCN

It XPASSes after recent commit 5a3387938d4d95717cac29eecd0ba53e0ef9094d
"testsuite: Add -fwrapv to signbit-5.c".

	gcc/testsuite/
	* gcc.dg/signbit-5.c: Un-XFAIL for GCN.
---
 gcc/testsuite/gcc.dg/signbit-5.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/signbit-5.c b/gcc/testsuite/gcc.dg/signbit-5.c
index e65c8910c82..57e29e3ca63 100644
--- a/gcc/testsuite/gcc.dg/signbit-5.c
+++ b/gcc/testsuite/gcc.dg/signbit-5.c
@@ -3,7 +3,6 @@
 
 /* This test does not work when the truth type does not match vector type.  */
 /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
-/* { dg-xfail-run-if "truth type does not match vector type" { amdgcn-*-* } } */
 
 
 #include 
-- 
2.34.1



RE: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-08-13 Thread Thomas Schwinge
Hi Prathamesh!

On 2024-08-12T07:50:07+, Prathamesh Kulkarni  wrote:
>> From: Thomas Schwinge 
>> Sent: Friday, August 9, 2024 12:55 AM

>> On 2024-08-08T06:46:25-0700, Andrew Pinski  wrote:
>> > On Thu, Aug 8, 2024 at 6:11 AM Prathamesh Kulkarni
>> >  wrote:
>> >> After differing NUM_POLY_INT_COEFFS fix for AArch64/nvptx
>> offloading, the following minimal test:
>> 
>> First, thanks for your work on enabling this!  I will say that I had
>> the plan to re-engage with Nvidia to hire us (as initial implementors
>> of GCC/nvptx offloading) to make AArch64/nvptx offloading work, but
>> now that Nvidia has its own GCC team, that's great that you're able to
>> work on this yourself!  :-)
>> 
>> Please CC me for GCC/nvptx issues for (at least potentially...) faster
>> response times.
> Thanks, will do 😊

Heh, so much for "potentially": I'm not able to spend a lot of time on
this right now, as I shall soon be out of office.  Quickly:

>> >> compiled with -fopenmp -foffload=nvptx-none now fails with:
>> >> gcc: error: unrecognized command-line option '-m64'
>> >> nvptx mkoffload: fatal error: ../install/bin/gcc returned 1 exit
>> status compilation terminated.
>> 
>> Heh.  Yeah...
>> 
>> >> As mentioned in RFC email, this happens because
>> >> nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host compiler
>> depending on whether offload_abi is OFFLOAD_ABI_LP64 or
>> OFFLOAD_ABI_ILP32, and aarch64 backend doesn't recognize these
>> options.

>> So, my idea is: instead of the current strategy that the host
>> 'TARGET_OFFLOAD_OPTIONS' synthesizes '-foffload-abi=lp64' etc., which
>> the 'mkoffload's then interpret and re-synthesize '-m64' etc. -- how
>> about we instead directly tell the 'mkoffload's the relevant ABI
>> options?  That is, 'TARGET_OFFLOAD_OPTIONS' instead synthesizes '-
>> foffload-abi=-m64'
>> etc., which the 'mkoffload's can then readily use.  Could you please
>> give that a try, and/or does anyone see any issues with that approach?
>> 
>> And use something like '-foffload-abi=disable' to replace the current:
>> 
>> /* PR libgomp/65099: Currently, we only support offloading in 64-
>> bit
>>configurations.  */
>> if (offload_abi == OFFLOAD_ABI_LP64)
>>   {
>> 
>> (As discussed before, this should be done differently altogether, but
>> that's for another day.)
> Sorry, I don't quite follow. Currently we enable offloading if offload_abi == 
> OFFLOAD_ABI_LP64,
> which is synthesized from -foffload-abi=lp64. If we change -foffload-abi to 
> instead specify
> host-specific ABI opts, I guess mkoffload will still need to somehow figure 
> out which ABI is used,
> so it can disable offloading for 32-bit ? I suppose we could adjust 
> TARGET_OFFLOAD_OPTIONS for each
> host to pass -foffload-abi=disable if TARGET_ILP32 is set and offload target 
> is nvptx, but not sure
> if that'd be correct ?

Basically, yes.  My idea was that all 'TARGET_OFFLOAD_OPTIONS'
implementations return either the correct host flags to be used by the
'mkoffload's (the case that offloading is supported for the current host
flags/ABI configuration), or otherwise return '-foffload-abi=disable'.
For example (untested):

>  char *
>  ix86_offload_options (void)
>  {
>if (TARGET_LP64)
> -return xstrdup ("-foffload-abi=lp64");
> +return xstrdup ("-foffload-abi=-m64");
> -  return xstrdup ("-foffload-abi=ilp32");
> +  return xstrdup ("-foffload-abi=disable");
>  }

That is, only for 'TARGET_LP64' offloading is supported, and via
'-foffload-abi=-m64' the 'mkoffload's know that they need to specify
'-m64'.  For other host flags/ABI configuration, the 'mkoffload's see
'-foffload-abi=disable' and thus disable offload code generation
(replacing the current 'if (offload_abi == OFFLOAD_ABI_LP64)' in
'mkoffload').

> In the attached patch

Yes, that's going in the right direction, thanks!

> I added another option -foffload-abi-host-opts to specify host abi
> opts, and leave -foffload-abi to specify if ABI is 32/64 bit which mkoffload 
> can use to
> enable/disable offloading (as before).

I'm not sure however, if this additional option is really necessary?

In case we're not happy to re-purpose the flag name
'-foffload-abi=[...]', we could also rename that one to
'-foffload-abi-host-opts=[...

OpenMP: Constructors and destructors for "declare target" static aggregates: Fix effective-target keyword in test cases (was: [PATCH, v3] OpenMP: Constructors and destructors for "declare target" stat

2024-08-09 Thread Thomas Schwinge
Hi!

On 2024-08-07T14:08:42+0200, Tobias Burnus  wrote:
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C

> +// { dg-additional-options -foffload-options=-fdump-tree-optimized { target 
> { offload_device_nvptx || offload_target_amdgcn } } }

Note here: 'offload_device_nvptx' vs. 'offload_target_amdgcn', but...

> +// { dg-final { only_for_offload_target amdgcn-amdhsa 
> scan-offload-tree-dump-not "omp_initial_device;" "optimized" { target 
> offload_target_amdgcn } } }
> +// { dg-final { only_for_offload_target amdgcn-amdhsa scan-offload-tree-dump 
> "v1\\._x = 5;" "optimized" { target offload_target_amdgcn } } }
> +// { dg-final { only_for_offload_target nvptx-none 
> scan-offload-tree-dump-not "omp_initial_device;" "optimized" { target 
> offload_target_nvptx } } }
> +// { dg-final { only_for_offload_target nvptx-none scan-offload-tree-dump 
> "v1\\._x = 5;" "optimized" { target offload_target_nvptx } } }

... here: 'offload_target_nvptx', 'offload_target_amdgcn', resulting in a
few UNRESOLVEDs.

> [Etc.]

Pushed to trunk branch commit 9f5d22e3e2b8e4532896a4f3837cb86006d5930c
"OpenMP: Constructors and destructors for "declare target" static aggregates: 
Fix effective-target keyword in test cases",
see attached.


Grüße
 Thomas


>From 9f5d22e3e2b8e4532896a4f3837cb86006d5930c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 9 Aug 2024 11:23:15 +0200
Subject: [PATCH] OpenMP: Constructors and destructors for "declare target"
 static aggregates: Fix effective-target keyword in test cases

(Most of) the tests added in commit f1bfba3a9b3f31e3e06bfd1911c9f223869ea03f
"OpenMP: Constructors and destructors for "declare target" static aggregates"
had a mismatch between dump file production and its scanning; the former needs
to use 'offload_target_nvptx' (like 'offload_target_amdgcn'), not
'offload_device_nvptx'.

	libgomp/
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C:
	Fix effective-target keyword.
	* testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C:
	Likewise.
	* testsuite/libgomp.c-c++-common/target-is-initial-host-2.c:
	Likewise.
	* testsuite/libgomp.c-c++-common/target-is-initial-host.c:
	Likewise.
	* testsuite/libgomp.fortran/target-is-initial-host-2.f90:
	Likewise.
	* testsuite/libgomp.fortran/target-is-initial-host.f: Likewise.
	* testsuite/libgomp.fortran/target-is-initial-host.f90: Likewise.
---
 .../libgomp.c++/static-aggr-constructor-destructor-1.C  | 2 +-
 .../libgomp.c++/static-aggr-constructor-destructor-2.C  | 2 +-
 .../testsuite/libgomp.c-c++-common/target-is-initial-host-2.c   | 2 +-
 libgomp/testsuite/libgomp.c-c++-common/target-is-initial-host.c | 2 +-
 libgomp/testsuite/libgomp.fortran/target-is-initial-host-2.f90  | 2 +-
 libgomp/testsuite/libgomp.fortran/target-is-initial-host.f  | 2 +-
 libgomp/testsuite/libgomp.fortran/target-is-initial-host.f90| 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C
index b5aafc8cabc..a704e39411d 100644
--- a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C
+++ b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-1.C
@@ -1,6 +1,6 @@
 // { dg-do run }
 // { dg-additional-options "-fdump-tree-gimple -fdump-tree-optimized" }
-// { dg-additional-options -foffload-options=-fdump-tree-optimized { target { offload_device_nvptx || offload_target_amdgcn } } }
+// { dg-additional-options -foffload-options=-fdump-tree-optimized { target { offload_target_nvptx || offload_target_amdgcn } } }
 
 // { dg-final { scan-tree-dump-times "omp_is_initial_device" 1 "gimple" } }
 // { dg-final { scan-tree-dump-times "_GLOBAL__off_I_v1" 1 "gimple" } }
diff --git a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C
index 9652a721bbe..de481aadd34 100644
--- a/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C
+++ b/libgomp/testsuite/libgomp.c++/static-aggr-constructor-destructor-2.C
@@ -1,6 +1,6 @@
 // { dg-do run }
 // { dg-additional-options "-fdump-tree-gimple -fdump-tree-optimized" }
-// { dg-additional-options -foffload-options=-fdump-tree-optimized { target { offload_device_nvptx || offload_target_amdgcn } } }
+// { dg-additional-options -foffload-options=-fdump-tree-optimized { target { offload_target_nvptx || offload_target_amdgcn } } }
 
 // { dg-final { scan-tree-dump-times "omp_is_initial_device" 1 "gimple" } }
 // { dg-final { scan-tree-dump-times

Re: [commit] amdgcn: Re-enable trampolines

2024-08-08 Thread Thomas Schwinge
Hi Andrew!

On 2024-08-08T13:50:17+, Andrew Stubbs  wrote:
> Previously, trampolines worked on GCN3 devices, but the newer GCN5
> devices had different permissions on the stack memory space we were
> using.
>
> That changed when we added the reverse-offload features because we
> switched from using the "private" memory space to using a regular memory
> allocation.
>
> The execute permissions on this new space permit trampolines to work
> just as they did before.

ACK; I see a lot of UNSUPPORTED -> PASS progressions (tested
'-march=gfx908', '-march=gfx1100').

Just two non-good ones:

[-UNSUPPORTED:-]{+FAIL:+} gcc.dg/20050607-1.c {+(test for excess errors)+}

[...]/gcc.dg/20050607-1.c: In function 'foo':
[...]/gcc.dg/20050607-1.c:7:5: warning: padding struct size to alignment 
boundary with 4 bytes [-Wpadded]

..., which, as I understand the test case is what should *not* be
happening.

And the other:

[-FAIL:-]{+PASS:+} gfortran.dg/optional_absent_8.f90   -O0  (test for 
excess errors)
[-UNRESOLVED:-]{+FAIL:+} gfortran.dg/optional_absent_8.f90   -O0  
[-compilation failed to produce executable-]{+execution test+}
PASS: gfortran.dg/optional_absent_8.f90   -O1  (test for excess errors)
PASS: gfortran.dg/optional_absent_8.f90   -O1  execution test
PASS: gfortran.dg/optional_absent_8.f90   -O2  (test for excess errors)

STOP 11


Grüße
 Thomas


> This patch has been committed to mainline and will be pushed to the OG14
> branch shortly.
>
> Andrew
>
> gcc/ChangeLog:
>
>   * config/gcn/gcn.cc (gcn_trampoline_init): Re-enable trampolines.
> ---
>  gcc/config/gcn/gcn.cc | 5 -
>  1 file changed, 5 deletions(-)
>
> diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
> index 00f2978559b..b22132de6ab 100644
> --- a/gcc/config/gcn/gcn.cc
> +++ b/gcc/config/gcn/gcn.cc
> @@ -3799,11 +3799,6 @@ gcn_asm_trampoline_template (FILE *f)
>  static void
>  gcn_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value)
>  {
> -  // FIXME
> -  if (TARGET_GCN5_PLUS)
> -sorry ("nested function trampolines not supported on GCN5 due to"
> -   " non-executable stacks");
> -
>emit_block_move (m_tramp, assemble_trampoline_template (),
>  GEN_INT (TRAMPOLINE_SIZE), BLOCK_OP_NORMAL);
>  
> -- 
> 2.45.2


Re: [nvptx] Pass -m32/-m64 to host_compiler if it has multilib support

2024-08-08 Thread Thomas Schwinge
Hi Prathamesh!

On 2024-08-08T06:46:25-0700, Andrew Pinski  wrote:
> On Thu, Aug 8, 2024 at 6:11 AM Prathamesh Kulkarni
>  wrote:
>> After differing NUM_POLY_INT_COEFFS fix for AArch64/nvptx offloading, the 
>> following minimal test:

First, thanks for your work on enabling this!  I will say that I had the
plan to re-engage with Nvidia to hire us (as initial implementors of
GCC/nvptx offloading) to make AArch64/nvptx offloading work, but now that
Nvidia has its own GCC team, that's great that you're able to work on
this yourself!  :-)

Please CC me for GCC/nvptx issues for (at least potentially...) faster
response times.

>> compiled with -fopenmp -foffload=nvptx-none now fails with:
>> gcc: error: unrecognized command-line option '-m64'
>> nvptx mkoffload: fatal error: ../install/bin/gcc returned 1 exit status 
>> compilation terminated.

Heh.  Yeah...

>> As mentioned in RFC email, this happens because 
>> nvptx/mkoffload.cc:compile_native passes -m64/-m32 to host compiler 
>> depending on whether
>> offload_abi is OFFLOAD_ABI_LP64 or OFFLOAD_ABI_ILP32, and aarch64 backend 
>> doesn't recognize these options.
>>
>> Based on your suggestion in: 
>> https://gcc.gnu.org/pipermail/gcc/2024-July/244470.html,
>> The attached patch generates new macro HOST_MULTILIB derived from 
>> $enable_as_accelerator_for, and in mkoffload.cc it gates passing -m32/-m64
>> to host_compiler on HOST_MULTILIB. I verified that the macro is set to 0 for 
>> aarch64 host (and thus avoids above unrecognized command line option error),
>> and is set to 1 for x86_64 host.
>>
>> Does the patch look OK ?
>
> Note I think the usage of the name MULTILIB here is wrong because
> aarch64 (and riscv) could have MUTLILIB support just the options are
> different.

I also think the proposed patch is not quite the right hammer for the
issue at hand.

> For aarch64, it would be -mabi=ilp32/-mabi=lp64 (riscv it
> is more complex).
>
> This most likely should be something more complex due to the above.

Right.

> Maybe call it HOST_64_32 but even that seems wrong due to Aarch64
> having ILP32 support and such.

Right.

> What about HOST_64ABI_OPTS="-mabi=lp64"/HOST_32ABI_OPTS="-mabi=ilp32"
> but  I am not sure if that would be enough to support RISCV which
> requires two options.

So, my idea is: instead of the current strategy that the host
'TARGET_OFFLOAD_OPTIONS' synthesizes '-foffload-abi=lp64' etc., which the
'mkoffload's then interpret and re-synthesize '-m64' etc. -- how about we
instead directly tell the 'mkoffload's the relevant ABI options?  That
is, 'TARGET_OFFLOAD_OPTIONS' instead synthesizes '-foffload-abi=-m64'
etc., which the 'mkoffload's can then readily use.  Could you please give
that a try, and/or does anyone see any issues with that approach?

And use something like '-foffload-abi=disable' to replace the current:

/* PR libgomp/65099: Currently, we only support offloading in 64-bit
   configurations.  */
if (offload_abi == OFFLOAD_ABI_LP64)
  {

(As discussed before, this should be done differently altogether, but
that's for another day.)


Grüße
 Thomas


Re: [PATCH] PR116080: Fix test suite checks for musttail

2024-08-07 Thread Thomas Schwinge
Hi Andi!

On 2024-08-02T14:12:59-0700, Andi Kleen  wrote:
> Andi Kleen  writes:
>> This is a new attempt to fix PR116080. The previous try was reverted
>> because it just broke a bunch of tests, hiding the problem.
>
> The previous version still had one failure on powerpc because
> of a template call that needs a dg-error check for external_tail_call.
> I fixed that now in the below version.
>
> Okay for trunk? I would like to check that one in to avoid the noise
> in the regression reports.

I've tested this version in a few trees.

('-Wc++-compat' are the C test cases, '-std=c++YY' the C++ ones.)


For x86_64 GNU/Linux, '-m32' testing, this does resolve the previous
FAILs:

[-FAIL:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -std=c++11[-(test for 
excess errors)-]
[-FAIL:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -std=c++17[-(test for 
excess errors)-]
[-FAIL:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -std=c++26[-(test for 
excess errors)-]

[-FAIL:-]{+UNSUPPORTED:+} g++.dg/musttail6.C[-(test for excess errors)-]  

..., but also "regresses" (PASS -> UNSUPPORTED):

[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -Wc++-compat[-(test 
for excess errors)-] 

[-PASS: c-c++-common/musttail3.c  -Wc++-compat  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -Wc++-compat[-(test for 
excess errors)-]

[-PASS: c-c++-common/musttail3.c  -std=c++11  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -std=c++11[-(test for 
excess errors)-]
[-PASS: c-c++-common/musttail3.c  -std=c++17  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -std=c++17[-(test for 
excess errors)-]
[-PASS: c-c++-common/musttail3.c  -std=c++26  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -std=c++26[-(test for 
excess errors)-]

That's because of effective-target 'struct_musttail' for '-m32'
reporting:

struct_musttail1494739.cc: In function 'foo bar()':
struct_musttail1494739.cc:5:88: error: cannot tail-call: return value used 
after call

(I'm just mentioning the latter "regressions" in case those are
unexpected.)


For powerpc64le GNU/Linux, this does resolve the previous FAIL:

PASS: g++.dg/musttail10.C(test for errors, line 11)
{+PASS: g++.dg/musttail10.C(test for errors, line 15)+}
PASS: g++.dg/musttail10.C(test for errors, line 20)
PASS: g++.dg/musttail10.C(test for errors, line 24)
PASS: g++.dg/musttail10.C(test for errors, line 7)
[-FAIL:-]{+PASS:+} g++.dg/musttail10.C   (test for excess errors)

..., but similarly "regresses" (PASS -> UNSUPPORTED):

[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -Wc++-compat[-(test 
for excess errors)-] 

[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -std=c++11[-(test for 
excess errors)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -std=c++17[-(test for 
excess errors)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail12.c  -std=c++26[-(test for 
excess errors)-]

[-PASS: c-c++-common/musttail3.c  -Wc++-compat  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -Wc++-compat[-(test for 
excess errors)-] 

[-PASS: c-c++-common/musttail3.c  -std=c++11  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -std=c++11[-(test for 
excess errors)-]
[-PASS: c-c++-common/musttail3.c  -std=c++17  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -std=c++17[-(test for 
excess errors)-]
[-PASS: c-c++-common/musttail3.c  -std=c++26  (test for errors, line 26)-]
[-PASS:-]{+UNSUPPORTED:+} c-c++-common/musttail3.c  -std=c++26[-(test for 
excess errors)-]

Here, that's because of effective-target 'struct_musttail' reporting:

struct_musttail485321.cc: In function 'foo bar()':
struct_musttail485321.cc:5:88: error: cannot tail-call: target is not able 
to optimize the call into a sibling call

(Again, I'm just mentioning the latter "regressions" in case those are
unexpected.)


So: looks good, all FAILs resolved (in these GCC configurations).


Grüße
 Thomas


> This is a new attempt to fix PR116080. The previous try was reverted
> because it just broke a bunch of tests, hiding the problem.
>
> - musttail behaves differently than tailcall at -O0. Some of the test
> run at -O0, so add separate effective target tests for musttail.
> - New effective target tests need to use unique file names
> to make dejagnu caching work
> - Change the tests to use new targets
> - Add a external_musttail test to check for target's ability
> to do tail calls between translation units. This covers some powerpc
> ABIs.
>
> gcc/testsuite/ChangeLog:
>
>   PR testsuite/116080
>   * c-c++-common/musttail1.c: Use musttail target.
>   * c-c++-common/musttail12.c: Use struct_musttail target.
>   * c-c++-common/musttail2.c: Use m

Re: [Patch] libgomp: Fix declare target link with offset array-section mapping [PR116107]

2024-08-07 Thread Thomas Schwinge
Hi Tobias!

On 2024-07-26T20:05:43+0200, Tobias Burnus  wrote:
> The main idea of 'link' is to permit putting only a subset of a
> huge array on the device. Well, in order to make this work properly,
> it requires that one can map an array section, which does not
> start with the first element.
>
> This patch adjusts the pointers such, that this actually works.
>
> (Tested on x86-64-gnu-linux with Nvptx offloading.)
> Comments, suggestions, remarks before I commit it?

> libgomp: Fix declare target link with offset array-section mapping [PR116107]
>
> Assume that 'int var[100]' is 'omp declare target link(var)'. When now
> mapping an array section with offset such as 'map(to:var[20:10])',
> the device-side link pointer has to store &[0] minus
> the offset such that var[20] will access [0]. But
> the offset calculation was missed such that the device-side 'var' pointed
> to the first element of the mapped data - and var[20] points beyond at
> some invalid memory.
>
>   PR middle-end/116107
>
> libgomp/ChangeLog:
>
>   * target.c (gomp_map_vars_internal): Honor array mapping offsets
>   with declare-target 'link' variables.
>   * testsuite/libgomp.c-c++-common/target-link-2.c: New test.
>
>  libgomp/target.c   |  7 ++-
>  .../testsuite/libgomp.c-c++-common/target-link-2.c | 59 
> ++
>  2 files changed, 64 insertions(+), 2 deletions(-)

The new test case 'libgomp.c-c++-common/target-link-2.c' generally PASSes
on one-GPU systems, but on a multi-GPU system (tested nvidia5):

$ nvidia-smi -L
GPU 0: Tesla K80 (UUID: [...])
GPU 1: Tesla K80 (UUID: [...])

..., I see:

+PASS: libgomp.c/../libgomp.c-c++-common/target-link-2.c (test for excess 
errors)
+FAIL: libgomp.c/../libgomp.c-c++-common/target-link-2.c execution test

+PASS: libgomp.c++/../libgomp.c-c++-common/target-link-2.c (test for excess 
errors)
+FAIL: libgomp.c++/../libgomp.c-c++-common/target-link-2.c execution test

[...]
#2  0x77b548fc in __GI_abort () at abort.c:79
#3  0x1bd4 in main () at 
[...]/libgomp.c-c++-common/target-link-2.c:38
(gdb) frame 3
#3  0x1bd4 in main () at 
[...]/libgomp.c-c++-common/target-link-2.c:38
38  __builtin_abort ();
(gdb) list
33
34#pragma omp target map(from: res2) device(dev)
35  res2 = arr[5];
36
37if (res2 != 6)
38  __builtin_abort ();
[...]
(gdb) print res2
$1 = 60

I first thought that maybe just:

--- libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
+++ libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
@@ -54,6 +54,8 @@ int main()
   for (int i = 0; i < 10; i++)
if (res[i] != (4 + i)*10)
  __builtin_abort ();
+
+  #pragma omp target exit data map(release:arr[3:10]) device(dev)
 }
   return 0;
 }

... was missing, but that doesn't resolve the issue: same error state.
Could you please have a look what other state needs to be reset, in which
way?


Grüße
 Thomas


> diff --git a/libgomp/target.c b/libgomp/target.c
> index aa01c1367b9..e3e648f5443 100644
> --- a/libgomp/target.c
> +++ b/libgomp/target.c
> @@ -1820,8 +1820,11 @@ gomp_map_vars_internal (struct gomp_device_descr 
> *devicep,
>   if (k->aux && k->aux->link_key)
> {
>   /* Set link pointer on target to the device address of the
> -mapped object.  */
> - void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset);
> +mapped object. Also deal with offsets due to
> +array-section mapping. */
> + void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset
> +- (k->host_start
> +   - 
> k->aux->link_key->host_start));
>   /* We intentionally do not use coalescing here, as it's not
>  data allocated by the current call to this function.  */
>   gomp_copy_host2dev (devicep, aq, (void *) n->tgt_offset,
> diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c 
> b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
> new file mode 100644
> index 000..4ff4080da76
> --- /dev/null
> +++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
> @@ -0,0 +1,59 @@
> +/* PR middle-end/116107  */
> +
> +#include 
> +
> +int arr[15] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
> +#pragma omp declare target link(arr)
> +
> +#pragma omp begin declare target
> +void f(int *res)
> +{
> +  __builtin_memcpy (res, &arr[5], sizeof(int)*10);
> +}
> +
> +void g(int *res)
> +{
> +  __builtin_memcpy (res, &arr[3], sizeof(int)*10);
> +}
> +#pragma omp end declare target
> +
> +int main()
> +{
> +  int res[10], res2;
> +  for (int dev = 0; dev < omp_get_num_devices(); dev++

Inline 'gcc/rust/Make-lang.in:RUST_LIBDEPS' (was: [PATCH 006/125] gccrs: Add 'gcc/rust/Make-lang.in:LIBFORMAT_PARSER')

2024-08-05 Thread Thomas Schwinge
Hi!

On 2024-08-01T16:56:02+0200, Arthur Cohen  wrote:
> --- a/gcc/rust/Make-lang.in
> +++ b/gcc/rust/Make-lang.in
> @@ -212,6 +212,9 @@ RUST_ALL_OBJS = $(GRS_OBJS) $(RUST_TARGET_OBJS)
>  rust_OBJS = $(RUST_ALL_OBJS) rust/rustspec.o
>  
>  LIBPROC_MACRO_INTERNAL = 
> ../libgrust/libproc_macro_internal/libproc_macro_internal.a
> +LIBFORMAT_PARSER = rust/libformat_parser.a
> +
> +RUST_LIBDEPS = $(LIBDEPS) $(LIBPROC_MACRO_INTERNAL) $(LIBFORMAT_PARSER)
>  
>  
>  RUST_LIBDEPS = $(LIBDEPS) $(LIBPROC_MACRO_INTERNAL)

That must've been a mis-merge; my GCC/Rust master branch original of this
commit (as part of <https://github.com/Rust-GCC/gccrs/pull/2947>
"Move 'libformat_parser' build into the GCC build directory, and into libgrust")
didn't include a bogus second definition of 'RUST_LIBDEPS'.  I've pushed
to trunk branch commit aab9f33ed1f1b92444a82eb3ea5cab1048593791
"Inline 'gcc/rust/Make-lang.in:RUST_LIBDEPS'", see attached -- this
commit apparently had been omitted from the 2024-08-01 upstream
submission.


Grüße
 Thomas


>From aab9f33ed1f1b92444a82eb3ea5cab1048593791 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 28 Feb 2024 23:06:25 +0100
Subject: [PATCH] Inline 'gcc/rust/Make-lang.in:RUST_LIBDEPS'

..., also fixing up an apparently mis-merged
commit 2340894554334a310b891a1d9e9d5e3f502357ac
"gccrs: Add 'gcc/rust/Make-lang.in:LIBFORMAT_PARSER'", which was adding a bogus
second definition of 'RUST_LIBDEPS'.

	gcc/rust/
	* Make-lang.in (RUST_LIBDEPS): Inline into all users.
---
 gcc/rust/Make-lang.in | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in
index c3be5f9d81b..aed9a998c80 100644
--- a/gcc/rust/Make-lang.in
+++ b/gcc/rust/Make-lang.in
@@ -226,13 +226,8 @@ rust_OBJS = $(RUST_ALL_OBJS) rust/rustspec.o
 LIBPROC_MACRO_INTERNAL = ../libgrust/libproc_macro_internal/libproc_macro_internal.a
 LIBFORMAT_PARSER = ../libgrust/libformat_parser/debug/liblibformat_parser.a
 
-RUST_LIBDEPS = $(LIBDEPS) $(LIBPROC_MACRO_INTERNAL) $(LIBFORMAT_PARSER)
-
-
-RUST_LIBDEPS = $(LIBDEPS) $(LIBPROC_MACRO_INTERNAL)
-
 # The compiler itself is called crab1
-crab1$(exeext): $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(RUST_LIBDEPS) $(rust.prev)
+crab1$(exeext): $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(LIBDEPS) $(LIBPROC_MACRO_INTERNAL) $(LIBFORMAT_PARSER) $(rust.prev)
 	@$(call LINK_PROGRESS,$(INDEX.rust),start)
 	+$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
 	  $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(LIBS) $(CRAB1_LIBS) $(LIBPROC_MACRO_INTERNAL) $(LIBFORMAT_PARSER) $(BACKENDLIBS)
-- 
2.34.1



Don't override 'LIBS' if '--enable-languages=rust'; use 'CRAB1_LIBS' (was: [PATCH 005/125] gccrs: libgrust: Add format_parser library)

2024-08-05 Thread Thomas Schwinge
Hi!

On 2024-08-01T16:56:01+0200, Arthur Cohen  wrote:
> Compile libformat_parser and link to it.

> --- a/gcc/rust/Make-lang.in
> +++ b/gcc/rust/Make-lang.in

> +LIBS += -ldl -lpthread

That's still not correct.  I've pushed to trunk branch
commit 816c4de4d062c89f5b7a68f68f29b2b033f5b136
"Don't override 'LIBS' if '--enable-languages=rust'; use 'CRAB1_LIBS'",
see attached.


Grüße
 Thomas


>From 816c4de4d062c89f5b7a68f68f29b2b033f5b136 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 5 Aug 2024 10:06:05 +0200
Subject: [PATCH] Don't override 'LIBS' if '--enable-languages=rust'; use
 'CRAB1_LIBS'

Recent commit 6fef4d6ffcab0fec8518adcb05458cba5dbeac25
"gccrs: libgrust: Add format_parser library", added a general override of
'LIBS += -ldl -lpthread' if '--enable-languages=rust'.  This is wrong
conceptually, and will make the build fail on systems not providing such
libraries.  Instead, 'CRAB1_LIBS', added a while ago in
commit 75299e4fe50aa8d9b3ff529e48db4ed246083e64
"rust: Do not link with libdl and libpthread unconditionally", should be used,
and not generally, but for 'crab1' only.

	gcc/rust/
	* Make-lang.in (LIBS): Don't override.
	(crab1$(exeext):): Use 'CRAB1_LIBS'.
---
 gcc/rust/Make-lang.in | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in
index 24229c02770..c3be5f9d81b 100644
--- a/gcc/rust/Make-lang.in
+++ b/gcc/rust/Make-lang.in
@@ -54,8 +54,6 @@ GCCRS_D_OBJS = \
rust/rustspec.o \
$(END)
 
-LIBS += -ldl -lpthread
-
 gccrs$(exeext): $(GCCRS_D_OBJS) $(EXTRA_GCC_OBJS) libcommon-target.a $(LIBDEPS)
 	+$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
 	  $(GCCRS_D_OBJS) $(EXTRA_GCC_OBJS) libcommon-target.a \
@@ -237,7 +235,7 @@ RUST_LIBDEPS = $(LIBDEPS) $(LIBPROC_MACRO_INTERNAL)
 crab1$(exeext): $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(RUST_LIBDEPS) $(rust.prev)
 	@$(call LINK_PROGRESS,$(INDEX.rust),start)
 	+$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
-	  $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(LIBS) $(LIBPROC_MACRO_INTERNAL) $(LIBFORMAT_PARSER) $(BACKENDLIBS)
+	  $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(LIBS) $(CRAB1_LIBS) $(LIBPROC_MACRO_INTERNAL) $(LIBFORMAT_PARSER) $(BACKENDLIBS)
 	@$(call LINK_PROGRESS,$(INDEX.rust),end)
 
 # Build hooks.
-- 
2.34.1



Polish libstdc++ 'dg-final' action 'file-io-diff' (was: [PATCH 4/8] libstdc++: Add file-io-diff to replace @diff@ markup in I/O tests)

2024-07-29 Thread Thomas Schwinge
st for excess 
errors)
@@ -11850,2 +11860,3 @@
 PASS: 27_io/basic_istream/peek/wchar_t/12296.cc  -std=gnu++17 execution 
test
+PASS: 27_io/basic_istream/peek/wchar_t/6414.cc  -std=gnu++17  file-io-diff 
wistream_seeks-1
 PASS: 27_io/basic_istream/peek/wchar_t/6414.cc  -std=gnu++17 (test for 
excess errors)
@@ -11893,2 +11904,4 @@
 PASS: 27_io/basic_istream/seekg/char/exceptions_badbit_throw.cc  
-std=gnu++17 execution test
+PASS: 27_io/basic_istream/seekg/char/fstream.cc  -std=gnu++17  
file-io-diff istream_seeks-1
+PASS: 27_io/basic_istream/seekg/char/fstream.cc  -std=gnu++17  
file-io-diff istream_seeks-2
 PASS: 27_io/basic_istream/seekg/char/fstream.cc  -std=gnu++17 (test for 
excess errors)
@@ -11907,2 +11920,4 @@
 PASS: 27_io/basic_istream/seekg/wchar_t/exceptions_badbit_throw.cc  
-std=gnu++17 execution test
+PASS: 27_io/basic_istream/seekg/wchar_t/fstream.cc  -std=gnu++17  
file-io-diff wistream_seeks-1
+PASS: 27_io/basic_istream/seekg/wchar_t/fstream.cc  -std=gnu++17  
file-io-diff wistream_seeks-2
 PASS: 27_io/basic_istream/seekg/wchar_t/fstream.cc  -std=gnu++17 (test for 
excess errors)
@@ -11941,2 +11956,4 @@
 PASS: 27_io/basic_istream/tellg/char/exceptions_badbit_throw.cc  
-std=gnu++17 execution test
+PASS: 27_io/basic_istream/tellg/char/fstream.cc  -std=gnu++17  
file-io-diff istream_seeks-1
+PASS: 27_io/basic_istream/tellg/char/fstream.cc  -std=gnu++17  
file-io-diff istream_seeks-2
 PASS: 27_io/basic_istream/tellg/char/fstream.cc  -std=gnu++17 (test for 
excess errors)
@@ -11955,2 +11972,4 @@
 PASS: 27_io/basic_istream/tellg/wchar_t/exceptions_badbit_throw.cc  
-std=gnu++17 execution test
+PASS: 27_io/basic_istream/tellg/wchar_t/fstream.cc  -std=gnu++17  
file-io-diff wistream_seeks-1
+PASS: 27_io/basic_istream/tellg/wchar_t/fstream.cc  -std=gnu++17  
file-io-diff wistream_seeks-2
 PASS: 27_io/basic_istream/tellg/wchar_t/fstream.cc  -std=gnu++17 (test for 
excess errors)
@@ -12026,2 +12045,3 @@
 PASS: 27_io/basic_ofstream/native_handle/wchar_t/1.cc  -std=gnu++26 
execution test
+PASS: 27_io/basic_ofstream/open/char/1.cc  -std=gnu++17  file-io-diff 
ofstream_members-1
 PASS: 27_io/basic_ofstream/open/char/1.cc  -std=gnu++17 (test for excess 
errors)
@@ -12221,4 +12241,7 @@
 PASS: 27_io/basic_ostream/inserters_character/wchar_t/deleted.cc  
-std=gnu++26 (test for excess errors)
+PASS: 27_io/basic_ostream/inserters_other/char/1.cc  -std=gnu++17  
file-io-diff ostream_inserter_other-1
+PASS: 27_io/basic_ostream/inserters_other/char/1.cc  -std=gnu++17  
file-io-diff ostream_inserter_other-2
 PASS: 27_io/basic_ostream/inserters_other/char/1.cc  -std=gnu++17 (test 
for excess errors)
 PASS: 27_io/basic_ostream/inserters_other/char/1.cc  -std=gnu++17 
execution test
+PASS: 27_io/basic_ostream/inserters_other/char/2.cc  -std=gnu++17  
file-io-diff ostream_inserter_other_in ostream_inserter_other_out
 PASS: 27_io/basic_ostream/inserters_other/char/2.cc  -std=gnu++17 (test 
for excess errors)
@@ -12255,4 +12278,7 @@
 PASS: 27_io/basic_ostream/inserters_other/char/volatile_ptr.cc  
-std=gnu++26 execution test
+PASS: 27_io/basic_ostream/inserters_other/wchar_t/1.cc  -std=gnu++17  
file-io-diff wostream_inserter_other-1
+PASS: 27_io/basic_ostream/inserters_other/wchar_t/1.cc  -std=gnu++17  
file-io-diff wostream_inserter_other-2
 PASS: 27_io/basic_ostream/inserters_other/wchar_t/1.cc  -std=gnu++17 (test 
for excess errors)
 PASS: 27_io/basic_ostream/inserters_other/wchar_t/1.cc  -std=gnu++17 
execution test
+PASS: 27_io/basic_ostream/inserters_other/wchar_t/2.cc  -std=gnu++17  
file-io-diff wostream_inserter_other_in.txt wostream_inserter_other_out.txt
 PASS: 27_io/basic_ostream/inserters_other/wchar_t/2.cc  -std=gnu++17 (test 
for excess errors)
@@ -13103,2 +13129,3 @@
 PASS: 27_io/ios_base/storage/68197.cc  -std=gnu++17 execution test
+PASS: 27_io/ios_base/sync_with_stdio/1.cc  -std=gnu++17  file-io-diff 
ios_base_members_static-1
 PASS: 27_io/ios_base/sync_with_stdio/1.cc  -std=gnu++17 (test for excess 
errors)
@@ -16601,3 +16628,2 @@
 PASS: ext/vstring/types/23767.cc  -std=gnu++17 (test for excess errors)
-FAIL: files differ
 PASS: special_functions/01_assoc_laguerre/check_nan.cc  -std=gnu++17 (test 
for excess errors)
@@ -19501,3 +19527,3 @@
 
-# of expected passes   18615
+# of expected passes   18641
 # of unexpected failures   1

Note several instances of 'PASS: [...] file-io-diff [...]' appear, and
the unspecific 'FAIL: files differ' (near the end) turn into specific:

FAIL: 27_io/basic_istream/extractors_other/char/2.cc  -std=gnu++17  
file-io-diff istream_extractor_other-2

(Again, that FAIL's injected for demonstration purposes only.)
The '*.log' and '*.sum' files also look as expected.


Grüße
 Thoma

Re: [PATCHv2 2/2] libiberty/buildargv: handle input consisting of only white space

2024-07-29 Thread Thomas Schwinge
Hi!

On 2024-02-10T17:26:01+, Andrew Burgess  wrote:
> --- a/libiberty/argv.c
> +++ b/libiberty/argv.c

> @@ -439,17 +442,8 @@ expandargv (int *argcp, char ***argvp)
>   }
>/* Add a NUL terminator.  */
>buffer[len] = '\0';
> -  /* If the file is empty or contains only whitespace, buildargv would
> -  return a single empty argument.  In this context we want no arguments,
> -  instead.  */
> -  if (only_whitespace (buffer))
> - {
> -   file_argv = (char **) xmalloc (sizeof (char *));
> -   file_argv[0] = NULL;
> - }
> -  else
> - /* Parse the string.  */
> - file_argv = buildargv (buffer);
> +  /* Parse the string.  */
> +  file_argv = buildargv (buffer);
>/* If *ARGVP is not already dynamically allocated, copy it.  */
>if (*argvp == original_argv)
>   *argvp = dupargv (*argvp);

With that (single) use of 'only_whitespace' now gone:

[...]/source-gcc/libiberty/argv.c:128:1: warning: ‘only_whitespace’ defined 
but not used [-Wunused-function]
  128 | only_whitespace (const char* input)
  | ^~~


Grüße
 Thomas


Re: [PATCH v1 1/2] PR116080: Fix tail call dejagnu checks

2024-07-29 Thread Thomas Schwinge
Hi Andi!

I'm lacking all possible context here, but I noticed:

On 2024-07-25T15:55:01-0700, Andi Kleen  wrote:
> - Run the target_effective tail_call checks without optimization to
> match the actual test cases.

> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -12741,7 +12741,15 @@ proc check_effective_target_tail_call { } {
>  return [check_no_messages_and_pattern tail_call ",SIBCALL" rtl-expand {
>   __attribute__((__noipa__)) void foo (void) { }
>   __attribute__((__noipa__)) void bar (void) { foo(); }
> -} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
> +} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
> +}

> +proc check_effective_target_external_tail_call { } {
> +[...]
> +} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
>  }

> @@ -12751,9 +12759,9 @@ proc check_effective_target_struct_tail_call { } {
> [...]
> -} {-O2 -fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
> +} {-fdump-rtl-expand-all}] ;# The "SIBCALL" note requires a detailed 
> dump.
>  }

..., that means that a number of the new test cases are UNSUPPORTED, for
example, x86_64 GNU/Linux:

+UNSUPPORTED: c-c++-common/musttail1.c  -Wc++-compat 
+UNSUPPORTED: c-c++-common/musttail12.c  -Wc++-compat 
+PASS: c-c++-common/musttail13.c  -Wc++-compat   (test for errors, line 4)
+PASS: c-c++-common/musttail13.c  -Wc++-compat  (test for excess errors)
+UNSUPPORTED: c-c++-common/musttail2.c  -Wc++-compat 
+UNSUPPORTED: c-c++-common/musttail3.c  -Wc++-compat 
+UNSUPPORTED: c-c++-common/musttail4.c  -Wc++-compat 
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for errors, line 17)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 10)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 11)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 12)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 24)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 25)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 26)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 5)
+PASS: c-c++-common/musttail5.c  -Wc++-compat   (test for warnings, line 6)
+PASS: c-c++-common/musttail5.c  -Wc++-compat  (test for excess errors)
+UNSUPPORTED: c-c++-common/musttail7.c  -Wc++-compat 
+UNSUPPORTED: c-c++-common/musttail8.c  -Wc++-compat 

(Similarly for their C++ testing.)

+UNSUPPORTED: g++.dg/musttail10.C  
+UNSUPPORTED: g++.dg/musttail11.C  
+UNSUPPORTED: g++.dg/musttail6.C  
+UNSUPPORTED: g++.dg/musttail9.C  

..., and even a few existing test cases "regress" from PASS to
UNSUPPORTED:

[-PASS:-]{+UNSUPPORTED:+} gcc.dg/plugin/must-tail-call-1.c 
-fplugin=./must_tail_call_plugin.so[-(test for excess errors)-]
[-PASS:-]{+UNSUPPORTED:+} gcc.dg/plugin/must-tail-call-2.c 
-fplugin=./must_tail_call_plugin.so[-(test for errors, line 18)-]
[-PASS: gcc.dg/plugin/must-tail-call-2.c 
-fplugin=./must_tail_call_plugin.so  (test for errors, line 33)-]
[-PASS: gcc.dg/plugin/must-tail-call-2.c 
-fplugin=./must_tail_call_plugin.so  (test for errors, line 40)-]
[-PASS: gcc.dg/plugin/must-tail-call-2.c 
-fplugin=./must_tail_call_plugin.so  (test for errors, line 49)-]
[-PASS: gcc.dg/plugin/must-tail-call-2.c 
-fplugin=./must_tail_call_plugin.so  (test for errors, line 58)-]
[-PASS: gcc.dg/plugin/must-tail-call-2.c 
-fplugin=./must_tail_call_plugin.so (test for excess errors)-]

Similarly for ppc64le GNU/Linux.

Is that intentional?


Grüße
 Thomas


Re: [C++ coroutines 6/6] Testsuite.

2024-07-29 Thread Thomas Schwinge
Hi Iain!

On 2019-11-17T10:28:26+, Iain Sandoe  wrote:
> There are two categories of test:
>
> 1. Checks for correctly formed source code and the error reporting.
> 2. Checks for transformation and code-gen.
>
> The second set are run as 'torture' tests for the standard options
> set, including LTO.  These are also intentionally run with no options
> provided (from the coroutines.exp script).

I recently was confused why I'm seeing the same test case first without
and then again with torture testing options; non-standard in the GCC test
suite, per my experience at least?  Should we therefore add a short
rationale comment to the 'find' in 'g++.dg/coroutines/coroutines.exp',
why 'g++.dg/coroutines/torture/' test cases are not being filtered out
there, despite more specific 'g++.dg/coroutines/torture/coro-torture.exp'
testing these, too?


Grüße
 Thomas


> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/coroutines/coroutines.exp
> @@ -0,0 +1,50 @@

> +foreach test [lsort [find $srcdir/$subdir {*.[CH]}]] {
> +if [runtest_file_p $runtests $test] {
> +set nshort [file tail [file dirname $test]]/[file tail $test]
> +verbose "Testing $nshort $DEFAULT_COROFLAGS" 1
> +dg-test $test "" $DEFAULT_COROFLAGS
> +set testcase [string range $test [string length "$srcdir/"] end]
> +}

> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/coroutines/torture/coro-torture.exp

> +gcc-dg-runtest [lsort [glob $srcdir/$subdir/*.C]] "" $DEFAULT_COROFLAGS


[PATCH] nvptx: Specify '-mno-alias' for 'gcc.dg/pr60797.c' [PR60797, PR104957] (was: [PATCH] Fix PR60797)

2024-07-22 Thread Thomas Schwinge
Hi!

On 2014-04-11T12:37:42+0200, Richard Biener  wrote:
> This fixes the endless error reporting for unhandled aliases [...]

> *** gcc/testsuite/gcc.dg/pr60797.c(revision 0)
> --- gcc/testsuite/gcc.dg/pr60797.c(working copy)
> ***
> *** 0 
> --- 1,8 
> + /* { dg-do compile } */
> + /* { dg-skip-if "" { alias } } */
> + 
> + extern int foo __attribute__((alias("bar"))); /* { dg-error "supported" } */
> + int main()
> + {
> +   return 0;
> + }

If there's support for symbol aliases, have to 'dg-skip-if' -- unless
there's a way to disable this support, which there is for GCC/nvptx:
'-mno-alias'.  OK to push the attached
"nvptx: Specify '-mno-alias' for 'gcc.dg/pr60797.c' [PR60797, PR104957]"?


Grüße
 Thomas


>From 72365494b47ad43a78d190ab87eae79fe57eb006 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Sun, 21 Jul 2024 22:23:40 +0200
Subject: [PATCH] nvptx: Specify '-mno-alias' for 'gcc.dg/pr60797.c' [PR60797,
 PR104957]

2014 Subversion r209299 (Git commit 8330537b5b58bd0532a0a49f9cbd59bf526a7847)
"Fix PR60797" added this test case, which we now amend so that it's able to
test its thing also in '--target=nvptx-none' configurations with symbol alias
support enabled (..., and test nvptx '-mno-alias').

	PR middle-end/60797
	PR target/104957
	gcc/testsuite/
	* gcc.dg/pr60797.c: For nvptx, specify '-mno-alias'.
---
 gcc/testsuite/gcc.dg/pr60797.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/pr60797.c b/gcc/testsuite/gcc.dg/pr60797.c
index 45090bae502..0485b2de172 100644
--- a/gcc/testsuite/gcc.dg/pr60797.c
+++ b/gcc/testsuite/gcc.dg/pr60797.c
@@ -1,5 +1,7 @@
-/* { dg-do compile } */
-/* { dg-skip-if "" { alias } } */
+/* If there's support for symbol aliases, have to 'dg-skip-if' -- unless
+   there's a way to disable this support.
+   { dg-additional-options -mno-alias { target nvptx-*-* } }
+   { dg-skip-if "" { { ! nvptx-*-* } && alias } } */
 
 extern int foo __attribute__((alias("bar"))); /* { dg-error "supported" } */
 int main()
-- 
2.34.1



[OG14] Revert "[og10] vect: Add target hook to prefer gather/scatter instructions" (was: [PATCH] [og10] vect: Add target hook to prefer gather/scatter instructions)

2024-07-19 Thread Thomas Schwinge
Hi!

On 2021-01-13T15:48:42-0800, Julian Brown  wrote:
> For AMD GCN, the instructions available for loading/storing vectors are
> always scatter/gather operations (i.e. there are separate addresses for
> each vector lane), so the current heuristic to avoid gather/scatter
> operations with too many elements in get_group_load_store_type is
> counterproductive. Avoiding such operations in that function can
> subsequently lead to a missed vectorization opportunity whereby later
> analyses in the vectorizer try to use a very wide array type which is
> not available on this target, and thus it bails out.
>
> The attached patch adds a target hook to override the "single_element_p"
> heuristic in the function as a target hook, and activates it for GCN. This
> allows much better code to be generated for affected loops.
>
> Tested with offloading to AMD GCN. I will apply to the og10 branch
> shortly.

Testing current OG14 commit 735bbbfc6eaf58522c3ebb0946b66f33958ea134 for
'--target=amdgcn-amdhsa' (I've tested '-march=gfx908', '-march=gfx1100'),
this change has been identified to be causing ~100 instances of execution
test PASS -> FAIL, thus wrong-code generation.  It's possible that we've
had the same misbehavior also on OG13 and earlier, but just nobody ever
tested that.  And/or, that at some point in time, the original patch fell
out of sync, wasn't updated for relevant upstream vectorizer changes.
Until someone gets to analyze that (and upstream these changes here), we
shall revert this commit on OG14.  Pushed to devel/omp/gcc-14 branch
commit 8678fc697046fba1014f1db6321ee670538b0881
'Revert "[og10] vect: Add target hook to prefer gather/scatter instructions"',
see attached.


List of GCC 14.1 vs OG14 regressions (... avoided by this revert commit):

'-march=gfx1100' only:

PASS: g++.dg/vect/pr97255.cc  -std=c++14 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/vect/pr97255.cc  -std=c++14 execution test
PASS: g++.dg/vect/pr97255.cc  -std=c++17 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/vect/pr97255.cc  -std=c++17 execution test
PASS: g++.dg/vect/pr97255.cc  -std=c++20 (test for excess errors)
[-PASS:-]{+FAIL:+} g++.dg/vect/pr97255.cc  -std=c++20 execution test
UNSUPPORTED: g++.dg/vect/pr97255.cc  -std=c++98

GCN Kernel Aborted

@@ -101950,11 +101950,11 @@ PASS: gcc.dg/torture/pr52028.c   -O0  execution 
test
PASS: gcc.dg/torture/pr52028.c   -O1  (test for excess errors)
PASS: gcc.dg/torture/pr52028.c   -O1  execution test
PASS: gcc.dg/torture/pr52028.c   -O2  (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/torture/pr52028.c   -O2  execution test
PASS: gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/torture/pr52028.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
PASS: gcc.dg/torture/pr52028.c   -O3 -g  (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/torture/pr52028.c   -O3 -g  execution test
PASS: gcc.dg/torture/pr52028.c   -Os  (test for excess errors)
PASS: gcc.dg/torture/pr52028.c   -Os  execution test

GCN Kernel Aborted

@@ -102160,11 +102160,11 @@ PASS: gcc.dg/torture/pr53366-1.c   -O0  
execution test
PASS: gcc.dg/torture/pr53366-1.c   -O1  (test for excess errors)
PASS: gcc.dg/torture/pr53366-1.c   -O1  execution test
PASS: gcc.dg/torture/pr53366-1.c   -O2  (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/torture/pr53366-1.c   -O2  execution test
PASS: gcc.dg/torture/pr53366-1.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/torture/pr53366-1.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
PASS: gcc.dg/torture/pr53366-1.c   -O3 -g  (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/torture/pr53366-1.c   -O3 -g  execution test
PASS: gcc.dg/torture/pr53366-1.c   -Os  (test for excess errors)
PASS: gcc.dg/torture/pr53366-1.c   -Os  execution test

GCN Kernel Aborted

PASS: gcc.dg/torture/pr93868.c   -O0  (test for excess errors)
PASS: gcc.dg/torture/pr93868.c   -O0  execution test
PASS: gcc.dg/torture/pr93868.c   -O1  (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/torture/pr93868.c   -O1  execution test
PASS: gcc.dg/torture/pr93868.c   -O2  (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/torture/pr93868.c   -O2  execution test
PASS: gcc.dg/torture/pr93868.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/torture/pr93868.c   -O3 -fomit-frame-pointer 
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
PASS: gcc.dg/torture/pr93868.c   -O3 -g  (test for excess errors)
[-PASS:-]{+FAIL:+} gcc.dg/torture/pr93868.

Re: [r15-2135 Regression] FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable -Os at line 32 (test for warnings, line 31) on Linux/x86_64

2024-07-19 Thread Thomas Schwinge
Hi!

First, note this is now GCC PR115989
"[15 regression] libgomp.oacc-fortran/privatized-ref-2.f90 fails after 
r15-2135-gc3aa339ea50f05".

Otherwise:

On 2024-07-19T06:54:46+0100, Paul Richard Thomas 
 wrote:
> Thanks for doing that test. Here is what the error looks like on 14-branch:
> libgomp.oacc-fortran/privatized-ref-2.f90:36:22:
>36 |   A = [(3*j, j=1, 10)]
>   |  ^
> Warning: ‘a.offset’ is used uninitialized [-Wuninitialized]
> libgomp.oacc-fortran/privatized-ref-2.f90:31:30:
>31 |   integer, allocatable :: A(:)
>   |  ^
> note: ‘a’ declared here
> libgomp.oacc-fortran/privatized-ref-2.f90:36:22:
> repeats for the descriptor bounds.
>
> The scalarizer, which sets up the loops for the assignment of 'A' assigns
> the bounds and offset to variables. These are then manipulated further and
> used for the loop bounds and allocation. The patch does a once off setting
> of the bounds, to eliminate the bogus warnings. The allocate statement
> already does this.

Maybe you're already aware, but if not, please have a look how PR108889
(Paul's commit r15-2135-gc3aa339ea50f050caf7ed2e497f5499ec2d7b9cc
"Fortran: Suppress bogus used uninitialized warnings [PR108889]") relates
to "PR77504 etc." as mentioned in
'libgomp.oacc-fortran/privatized-ref-2.f90'?

> I will patch appropriately just as soon as I am able.

Next, the proposed patch:

> On Fri, 19 Jul 2024 at 02:59, Jiang, Haochen 
> wrote:
>> Just did a quick test. Correct myself previously. Those lines also
>> needs to be removed since they are XPASS now.
>>
>> However the real issue is the dg-note at Line 32, that is the warning
>> disappeared.
>>
>> diff --git a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90 
>> b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>> index 498ef70b63a..8cf79a10e8d 100644
>> --- a/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/privatized-ref-2.f90
>> @@ -29,16 +29,10 @@ program main
>>implicit none (type, external)
>>integer :: j
>>integer, allocatable :: A(:)
>> -  ! { dg-note {'a' declared here} {} { target *-*-* } .-1 }
>>character(len=:), allocatable :: my_str
>>character(len=15), allocatable :: my_str15
>>
>>A = [(3*j, j=1, 10)]
>> -  ! { dg-bogus {'a\.offset' is used uninitialized} {PR77504 etc.} { xfail 
>> *-*-* } .-1 }
>> -  ! { dg-bogus {'a\.dim\[0\]\.lbound' is used uninitialized} {PR77504 etc.} 
>> { xfail *-*-* } .-2 }
>> -  ! { dg-bogus {'a\.dim\[0\]\.ubound' is used uninitialized} {PR77504 etc.} 
>> { xfail *-*-* } .-3 }
>> -  ! { dg-bogus {'a\.dim\[0\]\.lbound' may be used uninitialized} {PR77504 
>> etc.} { xfail { ! __OPTIMIZE__ } } .-4 }
>> -  ! { dg-bogus {'a\.dim\[0\]\.ubound' may be used uninitialized} {PR77504 
>> etc.} { xfail { ! __OPTIMIZE__ } } .-5 }
>>call foo (A, size(A))
>>call bar (A)
>>my_str = "1234567890"
>>
>> After the change, all the tests are passed. However, is that right?

... looks exactly right to me.  Please push.


Grüße
 Thomas


>> I am not familiar with either Fortran or libgomp, but the warning
>> like something declared here which might report variable declaration
>> conflict seems needed.
>>
>> Thx,
>> Haochen
>>
>> *From:* Jiang, Haochen
>> *Sent:* Friday, July 19, 2024 9:49 AM
>> *To:* Paul Richard Thomas 
>> *Cc:* pa...@gcc.gnu.org; gcc-regress...@gcc.gnu.org;
>> gcc-patches@gcc.gnu.org
>> *Subject:* RE: [r15-2135 Regression] FAIL:
>> libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1
>> -DACC_MEM_SHARED=1 -foffload=disable -Os at line 32 (test for warnings,
>> line 31) on Linux/x86_64
>>
>>
>>
>> Hi Paul,
>>
>>
>>
>> I suspect it is not the correct way to do that, those lines are ok since
>> they are XFAIL. The problem is that specific warning test.
>>
>>
>>
>> Thx,
>>
>> Haochen
>>
>>
>>
>> *From:* Paul Richard Thomas 
>> *Sent:* Friday, July 19, 2024 12:28 AM
>> *To:* haochen.jiang 
>> *Cc:* pa...@gcc.gnu.org; gcc-regress...@gcc.gnu.org;
>> gcc-patches@gcc.gnu.org; Jiang, Haochen 
>> *Subject:* Re: [r15-2135 Regression] FAIL:
>> libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1
>> -DACC_MEM_SHARED=1 -foffload=disable -Os at line 32 (test for warnings,
>> line 31) on Linux/x86_64
>>
>>
>>
>> Hi Haochen,
>>
>>
>>
>> Try removing lines 37-41 since these are precisely the bogus warnings that
>> the patch is meant to eliminate.
>>
>>
>>
>> Regards
>>
>>
>>
>> Paul
>>
>>
>>
>> On Thu, 18 Jul 2024 at 14:38, haochen.jiang 
>> wrote:
>>
>> On Linux/x86_64,
>>
>> c3aa339ea50f050caf7ed2e497f5499ec2d7b9cc is the first bad commit
>> commit c3aa339ea50f050caf7ed2e497f5499ec2d7b9cc
>> Author: Paul Thomas 
>> Date:   Thu Jul 18 08:51:35 2024 +0100
>>
>> Fortran: Suppress bogus used uninitialized warnings [PR108889].
>>
>> caused
>>
>> FAIL: libgomp.oacc-fortran/privatized-ref-2.f90 -DACC_DEVICE_TYPE_host=1
>> -DACC_MEM_SHARED=1 -foffload=disable  -O0   at line 32

[PATCH] Make 'target-supports.exp' additions for nvptx target generally available

2024-07-18 Thread Thomas Schwinge
Hi!

OK to push (once testing completes) the attached
"Make 'target-supports.exp' additions for nvptx target generally available"?

The idea of this new scheme is that explicit feature/target-specific
stuff isn't kept in 'gcc/testsuite/lib/target-supports.exp', but instead
in feature/target-specific 'gcc/testsuite/lib/target-supports-*.exp'
files.  (..., and hoping that other maintainers also pick up this new
scheme, and likewise move any feature/target-specific stuff from
'gcc/testsuite/lib/target-supports.exp', for example, into new
'gcc/testsuite/lib/target-supports-*.exp' files, to un-bloat the former
one.)


Grüße
 Thomas


>From b029aac1801ae1950e19bafef966eae28ce5b29f Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 17 Jul 2024 23:11:31 +0200
Subject: [PATCH] Make 'target-supports.exp' additions for nvptx target
 generally available

..., instead of just 'gcc.target/nvptx/nvptx.exp'.

	gcc/testsuite/lib/
	* gcc.target/nvptx/nvptx.exp: Move 'target-supports.exp' additions
	for nvptx target...
	* lib/target-supports-nvptx.exp: ... into this new file.
	* lib/target-supports.exp: Load it.
---
 gcc/testsuite/gcc.target/nvptx/nvptx.exp  | 66 ---
 .../target-supports-nvptx.exp}| 27 ++--
 gcc/testsuite/lib/target-supports.exp | 11 
 3 files changed, 15 insertions(+), 89 deletions(-)
 copy gcc/testsuite/{gcc.target/nvptx/nvptx.exp => lib/target-supports-nvptx.exp} (81%)

diff --git a/gcc/testsuite/gcc.target/nvptx/nvptx.exp b/gcc/testsuite/gcc.target/nvptx/nvptx.exp
index 3151381f51a..d526b5822f9 100644
--- a/gcc/testsuite/gcc.target/nvptx/nvptx.exp
+++ b/gcc/testsuite/gcc.target/nvptx/nvptx.exp
@@ -25,72 +25,6 @@ if ![istarget nvptx*-*-*] then {
 # Load support procs.
 load_lib gcc-dg.exp
 
-# Return 1 if code by default compiles for at least PTX ISA version
-# major.minor.
-proc check_effective_target_default_ptx_isa_version_at_least { major minor } {
-set name default_ptx_isa_version_at_least_${major}_${minor}
-
-set supported_p \
-	[concat \
-	 "((__PTX_ISA_VERSION_MAJOR__ == $major" \
-	 "  && __PTX_ISA_VERSION_MINOR__ >= $minor)" \
-	 " || (__PTX_ISA_VERSION_MAJOR__ > $major))"]
-
-set src \
-	[list \
-	 "#if $supported_p" \
-	 "#else" \
-	 "#error unsupported" \
-	 "#endif"]
-set src [join $src "\n"]
-
-set res [check_no_compiler_messages $name assembly $src ""]
-
-return $res
-}
-
-# Return 1 if code by default compiles for at least PTX ISA version 6.0.
-proc check_effective_target_default_ptx_isa_version_at_least_6_0 { } {
-return [check_effective_target_default_ptx_isa_version_at_least 6 0]
-}
-
-# Return 1 if code with PTX ISA version major.minor or higher can be run.
-proc check_effective_target_runtime_ptx_isa_version_at_least { major minor } {
-set name runtime_ptx_isa_version_${major}_${minor}
-
-set default \
-	[check_effective_target_default_ptx_isa_version_at_least \
-	 ${major} ${minor}]
-
-if { $default } {
-	set flag ""
-} else {
-	set flag "-mptx=$major.$minor"
-}
-
-set res [check_runtime $name {
-	int main (void) { return 0; }
-} $flag]
-
-return $res
-}
-
-# Return 1 if runtime environment support the PTX ISA directive .alias.
-proc check_effective_target_runtime_ptx_alias { } {
-return [check_effective_target_runtime_ptx_isa_version_at_least 6 3]
-}
-
-# Add options to enable using PTX ISA directive .alias.
-proc add_options_for_ptx_alias { flags } {
-append flags " -malias"
-
-if { ![check_effective_target_default_ptx_isa_version_at_least 6 3] } {
-	append flags " -mptx=6.3"
-}
-
-return $flags
-}
-
 # If a testcase doesn't have special options, use these.
 global DEFAULT_CFLAGS
 if ![info exists DEFAULT_CFLAGS] then {
diff --git a/gcc/testsuite/gcc.target/nvptx/nvptx.exp b/gcc/testsuite/lib/target-supports-nvptx.exp
similarity index 81%
copy from gcc/testsuite/gcc.target/nvptx/nvptx.exp
copy to gcc/testsuite/lib/target-supports-nvptx.exp
index 3151381f51a..5d014f518e0 100644
--- a/gcc/testsuite/gcc.target/nvptx/nvptx.exp
+++ b/gcc/testsuite/lib/target-supports-nvptx.exp
@@ -1,5 +1,6 @@
-# Specific regression driver for nvptx.
-# Copyright (C) 2015-2024 Free Software Foundation, Inc.
+# 'target-supports.exp' additions for nvptx target.
+
+# Copyright (C) 2022-2024 Free Software Foundation, Inc.
 
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License as published by
@@ -15,15 +16,11 @@
 # along with GCC; see the file COPYING3.  If not see
 # <http://www.gnu.org/licenses/>.
 
-# GCC testsuite that uses the `dg.exp' driver.
-
-# Exit immediately if this 

libgomp: Document 'GOMP_teams4' (was: GCN: Honor OpenMP 5.1 'num_teams' lower bound)

2024-07-16 Thread Thomas Schwinge
Hi!

On 2024-07-15T17:01:46+0100, Andrew Stubbs  wrote:
> On 15/07/2024 16:36, Thomas Schwinge wrote:
>> On 2024-07-15T12:16:30+0100, Andrew Stubbs  wrote:
>>> On 15/07/2024 10:29, Thomas Schwinge wrote:
>>>> On 2021-11-12T18:58:04+0100, Jakub Jelinek via Gcc-patches 
>>>>  wrote:
>>>>> And finally here is a third version, [...]
>>>>
>>>> ... which became commit 9fa72756d90e0d9edadf6e6f5f56476029925788
>>>> "libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound".
>>>>
>>>> Attached here is "GCN: Honor OpenMP 5.1 'num_teams' lower bound", which
>>>> are exactly the corresponding changes for GCN (see below Jakub's nvptx
>>>> changes for reference); OK to push?
>> 
>>> That's a lot of convoluted logic to drop in without a single comment!
>> 
>> Well, I'll pass that compliment over to Jakub ;-) -- my code changes just
>> intend to be a faithful "'s%nvptx%GCN'" of his code changes from back
>> then.
>> 
>>> The GCN bits look fine, and I assume you've probably thought about the
>>> logic here a lot, but I've no idea what you're trying to achieve, or why
>>> you're trying to achieve it (from the patch alone).
>>>
>>> Can we have some comments on motivation and goals, please?
>> 
>> Here's the original context:
>> 
>>- <https://inbox.sourceware.org/2021190313.GV2710@tucnak> "[PATCH] 
>> openmp: Honor OpenMP 5.1 num_teams lower bound"
>>- <https://inbox.sourceware.org/2022132023.GC2710@tucnak> "[PATCH] 
>> libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound"
>> 
>> Is that sufficient, and/or would you like to see some commentary to the
>> relevant libgomp generic/nvptx/GCN code added?
>
> Yes, sorry if it wasn't clear; I meant *code* comments.
>
> /* The team number is usually the same as the gcn_dim_pos(0), except 
> when num_teams(N) is .   */
>
> The FIXME actually tells me something useful about one of the 
> conditional cases, but that's being removed here.
>
> Also, why are we returning "false" in other cases, and what effect does 
> that have? Is that for "spare" teams when we launch more than we need?

How about the attached "libgomp: Document 'GOMP_teams4'"?  Jakub, does
that accurately reflect the relevant facts?


Grüße
 Thomas


>From 149c2dc71bb44a9365ea3c360304f75cb9056084 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 16 Jul 2024 17:09:38 +0200
Subject: [PATCH] libgomp: Document 'GOMP_teams4'

For reference:

  - <https://inbox.sourceware.org/2021190313.GV2710@tucnak> "[PATCH] openmp: Honor OpenMP 5.1 num_teams lower bound"
  - <https://inbox.sourceware.org/2022132023.GC2710@tucnak> "[PATCH] libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound"

	libgomp/
	* config/gcn/target.c (GOMP_teams4): Document.
	* config/nvptx/target.c (GOMP_teams4): Likewise.
	* target.c (GOMP_teams4): Likewise.
---
 libgomp/config/gcn/target.c   | 8 
 libgomp/config/nvptx/target.c | 8 
 libgomp/target.c  | 9 +
 3 files changed, 25 insertions(+)

diff --git a/libgomp/config/gcn/target.c b/libgomp/config/gcn/target.c
index e57d2e5f93f..9cafea4e2cc 100644
--- a/libgomp/config/gcn/target.c
+++ b/libgomp/config/gcn/target.c
@@ -29,6 +29,14 @@
 
 extern volatile struct gomp_offload_icvs GOMP_ADDITIONAL_ICVS;
 
+/* Implement OpenMP 'teams' construct.
+
+   Initialize upon FIRST call.  Return whether this invocation is active.
+   Depending on whether NUM_TEAMS_LOWER asks for more teams than are provided
+   in hardware, we may need to loop multiple times; in that case make sure to
+   update the team-level variable used by 'omp_get_team_num', as we then can't
+   just use '__builtin_gcn_dim_pos (0)'.  */
+
 bool
 GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper,
 	 unsigned int thread_limit, bool first)
diff --git a/libgomp/config/nvptx/target.c b/libgomp/config/nvptx/target.c
index f14dcfc8ff6..8d4dc5f661a 100644
--- a/libgomp/config/nvptx/target.c
+++ b/libgomp/config/nvptx/target.c
@@ -31,6 +31,14 @@ extern int __gomp_team_num __attribute__((shared));
 extern volatile struct gomp_offload_icvs GOMP_ADDITIONAL_ICVS;
 volatile struct rev_offload *GOMP_REV_OFFLOAD_VAR;
 
+/* Implement OpenMP 'teams' construct.
+
+   Initialize upon FIRST call.  Return whether this invocation is active.
+   Depending on whether NUM_TEAMS_LOWER asks for more teams than are provided
+   in hardware, we may need to loop multiple times; in that case make sur

Re: GCN: Honor OpenMP 5.1 'num_teams' lower bound

2024-07-15 Thread Thomas Schwinge
Hi!

On 2024-07-15T12:16:30+0100, Andrew Stubbs  wrote:
> On 15/07/2024 10:29, Thomas Schwinge wrote:
>> On 2021-11-12T18:58:04+0100, Jakub Jelinek via Gcc-patches 
>>  wrote:
>>> And finally here is a third version, [...]
>> 
>> ... which became commit 9fa72756d90e0d9edadf6e6f5f56476029925788
>> "libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound".
>> 
>> Attached here is "GCN: Honor OpenMP 5.1 'num_teams' lower bound", which
>> are exactly the corresponding changes for GCN (see below Jakub's nvptx
>> changes for reference); OK to push?

> That's a lot of convoluted logic to drop in without a single comment!

Well, I'll pass that compliment over to Jakub ;-) -- my code changes just
intend to be a faithful "'s%nvptx%GCN'" of his code changes from back
then.

> The GCN bits look fine, and I assume you've probably thought about the 
> logic here a lot, but I've no idea what you're trying to achieve, or why 
> you're trying to achieve it (from the patch alone).
>
> Can we have some comments on motivation and goals, please?

Here's the original context:

  - <https://inbox.sourceware.org/2021190313.GV2710@tucnak> "[PATCH] 
openmp: Honor OpenMP 5.1 num_teams lower bound"
  - <https://inbox.sourceware.org/2022132023.GC2710@tucnak> "[PATCH] 
libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound"

Is that sufficient, and/or would you like to see some commentary to the
relevant libgomp generic/nvptx/GCN code added?


Grüße
 Thomas


GCN: Honor OpenMP 5.1 'num_teams' lower bound (was: [PATCH] libgomp, nvptx, v3: Honor OpenMP 5.1 num_teams lower bound)

2024-07-15 Thread Thomas Schwinge
Hi!

On 2021-11-12T18:58:04+0100, Jakub Jelinek via Gcc-patches 
 wrote:
> And finally here is a third version, [...]

... which became commit 9fa72756d90e0d9edadf6e6f5f56476029925788
"libgomp, nvptx: Honor OpenMP 5.1 num_teams lower bound".

Attached here is "GCN: Honor OpenMP 5.1 'num_teams' lower bound", which
are exactly the corresponding changes for GCN (see below Jakub's nvptx
changes for reference); OK to push?


Grüße
 Thomas


> 2021-11-12  Jakub Jelinek  
>
>   * config/nvptx/team.c (__gomp_team_num): Define as
>   __attribute__((shared)) var.
>   (gomp_nvptx_main): Initialize __gomp_team_num to 0.
>   * config/nvptx/target.c (__gomp_team_num): Declare as
>   extern __attribute__((shared)) var.
>   (GOMP_teams4): Use __gomp_team_num as the team number instead of
>   %ctaid.x.  If first, initialize it to %ctaid.x.  If num_teams_lower
>   is bigger than num_blocks, use num_teams_lower teams and arrange for
>   bumping of __gomp_team_num if !first and returning false once we run
>   out of teams.
>   * config/nvptx/teams.c (__gomp_team_num): Declare as
>   extern __attribute__((shared)) var.
>   (omp_get_team_num): Return __gomp_team_num value instead of %ctaid.x.
>
> --- libgomp/config/nvptx/team.c.jj2021-05-25 13:43:02.793121350 +0200
> +++ libgomp/config/nvptx/team.c   2021-11-12 17:49:02.847341650 +0100
> @@ -32,6 +32,7 @@
>  #include 
>  
>  struct gomp_thread *nvptx_thrs __attribute__((shared,nocommon));
> +int __gomp_team_num __attribute__((shared));
>  
>  static void gomp_thread_start (struct gomp_thread_pool *);
>  
> @@ -57,6 +58,7 @@ gomp_nvptx_main (void (*fn) (void *), vo
>/* Starting additional threads is not supported.  */
>gomp_global_icv.dyn_var = true;
>  
> +  __gomp_team_num = 0;
>nvptx_thrs = alloca (ntids * sizeof (*nvptx_thrs));
>memset (nvptx_thrs, 0, ntids * sizeof (*nvptx_thrs));
>  
> --- libgomp/config/nvptx/target.c.jj  2021-11-12 15:57:29.400632875 +0100
> +++ libgomp/config/nvptx/target.c 2021-11-12 17:47:39.499533296 +0100
> @@ -26,28 +26,41 @@
>  #include "libgomp.h"
>  #include 
>  
> +extern int __gomp_team_num __attribute__((shared));
> +
>  bool
>  GOMP_teams4 (unsigned int num_teams_lower, unsigned int num_teams_upper,
>unsigned int thread_limit, bool first)
>  {
> +  unsigned int num_blocks, block_id;
> +  asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks));
>if (!first)
> -return false;
> +{
> +  unsigned int team_num;
> +  if (num_blocks > gomp_num_teams_var)
> + return false;
> +  team_num = __gomp_team_num;
> +  if (team_num > gomp_num_teams_var - num_blocks)
> + return false;
> +  __gomp_team_num = team_num + num_blocks;
> +  return true;
> +}
>if (thread_limit)
>  {
>struct gomp_task_icv *icv = gomp_icv (true);
>icv->thread_limit_var
>   = thread_limit > INT_MAX ? UINT_MAX : thread_limit;
>  }
> -  unsigned int num_blocks, block_id;
> -  asm ("mov.u32 %0, %%nctaid.x;" : "=r" (num_blocks));
> -  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id));
> -  /* FIXME: If num_teams_lower > num_blocks, we want to loop multiple
> - times for some CTAs.  */
> -  (void) num_teams_lower;
> -  if (!num_teams_upper || num_teams_upper >= num_blocks)
> +  if (!num_teams_upper)
>  num_teams_upper = num_blocks;
> -  else if (block_id >= num_teams_upper)
> +  else if (num_blocks < num_teams_lower)
> +num_teams_upper = num_teams_lower;
> +  else if (num_blocks < num_teams_upper)
> +num_teams_upper = num_blocks;
> +  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (block_id));
> +  if (block_id >= num_teams_upper)
>  return false;
> +  __gomp_team_num = block_id;
>gomp_num_teams_var = num_teams_upper - 1;
>return true;
>  }
> --- libgomp/config/nvptx/teams.c.jj   2021-05-25 13:43:02.793121350 +0200
> +++ libgomp/config/nvptx/teams.c  2021-11-12 17:37:18.933361024 +0100
> @@ -28,6 +28,8 @@
>  
>  #include "libgomp.h"
>  
> +extern int __gomp_team_num __attribute__((shared));
> +
>  void
>  GOMP_teams_reg (void (*fn) (void *), void *data, unsigned int num_teams,
>   unsigned int thread_limit, unsigned int flags)
> @@ -48,9 +50,7 @@ omp_get_num_teams (void)
>  int
>  omp_get_team_num (void)
>  {
> -  int ctaid;
> -  asm ("mov.u32 %0, %%ctaid.x;" : "=r" (ctaid));
> -  return ctaid;
> +  return __gomp_team_num;
>  }
>  
>  ialias (omp_get_num_teams)
>
&

[OG14] Fortran/OpenMP: Support mapping of DT with allocatable components: disable 'generate_callback_wrapper' for nvptx target (was: [Patch][Stage 1] Fortran/OpenMP: Support mapping of DT with allocat

2024-07-03 Thread Thomas Schwinge
Hi Tobias!

I've compared test results for nvptx target for GCC 14 vs. the new OG14,
and ran into a number of unexpected regressions: thousands of compilation
PASS -> FAIL in the Fortran testsuite.  The few that I looked at were all
like:

ptxas /tmp/ccAMr7D9.o, line 63; error   : Illegal operand type to 
instruction 'st'
ptxas /tmp/ccAMr7D9.o, line 63; error   : Unknown symbol '%stack'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status
compiler exited with status 1

Comparing '-fdump-tree-all' for 'gfortran.dg/pr37287-1.f90' (randomly
picked) for GCC 14 vs. OG14, already in 'pr37287-1.f90.005t.original' we
see:

--- [GCC 14]/pr37287-1.f90.005t.original  2024-07-03 12:45:08.369948469 
+0200
+++ [OG14]/pr37287-1.f90.005t.original   2024-07-03 12:44:57.770072298 
+0200
@@ -1,3 +1,21 @@
+__attribute__((fn spec (". r r r r ")))
+integer(kind=8) __callback___iso_c_binding_C_ptr (integer(kind=8) 
(*) (void *, void * & restrict, integer(kind=2), void (*) (void)) 
cb, void * token, void * this_ptr, integer(kind=2) flag)
+{
+  integer(kind=8) result;
+  void * * scalar;
+
+  result = 0;
+  if (flag == 1)
+{
+  result = cb (token, &this_ptr, 64, 3, 0B);
+  return result;
+}
+  L$1:;
+  scalar = (void * *) this_ptr;
+  return result;
+}
+
+
 __attribute__((fn spec (". . . ")))
 void __copy___iso_c_binding_C_ptr (void * & restrict src, void * & 
restrict dst)
 {

(In addition to the whole function '__callback___iso_c_binding_C_ptr',
also note that the 'L$1:' label and 'scalar' variable are dead here; but
that's likely unrelated to the issue at hand?)

This points to OG14 commit 92c3af3d4f82351c7133b6ee90e213a8a5a485db
"Fortran/OpenMP: Support mapping of DT with allocatable components":

On 2022-03-01T16:34:18+0100, Tobias Burnus  wrote:
> this patch adds support for mapping something like
>type t
>  type(t2), allocatable :: a, b(:)
>  integer, allocatable :: c, c(:)
>end type t
>type(t), allocatable :: var, var2(:,:)
>
>!$omp target enter data map(var, var)
>
> which does a deep walk of the components at runtime.
>
> [...]
>
> Issues: None known, but I am sure with experimenting,
> more can be found - [...]

Due to a number of other commits (at least textually) depending on this
one, this commit isn't easy to revert on OG14.

But: if I disable it for nvptx target as per the attached
"Fortran/OpenMP: Support mapping of DT with allocatable components: disable 
'generate_callback_wrapper' for nvptx target",
then we're back to good -- all GCC 14 vs. OG14 regressions resolved for
nvptx target.

By the way: it's possible that we've had the same misbehavior also on
OG13 and earlier, but just nobody ever tested that for nvptx target.

Note that also outside of OG14 (that is, in GCC 14 as well as GCC trunk),
we have a number of instances of:

ptxas /tmp/ccAMr7D9.o, line 63; error   : Illegal operand type to 
instruction 'st'
ptxas /tmp/ccAMr7D9.o, line 63; error   : Unknown symbol '%stack'

... all over the Fortran test suite (only).  My current theory therefore
is that there is some latent issue, which is just greatly exacerbated by
OG14 commit 92c3af3d4f82351c7133b6ee90e213a8a5a485db
"Fortran/OpenMP: Support mapping of DT with allocatable components" (or
some related change).

This could be the Fortran front end generating incorrect GIMPLE, or the
middle end or (more likely?) nvptx back end not correctly handling
something that only comes into existance via the Fortran front end.

Anyway: until we understand the underlying issue, OK to push the attached
"Fortran/OpenMP: Support mapping of DT with allocatable components: disable 
'generate_callback_wrapper' for nvptx target"
to devel/omp/gcc-14 branch?


Grüße
 Thomas


>From 3fb9e4cabea736ace66ee197be1b13a978af10ac Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 3 Jul 2024 22:09:39 +0200
Subject: [PATCH] Fortran/OpenMP: Support mapping of DT with allocatable
 components: disable 'generate_callback_wrapper' for nvptx target

This is, obviously, not the final fix for this issue.

	gcc/fortran/
	* class.cc (generate_callback_wrapper) [GCC_NVPTX_H]: Disable.
---
 gcc/fortran/class.cc | 25 +
 1 file changed, 25 insertions(+)

diff --git a/gcc/fortran/class.cc b/gcc/fortran/class.cc
index 15aacd98fd8..2c062204e5a 100644
--- a/gcc/fortran/class.cc
+++ b/gcc/fortran/class.cc
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gfortran.h"
 #include "constructor.h"
 #include "target-memory.h&q

WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into 'pass_late_compilation' (was: nvptx vs. [PATCH] Add a late-combine pass [PR106594])

2024-07-01 Thread Thomas Schwinge
n GCC target libraries for nvptx.  (For avoidance of
> doubt: "mess" is a great exaggeration here.)

But that then disturbs non-nvptx targets; see (prerequisite)
<https://inbox.sourceware.org/87ed8htdwy@euler.schwinge.ddns.net>
"Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN'" for why.

Then, see the attached -- just for later, for now --
"WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into 
'pass_late_compilation'"
for how to make this work properly.  (This also puts back
'pass_fast_rtl_dce' into 'pass_late_compilation' instead of running it
unconditionally, in order to not change any behavior in that regard.)


Grüße
 Thomas


>>> But: should we expect '-fno-late-combine-instructions' vs.
>>> '-flate-combine-instructions' to behave in the same way?  (After all,
>>> '%r22' remains unused also with '-flate-combine-instructions', and
>>> doesn't need to be emitted.)  This could, of course, also be a nvptx back
>>> end issue?
>>>
>>> I'm happy to supply any dump files etc.  Also, 'tmp-libc_a-lnumeric.i.xz'
>>> is attached if you'd like to reproduce this with your own nvptx target
>>> 'cc1':
>>>
>>> $ [...]/configure --target=nvptx-none --enable-languages=c
>>> $ make -j12 all-gcc
>>> $ gcc/cc1 -fpreprocessed tmp-libc_a-lnumeric.i -quiet -dumpbase 
>>> tmp-libc_a-lnumeric.c -dumpbase-ext .c -misa=sm_30 -g -O2 -fno-builtin -o 
>>> tmp-libc_a-lnumeric.s -fdump-rtl-all # -fno-late-combine-instructions
>>>
>>>
>>> Grüße
>>>  Thomas


>From ef14e15c3255059f374e04a47d838e9c98c9da2c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 28 Jun 2024 00:41:54 +0200
Subject: [PATCH] WIP Move 'pass_fast_rtl_dce' from 'pass_postreload' into
 'pass_late_compilation'

id:87ed8i2ekt@euler.schwinge.ddns.net
---
 gcc/passes.cc  | 8 
 gcc/passes.def | 6 ++
 2 files changed, 14 insertions(+)

diff --git a/gcc/passes.cc b/gcc/passes.cc
index e444b462113..1cdd4a77f5b 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -685,6 +685,10 @@ public:
   {}
 
   /* opt_pass methods: */
+  opt_pass *clone () final override
+  {
+return new pass_postreload (m_ctxt);
+  }
   bool gate (function *) final override
   {
 if (reload_completed)
@@ -728,6 +732,10 @@ public:
   {}
 
   /* opt_pass methods: */
+  opt_pass *clone () final override
+  {
+return new pass_late_compilation (m_ctxt);
+  }
   bool gate (function *) final override
   {
 return reload_completed || targetm.no_register_allocation;
diff --git a/gcc/passes.def b/gcc/passes.def
index 72198bc4c4e..cb221438a1e 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -529,7 +529,13 @@ along with GCC; see the file COPYING3.  If not see
 	  NEXT_PASS (pass_regrename);
 	  NEXT_PASS (pass_fold_mem_offsets);
 	  NEXT_PASS (pass_cprop_hardreg);
+  POP_INSERT_PASSES ()
+  NEXT_PASS (pass_late_compilation);
+  PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
 	  NEXT_PASS (pass_fast_rtl_dce);
+  POP_INSERT_PASSES ()
+  NEXT_PASS (pass_postreload);
+  PUSH_INSERT_PASSES_WITHIN (pass_postreload)
 	  NEXT_PASS (pass_reorder_blocks);
 	  NEXT_PASS (pass_leaf_regs);
 	  NEXT_PASS (pass_split_before_sched2);
-- 
2.34.1



Re: nvptx vs. [PATCH] Add a late-combine pass [PR106594]

2024-07-01 Thread Thomas Schwinge
Hi Richard!

On 2024-06-28T17:48:30+0100, Richard Sandiford  
wrote:
> Richard Sandiford  writes:
>> Thomas Schwinge  writes:
>>> On 2024-06-27T23:20:18+0200, I wrote:
>>>> On 2024-06-27T22:27:21+0200, I wrote:
>>>>> On 2024-06-27T18:49:17+0200, I wrote:
>>>>>> On 2023-10-24T19:49:10+0100, Richard Sandiford 
>>>>>>  wrote:
>>>>>>> This patch adds a combine pass that runs late in the pipeline.
>>>>>
>>>>> [After sending, I realized I replied to a previous thread of this work.]
>>>>>
>>>>>> I've beek looking a bit through recent nvptx target code generation
>>>>>> changes for GCC target libraries, and thought I'd also share here my
>>>>>> findings for the "late-combine" changes in isolation, for nvptx target.
>>>>>> 
>>>>>> First the unexpected thing:
>>>>>
>>>>> So much for "unexpected thing" -- next level of unexpected here...
>>>>> Appreciated if anyone feels like helping me find my way through this, but
>>>>> I totally understand if you've got other things to do.
>>>>
>>>> OK, I found something already.  (Unexpectedly quickly...)  ;-)
>>>>
>>>>>> there are a few cases where we now see unused
>>>>>> registers get declared
>>>
>>>> But in fact, for both cases
>>>
>>> Now tested: 's%both%all'.  :-)
>>>
>>>> the unexpected difference goes away if after
>>>> 'pass_late_combine' I inject a 'pass_fast_rtl_dce'.  That's normally run
>>>> as part of 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' -- but that's
>>>> all not active for nvptx target given '!reload_completed', given nvptx is
>>>> 'targetm.no_register_allocation'.  Maybe we need to enable a few more
>>>> passes, or is there anything in 'pass_late_combine' to change, so that we
>>>> don't run into this?  Does it inadvertently mark registers live or
>>>> something like that?
>>>
>>> Basically, is 'pass_late_combine' potentionally doing things that depend
>>> on later clean-up?  (..., or shouldn't it be doing these things in the
>>> first place?)
>>
>> It's possible that late-combine could expose dead code, but I imagine
>> it's a niche case.
>>
>> I had a look at the nvptx logs from my comparison, and the cases in
>> which I saw this seemed to be those where late-combine doesn't find
>> anything to do.  Does that match your examples?  Specifically,
>> the effect should be the same with -fdbg-cnt=late_combine:0-0
>>
>> I think what's happening is that:
>>
>> - combine exposes dead code
>>
>> - ce2 previously ran df_analyze with DF_LR_RUN_DCE set, and so cleared
>>   up the dead code
>>
>> - late-combine instead runs df_analyze without that flag (since late-combine
>>   itself doesn't really care whether dead code is present)
>>
>> - if late-combine doesn't do anything, ce2's df_analyze call has nothing
>>   to do, and skips even the DCE
>>
>> The easiest fix would be to add:
>>
>>   df_set_flags (DF_LR_RUN_DCE);
>>
>> before df_analyze in late-combine.cc, so that it behaves like ce2.
>> But the arrangement feels wrong.  I would have expected DF_LR_RUN_DCE
>> to depend on whether df_analyze had been called since the last DCE pass
>> (whether DF_LR_RUN_DCE or a full DCE).
>
> I'm testing the attached patch to do that.  I'll submit it properly if
> testing passes, but it seems to fix the extra-register problem for me.

> Give fast DCE a separate dirty flag

Thanks, and yes, your analysis makes sense to me (to the extent that I
only superficially understand these parts of GCC) -- and I confirm that
your proposed change to "Give fast DCE a separate dirty flag" does
address the issue for nvptx target.


Grüße
 Thomas


> Thomas pointed out that we sometimes failed to eliminate some dead code
> (specifically clobbers of otherwise unused registers) on nvptx when
> late-combine is enabled.  This happens because:
>
> - combine is able to optimise the function in a way that exposes dead code.
>   This leaves the df information in a "dirty" state.
>
> - late_combine calls df_analyze without DF_LR_RUN_DCE run set.
>   This updates the df information and clears the "dirty" state.
>
> - late_combine doesn

Document 'pass_postreload' vs. 'pass_late_compilation' (was: The nvptx port [4/11+] Post-RA pipeline)

2024-06-28 Thread Thomas Schwinge
Hi!

Before we start looking into enabling certain 'pass_postreload' passes
for nvptx, as we've been discussing in
<https://inbox.sourceware.org/007a01dac921$82d0d9c0$88728d40$@nextmovesoftware.com>
"nvptx vs. [PATCH] Add a late-combine pass [PR106594]", let's first
document the (not quite obvious) status quo:

On 2014-10-20T16:24:43+0200, Bernd Schmidt  wrote:
> This stops most of the post-regalloc passes to be run if the target 
> doesn't want register allocation. I'd previously moved them all out of 
> postreload to the toplevel, but Jakub (I think) pointed out that the 
> idea is not to run them to avoid crashes if reload fails e.g. for an 
> invalid asm. So I've made a new container pass.

OK to push "Document 'pass_postreload' vs. 'pass_late_compilation'", see
attached?


Grüße
 Thomas


> A later patch will make thread_prologue_and_epilogue_insns callable from 
> the backend.
>
>
> Bernd
>
>   gcc/
>   * passes.def (pass_compute_alignments, pass_duplicate_computed_gotos,
>   pass_variable_tracking, pass_free_cfg, pass_machine_reorg,
>   pass_cleanup_barriers, pass_delay_slots,
>   pass_split_for_shorten_branches, pass_convert_to_eh_region_ranges,
>   pass_shorten_branches, pass_est_nothrow_function_flags,
>   pass_dwarf2_frame, pass_final): Move outside of pass_postreload and
>   into pass_late_compilation.
>   (pass_late_compilation): Add.
>   * passes.c (pass_data_late_compilation, pass_late_compilation,
>   make_pass_late_compilation): New.
>   * timevar.def (TV_LATE_COMPILATION): New.
>
> 
> Index: gcc/passes.def
> ===
> --- gcc/passes.def.orig
> +++ gcc/passes.def
> @@ -415,6 +415,9 @@ along with GCC; see the file COPYING3.
> NEXT_PASS (pass_split_before_regstack);
> NEXT_PASS (pass_stack_regs_run);
> POP_INSERT_PASSES ()
> +  POP_INSERT_PASSES ()
> +  NEXT_PASS (pass_late_compilation);
> +  PUSH_INSERT_PASSES_WITHIN (pass_late_compilation)
> NEXT_PASS (pass_compute_alignments);
> NEXT_PASS (pass_variable_tracking);
> NEXT_PASS (pass_free_cfg);
> Index: gcc/passes.c
> ===
> --- gcc/passes.c.orig
> +++ gcc/passes.c
> @@ -569,6 +569,44 @@ make_pass_postreload (gcc::context *ctxt
>return new pass_postreload (ctxt);
>  }
>  
> +namespace {
> +
> +const pass_data pass_data_late_compilation =
> +{
> +  RTL_PASS, /* type */
> +  "*all-late_compilation", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_LATE_COMPILATION, /* tv_id */
> +  PROP_rtl, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  0, /* todo_flags_finish */
> +};
> +
> +class pass_late_compilation : public rtl_opt_pass
> +{
> +public:
> +  pass_late_compilation (gcc::context *ctxt)
> +: rtl_opt_pass (pass_data_late_compilation, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  virtual bool gate (function *)
> +  {
> +return reload_completed || targetm.no_register_allocation;
> +  }
> +
> +}; // class pass_late_compilation
> +
> +} // anon namespace
> +
> +static rtl_opt_pass *
> +make_pass_late_compilation (gcc::context *ctxt)
> +{
> +  return new pass_late_compilation (ctxt);
> +}
> +
>  
>  
>  /* Set the static pass number of pass PASS to ID and record that
> Index: gcc/timevar.def
> ===
> --- gcc/timevar.def.orig
> +++ gcc/timevar.def
> @@ -270,6 +270,7 @@ DEFTIMEVAR (TV_EARLY_LOCAL , "early
>  DEFTIMEVAR (TV_OPTIMIZE   , "unaccounted optimizations")
>  DEFTIMEVAR (TV_REST_OF_COMPILATION   , "rest of compilation")
>  DEFTIMEVAR (TV_POSTRELOAD , "unaccounted post reload")
> +DEFTIMEVAR (TV_LATE_COMPILATION   , "unaccounted late compilation")
>  DEFTIMEVAR (TV_REMOVE_UNUSED  , "remove unused locals")
>  DEFTIMEVAR (TV_ADDRESS_TAKEN  , "address taken")
>  DEFTIMEVAR (TV_TODO   , "unaccounted todo")


>From 7f708dd9774773e704cb06b7a6f296927f9057df Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 28 Jun 2024 16:04:18 +0200
Subject: [PATCH] Document 'pass_postreload' vs. 'pass_late_compilation'

See Subversion r217124 (Git commit 433e4164339f18d0b8798968444a56b681b5232c)
"Reorganize post-ra pipeline fo

Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN' (was: [PATCH 03/11] Handwritten part of conversion of passes to C++ classes)

2024-06-28 Thread Thomas Schwinge
Hi!

As part of this:

On 2013-07-26T11:04:33-0400, David Malcolm  wrote:
> This patch is the hand-written part of the conversion of passes from
> C structs to C++ classes.

> --- a/gcc/passes.c
> +++ b/gcc/passes.c

..., we did hard-code 'PUSH_INSERT_PASSES_WITHIN(PASS)' to always refer
to the first instance of 'PASS':

>  #define PUSH_INSERT_PASSES_WITHIN(PASS) \
>{ \
> -struct opt_pass **p = &(PASS).pass.sub;
> +struct opt_pass **p = &(PASS ## _1)->sub;

..., however we did change 'NEXT_PASS(PASS, NUM)' to actually use 'NUM':

> -#define NEXT_PASS(PASS, NUM)  (p = next_pass_1 (p, &((PASS).pass)))
> +#define NEXT_PASS(PASS, NUM) \
> +  do { \
> +gcc_assert (NULL == PASS ## _ ## NUM); \
> +if ((NUM) == 1)  \
> +  PASS ## _1 = make_##PASS (ctxt_);  \
> +else \
> +  {  \
> +gcc_assert (PASS ## _1); \
> +PASS ## _ ## NUM = PASS ## _1->clone (); \
> +  }  \
> +p = next_pass_1 (p, PASS ## _ ## NUM);  \
> +  } while (0)

This was never re-synchronized later on, and is problematic if you try to
do something like this; change:

[...]
NEXT_PASS (pass_postreload);
PUSH_INSERT_PASSES_WITHIN (pass_postreload)
NEXT_PASS (pass_postreload_cse);
[...]
NEXT_PASS (pass_cprop_hardreg);
NEXT_PASS (pass_fast_rtl_dce);
NEXT_PASS (pass_reorder_blocks);
[...]
POP_INSERT_PASSES ()
[...]

... into:

[...]
NEXT_PASS (pass_postreload);
PUSH_INSERT_PASSES_WITHIN (pass_postreload)
NEXT_PASS (pass_postreload_cse);
[...]
NEXT_PASS (pass_cprop_hardreg);
POP_INSERT_PASSES ()
NEXT_PASS (pass_fast_rtl_dce);
NEXT_PASS (pass_postreload);
PUSH_INSERT_PASSES_WITHIN (pass_postreload)
NEXT_PASS (pass_reorder_blocks);
[...]
POP_INSERT_PASSES ()
[...]

That is, interrupt the pass pipeline within 'pass_postreload', in order
to unconditionally run 'pass_fast_rtl_dce' even if not running
'pass_postreload'.  What happens is that the second
'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' overwrites the first
'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' instead of applying to the
second (preceding) 'NEXT_PASS (pass_postreload);'.

(I ran into this in context of what I tried in
<https://inbox.sourceware.org/87ed8i2ekt@euler.schwinge.ddns.net>
"nvptx vs. [PATCH] Add a late-combine pass [PR106594]"; discuss that
specific use case over there, not here.)

OK to address this with the attached
"Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN'"?

This depends on
<https://inbox.sourceware.org/87jzi9tgcw@euler.schwinge.ddns.net>
"Rewrite usage comment at the top of 'gcc/passes.def'" to avoid running
into the 'ERROR: Can't locate [...]' that I'm adding, while processing
the 'PUSH_INSERT_PASSES_WITHIN (PASS)' in the usage comment at the top of
'gcc/passes.def', where 'NEXT_PASS (PASS)' only appears later.  ;-)

I've verified this does the expected thing for the main 'gcc/passes.def',
and that 'PUSH_INSERT_PASSES_WITHIN' is not used/not applicable for
'PASSES_EXTRA' ('gcc/config/*/*-passes.def').


Grüße
 Thomas


>From e368ccba93f5bbaee882076c80849adb55a68fa0 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 28 Jun 2024 12:10:12 +0200
Subject: [PATCH] Handle 'NUM' in 'PUSH_INSERT_PASSES_WITHIN'

..., such that also for repeated 'NEXT_PASS', 'PUSH_INSERT_PASSES_WITHIN' for a
given 'PASS', the 'PUSH_INSERT_PASSES_WITHIN' applies to the preceeding
'NEXT_PASS', and not unconditionally applies to the first 'NEXT_PASS'.

	gcc/
	* gen-pass-instances.awk: Handle 'PUSH_INSERT_PASSES_WITHIN'.
	* pass_manager.h (PUSH_INSERT_PASSES_WITHIN): Adjust.
	* passes.cc (PUSH_INSERT_PASSES_WITHIN): Likewise.
---
 gcc/gen-pass-instances.awk | 28 +---
 gcc/pass_manager.h |  2 +-
 gcc/passes.cc  |  6 +++---
 3 files changed, 29 insertions(+), 7 deletions(-)

diff --git a/gcc/gen-pass-instances.awk b/gcc/gen-pass-instances.awk
index 449889663f7..871ac0cdb52 100644
--- a/gcc/gen-pass-instances.awk
+++ b/gcc/gen-pass-instances.awk
@@ -16,7 +16,7 @@
 
 # This Awk script takes passes.def and writes pass-instances.def,
 # counting the instances of each kind of pass, adding an instance number
-# to everywhere that NEXT_PASS is used.
+# to everywhere that NEXT_PASS or PUSH_INSERT_PASSES_WITHIN are used.
 # Also handle INSERT_PASS_AFTER, INSERT_

Rewrite usage comment at the top of 'gcc/passes.def' (was: [PATCH 02/11] Generate pass-instances.def)

2024-06-28 Thread Thomas Schwinge
Hi!

On 2013-07-26T11:04:32-0400, David Malcolm  wrote:
> Introduce a new gen-pass-instances.awk script, and use it at build time
> to make a pass-instances.def from passes.def.

(The script has later been rewritten and extended, but the issue I'm
discussing is relevant already in its original version.)

> The generated pass-instances.def contains similar content to passes.def,
> but the pass instances within it are explicitly numbered, so that e.g.
> the third instance of:
>
>   NEXT_PASS (pass_copy_prop)
>
> becomes:
>
>   NEXT_PASS (pass_copy_prop, 3)

> --- a/gcc/passes.c
> +++ b/gcc/passes.c
> @@ -1315,12 +1315,12 @@ pipeline::pipeline (context *ctxt)
>  #define POP_INSERT_PASSES() \
>}
>  
> -#define NEXT_PASS(PASS)  (p = next_pass_1 (p, &((PASS).pass)))
> +#define NEXT_PASS(PASS, NUM)  (p = next_pass_1 (p, &((PASS).pass)))
>  
>  #define TERMINATE_PASS_LIST() \
>*p = NULL;
>  
> -#include "passes.def"
> +#include "pass-instances.def"

Given this, the usage comment at the top of 'gcc/passes.def' (see below)
no longer is accurate (even if that latter file does continue to use the
'NEXT_PASS' form without 'NUM') -- and, worse, the 'NEXT_PASS' etc. in
that usage comment are processed by the 'gcc/gen-pass-instances.awk'
script:

--- source-gcc/gcc/passes.def   2024-06-24 18:55:15.132561641 +0200
+++ build-gcc/gcc/pass-instances.def2024-06-24 18:55:27.768562714 +0200
[...]
@@ -20,546 +22,578 @@
 /*
  Macros that should be defined when using this file:
INSERT_PASSES_AFTER (PASS)
PUSH_INSERT_PASSES_WITHIN (PASS)
POP_INSERT_PASSES ()
-   NEXT_PASS (PASS)
+   NEXT_PASS (PASS, 1)
TERMINATE_PASS_LIST (PASS)
  */
[...]

(That is, this is 'NEXT_PASS' for the first instance of pass 'PASS'.)
That's benign so far, but with another thing that I'll be extending, I'd
then run into an error while the script handles this comment block.  ;-\

OK to push "Rewrite usage comment at the top of 'gcc/passes.def'", see
attached?


Grüße
 Thomas


>From 072cdf7d9cf86fb2b0553b93365648e153b4376b Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 28 Jun 2024 14:05:04 +0200
Subject: [PATCH] Rewrite usage comment at the top of 'gcc/passes.def'

Since Subversion r201359 (Git commit a167b052dfe9a8509bb23c374ffaeee953df0917)
"Introduce gen-pass-instances.awk and pass-instances.def", the usage comment at
the top of 'gcc/passes.def' no longer is accurate (even if that latter file
does continue to use the 'NEXT_PASS' form without 'NUM') -- and, worse, the
'NEXT_PASS' etc. in that usage comment are processed by the
'gcc/gen-pass-instances.awk' script:

--- source-gcc/gcc/passes.def   2024-06-24 18:55:15.132561641 +0200
+++ build-gcc/gcc/pass-instances.def2024-06-24 18:55:27.768562714 +0200
[...]
@@ -20,546 +22,578 @@
 /*
  Macros that should be defined when using this file:
INSERT_PASSES_AFTER (PASS)
PUSH_INSERT_PASSES_WITHIN (PASS)
POP_INSERT_PASSES ()
-   NEXT_PASS (PASS)
+   NEXT_PASS (PASS, 1)
TERMINATE_PASS_LIST (PASS)
  */
[...]

(That is, this is 'NEXT_PASS' for the first instance of pass 'PASS'.)
That's benign so far, but with another thing that I'll be extending, I'd
then run into an error while the script handles this comment block.  ;-\

	gcc/
	* passes.def: Rewrite usage comment at the top.
---
 gcc/passes.def | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/gcc/passes.def b/gcc/passes.def
index 1f222729d39..3f65fcf71d6 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -17,14 +17,11 @@ You should have received a copy of the GNU General Public License
 along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
-/*
- Macros that should be defined when using this file:
-   INSERT_PASSES_AFTER (PASS)
-   PUSH_INSERT_PASSES_WITHIN (PASS)
-   POP_INSERT_PASSES ()
-   NEXT_PASS (PASS)
-   TERMINATE_PASS_LIST (PASS)
- */
+/* Note that this file is processed by a simple parser:
+   'gen-pass-instances.awk', so carefully verify the generated
+   'pass-instances.def' if you deviate from the syntax otherwise used in
+   here.  */
+
 
  /* All passes needed to lower the function into shape optimizers can
 operate on.  These passes are always run first on the function, but
-- 
2.34.1



Re: nvptx vs. [PATCH] Add a late-combine pass [PR106594]

2024-06-27 Thread Thomas Schwinge
Hi!

On 2024-06-27T23:20:18+0200, I wrote:
> On 2024-06-27T22:27:21+0200, I wrote:
>> On 2024-06-27T18:49:17+0200, I wrote:
>>> On 2023-10-24T19:49:10+0100, Richard Sandiford  
>>> wrote:
 This patch adds a combine pass that runs late in the pipeline.
>>
>> [After sending, I realized I replied to a previous thread of this work.]
>>
>>> I've beek looking a bit through recent nvptx target code generation
>>> changes for GCC target libraries, and thought I'd also share here my
>>> findings for the "late-combine" changes in isolation, for nvptx target.
>>> 
>>> First the unexpected thing:
>>
>> So much for "unexpected thing" -- next level of unexpected here...
>> Appreciated if anyone feels like helping me find my way through this, but
>> I totally understand if you've got other things to do.
>
> OK, I found something already.  (Unexpectedly quickly...)  ;-)
>
>>> there are a few cases where we now see unused
>>> registers get declared

> But in fact, for both cases

Now tested: 's%both%all'.  :-)

> the unexpected difference goes away if after
> 'pass_late_combine' I inject a 'pass_fast_rtl_dce'.  That's normally run
> as part of 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' -- but that's
> all not active for nvptx target given '!reload_completed', given nvptx is
> 'targetm.no_register_allocation'.  Maybe we need to enable a few more
> passes, or is there anything in 'pass_late_combine' to change, so that we
> don't run into this?  Does it inadvertently mark registers live or
> something like that?

Basically, is 'pass_late_combine' potentionally doing things that depend
on later clean-up?  (..., or shouldn't it be doing these things in the
first place?)

> The following makes these two cases work, but evidently needs a lot more
> analysis: a lot of other passes are enabled that may be anything between
> beneficial and harmful for 'targetm.no_register_allocation'/nvptx.
>
> --- gcc/passes.cc
> +++ gcc/passes.cc
> @@ -676,17 +676,17 @@ const pass_data pass_data_postreload =
>  class pass_postreload : public rtl_opt_pass
>  {
>  public:
>pass_postreload (gcc::context *ctxt)
>  : rtl_opt_pass (pass_data_postreload, ctxt)
>{}
>  
>/* opt_pass methods: */
> -  bool gate (function *) final override { return reload_completed; }
> +  bool gate (function *) final override { return reload_completed || 
> targetm.no_register_allocation; }
> --- gcc/regcprop.cc
> +++ gcc/regcprop.cc
> @@ -1305,17 +1305,17 @@ class pass_cprop_hardreg : public rtl_opt_pass
>  public:
>pass_cprop_hardreg (gcc::context *ctxt)
>  : rtl_opt_pass (pass_data_cprop_hardreg, ctxt)
>{}
>  
>/* opt_pass methods: */
>bool gate (function *) final override
>  {
> -  return (optimize > 0 && (flag_cprop_registers));
> +  return (optimize > 0 && flag_cprop_registers && 
> !targetm.no_register_allocation);
>  }

Also, that quickly ICEs; more '[...] && !targetm.no_register_allocation'
are needed elsewhere, at least.

The following simpler thing, however, does work; move 'pass_fast_rtl_dce'
out of 'pass_postreload':

--- gcc/passes.cc
+++ gcc/passes.cc
@@ -677,14 +677,15 @@ class pass_postreload : public rtl_opt_pass
 {
 public:
   pass_postreload (gcc::context *ctxt)
 : rtl_opt_pass (pass_data_postreload, ctxt)
   {}
 
   /* opt_pass methods: */
+  opt_pass * clone () final override { return new pass_postreload 
(m_ctxt); }
   bool gate (function *) final override { return reload_completed; }
 
 }; // class pass_postreload
--- gcc/passes.def
+++ gcc/passes.def
@@ -529,7 +529,10 @@ along with GCC; see the file COPYING3.  If not see
  NEXT_PASS (pass_regrename);
  NEXT_PASS (pass_fold_mem_offsets);
  NEXT_PASS (pass_cprop_hardreg);
- NEXT_PASS (pass_fast_rtl_dce);
+  POP_INSERT_PASSES ()
+  NEXT_PASS (pass_fast_rtl_dce);
+  NEXT_PASS (pass_postreload);
+  PUSH_INSERT_PASSES_WITHIN (pass_postreload)
  NEXT_PASS (pass_reorder_blocks);
  NEXT_PASS (pass_leaf_regs);
  NEXT_PASS (pass_split_before_sched2);

This (only) cleans up "the mess that 'pass_late_combine' created"; no
further changes in GCC target libraries for nvptx.  (For avoidance of
doubt: "mess" is a great exaggeration here.)


Grüße
 Thomas


>> But: should we expect '-fno-late-combine-instructions' vs.
>> '-flate-combine-instructions' to behave in the same way?  (After all,
>> '%r22' remains unused also with '-flate-combine-instructions', and
>> doesn't need to be emitted.)  This could, of course, also be a nvptx back
>> end issue?
>>
>> I'm happy to supply any dump files etc.  Also, 'tmp-libc_a-lnumeric.i.xz'
>> is attached if you'd like to reproduce this with your own nvptx target
>> 'cc1':
>>
>> $ [...]/configure --target=nvptx-none --enable-langu

Re: nvptx vs. [PATCH] Add a late-combine pass [PR106594]

2024-06-27 Thread Thomas Schwinge
Hi!

On 2024-06-27T22:27:21+0200, I wrote:
> On 2024-06-27T18:49:17+0200, I wrote:
>> On 2023-10-24T19:49:10+0100, Richard Sandiford  
>> wrote:
>>> This patch adds a combine pass that runs late in the pipeline.
>
> [After sending, I realized I replied to a previous thread of this work.]
>
>> I've beek looking a bit through recent nvptx target code generation
>> changes for GCC target libraries, and thought I'd also share here my
>> findings for the "late-combine" changes in isolation, for nvptx target.
>> 
>> First the unexpected thing:
>
> So much for "unexpected thing" -- next level of unexpected here...
> Appreciated if anyone feels like helping me find my way through this, but
> I totally understand if you've got other things to do.

OK, I found something already.  (Unexpectedly quickly...)  ;-)

>> there are a few cases where we now see unused
>> registers get declared, for example (random) in
>> 'nvptx-none/newlib/libc/libm_a-s_modf.o:modf'

I've now looked into the former one ('tmp-libm_a-s_modf.i.xz' is
attached), to avoid...

> I first looked into a simpler case: newlib 'libc/locale/lnumeric.c'.

> ../../../source-gcc/newlib/libc/locale/lnumeric.c:88:10: warning: ‘ret’ 
> is used uninitialized [-Wuninitialized]
>88 |   return ret;
>   |  ^~~
> ../../../source-gcc/newlib/libc/locale/lnumeric.c:48:7: note: ‘ret’ was 
> declared here
>48 |   int ret;
>   |   ^~~
>
> Uh.  Given nothing else is going on in that function, I suppose '%r22'
> relates to the uninitialized 'ret' -- and given undefined behavior, GCC
> of course is fine to emit an unused 'reg' in that case...

... the undefined behavior here.

But in fact, for both cases, the unexpected difference goes away if after
'pass_late_combine' I inject a 'pass_fast_rtl_dce'.  That's normally run
as part of 'PUSH_INSERT_PASSES_WITHIN (pass_postreload)' -- but that's
all not active for nvptx target given '!reload_completed', given nvptx is
'targetm.no_register_allocation'.  Maybe we need to enable a few more
passes, or is there anything in 'pass_late_combine' to change, so that we
don't run into this?  Does it inadvertently mark registers live or
something like that?

The following makes these two cases work, but evidently needs a lot more
analysis: a lot of other passes are enabled that may be anything between
beneficial and harmful for 'targetm.no_register_allocation'/nvptx.

--- gcc/passes.cc
+++ gcc/passes.cc
@@ -676,17 +676,17 @@ const pass_data pass_data_postreload =
 class pass_postreload : public rtl_opt_pass
 {
 public:
   pass_postreload (gcc::context *ctxt)
 : rtl_opt_pass (pass_data_postreload, ctxt)
   {}
 
   /* opt_pass methods: */
-  bool gate (function *) final override { return reload_completed; }
+  bool gate (function *) final override { return reload_completed || 
targetm.no_register_allocation; }
--- gcc/regcprop.cc
+++ gcc/regcprop.cc
@@ -1305,17 +1305,17 @@ class pass_cprop_hardreg : public rtl_opt_pass
 public:
   pass_cprop_hardreg (gcc::context *ctxt)
 : rtl_opt_pass (pass_data_cprop_hardreg, ctxt)
   {}
 
   /* opt_pass methods: */
   bool gate (function *) final override
 {
-  return (optimize > 0 && (flag_cprop_registers));
+  return (optimize > 0 && flag_cprop_registers && 
!targetm.no_register_allocation);
 }


Grüße
 Thomas


> But: should we expect '-fno-late-combine-instructions' vs.
> '-flate-combine-instructions' to behave in the same way?  (After all,
> '%r22' remains unused also with '-flate-combine-instructions', and
> doesn't need to be emitted.)  This could, of course, also be a nvptx back
> end issue?
>
> I'm happy to supply any dump files etc.  Also, 'tmp-libc_a-lnumeric.i.xz'
> is attached if you'd like to reproduce this with your own nvptx target
> 'cc1':
>
> $ [...]/configure --target=nvptx-none --enable-languages=c
> $ make -j12 all-gcc
> $ gcc/cc1 -fpreprocessed tmp-libc_a-lnumeric.i -quiet -dumpbase 
> tmp-libc_a-lnumeric.c -dumpbase-ext .c -misa=sm_30 -g -O2 -fno-builtin -o 
> tmp-libc_a-lnumeric.s -fdump-rtl-all # -fno-late-combine-instructions
>
>
> Grüße
>  Thomas




tmp-libm_a-s_modf.i.xz
Description: application/xz


Re: nvptx vs. [PATCH] Add a late-combine pass [PR106594]

2024-06-27 Thread Thomas Schwinge
Hi!

On 2024-06-27T18:49:17+0200, I wrote:
> On 2023-10-24T19:49:10+0100, Richard Sandiford  
> wrote:
>> This patch adds a combine pass that runs late in the pipeline.

[After sending, I realized I replied to a previous thread of this work.]

> I've beek looking a bit through recent nvptx target code generation
> changes for GCC target libraries, and thought I'd also share here my
> findings for the "late-combine" changes in isolation, for nvptx target.
> 
> First the unexpected thing:

So much for "unexpected thing" -- next level of unexpected here...
Appreciated if anyone feels like helping me find my way through this, but
I totally understand if you've got other things to do.

> there are a few cases where we now see unused
> registers get declared, for example (random) in
> 'nvptx-none/newlib/libc/libm_a-s_modf.o:modf'

I first looked into a simpler case: newlib 'libc/locale/lnumeric.c'.

Here we get the following 'diff' for '*.s' for
'-fno-late-combine-instructions' vs. (default)
'-flate-combine-instructions':

 .visible .func (.param.u32 %value_out) __numeric_load_locale (.param.u64 
%in_ar0, .param.u64 %in_ar1, .param.u64 %in_ar2, .param.u64 %in_ar3)
 {
.reg.u32 %value;
.reg.u64 %ar0;
ld.param.u64 %ar0, [%in_ar0];
.reg.u64 %ar1;
ld.param.u64 %ar1, [%in_ar1];
.reg.u64 %ar2;
ld.param.u64 %ar2, [%in_ar2];
.reg.u64 %ar3;
ld.param.u64 %ar3, [%in_ar3];
+   .reg.u32 %r22;
.file 2 "../../../source-gcc/newlib/libc/locale/lnumeric.c"
.loc 2 89 1
mov.u32 %value, 0;
st.param.u32[%value_out], %value;
ret;
 }

Clearly, '%r22' is unused.  However, looking at the source code (manually
trimmed):

int
__numeric_load_locale (struct __locale_t *locale, const char *name ,
   void *f_wctomb, const char *charset)
{
  int ret;
  struct lc_numeric_T nm;
  char *bufp = NULL;

#ifdef __CYGWIN__
  [...]
#else
  /* TODO */
#endif
  return ret;
}

..., and adding '-Wall' (why isn't top-level/newlib build system doing
that...):

[...]
../../../source-gcc/newlib/libc/locale/lnumeric.c:88:10: warning: ‘ret’ is 
used uninitialized [-Wuninitialized]
   88 |   return ret;
  |  ^~~
../../../source-gcc/newlib/libc/locale/lnumeric.c:48:7: note: ‘ret’ was 
declared here
   48 |   int ret;
  |   ^~~

Uh.  Given nothing else is going on in that function, I suppose '%r22'
relates to the uninitialized 'ret' -- and given undefined behavior, GCC
of course is fine to emit an unused 'reg' in that case...

But: should we expect '-fno-late-combine-instructions' vs.
'-flate-combine-instructions' to behave in the same way?  (After all,
'%r22' remains unused also with '-flate-combine-instructions', and
doesn't need to be emitted.)  This could, of course, also be a nvptx back
end issue?

I'm happy to supply any dump files etc.  Also, 'tmp-libc_a-lnumeric.i.xz'
is attached if you'd like to reproduce this with your own nvptx target
'cc1':

$ [...]/configure --target=nvptx-none --enable-languages=c
$ make -j12 all-gcc
$ gcc/cc1 -fpreprocessed tmp-libc_a-lnumeric.i -quiet -dumpbase 
tmp-libc_a-lnumeric.c -dumpbase-ext .c -misa=sm_30 -g -O2 -fno-builtin -o 
tmp-libc_a-lnumeric.s -fdump-rtl-all # -fno-late-combine-instructions


Grüße
 Thomas




tmp-libc_a-lnumeric.i.xz
Description: application/xz


nvptx vs. [PATCH] Add a late-combine pass [PR106594]

2024-06-27 Thread Thomas Schwinge
Hi!

On 2023-10-24T19:49:10+0100, Richard Sandiford  
wrote:
> This patch adds a combine pass that runs late in the pipeline.

Great!

In context of 
'nvptx vs. "fwprop: invoke change_is_worthwhile to judge if a replacement is 
worthwhile"',
I've beek looking a bit through recent nvptx target code generation
changes for GCC target libraries, and thought I'd also share here my
findings for the "late-combine" changes in isolation, for nvptx target.

First the unexpected thing: there are a few cases where we now see unused
registers get declared, for example (random) in
'nvptx-none/newlib/libc/libm_a-s_modf.o:modf' (full 'diff' before vs.
after):

[...]
 .visible .func (.param .f64 %value_out) modf (.param .f64 %in_ar0, .param 
.u64 %in_ar1)
 {
 .reg .f64 %value;
 .reg .f64 %ar0;
 ld.param.f64 %ar0,[%in_ar0];
 .reg .u64 %ar1;
 ld.param.u64 %ar1,[%in_ar1];
 .reg .u32 %r23;
 .reg .f64 %r32;
 .reg .u32 %r41;
 .reg .u32 %r42;
 .reg .f64 %r43;
 .reg .u32 %r46;
+.reg .u64 %r48;
+.reg .u64 %r49;
+.reg .u64 %r50;
+.reg .u64 %r51;
+.reg .u64 %r52;
 .reg .f64 %r53;
 .reg .f64 %r54;
 .reg .u64 %r55;
[...]

That is, five additional registers declared, without any use.  I suppose
that's some 'gen_reg_rtx' that needs to be "confined" (in whichever way;
to be looked into).  I suppose that's not actually a problem: the PTX JIT
should clean these out again, but it's still noise when reading
GCC-emitted PTX code.


Other than that, I've only got good things to report :-) -- a few
examples in the following, if anyone's interested, without much
commentary.  I haven't looked of how much these "visible" PTX code
generation changes translate into actual GPU SASS code improvements after
the PTX JIT has done its thing, but it's certainly easier to read/less
state to keep during human reading (due to less live registers,
primarily).

'nvptx-none/libatomic/cas_16_.o':

[...]
-.reg .u32 %r24;
[...]
 setp.eq.u64 %r45,%r40,0;
-selp.u32 %r24,1,0,%r45;
 .loc 3 113 6
 setp.ne.u64 %r48,%r58,%r60;
 @ %r48 bra $L2;
@@ -86,7 +84,7 @@
 call libat_unlock_1,(%out_arg1);
 }
 .loc 3 122 1
-mov.u32 %value,%r24;
+selp.u32 %value,1,0,%r45;
 st.param.u32 [%value_out],%value;
 ret;
 }

A lot more instances of similar patterns.

'nvptx-none/libbacktrace/dwarf.o':

[...]
-.reg .u64 %r695;
[...]
 setp.ne.u32 %r679,%r678,0;
-@ ! %r679 bra $L1149;
-add.u64 %r695,%r212,32;
-bra $L1090;
-$L1149:
+@ %r679 bra $L1090;
[...]
 $L1090:
 .loc 2 4046 7
-mov.u64 %r28,%r695;
+add.u64 %r28,%r212,32;
 $L1096:
[...]

'nvptx-none/libbacktrace/dwarf.o':

[...]
-.reg .u64 %r41;
[...]
-add.u64 %r41,%frame,8;
 mov.u64 %r43,0;
-st.u64 [%r41],%r43;
-st.u64 [%r41+8],%r43;
-st.u64 [%r41+16],%r43;
+st.u64 [%frame+8],%r43;
+st.u64 [%frame+16],%r43;
+st.u64 [%frame+24],%r43;
[...]

'nvptx-none/libgfortran/generated/cshift1_8.o':

[...]
-.reg .u64 %r293;
[...]
-add.u64 %r293,%r292,120;
-st.u64 [%r293],%r409;
+st.u64 [%r292+120],%r409;
[...]

'nvptx-none/newlib/libc/stdio/libc_a-siscanf.o', have 'st' do 'u32'
truncation instead of explicit 'cvt':

[...]
-.reg .u32 %r23;
[...]
-cvt.u32.u64 %r23,%r32;
-st.u32 [%frame+8],%r23;
+st.u32 [%frame+8],%r32;
[...]


'nvptx-none/newlib/libc/string/libc_a-memccpy.o', simplify (supposedly)
char -> int -> short into char -> short:

[...]
-.reg .u32 %r37;
[...]
-cvt.u32.u8 %r37,%r49;
[...]
-cvt.u16.u32 %r67,%r37;
+cvt.u16.u8 %r67,%r49;
[...]

'nvptx-none/libgcc/crt0.o':

[...]
-.reg .u64 %r30;
[...]
 cvta.global.u64 %r29,stack$0+131072;
 st.shared.u64 [__nvptx_stacks],%r29;
-cvta.shared.u64 %r30,__nvptx_uni;
 mov.u32 %r31,0;
-st.u32 [%r30],%r31;
+st.shared.u32 [__nvptx_uni],%r31;
[...]

There are a lot more instances of getting rid of a register in favor of
using more complex instructions.

'nvptx-none/libgfortran/generated/findloc2_s4.o':

[...]
-.reg .u64 %r33;
[...]
-neg.s64 %r33,%r35;
[...]
-add.u64 %r39,%r39,%r33;
+sub.u64 %r39,%r39,%r35;
[...]

..., but in turn also a multi-use variant of the 'neg' (is this still
beneficial?), 'nvptx-none/newlib/libc/search/libc_a-bsd_qsort_r.o':

[...]
-.reg .u64 %r34;
[...]
-neg.s64 %r34,%r82;
-.loc 2 213 9
-add.u64 %r35,%r76,%r34;
+sub.u64 %r35,%r76,%r82;
[...]
-add.u64 %r37,%r71,%r34;
-add.u64 %r38,%r37,%r34;
+sub.u64 %r37,%r71,%r82;
+sub.u64 %r38,%r37,%r82;
[...]

'nvptx-none/newlib/libm/complex/libm_a-cephes_subr.o':

[...]
-.reg .f64 %r35;
[...]
-neg.f64 %r35,%r27;
-fma.rn.f64 %r22,%r35,0d400921fb5400,%r32;
+fma.rn.f64 %r22,%r27,0dc00921fb5400,%r32;
 .loc 2 80 21
   

Re: [PATCH][v2] Support single def-use cycle optimization for SLP reduction vectorization

2024-06-25 Thread Thomas Schwinge
Hi!

On 2024-06-14T11:08:15+0200, Richard Biener  wrote:
> We can at least mimic single def-use cycle optimization when doing
> single-lane SLP reductions and that's required to avoid regressing
> compared to non-SLP.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.
>
>   * tree-vect-loop.cc (vectorizable_reduction): Allow
>   single-def-use cycles with SLP.
>   (vect_transform_reduction): Handle SLP single def-use cycles.
>   (vect_transform_cycle_phi): Likewise.
>
>   * gcc.dg/vect/slp-reduc-12.c: New testcase.

For GCN target (tested '-march=gfx908' on current sources), I see:

+PASS: gcc.dg/vect/slp-reduc-12.c (test for excess errors)
+FAIL: gcc.dg/vect/slp-reduc-12.c scan-tree-dump vect "using single def-use 
cycle for reduction"

..., where we've got (see attached):

[...]
[...]/gcc.dg/vect/slp-reduc-12.c:10:21: optimized: loop vectorized using 
256 byte vectors
[...]
[...]/gcc.dg/vect/slp-reduc-12.c:10:21: note:   Reduce using direct vector 
reduction.
[...]/gcc.dg/vect/slp-reduc-12.c:10:21: note:   vectorizing stmts using SLP.
[...]

How to address?


Grüße
 Thomas


>  gcc/testsuite/gcc.dg/vect/slp-reduc-12.c | 18 ++
>  gcc/tree-vect-loop.cc| 45 ++--
>  2 files changed, 45 insertions(+), 18 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-reduc-12.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-reduc-12.c 
> b/gcc/testsuite/gcc.dg/vect/slp-reduc-12.c
> new file mode 100644
> index 000..625f8097c54
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/slp-reduc-12.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target vect_hw_misalign } */
> +/* { dg-additional-options "-Ofast" } */
> +
> +double foo (double *x, int * __restrict a, int n)
> +{
> +  double r = 0.;
> +  for (int i = 0; i < n; ++i)
> +{
> +  a[i] = a[i] + i;
> +  r += x[i];
> +}
> +  return r;
> +}
> +
> +/* { dg-final { scan-tree-dump "using single def-use cycle for reduction" 
> "vect" } } */
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index bbd5d261907..d9a2ad69484 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -8320,7 +8320,11 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
> participating.  When unrolling we want each unrolled iteration to have its
> own reduction accumulator since one of the main goals of unrolling a
> reduction is to reduce the aggregate loop-carried latency.  */
> -  if (ncopies > 1
> +  if ((ncopies > 1
> +   || (slp_node
> +&& !REDUC_GROUP_FIRST_ELEMENT (stmt_info)
> +&& SLP_TREE_LANES (slp_node) == 1
> +&& vect_get_num_copies (loop_vinfo, vectype_in) > 1))
>&& (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live)
>&& reduc_chain_length == 1
>&& loop_vinfo->suggested_unroll_factor == 1)
> @@ -8373,6 +8377,10 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>   single_defuse_cycle = false;
>   }
>  }
> +  if (dump_enabled_p () && single_defuse_cycle)
> +dump_printf_loc (MSG_NOTE, vect_location,
> +  "using single def-use cycle for reduction by reducing "
> +  "multiple vectors to one in the loop body\n");
>STMT_VINFO_FORCE_SINGLE_CYCLE (reduc_info) = single_defuse_cycle;
>  
>/* If the reduction stmt is one of the patterns that have lane
> @@ -8528,9 +8536,8 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>  {
>tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
>class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> -  int i;
> -  int ncopies;
> -  int vec_num;
> +  unsigned ncopies;
> +  unsigned vec_num;
>  
>stmt_vec_info reduc_info = info_for_reduction (loop_vinfo, stmt_info);
>gcc_assert (reduc_info->is_reduc_info);
> @@ -8577,7 +8584,6 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>auto_vec vec_oprnds0;
>auto_vec vec_oprnds1;
>auto_vec vec_oprnds2;
> -  tree def0;
>  
>if (dump_enabled_p ())
>  dump_printf_loc (MSG_NOTE, vect_location, "transform reduction.\n");
> @@ -8652,20 +8658,21 @@ vect_transform_reduction (loop_vec_info loop_vinfo,
>   definition.  */
>if (single_defuse_cycle)
>  {
> -  gcc_assert (!slp_node);
> -  vect_get_vec_defs_for_operand (loop_vinfo, stmt_info, 1,
> -  op.ops[reduc_index],
> -  reduc_index == 0 ? &vec_oprnds0
> -  : (reduc_index == 1 ? &vec_oprnds1
> - : &vec_oprnds2));
> +  vect_get_vec_defs (loop_vinfo, stmt_info, slp_node, 1,
> +  reduc_index == 0 ? op.ops[0] : NULL_TREE, &vec_oprnds0,
> +  reduc_index == 1 ? op.ops[1] : NULL_TREE, &vec_oprnds1,
> +  redu

rs6000: Properly default-disable late-combine passes [PR106594, PR115622, PR115633] (was: [PATCH 6/6] Add a late-combine pass [PR106594])

2024-06-25 Thread Thomas Schwinge
Hi!

On 2024-06-25T10:07:47+0100, Richard Sandiford  
wrote:
> Thomas Schwinge  writes:
>> On 2024-06-20T14:34:18+0100, Richard Sandiford  
>> wrote:
>>> This patch adds a combine pass that runs late in the pipeline.
>>> [...]
>>
>> Nice!
>>
>>> The patch [...] disables the pass by default on i386, rs6000
>>> and xtensa.
>>
>> Like here:
>>
>>> --- a/gcc/config/i386/i386-options.cc
>>> +++ b/gcc/config/i386/i386-options.cc
>>> @@ -1942,6 +1942,10 @@ ix86_override_options_after_change (void)
>>> flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
>>>  }
>>>  
>>> +  /* Late combine tends to undo some of the effects of STV and RPAD,
>>> + by combining instructions back to their original form.  */
>>> +  if (!OPTION_SET_P (flag_late_combine_instructions))
>>> +flag_late_combine_instructions = 0;
>>>  }
>>
>> ..., I think also here:
>>
>>> --- a/gcc/config/rs6000/rs6000.cc
>>> +++ b/gcc/config/rs6000/rs6000.cc
>>> @@ -4768,6 +4768,14 @@ rs6000_option_override_internal (bool global_init_p)
>>> targetm.expand_builtin_va_start = NULL;
>>>  }
>>>  
>>> +  /* One of the late-combine passes runs after register allocation
>>> + and can match define_insn_and_splits that were previously used
>>> + only before register allocation.  Some of those define_insn_and_splits
>>> + use gen_reg_rtx unconditionally.  Disable late-combine by default
>>> + until the define_insn_and_splits are fixed.  */
>>> +  if (!OPTION_SET_P (flag_late_combine_instructions))
>>> +flag_late_combine_instructions = 0;
>>> +
>>>rs6000_override_options_after_change ();
>>
>> ..., this needs to be done in 'rs6000_override_options_after_change'
>> instead of 'rs6000_option_override_internal', to address the PRs under
>> discussion.  I'm testing such a patch.
>
> Oops!  Sorry about that, and thanks for tracking it down.

No worries.  ;-) OK to push the attached
"rs6000: Properly default-disable late-combine passes [PR106594, PR115622, 
PR115633]"?


Grüße
 Thomas


>From ccd12107fb06017f878384d2186ed5f01a1dab79 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 25 Jun 2024 10:55:41 +0200
Subject: [PATCH] rs6000: Properly default-disable late-combine passes
 [PR106594, PR115622, PR115633]

..., so that it also works for '__attribute__ ((optimize("[...]")))' etc.

	PR target/106594
	PR target/115622
	PR target/115633
	gcc/
	* config/rs6000/rs6000.cc (rs6000_option_override_internal): Move
	default-disable of late-combine passes from here...
	(rs6000_override_options_after_change): ... to here.
---
 gcc/config/rs6000/rs6000.cc | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index f39b8909925..713fac75f26 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -3431,6 +3431,14 @@ rs6000_override_options_after_change (void)
   /* If we are inserting ROP-protect instructions, disable shrink wrap.  */
   if (rs6000_rop_protect)
 flag_shrink_wrap = 0;
+
+  /* One of the late-combine passes runs after register allocation
+ and can match define_insn_and_splits that were previously used
+ only before register allocation.  Some of those define_insn_and_splits
+ use gen_reg_rtx unconditionally.  Disable late-combine by default
+ until the define_insn_and_splits are fixed.  */
+  if (!OPTION_SET_P (flag_late_combine_instructions))
+flag_late_combine_instructions = 0;
 }
 
 #ifdef TARGET_USES_LINUX64_OPT
@@ -4768,14 +4776,6 @@ rs6000_option_override_internal (bool global_init_p)
 	targetm.expand_builtin_va_start = NULL;
 }
 
-  /* One of the late-combine passes runs after register allocation
- and can match define_insn_and_splits that were previously used
- only before register allocation.  Some of those define_insn_and_splits
- use gen_reg_rtx unconditionally.  Disable late-combine by default
- until the define_insn_and_splits are fixed.  */
-  if (!OPTION_SET_P (flag_late_combine_instructions))
-flag_late_combine_instructions = 0;
-
   rs6000_override_options_after_change ();
 
   /* If not explicitly specified via option, decide whether to generate indexed
-- 
2.34.1



Re: [PATCH 6/6] Add a late-combine pass [PR106594]

2024-06-25 Thread Thomas Schwinge
Hi!

On 2024-06-20T14:34:18+0100, Richard Sandiford  
wrote:
> This patch adds a combine pass that runs late in the pipeline.
> [...]

Nice!

> The patch [...] disables the pass by default on i386, rs6000
> and xtensa.

Like here:

> --- a/gcc/config/i386/i386-options.cc
> +++ b/gcc/config/i386/i386-options.cc
> @@ -1942,6 +1942,10 @@ ix86_override_options_after_change (void)
>   flag_cunroll_grow_size = flag_peel_loops || optimize >= 3;
>  }
>  
> +  /* Late combine tends to undo some of the effects of STV and RPAD,
> + by combining instructions back to their original form.  */
> +  if (!OPTION_SET_P (flag_late_combine_instructions))
> +flag_late_combine_instructions = 0;
>  }

..., I think also here:

> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -4768,6 +4768,14 @@ rs6000_option_override_internal (bool global_init_p)
>   targetm.expand_builtin_va_start = NULL;
>  }
>  
> +  /* One of the late-combine passes runs after register allocation
> + and can match define_insn_and_splits that were previously used
> + only before register allocation.  Some of those define_insn_and_splits
> + use gen_reg_rtx unconditionally.  Disable late-combine by default
> + until the define_insn_and_splits are fixed.  */
> +  if (!OPTION_SET_P (flag_late_combine_instructions))
> +flag_late_combine_instructions = 0;
> +
>rs6000_override_options_after_change ();

..., this needs to be done in 'rs6000_override_options_after_change'
instead of 'rs6000_option_override_internal', to address the PRs under
discussion.  I'm testing such a patch.


Grüße
 Thomas


nvptx, libgfortran: Switch out of "minimal" mode

2024-06-06 Thread Thomas Schwinge
Hi!

On 2023-01-20T22:16:00+0100, Hi wrote:
> On 2023-01-20T22:04:02+0100, I wrote:
>> We've been (t)asked to enable (portions of) GCC/Fortran I/O for nvptx
>> offloading, which means building a normal (non-'LIBGFOR_MINIMAL')
>> configuration of libgfortran.
>
> This is achieved by 'nvptx, libgfortran: Switch out of "minimal" mode',
> see attached, again based on WIP work by Andrew Stubbs.

I've recently slightly revised this, in particular:

> The OpenACC XFAILs: "[...] overflows the stack [...]"

... I now avoid by use of commit 0d25989d60d15866ef4737d66e02432f50717255
"nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment 
variable [PR97384, PR105274]".

The underlying issue remains...

> [...] unresolved at this point; see the discussion around
> "Handling of large stack objects in GPU code generation -- maybe transform 
> into heap allocation?",
> and my "nvptx: '-mframe-malloc-threshold', '-Wframe-malloc-threshold'"
> experimenting.  (The latter works to some extent, but also has other
> issues that I shall detail at some later point in time.)

(No progress.)


Pushed to trunk branch commit 3a4775d4403f2e88b589e88a9937cc1fd45a0e87
'nvptx, libgfortran: Switch out of "minimal" mode', see attached.

This, unsurprisingly, also greatly improves GCC/Fortran test results for
nvptx target.


Grüße
 Thomas


>From 3a4775d4403f2e88b589e88a9937cc1fd45a0e87 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 5 Jun 2024 13:13:24 +0200
Subject: [PATCH] nvptx, libgfortran: Switch out of "minimal" mode

..., in order to enable (portions of) Fortran I/O, for example.

	libgfortran/
	* configure.ac: No longer set 'LIBGFOR_MINIMAL' for nvptx.
	* configure: Regenerate.
	libgomp/
	* libgomp.texi (nvptx): Update.
	* testsuite/libgomp.fortran/target-print-1-nvptx.f90: Remove.
	* testsuite/libgomp.fortran/target-print-1.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/error_stop-2-nvptx.f: New.
	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Adjust.
	* testsuite/libgomp.oacc-fortran/print-1-nvptx.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/print-1.f90: Adjust.
	* testsuite/libgomp.oacc-fortran/stop-2-nvptx.f: New.
	* testsuite/libgomp.oacc-fortran/stop-2.f: Adjust.

Co-authored-by: Andrew Stubbs 
---
 libgfortran/configure | 21 --
 libgfortran/configure.ac  | 17 +++-
 libgomp/libgomp.texi  | 10 +++--
 .../libgomp.fortran/target-print-1-nvptx.f90  | 11 -
 .../libgomp.fortran/target-print-1.f90|  3 --
 .../libgomp.oacc-fortran/error_stop-2-nvptx.f | 39 ++
 .../libgomp.oacc-fortran/error_stop-2.f   |  3 +-
 .../libgomp.oacc-fortran/print-1-nvptx.f90| 40 +++
 .../libgomp.oacc-fortran/print-1.f90  |  4 +-
 .../libgomp.oacc-fortran/stop-2-nvptx.f   | 36 +
 .../testsuite/libgomp.oacc-fortran/stop-2.f   |  3 +-
 11 files changed, 134 insertions(+), 53 deletions(-)
 delete mode 100644 libgomp/testsuite/libgomp.fortran/target-print-1-nvptx.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/error_stop-2-nvptx.f
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/stop-2-nvptx.f

diff --git a/libgfortran/configure b/libgfortran/configure
index 774dd52fc95..11a1bc5f070 100755
--- a/libgfortran/configure
+++ b/libgfortran/configure
@@ -6207,17 +6207,12 @@ else
 fi
 
 
-# For GPU offloading, not everything in libfortran can be supported.
-# Currently, the only target that has this problem is nvptx.  The
-# following is a (partial) list of features that are unsupportable on
-# this particular target:
-# * Constructors
-# * alloca
-# * C library support for I/O, with printf as the one notable exception
-# * C library support for other features such as signal, environment
-#   variables, time functions
-
- if test "x${target_cpu}" = xnvptx; then
+# "Minimal" mode is for targets that cannot (yet) support all features of
+# libgfortran.  It avoids the need for working constructors, alloca, and C
+# library support for I/O, signals, environment variables, time functions, etc.
+# At present there are no targets that require this mode.
+
+ if false; then
   LIBGFOR_MINIMAL_TRUE=
   LIBGFOR_MINIMAL_FALSE='#'
 else
@@ -12852,7 +12847,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12855 "configure"
+#line 12850 "configure"
 #include "confdefs.h"
 
 #if HAVE_DLFCN_H
@@ -12958,7 +12953,7 @@ else
   lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2
   lt_status=$lt_dlunknown
   cat > conftest.$ac_ext <<_LT_EOF
-#line 12961 "configure"
+#line 12956 "configure"

Re: nvptx, libgcc: Stub unwinding implementation

2024-06-06 Thread Thomas Schwinge
Hi!

On 2023-01-20T22:04:02+0100, I wrote:
> We've been (t)asked to enable (portions of) GCC/Fortran I/O for nvptx
> offloading, which means building a normal (non-'LIBGFOR_MINIMAL')
> configuration of libgfortran.  One prerequisite patch, based on WIP work
> by Andrew Stubbs, is: "nvptx, libgcc: Stub unwinding implementation"

Pushed to trunk branch commit a29c5852a606588175d11844db84da0881227100
"nvptx, libgcc: Stub unwinding implementation", see attached.


Grüße
 Thomas


>From a29c5852a606588175d11844db84da0881227100 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 5 Jun 2024 13:11:04 +0200
Subject: [PATCH] nvptx, libgcc: Stub unwinding implementation

Adding stub '_Unwind_Backtrace', '_Unwind_GetIPInfo' functions is necessary
for linking libbacktrace, as a normal (non-'LIBGFOR_MINIMAL') configuration
of libgfortran wants to do, for example.

The file 'libgcc/config/nvptx/unwind-nvptx.c' is copied from
'libgcc/config/gcn/unwind-gcn.c'.

libgcc/ChangeLog:

	* config/nvptx/t-nvptx: Add unwind-nvptx.c.
	* config/nvptx/unwind-nvptx.c: New file.

Co-authored-by: Andrew Stubbs 
---
 libgcc/config/nvptx/t-nvptx|  3 ++-
 libgcc/config/nvptx/unwind-nvptx.c | 37 ++
 2 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 libgcc/config/nvptx/unwind-nvptx.c

diff --git a/libgcc/config/nvptx/t-nvptx b/libgcc/config/nvptx/t-nvptx
index 260ed6334db..1ff574c2982 100644
--- a/libgcc/config/nvptx/t-nvptx
+++ b/libgcc/config/nvptx/t-nvptx
@@ -1,6 +1,7 @@
 LIB2ADD=$(srcdir)/config/nvptx/reduction.c \
 	$(srcdir)/config/nvptx/mgomp.c \
-	$(srcdir)/config/nvptx/atomic.c
+	$(srcdir)/config/nvptx/atomic.c \
+	$(srcdir)/config/nvptx/unwind-nvptx.c
 
 # Until we have libstdc++-v3/libsupc++ proper.
 LIB2ADD += $(srcdir)/c++-minimal/guard.c
diff --git a/libgcc/config/nvptx/unwind-nvptx.c b/libgcc/config/nvptx/unwind-nvptx.c
new file mode 100644
index 000..d08ba266be1
--- /dev/null
+++ b/libgcc/config/nvptx/unwind-nvptx.c
@@ -0,0 +1,37 @@
+/* Stub unwinding implementation.
+
+   Copyright (C) 2019-2024 Free Software Foundation, Inc.
+   Contributed by Mentor Graphics
+
+   This file is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by the
+   Free Software Foundation; either version 3, or (at your option) any
+   later version.
+
+   This file is distributed in the hope that it will be useful, but
+   WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   General Public License for more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include "unwind.h"
+
+_Unwind_Reason_Code
+_Unwind_Backtrace(_Unwind_Trace_Fn trace, void * trace_argument)
+{
+  return 0;
+}
+
+_Unwind_Ptr
+_Unwind_GetIPInfo (struct _Unwind_Context *c, int *ip_before_insn)
+{
+  return 0;
+}
-- 
2.34.1



nvptx offloading: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2' for offloading)

2024-06-06 Thread Thomas Schwinge
Hi!

On 2022-12-23T14:35:16+0100, I wrote:
> On 2022-12-02T14:35:35+0100, I wrote:
>> On 2022-12-01T22:13:38+0100, I wrote:
>>> I'm working on support for global constructors/destructors with
>>> GCC/nvptx
>>
>> See "nvptx: Support global constructors/destructors via 'collect2'"
>> [posted before]

..., which I then recently revised; see
commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53
"nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'".

> Building on that, attached is now the additional "for offloading" piece:
> "nvptx: Support global constructors/destructors via 'collect2' for 
> offloading".

Similarly revised, I've now pushed to trunk branch
commit 5bbe5350a0932c78d4ffce292ba4104a6fe6ef96
"nvptx offloading: Global constructor, destructor support, via nvptx-tools 
'ld'",
see attached.


Grüße
 Thomas


>From 5bbe5350a0932c78d4ffce292ba4104a6fe6ef96 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 5 Jun 2024 12:40:50 +0200
Subject: [PATCH] nvptx offloading: Global constructor, destructor support, via
 nvptx-tools 'ld'

This extends commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53
"nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'"
for offloading.

	libgcc/
	* config/nvptx/gbl-ctors.c ["mgomp"]
	(__do_global_ctors__entry__mgomp)
	(__do_global_dtors__entry__mgomp): New.
	[!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry):
	New.
	libgomp/
	* plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New.
	(nvptx_close_device, GOMP_OFFLOAD_load_image)
	(GOMP_OFFLOAD_unload_image): Call it.
---
 libgcc/config/nvptx/gbl-ctors.c |  55 +++
 libgomp/plugin/plugin-nvptx.c   | 117 +++-
 2 files changed, 171 insertions(+), 1 deletion(-)

diff --git a/libgcc/config/nvptx/gbl-ctors.c b/libgcc/config/nvptx/gbl-ctors.c
index a2ca053e5e3..a56d64f8ef8 100644
--- a/libgcc/config/nvptx/gbl-ctors.c
+++ b/libgcc/config/nvptx/gbl-ctors.c
@@ -68,6 +68,61 @@ __gbl_ctors (void)
 }
 
 
+/* For nvptx offloading configurations, need '.entry' wrappers.  */
+
+# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__)
+
+/* OpenMP */
+
+/* See 'crt0.c', 'mgomp.c'.  */
+extern void *__nvptx_stacks[32] __attribute__((shared,nocommon));
+extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon));
+
+__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *);
+
+void
+__do_global_ctors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __static_do_global_ctors ();
+}
+
+__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *);
+
+void
+__do_global_dtors__entry__mgomp (void *nvptx_stacks_0)
+{
+  __nvptx_stacks[0] = nvptx_stacks_0;
+  __nvptx_uni[0] = 0;
+
+  __static_do_global_dtors ();
+}
+
+# else
+
+/* OpenACC */
+
+__attribute__((kernel)) void __do_global_ctors__entry (void);
+
+void
+__do_global_ctors__entry (void)
+{
+  __static_do_global_ctors ();
+}
+
+__attribute__((kernel)) void __do_global_dtors__entry (void);
+
+void
+__do_global_dtors__entry (void)
+{
+  __static_do_global_dtors ();
+}
+
+# endif
+
+
 /* The following symbol just provides a means for the nvptx-tools 'ld' to
trigger linking in this file.  */
 
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 4cedc5390a3..0f3a3be1898 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -346,6 +346,11 @@ static struct ptx_device **ptx_devices;
default is set here.  */
 static unsigned lowlat_pool_size = 8 * 1024;
 
+static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *,
+const char *);
+static size_t nvptx_stacks_size ();
+static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int);
+
 static inline struct nvptx_thread *
 nvptx_thread (void)
 {
@@ -565,6 +570,18 @@ nvptx_close_device (struct ptx_device *ptx_dev)
   if (!ptx_dev)
 return true;
 
+  bool ret = true;
+
+  for (struct ptx_image_data *image = ptx_dev->images;
+   image != NULL;
+   image = image->next)
+{
+  if (!nvptx_do_global_cdtors (image->module, ptx_dev,
+   "__do_global_dtors__entry"
+   /* or "__do_global_dtors__entry__mgomp" */))
+	ret = false;
+}
+
   for (struct ptx_free_block *b = ptx_dev->free_blocks; b;)
 {
   struct ptx_free_block *b_next = b->next;
@@ -585,7 +602,8 @@ nvptx_close_device (struct ptx_device *ptx_dev)
 CUDA_CALL (cuCtxDestroy, ptx_dev->ctx);
 
   free (ptx_dev);
-  return true;
+
+  return ret;
 }
 
 static int
@@ -1317,6 +1335,93 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev)
 GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda

Re: Clean up after newlib "nvptx: In offloading execution, map '_exit' to 'abort' [GCC PR85463]"

2024-06-06 Thread Thomas Schwinge
Hi!

On 2023-01-20T21:12:05+0100, I wrote:
> Re the newlib commit 05a2d7a8b3277b469e7cb121115bba398adc8559
> "nvptx: In offloading execution, map '_exit' to 'abort' [GCC PR85463]"
> that I've just pushes to newlib main branch:
>
> On 2023-01-19T23:00:05+0100, I wrote:
>> This is still not properly resolving <https://gcc.gnu.org/PR85463>
>> '[nvptx] "exit" in offloaded region doesn't terminate process', but is
>> one step into that direction, and allows for simplifying some GCC code.
>
>> --- a/newlib/libc/machine/nvptx/_exit.c
>> +++ b/newlib/libc/machine/nvptx/_exit.c
>
>> @@ -26,7 +27,15 @@ void __attribute__((noreturn))
>>  _exit (int status)
>>  {
>>if (__exitval_ptr)
>> -*__exitval_ptr = status;
>> -  for (;;)
>> -asm ("exit;" ::: "memory");
>> +{
>> +  *__exitval_ptr = status;
>> +  for (;;)
>> +   asm ("exit;" ::: "memory");
>> +}
>> +  else /* offloading */
>> +{
>> +  /* Map to 'abort'; see <https://gcc.gnu.org/PR85463>
>> +'[nvptx] "exit" in offloaded region doesn't terminate process'.  */
>> +  abort ();
>> +}
>>  }
>
> That has put "the PR85463 stuff" into the one central place, and allows
> for simplifying GCC as per the attached
> 'Clean up after newlib "nvptx: In offloading execution, map '_exit' to 
> 'abort' [GCC PR85463]"',
> which I've just pushed to GCC devel/omp/gcc-12 branch in
> commit 094b379f461bb4b635327cde26eabc0966159fec, and intend to push to
> GCC master branch once the latter depends on updated newlib for other
> (functional) reasons.

Better late than never: I've now pushed to GCC trunk branch
commit 395ac0417a17ba6405873f891f895417d696b603
'Clean up after newlib "nvptx: In offloading execution, map '_exit' to 'abort' 
[GCC PR85463]"',
see attached.


Grüße
 Thomas


>From 395ac0417a17ba6405873f891f895417d696b603 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 5 Jun 2024 14:34:06 +0200
Subject: [PATCH] Clean up after newlib "nvptx: In offloading execution, map
 '_exit' to 'abort' [GCC PR85463]"

	PR target/85463
	libgfortran/
	* runtime/minimal.c [__nvptx__] (exit): Don't override.
	libgomp/
	* config/nvptx/error.c (exit): Don't override.
	* testsuite/libgomp.oacc-fortran/error_stop-1.f: Update.
	* testsuite/libgomp.oacc-fortran/error_stop-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/error_stop-3.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-1.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-2.f: Likewise.
	* testsuite/libgomp.oacc-fortran/stop-3.f: Likewise.
---
 libgfortran/runtime/minimal.c   |  8 
 libgomp/config/nvptx/error.c|  7 ---
 .../testsuite/libgomp.oacc-fortran/error_stop-1.f   |  8 +---
 .../testsuite/libgomp.oacc-fortran/error_stop-2.f   |  8 +---
 .../testsuite/libgomp.oacc-fortran/error_stop-3.f   |  8 +---
 libgomp/testsuite/libgomp.oacc-fortran/stop-1.f | 13 +
 libgomp/testsuite/libgomp.oacc-fortran/stop-2.f |  6 +-
 libgomp/testsuite/libgomp.oacc-fortran/stop-3.f | 12 
 8 files changed, 37 insertions(+), 33 deletions(-)

diff --git a/libgfortran/runtime/minimal.c b/libgfortran/runtime/minimal.c
index f13b3a4bf90..619f818c844 100644
--- a/libgfortran/runtime/minimal.c
+++ b/libgfortran/runtime/minimal.c
@@ -31,14 +31,6 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #endif
 
 
-#if __nvptx__
-/* Map "exit" to "abort"; see PR85463 '[nvptx] "exit" in offloaded region
-   doesn't terminate process'.  */
-# undef exit
-# define exit(status) do { (void) (status); abort (); } while (0)
-#endif
-
-
 #if __nvptx__
 /* 'printf' is all we have.  */
 # undef estr_vprintf
diff --git a/libgomp/config/nvptx/error.c b/libgomp/config/nvptx/error.c
index 7e668276004..f7a2536c29b 100644
--- a/libgomp/config/nvptx/error.c
+++ b/libgomp/config/nvptx/error.c
@@ -58,11 +58,4 @@
 #endif
 
 
-/* The 'exit (EXIT_FAILURE);' of an Fortran (only, huh?) OpenMP 'error'
-   directive with 'severity (fatal)' causes a hang, so 'abort' instead of
-   'exit'.  */
-#undef exit
-#define exit(status) abort ()
-
-
 #include "../../error.c"
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/error_stop-1.f b/libgomp/testsuite/libgomp.oacc-fortran/error_stop-1.f
index de727749a53..3918d6853f6 100644
--- a/libgomp/testsuite/libgomp.oacc-fortran/error_stop-1.f
+++ b/libgomp/testsuite/libgomp

nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution, via 'vote.all.pred' (was: nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution (was: [committed][nvptx] Add un

2024-06-04 Thread Thomas Schwinge
Hi!

On 2022-12-15T19:27:08+0100, I wrote:
> First "a bit" of context; skip to "the proposed patch" if you'd like to
> see just that.

Here, I'm not again providing all the context; see the previous email if
necessary.

> My following discussion is about the implementation of
> 'nvptx_uniform_warp_check', originally introduced as follows:
>
> On 2022-02-01T19:31:27+0100, Tom de Vries via Gcc-patches 
>  wrote:
>> --- a/gcc/config/nvptx/nvptx.md
>> +++ b/gcc/config/nvptx/nvptx.md

>> +(define_insn "nvptx_uniform_warp_check"
>> +  [(unspec_volatile [(const_int 0)] UNSPECV_UNIFORM_WARP_CHECK)]
>> +  ""
>> +  {
>> +output_asm_insn ("{", NULL);
>> +output_asm_insn ("\\t"   ".reg.b32""\\t" "act;", NULL);
>> +output_asm_insn ("\\t"   "vote.ballot.b32" "\\t" "act,1;", NULL);
>> +output_asm_insn ("\\t"   ".reg.pred"   "\\t" "uni;", NULL);
>> +output_asm_insn ("\\t"   "setp.eq.b32" "\\t" "uni,act,0x;",
>> + NULL);
>> +output_asm_insn ("@ !uni\\t" "trap;", NULL);
>> +output_asm_insn ("@ !uni\\t" "exit;", NULL);
>> +output_asm_insn ("}", NULL);
>> +return "";
>> +  }
>> +  [(set_attr "predicable" "false")])
>
> Later adjusted, but the fundamental idea is still the same.

> Now, "the proposed patch".  I'd like to make 'nvptx_uniform_warp_check'
> fit for non-full-warp execution.  For example, to be able to execute such
> code in single-threaded 'cuLaunchKernel' for execution of global
> constructors/destructors, where those may, for example, call into nvptx
> target libraries compiled with '-mgomp' (thus, '-muniform-simt').
>
> OK to push (after proper testing, and with TODO markers adjusted/removed)
> the attached
> "nvptx: Make 'nvptx_uniform_warp_check' fit for non-full-warp execution"?

> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -2282,10 +2282,24 @@
>"{",
>"\\t"".reg.b32""\\t" "%%r_act;",
>"%.\\t"  "vote.ballot.b32" "\\t" "%%r_act,1;",
> +  /* For '%r_exp', we essentially need 'activemask.b32', but that is 
> "Introduced in PTX ISA version 6.2", and this code here is used only 'if 
> (!TARGET_PTX_6_0)'.  Thus, emulate it.
> + TODO Is that actually correct?  Wouldn't 'activemask.b32' rather 
> replace our 'vote.ballot.b32' given that it registers the *currently active 
> threads*?  */
> +  /* Compute the "membermask" of all threads of the warp that are 
> expected to be converged here.
> +  For OpenACC, '%ntid.x' is 'vector_length', which per 
> 'nvptx_goacc_validate_dims' always is a multiple of 32.
> +  For OpenMP, '%ntid.x' always is 32.
> +  Thus, this is typically 0x, but additionally always 
> for the case that not all 32 threads of the warp have been launched.
> +  This assume that lane IDs are assigned in ascending order.  */
> +  //TODO Can we rely on '1 << 32 == 0', and '0 - 1 = 0x'?
> +  //TODO 
> https://developer.nvidia.com/blog/using-cuda-warp-level-primitives/
> +  //TODO 
> https://stackoverflow.com/questions/54055195/activemask-vs-ballot-sync
> +  "\\t"".reg.b32""\\t" "%%r_exp;",
> +  "%.\\t"  "mov.b32" "\\t" "%%r_exp, %%ntid.x;",
> +  "%.\\t"  "shl.b32" "\\t" "%%r_exp, 1, 
> %%r_exp;",
> +  "%.\\t"  "sub.u32" "\\t" "%%r_exp, %%r_exp, 
> 1;",
>"\\t"".reg.pred"   "\\t" "%%r_do_abort;",
>"\\t""mov.pred""\\t" "%%r_do_abort,0;",
>"%.\\t"  "setp.ne.b32" "\\t" 
> "%%r_do_abort,%%r_act,"
> -   "0x;",
> +   "%%r_exp;",

[PATCH 4/4] Add 'c-c++-common/initpri1{, -lto, -split}-static.c' as internal linkage variants

2024-06-04 Thread Thomas Schwinge
gcc/testsuite/
* c-c++-common/initpri1_part_c1.c: Consider 'CDTOR_LINKAGE'.
* c-c++-common/initpri1_part_c2.c: Likewise.
* c-c++-common/initpri1_part_c3.c: Likewise.
* c-c++-common/initpri1_part_cd4.c: Likewise.
* c-c++-common/initpri1_part_d1.c: Likewise.
* c-c++-common/initpri1_part_d2.c: Likewise.
* c-c++-common/initpri1_part_d3.c: Likewise.
* c-c++-common/initpri1.c: Specify it.
* c-c++-common/initpri1-lto.c: Likewise.
* c-c++-common/initpri1-split.c: Likewise.
* c-c++-common/initpri1-static.c: New.
* c-c++-common/initpri1-lto-static.c: Likewise.
* c-c++-common/initpri1-split-static.c: Likewise.
---
 .../c-c++-common/{initpri1-lto.c => initpri1-lto-static.c} | 1 +
 gcc/testsuite/c-c++-common/initpri1-lto.c  | 1 +
 .../c-c++-common/{initpri1-split.c => initpri1-split-static.c} | 1 +
 gcc/testsuite/c-c++-common/initpri1-split.c| 1 +
 .../c-c++-common/{initpri1-lto.c => initpri1-static.c} | 3 +--
 gcc/testsuite/c-c++-common/initpri1.c  | 1 +
 gcc/testsuite/c-c++-common/initpri1_part_c1.c  | 2 ++
 gcc/testsuite/c-c++-common/initpri1_part_c2.c  | 2 ++
 gcc/testsuite/c-c++-common/initpri1_part_c3.c  | 2 ++
 gcc/testsuite/c-c++-common/initpri1_part_cd4.c | 2 ++
 gcc/testsuite/c-c++-common/initpri1_part_d1.c  | 2 ++
 gcc/testsuite/c-c++-common/initpri1_part_d2.c  | 2 ++
 gcc/testsuite/c-c++-common/initpri1_part_d3.c  | 2 ++
 13 files changed, 20 insertions(+), 2 deletions(-)
 copy gcc/testsuite/c-c++-common/{initpri1-lto.c => initpri1-lto-static.c} (81%)
 copy gcc/testsuite/c-c++-common/{initpri1-split.c => initpri1-split-static.c} 
(86%)
 copy gcc/testsuite/c-c++-common/{initpri1-lto.c => initpri1-static.c} (70%)

diff --git a/gcc/testsuite/c-c++-common/initpri1-lto.c 
b/gcc/testsuite/c-c++-common/initpri1-lto-static.c
similarity index 81%
copy from gcc/testsuite/c-c++-common/initpri1-lto.c
copy to gcc/testsuite/c-c++-common/initpri1-lto-static.c
index 433ef356c7e..6393f7ec99b 100644
--- a/gcc/testsuite/c-c++-common/initpri1-lto.c
+++ b/gcc/testsuite/c-c++-common/initpri1-lto-static.c
@@ -2,5 +2,6 @@
 /* { dg-require-effective-target lto } */
 /* { dg-options "-flto -O3" } */
 /* Via the magic string "-std=*++" indicate that testing one (the default) C++ 
standard is sufficient.  */
+/* { dg-additional-options -DCDTOR_LINKAGE=static } */
 
 #include "initpri1.c"
diff --git a/gcc/testsuite/c-c++-common/initpri1-lto.c 
b/gcc/testsuite/c-c++-common/initpri1-lto.c
index 433ef356c7e..7fb4bf1aa82 100644
--- a/gcc/testsuite/c-c++-common/initpri1-lto.c
+++ b/gcc/testsuite/c-c++-common/initpri1-lto.c
@@ -2,5 +2,6 @@
 /* { dg-require-effective-target lto } */
 /* { dg-options "-flto -O3" } */
 /* Via the magic string "-std=*++" indicate that testing one (the default) C++ 
standard is sufficient.  */
+/* { dg-additional-options -DCDTOR_LINKAGE= } */
 
 #include "initpri1.c"
diff --git a/gcc/testsuite/c-c++-common/initpri1-split.c 
b/gcc/testsuite/c-c++-common/initpri1-split-static.c
similarity index 86%
copy from gcc/testsuite/c-c++-common/initpri1-split.c
copy to gcc/testsuite/c-c++-common/initpri1-split-static.c
index 11755ee9f6a..02d8b162e19 100644
--- a/gcc/testsuite/c-c++-common/initpri1-split.c
+++ b/gcc/testsuite/c-c++-common/initpri1-split-static.c
@@ -1,3 +1,4 @@
 /* { dg-do run { target init_priority } } */
 /* Via the magic string "-std=*++" indicate that testing one (the default) C++ 
standard is sufficient.  */
 /* { dg-additional-sources {initpri1_part_c1.c initpri1_part_c2.c 
initpri1_part_c3.c initpri1_part_d1.c initpri1_part_d2.c initpri1_part_d3.c 
initpri1_part_cd4.c initpri1_part_main.c} } */
+/* { dg-additional-options -DCDTOR_LINKAGE=static } */
diff --git a/gcc/testsuite/c-c++-common/initpri1-split.c 
b/gcc/testsuite/c-c++-common/initpri1-split.c
index 11755ee9f6a..f1482c7e0c1 100644
--- a/gcc/testsuite/c-c++-common/initpri1-split.c
+++ b/gcc/testsuite/c-c++-common/initpri1-split.c
@@ -1,3 +1,4 @@
 /* { dg-do run { target init_priority } } */
 /* Via the magic string "-std=*++" indicate that testing one (the default) C++ 
standard is sufficient.  */
 /* { dg-additional-sources {initpri1_part_c1.c initpri1_part_c2.c 
initpri1_part_c3.c initpri1_part_d1.c initpri1_part_d2.c initpri1_part_d3.c 
initpri1_part_cd4.c initpri1_part_main.c} } */
+/* { dg-additional-options -DCDTOR_LINKAGE= } */
diff --git a/gcc/testsuite/c-c++-common/initpri1-lto.c 
b/gcc/testsuite/c-c++-common/initpri1-static.c
similarity index 70%
copy from gcc/testsuite/c-c++-common/initpri1-lto.c
copy to gcc/testsuite/c-c++-common/initpri1-static.c
index 433ef356c7e..ac101ff63cb 100644
--- a/gcc/testsuite/c-c++-common/initpri1-lto.c
+++ b/gcc/testsuite/c-c++-common/initpri1-static.c
@@ -1,6 +1,5 @@
 /* { dg-do run { target init_priority } }

[PATCH 3/4] Add 'c-c++-common/initpri1-split.c': 'c-c++-common/initpri1.c' split into separate translation units

2024-06-04 Thread Thomas Schwinge
gcc/testsuite/
* c-c++-common/initpri1.c: Split into...
* c-c++-common/initpri1_part_c1.c: ... this, and...
* c-c++-common/initpri1_part_c2.c: ... this, and...
* c-c++-common/initpri1_part_c3.c: ... this, and...
* c-c++-common/initpri1_part_cd4.c: ... this, and...
* c-c++-common/initpri1_part_d1.c: ... this, and...
* c-c++-common/initpri1_part_d2.c: ... this, and...
* c-c++-common/initpri1_part_d3.c: ... this, and...
* c-c++-common/initpri1_part_main.c: ... this part.
* c-c++-common/initpri1-split.c: New.
---
 .../{initpri1.c => initpri1-split.c}  | 60 +--
 gcc/testsuite/c-c++-common/initpri1.c | 73 ---
 .../{initpri1.c => initpri1_part_c1.c}| 54 +-
 .../{initpri1.c => initpri1_part_c2.c}| 54 +-
 .../{initpri1.c => initpri1_part_c3.c}| 54 +-
 .../{initpri1.c => initpri1_part_cd4.c}   | 54 +-
 .../{initpri1.c => initpri1_part_d1.c}| 54 +-
 .../{initpri1.c => initpri1_part_d2.c}| 54 +-
 .../{initpri1.c => initpri1_part_d3.c}| 53 +-
 .../{initpri1.c => initpri1_part_main.c}  | 50 +
 10 files changed, 33 insertions(+), 527 deletions(-)
 copy gcc/testsuite/c-c++-common/{initpri1.c => initpri1-split.c} (14%)
 copy gcc/testsuite/c-c++-common/{initpri1.c => initpri1_part_c1.c} (20%)
 copy gcc/testsuite/c-c++-common/{initpri1.c => initpri1_part_c2.c} (20%)
 copy gcc/testsuite/c-c++-common/{initpri1.c => initpri1_part_c3.c} (20%)
 copy gcc/testsuite/c-c++-common/{initpri1.c => initpri1_part_cd4.c} (22%)
 copy gcc/testsuite/c-c++-common/{initpri1.c => initpri1_part_d1.c} (20%)
 copy gcc/testsuite/c-c++-common/{initpri1.c => initpri1_part_d2.c} (20%)
 copy gcc/testsuite/c-c++-common/{initpri1.c => initpri1_part_d3.c} (23%)
 copy gcc/testsuite/c-c++-common/{initpri1.c => initpri1_part_main.c} (21%)

diff --git a/gcc/testsuite/c-c++-common/initpri1.c 
b/gcc/testsuite/c-c++-common/initpri1-split.c
similarity index 14%
copy from gcc/testsuite/c-c++-common/initpri1.c
copy to gcc/testsuite/c-c++-common/initpri1-split.c
index 387f2a39658..11755ee9f6a 100644
--- a/gcc/testsuite/c-c++-common/initpri1.c
+++ b/gcc/testsuite/c-c++-common/initpri1-split.c
@@ -1,61 +1,3 @@
 /* { dg-do run { target init_priority } } */
 /* Via the magic string "-std=*++" indicate that testing one (the default) C++ 
standard is sufficient.  */
-
-int i;
-int j;
-
-void c1() __attribute__((constructor (500)));
-void c2() __attribute__((constructor (700)));
-void c3() __attribute__((constructor (600)));
-
-void c1() {
-  if (i++ != 0)
-__builtin_abort ();
-}
-
-void c2() {
-  if (i++ != 2)
-__builtin_abort ();
-}
-
-void c3() {
-  if (i++ != 1)
-__builtin_abort ();
-}
-
-void d1() __attribute__((destructor (500)));
-void d2() __attribute__((destructor (700)));
-void d3() __attribute__((destructor (600)));
-
-void d1() {
-  if (--i != 0)
-__builtin_abort ();
-}
-
-void d2() {
-  if (--i != 2)
-__builtin_abort ();
-}
-
-void d3() {
-  if (j != 2)
-__builtin_abort ();
-  if (--i != 1)
-__builtin_abort ();
-}
-
-void cd4() __attribute__((constructor (800), destructor (800)));
-
-void cd4() {
-  if (i != 3)
-__builtin_abort ();
-  ++j;
-}
-
-int main () {
-  if (i != 3)
-return 1;
-  if (j != 1)
-__builtin_abort ();
-  return 0;
-}
+/* { dg-additional-sources {initpri1_part_c1.c initpri1_part_c2.c 
initpri1_part_c3.c initpri1_part_d1.c initpri1_part_d2.c initpri1_part_d3.c 
initpri1_part_cd4.c initpri1_part_main.c} } */
diff --git a/gcc/testsuite/c-c++-common/initpri1.c 
b/gcc/testsuite/c-c++-common/initpri1.c
index 387f2a39658..f50137a489b 100644
--- a/gcc/testsuite/c-c++-common/initpri1.c
+++ b/gcc/testsuite/c-c++-common/initpri1.c
@@ -1,61 +1,18 @@
 /* { dg-do run { target init_priority } } */
 /* Via the magic string "-std=*++" indicate that testing one (the default) C++ 
standard is sufficient.  */
 
-int i;
-int j;
-
-void c1() __attribute__((constructor (500)));
-void c2() __attribute__((constructor (700)));
-void c3() __attribute__((constructor (600)));
-
-void c1() {
-  if (i++ != 0)
-__builtin_abort ();
-}
-
-void c2() {
-  if (i++ != 2)
-__builtin_abort ();
-}
-
-void c3() {
-  if (i++ != 1)
-__builtin_abort ();
-}
-
-void d1() __attribute__((destructor (500)));
-void d2() __attribute__((destructor (700)));
-void d3() __attribute__((destructor (600)));
-
-void d1() {
-  if (--i != 0)
-__builtin_abort ();
-}
-
-void d2() {
-  if (--i != 2)
-__builtin_abort ();
-}
-
-void d3() {
-  if (j != 2)
-__builtin_abort ();
-  if (--i != 1)
-__builtin_abort ();
-}
-
-void cd4() __attribute__((constructor (800), destructor (800)));
-
-void cd4() {
-  if (i != 3)
-__builtin_abort ();
-  ++j;
-}
-
-int main () {
-  if (i != 3)
-return 1;
-  if (j != 1)
-__builtin_abort ();
-  return 0;
-}
+#include "init

[PATCH 2/4] Add C++ testing for 'gcc.dg/initpri1-lto.c': 'c-c++-common/initpri1-lto.c'

2024-06-04 Thread Thomas Schwinge
Similar to TODO
"Consolidate similar C/C++ test cases for 'constructor', 'destructor' function 
attributes with priority".

gcc/testsuite/
* gcc.dg/initpri1-lto.c: Integrate this...
* c-c++-common/initpri1-lto.c: ... here.
---
 gcc/testsuite/{gcc.dg => c-c++-common}/initpri1-lto.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
 rename gcc/testsuite/{gcc.dg => c-c++-common}/initpri1-lto.c (48%)

diff --git a/gcc/testsuite/gcc.dg/initpri1-lto.c 
b/gcc/testsuite/c-c++-common/initpri1-lto.c
similarity index 48%
rename from gcc/testsuite/gcc.dg/initpri1-lto.c
rename to gcc/testsuite/c-c++-common/initpri1-lto.c
index 0c97cf4b1c9..433ef356c7e 100644
--- a/gcc/testsuite/gcc.dg/initpri1-lto.c
+++ b/gcc/testsuite/c-c++-common/initpri1-lto.c
@@ -1,5 +1,6 @@
 /* { dg-do run { target init_priority } } */
 /* { dg-require-effective-target lto } */
 /* { dg-options "-flto -O3" } */
+/* Via the magic string "-std=*++" indicate that testing one (the default) C++ 
standard is sufficient.  */
 
-#include "../c-c++-common/initpri1.c"
+#include "initpri1.c"
-- 
2.34.1



[PATCH 1/4] Consolidate similar C/C++ test cases for 'constructor', 'destructor' function attributes with priority

2024-06-04 Thread Thomas Schwinge
gcc/testsuite/
* gcc.dg/initpri1.c: Integrate this...
* g++.dg/special/initpri1.C: ..., and this...
* c-c++-common/initpri1.c: ... here.
* gcc.dg/initpri1-lto.c: Adjust.
* gcc.dg/initpri2.c: Integrate this...
* g++.dg/special/initpri2.C: ..., and this...
* c-c++-common/initpri2.c: ... here.
---
 .../{gcc.dg => c-c++-common}/initpri1.c   | 21 +++
 .../{gcc.dg => c-c++-common}/initpri2.c   |  1 +
 gcc/testsuite/g++.dg/special/initpri1.C   | 62 ---
 gcc/testsuite/g++.dg/special/initpri2.C   | 39 
 gcc/testsuite/gcc.dg/initpri1-lto.c   |  2 +-
 5 files changed, 12 insertions(+), 113 deletions(-)
 rename gcc/testsuite/{gcc.dg => c-c++-common}/initpri1.c (68%)
 rename gcc/testsuite/{gcc.dg => c-c++-common}/initpri2.c (92%)
 delete mode 100644 gcc/testsuite/g++.dg/special/initpri1.C
 delete mode 100644 gcc/testsuite/g++.dg/special/initpri2.C

diff --git a/gcc/testsuite/gcc.dg/initpri1.c 
b/gcc/testsuite/c-c++-common/initpri1.c
similarity index 68%
rename from gcc/testsuite/gcc.dg/initpri1.c
rename to gcc/testsuite/c-c++-common/initpri1.c
index b6afd7690de..387f2a39658 100644
--- a/gcc/testsuite/gcc.dg/initpri1.c
+++ b/gcc/testsuite/c-c++-common/initpri1.c
@@ -1,6 +1,5 @@
 /* { dg-do run { target init_priority } } */
-
-extern void abort (void);
+/* Via the magic string "-std=*++" indicate that testing one (the default) C++ 
standard is sufficient.  */
 
 int i;
 int j;
@@ -11,17 +10,17 @@ void c3() __attribute__((constructor (600)));
 
 void c1() {
   if (i++ != 0)
-abort ();
+__builtin_abort ();
 }
 
 void c2() {
   if (i++ != 2)
-abort ();
+__builtin_abort ();
 }
 
 void c3() {
   if (i++ != 1)
-abort ();
+__builtin_abort ();
 }
 
 void d1() __attribute__((destructor (500)));
@@ -30,26 +29,26 @@ void d3() __attribute__((destructor (600)));
 
 void d1() {
   if (--i != 0)
-abort ();
+__builtin_abort ();
 }
 
 void d2() {
   if (--i != 2)
-abort ();
+__builtin_abort ();
 }
 
 void d3() {
   if (j != 2)
-abort ();
+__builtin_abort ();
   if (--i != 1)
-abort ();
+__builtin_abort ();
 }
 
 void cd4() __attribute__((constructor (800), destructor (800)));
 
 void cd4() {
   if (i != 3)
-abort ();
+__builtin_abort ();
   ++j;
 }
 
@@ -57,6 +56,6 @@ int main () {
   if (i != 3)
 return 1;
   if (j != 1)
-abort ();
+__builtin_abort ();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.dg/initpri2.c 
b/gcc/testsuite/c-c++-common/initpri2.c
similarity index 92%
rename from gcc/testsuite/gcc.dg/initpri2.c
rename to gcc/testsuite/c-c++-common/initpri2.c
index fa9fda0d7f3..bda2a626c64 100644
--- a/gcc/testsuite/gcc.dg/initpri2.c
+++ b/gcc/testsuite/c-c++-common/initpri2.c
@@ -1,4 +1,5 @@
 /* { dg-do compile { target init_priority } } */
+/* Via the magic string "-std=*++" indicate that testing one (the default) C++ 
standard is sufficient.  */
 
 /* Priorities must be in the range [0, 65535].  */
 void c1()
diff --git a/gcc/testsuite/g++.dg/special/initpri1.C 
b/gcc/testsuite/g++.dg/special/initpri1.C
deleted file mode 100644
index bd24961e46b..000
--- a/gcc/testsuite/g++.dg/special/initpri1.C
+++ /dev/null
@@ -1,62 +0,0 @@
-/* { dg-do run { target init_priority } } */
-
-extern "C" void abort ();
-
-int i;
-int j;
-
-void c1() __attribute__((constructor (500)));
-void c2() __attribute__((constructor (700)));
-void c3() __attribute__((constructor (600)));
-
-void c1() {
-  if (i++ != 0)
-abort ();
-}
-
-void c2() {
-  if (i++ != 2)
-abort ();
-}
-
-void c3() {
-  if (i++ != 1)
-abort ();
-}
-
-void d1() __attribute__((destructor (500)));
-void d2() __attribute__((destructor (700)));
-void d3() __attribute__((destructor (600)));
-
-void d1() {
-  if (--i != 0)
-abort ();
-}
-
-void d2() {
-  if (--i != 2)
-abort ();
-}
-
-void d3() {
-  if (j != 2)
-abort ();
-  if (--i != 1)
-abort ();
-}
-
-void cd4() __attribute__((constructor (800), destructor (800)));
-
-void cd4() {
-  if (i != 3)
-abort ();
-  ++j;
-}
-
-int main () {
-  if (i != 3)
-return 1;
-  if (j != 1)
-abort ();
-  return 0;
-}
diff --git a/gcc/testsuite/g++.dg/special/initpri2.C 
b/gcc/testsuite/g++.dg/special/initpri2.C
deleted file mode 100644
index fa9fda0d7f3..000
--- a/gcc/testsuite/g++.dg/special/initpri2.C
+++ /dev/null
@@ -1,39 +0,0 @@
-/* { dg-do compile { target init_priority } } */
-
-/* Priorities must be in the range [0, 65535].  */
-void c1()
- __attribute__((constructor (-1))); /* { dg-error "priorities" } */
-void c2() 
- __attribute__((constructor (65536))); /* { dg-error "priorities" } */
-void d1() 
- __attribute__((destructor (-1))); /* { dg-error "priorities" } */
-void d2() 
- __attribute__((destructor (65536))); /* { dg-error "priorities" } */
-
-/* Priorities 0-100 are reserved for system libraries.  */
-void c3() 
- __attribute__((constructor (50))); /* { dg-warning "reser

More variants of C/C++ test cases for 'constructor', 'destructor' function attributes with priority

2024-06-04 Thread Thomas Schwinge
Hi!

For my recent work on
"nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'",
I needed more variants of C/C++ test cases for 'constructor',
'destructor' function attributes with priority: in particular, split into
separate translation units, in combination with internal linkage
variants.  Out of that fell the following four patches.  OK to push?

This depends on

"Clarify that 'gcc.dg/initpri3.c' is a LTO variant of 'gcc.dg/initpri1.c': 
'gcc.dg/initpri1-lto.c' [PR46083]".


Grüße
 Thomas



Clarify that 'gcc.dg/initpri3.c' is a LTO variant of 'gcc.dg/initpri1.c': 'gcc.dg/initpri1-lto.c' [PR46083] (was: PR lto/46083 (destructor priorities are wrong))

2024-06-04 Thread Thomas Schwinge
Hi!

On 2011-01-10T13:56:06+0100, Richard Guenther  wrote:
> On Sun, 9 Jan 2011, Jan Hubicka wrote:
>> On 2011-01-09T07:24:57-0800, "H.J. Lu"  wrote:
>> > On Sat, Jan 8, 2011 at 5:01 PM, Jan Hubicka  wrote:
>> > > the PR is about testsuite/initpri1.c failing with lto.
>> > >
>> > > I am not sure why the testcase is not run with -flto flags. It is 
>> > > declared as
>> > > /* { dg-do run { target init_priority } } */ and thus I would expect all
>> > > default flags
>> > > to be cycled over.
>> > 
>> > It is because it isn't in lto nor torture directories.

>> > > The problem is simple - FINI_PRIORITY is not streamed at all.  [...]
>> > 
>> > Can you add a testcase?
>>
>> Copying initpri1.c into lto directory should do the trick then, right?
>> I will give it a try.
>
> Ok with a testcase.

No need for "Copying initpri1.c" if there's '#include "initpri1.c"'.  ;-P
(In preparation for further changes) OK to push the attached
"Clarify that 'gcc.dg/initpri3.c' is a LTO variant of 'gcc.dg/initpri1.c': 
'gcc.dg/initpri1-lto.c' [PR46083]"?


Grüße
 Thomas


>From 102c530d32b06e98b3536841b760fc16e9fac7eb Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 24 Apr 2024 10:11:02 +0200
Subject: [PATCH] Clarify that 'gcc.dg/initpri3.c' is a LTO variant of
 'gcc.dg/initpri1.c': 'gcc.dg/initpri1-lto.c' [PR46083]

Added in commit 06c9eb5136fe0e778cc3a643131eba2a3dfb77a8 (Subversion r168642)
"re PR lto/46083 (gcc.dg/initpri1.c FAILs with -flto/-fwhopr (attribute constructor/destructor doesn't work))".

	PR lto/46083
	gcc/testsuite/
	* gcc.dg/initpri3.c: Remove.
	* gcc.dg/initpri1-lto.c: New.
---
 .../gcc.dg/{initpri3.c => initpri1-lto.c} | 61 +--
 1 file changed, 1 insertion(+), 60 deletions(-)
 rename gcc/testsuite/gcc.dg/{initpri3.c => initpri1-lto.c} (12%)

diff --git a/gcc/testsuite/gcc.dg/initpri3.c b/gcc/testsuite/gcc.dg/initpri1-lto.c
similarity index 12%
rename from gcc/testsuite/gcc.dg/initpri3.c
rename to gcc/testsuite/gcc.dg/initpri1-lto.c
index 1633da0141f..98a43c3ff0d 100644
--- a/gcc/testsuite/gcc.dg/initpri3.c
+++ b/gcc/testsuite/gcc.dg/initpri1-lto.c
@@ -2,63 +2,4 @@
 /* { dg-require-effective-target lto } */
 /* { dg-options "-flto -O3" } */
 
-extern void abort ();
-
-int i;
-int j;
-
-void c1() __attribute__((constructor (500)));
-void c2() __attribute__((constructor (700)));
-void c3() __attribute__((constructor (600)));
-
-void c1() {
-  if (i++ != 0)
-abort ();
-}
-
-void c2() {
-  if (i++ != 2)
-abort ();
-}
-
-void c3() {
-  if (i++ != 1)
-abort ();
-}
-
-void d1() __attribute__((destructor (500)));
-void d2() __attribute__((destructor (700)));
-void d3() __attribute__((destructor (600)));
-
-void d1() {
-  if (--i != 0)
-abort ();
-}
-
-void d2() {
-  if (--i != 2)
-abort ();
-}
-
-void d3() {
-  if (j != 2)
-abort ();
-  if (--i != 1)
-abort ();
-}
-
-void cd4() __attribute__((constructor (800), destructor (800)));
-
-void cd4() {
-  if (i != 3)
-abort ();
-  ++j;
-}
-
-int main () {
-  if (i != 3)
-return 1;
-  if (j != 1)
-abort ();
-  return 0;
-}
+#include "initpri1.c"
-- 
2.34.1



nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment variable [PR97384, PR105274]

2024-06-04 Thread Thomas Schwinge
Hi!

Any comments before I push to trunk branch the attached
"nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment 
variable [PR97384, PR105274]"?

While this happens to implement some baseline work for the PRs indicated,
my original need for this is in upcoming libgomp Fortran test cases
(where I can't easily call 'cuCtxSetLimit(CU_LIMIT_STACK_SIZE, [bytes])'
in the test cases themselves).


Grüße
 Thomas


>From d32f1a6a73b767ab5cf2da502fc88975612b80f2 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 31 May 2024 17:04:39 +0200
Subject: [PATCH] nvptx offloading: 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'
 environment variable [PR97384, PR105274]

... as a means to manually set the "native" GPU thread stack size.

	PR libgomp/97384
	PR libgomp/105274
	libgomp/
	* plugin/cuda-lib.def (cuCtxSetLimit): Add.
	* plugin/plugin-nvptx.c (nvptx_open_device): Handle
	'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE' environment variable.
---
 libgomp/plugin/cuda-lib.def   |  1 +
 libgomp/plugin/plugin-nvptx.c | 45 +++
 2 files changed, 46 insertions(+)

diff --git a/libgomp/plugin/cuda-lib.def b/libgomp/plugin/cuda-lib.def
index 007c6e0f4df..9255c1cff68 100644
--- a/libgomp/plugin/cuda-lib.def
+++ b/libgomp/plugin/cuda-lib.def
@@ -4,6 +4,7 @@ CUDA_ONE_CALL (cuCtxGetCurrent)
 CUDA_ONE_CALL (cuCtxGetDevice)
 CUDA_ONE_CALL (cuCtxPopCurrent)
 CUDA_ONE_CALL (cuCtxPushCurrent)
+CUDA_ONE_CALL (cuCtxSetLimit)
 CUDA_ONE_CALL (cuCtxSynchronize)
 CUDA_ONE_CALL (cuDeviceGet)
 CUDA_ONE_CALL (cuDeviceGetAttribute)
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index a4a050521b4..e722ee2b400 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -150,6 +150,8 @@ init_cuda_lib (void)
 
 #include "secure_getenv.h"
 
+static void notify_var (const char *, const char *);
+
 #undef MIN
 #undef MAX
 #define MIN(X,Y) ((X) < (Y) ? (X) : (Y))
@@ -341,6 +343,9 @@ struct ptx_device
 
 static struct ptx_device **ptx_devices;
 
+/* "Native" GPU thread stack size.  */
+static unsigned native_gpu_thread_stack_size = 0;
+
 /* OpenMP kernels reserve a small amount of ".shared" space for use by
omp_alloc.  The size is configured using GOMP_NVPTX_LOWLAT_POOL, but the
default is set here.  */
@@ -550,6 +555,46 @@ nvptx_open_device (int n)
   ptx_dev->free_blocks = NULL;
   pthread_mutex_init (&ptx_dev->free_blocks_lock, NULL);
 
+  /* "Native" GPU thread stack size.  */
+  {
+/* This is intentionally undocumented, until we work out a proper, common
+   scheme (as much as makes sense) between all offload plugins as well
+   as between nvptx offloading use of "native" stacks for OpenACC vs.
+   OpenMP "soft stacks" vs. OpenMP '-msoft-stack-reserve-local=[...]'.
+
+   GCN offloading has a 'GCN_STACK_SIZE' environment variable (without
+   'GOMP_' prefix): documented; presumably used for all things OpenACC and
+   OpenMP?  Based on GCN command-line option '-mstack-size=[...]' (marked
+   "obsolete"), that one may be set via a GCN 'mkoffload'-synthesized
+   'constructor' function.  */
+const char *var_name = "GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE";
+const char *env_var = secure_getenv (var_name);
+notify_var (var_name, env_var);
+
+if (env_var != NULL)
+  {
+	char *endptr;
+	unsigned long val = strtoul (env_var, &endptr, 10);
+	if (endptr == NULL || *endptr != '\0'
+	|| errno == ERANGE || errno == EINVAL
+	|| val > UINT_MAX)
+	  GOMP_PLUGIN_error ("Error parsing %s", var_name);
+	else
+	  native_gpu_thread_stack_size = val;
+  }
+  }
+  if (native_gpu_thread_stack_size == 0)
+; /* Zero means use default.  */
+  else
+{
+  GOMP_PLUGIN_debug (0, "Setting \"native\" GPU thread stack size"
+			 " ('CU_LIMIT_STACK_SIZE') to %u bytes\n",
+			 native_gpu_thread_stack_size);
+  CUDA_CALL (cuCtxSetLimit,
+		 CU_LIMIT_STACK_SIZE, (size_t) native_gpu_thread_stack_size);
+}
+
+  /* OpenMP "soft stacks".  */
   ptx_dev->omp_stacks.ptr = 0;
   ptx_dev->omp_stacks.size = 0;
   pthread_mutex_init (&ptx_dev->omp_stacks.lock, NULL);
-- 
2.34.1



Re: [patch] [gcn][nvptx] Add warning to mkoffload for 32bit host code

2024-06-03 Thread Thomas Schwinge
Hi!

On 2024-04-25T16:07:53+0100, Andrew Stubbs  wrote:
> On 25/04/2024 11:51, Tobias Burnus wrote:
>> Motivated by a surprise of a colleague that with -m32,
>> no offload dumps were created; that's because mkoffload
>> does not process host binaries when the are 32bit (i.e. ilp32).
>> 
>> Internally, that done as follows: The host compiler passes to
>> 'mkoffload' the used host ABI, i.e. -foffload-abi=ilp32 or -foffload-abi=lp64
>> 
>> That's done via TARGET_OFFLOAD_OPTIONS, which is supported by aarch64, i386, 
>> and rs6000.
>> 
>> While it is sensible (albeit not strictly required) that GCC requires that
>> the host and device side agree and that only 64bit is implemented for the
>> device side, it can be confusing that silently no offloading code is 
>> generated.
>> 
>> 
>> Hence, I propose to print a warning in that case - as implemented in the 
>> attached patch:
>> 
>> $ gcc -fopenmp -m32 test.c
>> nvptx mkoffload: warning: offload code generation skipped: offloading with 
>> 32-bit host code is currently not supported
>> gcn mkoffload: warning: offload code generation skipped: offloading with 
>> 32-bit host code is currently not supported
>> 
>> * * *
>> 
>> This shouldn't have any effect on offload builds using -m64
>> and non-offload builds – while several testcases already have
>> issues with '-m32' when offloading is enabled or an offloading
>> device is available.
>> 
>> To make it not worse, this patch adds some pruning and for
>> a subset of the failing testcases, I added code to avoids FAILS.
>> There are some more fails, but those aren't new.
>> 
>> Comments, remarks, suggestions?

For "continuity", please reference "PR libgomp/65099" in the Git commit
log.

>> Is the mkoffload.cc part is okay?
>
> The mkoffload part looks reasonable to me. I'm not sure if there are 
> other ABIs we might want to warn about

Right, let's please generalize this -- warn for all cases that we don't
support.  That is, instead of:

if (offload_abi == OFFLOAD_ABI_ILP32)
  [warning]
else if (offload_abi == OFFLOAD_ABI_LP64)
  {
[...]

..., use:

if (offload_abi != OFFLOAD_ABI_LP64)
  [warning]
else
  {
[...]

For the 'warning' diagnostic, you may use 'const char *abi' (as currently
present in the GCN 'mkoffload'; similarly adapt into the nvptx one).
Maybe:

warning (0, "offload code generation skipped: currently not supported for 
%qs", abi);

Similarly then adjust 'libgomp-dg-prune', and in the test cases, use
'target lp64' (untested) instead of 'target { ! ia32 }', to match what
we're doing in the 'mkoffload's, and also correspondingly adjust the
"Skip for ia32 [...]" comments.

Instead of:

- { dg-note {requires-7-aux\.c' has 'unified_address'} {} { xfail *-*-* 
} 0 }
+ { dg-note {requires-7-aux\.c' has 'unified_address'} {} { xfail { 
*-*-* } } 0 }

..., I think you're looking for (untested):

- { dg-note {requires-7-aux\.c' has 'unified_address'} {} { xfail *-*-* 
} 0 }
+ { dg-note {requires-7-aux\.c' has 'unified_address'} {} { target lp64 
xfail *-*-* } 0 }

(I've not generally reviewed necessary test suite changes.)

> but this is definitely an 
> improvement.

Right, thanks!


A follow-up change could then make sure that 'offload_target_amdgcn' etc.
only return true if offloading compilation actually is supported given
the current command-line options.  Either again via a
'check_effective_target_lp64' check in
'libgomp_check_effective_target_offload_target', or -- better? -- change
the driver to only conditionally print 'OFFLOAD_TARGET_NAMES=[...]'?
Does that basically mean to move the 'offload_abi' checks from the
'mkoffload's into the driver?  That appears to make sense indeed.
(..., so that we don't even invoke the 'mkoffload's for unsupported
configurations/options.)


Grüße
 Thomas


nvptx target: Global constructor, destructor support, via nvptx-tools 'ld' (was: nvptx: Support global constructors/destructors via 'collect2')

2024-05-31 Thread Thomas Schwinge
Hi!

On 2022-12-02T14:35:35+0100, I wrote:
> On 2022-12-01T22:13:38+0100, I wrote:
>> I'm working on support for global constructors/destructors with
>> GCC/nvptx
>
> See "nvptx: Support global constructors/destructors via 'collect2'"
> attached; [...]
>
> Per my quick scanning of 'gcc/config.gcc' history, for more than two
> decades, there was a clear trend to remove 'use_collect2=yes'
> configurations; now finally a new one is being added -- making sure we're
> not slowly dispensing with the need for the early 1990s piece of work
> that 'gcc/collect2*' is...  ;'-P

In the following, I have then reconsidered that stance; we may actually
"Implement global constructor, destructor support in a conceptually
simpler way than using 'collect2' (the program): implement the respective
functionality in the nvptx-tools 'ld'".  The latter is
<https://github.com/SourceryTools/nvptx-tools/commit/96f8fc59a757767b9e98157d95c21e9fef22a93b>
"ld: Global constructor/destructor support".

Thus, this:

> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -2783,6 +2783,7 @@ nvptx-*)
>   tm_file="${tm_file} newlib-stdint.h"
>   use_gcc_stdint=wrap
>   tmake_file="nvptx/t-nvptx"
> + use_collect2=yes
>   if test x$enable_as_accelerator = xyes; then
>   extra_programs="${extra_programs} mkoffload\$(exeext)"
>   tm_file="${tm_file} nvptx/offload.h"

... now is gone again.  ;'-)

Pushed to trunk branch commit d9c90c82d900fdae95df4499bf5f0a4ecb903b53
"nvptx target: Global constructor, destructor support, via nvptx-tools 'ld'",
see attached.

(Support for nvptx offloading, enablement of full libgfortran for nvptx,
and corresponding documentation updates, etc. are to follow as separate
commits.)


Compared to the 2022 'collect2' version, this 'ld' version also does
happen to avoid one class of FAILs:

[-FAIL:-]{+PASS:+} gfortran.dg/implicit_class_1.f90   -O0  (test for excess 
errors)
[-UNRESOLVED:-]{+PASS:+} gfortran.dg/implicit_class_1.f90   -O0  
[-compilation failed to produce executable-]{+execution test+}
[...]

That was due to:

Executing on host: [gfortran] [...] [...]/gfortran.dg/implicit_class_1.f90 
[...] -fdump-fortran-original [...]
[...]
cc1: error: unrecognized command-line option '-fdump-fortran-original'; did 
you mean '-fdump-tree-original'?
collect2: fatal error: gcc returned 1 exit status
compilation terminated.
compiler exited with status 1
FAIL: gfortran.dg/implicit_class_1.f90   -O0  (test for excess errors)

That is, the 'gcc' invocation by 'collect2' is passed
'-fdump-fortran-original', but doesn't know what to do with that.  (Maybe
using '-Wno-complain-wrong-lang' in 'collect2' would help?)  (I'm not
going to look into that any further.)


Grüße
 Thomas


>From d9c90c82d900fdae95df4499bf5f0a4ecb903b53 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 28 May 2024 23:20:29 +0200
Subject: [PATCH] nvptx target: Global constructor, destructor support, via
 nvptx-tools 'ld'

The function attributes 'constructor', 'destructor', and 'init_priority' now
work, as do the C++ features making use of this.  Test cases with effective
target 'global_constructor' and 'init_priority' now generally work, and
'check-gcc-c++' test results greatly improve; no more
"sorry, unimplemented: global constructors not supported on this target".

For proper execution test results, this depends on
<https://github.com/SourceryTools/nvptx-tools/commit/96f8fc59a757767b9e98157d95c21e9fef22a93b>
"ld: Global constructor/destructor support".

	gcc/
	* config/nvptx/nvptx.h: Configure global constructor, destructor
	support.
	gcc/testsuite/
	* gcc.dg/no_profile_instrument_function-attr-1.c: GCC/nvptx is
	'NO_DOT_IN_LABEL' but not 'NO_DOLLAR_IN_LABEL', so '$' may apper
	in identifiers.
	* lib/target-supports.exp
	(check_effective_target_global_constructor): Enable for nvptx.
	libgcc/
	* config/nvptx/crt0.c (__gbl_ctors): New weak function.
	(__main): Invoke it.
	* config/nvptx/gbl-ctors.c: New.
	* config/nvptx/t-nvptx: Configure global constructor, destructor
	support.
---
 gcc/config/nvptx/nvptx.h  | 14 +++-
 .../no_profile_instrument_function-attr-1.c   |  2 +-
 gcc/testsuite/lib/target-supports.exp |  3 +-
 libgcc/config/nvptx/crt0.c| 12 +++
 libgcc/config/nvptx/gbl-ctors.c   | 74 +++
 libgcc/config/nvptx/t-nvptx   |  9 ++-
 6 files changed, 109 insertions(+), 5 deletions(-)
 create mode 100644 libg

Enable 'gcc.dg/pr114768.c' for nvptx target [PR114768] (was: [PATCH] rtlanal: Fix set_noop_p for volatile loads or stores [PR114768])

2024-04-19 Thread Thomas Schwinge
Hi!

On 2024-04-19T12:30:25+0200, Jakub Jelinek  wrote:
> On Fri, Apr 19, 2024 at 12:23:03PM +0200, Thomas Schwinge wrote:
>> On 2024-04-19T08:24:03+0200, Jakub Jelinek  wrote:
>> > --- gcc/testsuite/gcc.dg/pr114768.c.jj 2024-04-18 15:37:49.139433678 
>> > +0200
>> > +++ gcc/testsuite/gcc.dg/pr114768.c2024-04-18 15:43:30.389730365 
>> > +0200
>> > @@ -0,0 +1,10 @@
>> > +/* PR rtl-optimization/114768 */
>> > +/* { dg-do compile } */
>> > +/* { dg-options "-O2 -fdump-rtl-final" } */
>> > +/* { dg-final { scan-rtl-dump "\\\(mem/v:" "final" { target { ! { 
>> > nvptx*-*-* } } } } } */
>> > +
>> > +void
>> > +foo (int *p)
>> > +{
>> > +  *p = *(volatile int *) p;
>> > +}
>> 
>> Why exclude nvptx target here?  As far as I can see, it does behave in
>> the exactly same way as expected; see 'diff' of before vs. after the
>> 'gcc/rtlanal.cc' code changes:
>
> I wasn't sure if the non-RA targets (for which we don't have an effective
> target) even have final dump.
> If they do as you show, then guess the target guard can go.

ACK.  Pushed to trunk branch in
commit 9451b6c0a941dc44ca6f14ff8565d74fe56cca59
"Enable 'gcc.dg/pr114768.c' for nvptx target [PR114768]", see attached.


Grüße
 Thomas


>From 9451b6c0a941dc44ca6f14ff8565d74fe56cca59 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Fri, 19 Apr 2024 12:32:03 +0200
Subject: [PATCH] Enable 'gcc.dg/pr114768.c' for nvptx target [PR114768]

Follow-up to commit 9f295847a9c32081bdd0fe908ffba58e830a24fb
"rtlanal: Fix set_noop_p for volatile loads or stores [PR114768]": nvptx does
behave in the exactly same way as expected; see 'diff' of before vs. after the
'gcc/rtlanal.cc' code changes:

PASS: gcc.dg/pr114768.c (test for excess errors)
[-FAIL:-]{+PASS:+} gcc.dg/pr114768.c scan-rtl-dump final "\\(mem/v:"

--- 0/pr114768.c.347r.final	2024-04-19 11:34:34.577037596 +0200
+++ ./pr114768.c.347r.final	2024-04-19 12:08:00.118312524 +0200
@@ -13,15 +13,27 @@
 ;;  entry block defs 	 1 [%stack] 2 [%frame] 3 [%args]
 ;;  exit block uses 	 1 [%stack] 2 [%frame]
 ;;  regs ever live
-;;  ref usage 	r1={1d,2u} r2={1d,2u} r3={1d,1u}
-;;total ref usage 8{3d,5u,0e} in 1{1 regular + 0 call} insns.
+;;  ref usage 	r1={1d,3u} r2={1d,3u} r3={1d,2u} r22={1d,1u} r23={1d,2u}
+;;total ref usage 16{5d,11u,0e} in 4{4 regular + 0 call} insns.
 (note 1 0 4 NOTE_INSN_DELETED)
 (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
-(note 2 4 3 2 NOTE_INSN_DELETED)
+(insn 2 4 3 2 (set (reg/v/f:DI 23 [ p ])
+(unspec:DI [
+(const_int 0 [0])
+] UNSPEC_ARG_REG)) "source-gcc/gcc/testsuite/gcc.dg/pr114768.c":8:1 14 {load_arg_regdi}
+ (nil))
 (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
-(note 6 3 10 2 NOTE_INSN_DELETED)
-(note 10 6 11 2 NOTE_INSN_EPILOGUE_BEG)
-(jump_insn 11 10 12 2 (return) "source-gcc/gcc/testsuite/gcc.dg/pr114768.c":10:1 289 {return}
+(insn 6 3 7 2 (set (reg:SI 22 [ _1 ])
+(mem/v:SI (reg/v/f:DI 23 [ p ]) [1 MEM[(volatile int *)p_3(D)]+0 S4 A32])) "source-gcc/gcc/testsuite/gcc.dg/pr114768.c":9:8 6 {*movsi_insn}
+ (nil))
+(insn 7 6 10 2 (set (mem:SI (reg/v/f:DI 23 [ p ]) [1 *p_3(D)+0 S4 A32])
+(reg:SI 22 [ _1 ])) "source-gcc/gcc/testsuite/gcc.dg/pr114768.c":9:6 6 {*movsi_insn}
+ (expr_list:REG_DEAD (reg/v/f:DI 23 [ p ])
+(expr_list:REG_DEAD (reg:SI 22 [ _1 ])
+(nil
+(note 10 7 13 2 NOTE_INSN_EPILOGUE_BEG)
+(note 13 10 11 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
+(jump_insn 11 13 12 3 (return) "source-gcc/gcc/testsuite/gcc.dg/pr114768.c":10:1 289 {return}
	  (nil)
  -> return)
 (barrier 12 11 0)

--- 0/pr114768.s	2024-04-19 11:34:34.577037596 +0200
+++ ./pr114768.s	2024-04-19 12:08:00.118312524 +0200
@@ -13,5 +13,10 @@
 {
	.reg.u64 %ar0;
	ld.param.u64 %ar0, [%in_ar0];
+	.reg.u32 %r22;
+	.reg.u64 %r23;
+		mov.u64	%r23, %ar0;
+		ld.u32	%r22, [%r23];
+		st.u32	[%r23], %r22;
	ret;
 }

	PR testsuite/114768
	gcc/testsuite/
	* gcc.dg/pr114768.c: Enable for nvptx target.
---
 gcc/testsuite/gcc.dg/pr114768.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/pr114768.c b/gcc/testsuite/gcc.dg/pr114768.c
index 2075f0d6b82..ffe3b368638 100644
--- a/gcc/testsuite/gcc.dg/pr114768.c
+++ b/gcc/testsuite/gcc.dg/pr114768.c
@@ -1,7 +1,7 @@
 /* PR rtl-optimization/114768 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -fdump-rtl-final" } */
-/* { dg-final { scan-rtl-dump "\\\(mem/v:" "final" { target { ! { nvptx*-*-* } } } } } */
+/* { dg-final { scan-rtl-dump "\\\(mem/v:" "final" } } */
 
 void
 foo (int *p)
-- 
2.34.1



Re: [PATCH] rtlanal: Fix set_noop_p for volatile loads or stores [PR114768]

2024-04-19 Thread Thomas Schwinge
Hi Jakub!

On 2024-04-19T08:24:03+0200, Jakub Jelinek  wrote:
> --- gcc/testsuite/gcc.dg/pr114768.c.jj2024-04-18 15:37:49.139433678 
> +0200
> +++ gcc/testsuite/gcc.dg/pr114768.c   2024-04-18 15:43:30.389730365 +0200
> @@ -0,0 +1,10 @@
> +/* PR rtl-optimization/114768 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-final" } */
> +/* { dg-final { scan-rtl-dump "\\\(mem/v:" "final" { target { ! { nvptx*-*-* 
> } } } } } */
> +
> +void
> +foo (int *p)
> +{
> +  *p = *(volatile int *) p;
> +}

Why exclude nvptx target here?  As far as I can see, it does behave in
the exactly same way as expected; see 'diff' of before vs. after the
'gcc/rtlanal.cc' code changes:

PASS: gcc.dg/pr114768.c (test for excess errors)
[-FAIL:-]{+PASS:+} gcc.dg/pr114768.c scan-rtl-dump final "\\(mem/v:"

--- 0/pr114768.c.347r.final 2024-04-19 11:34:34.577037596 +0200
+++ ./pr114768.c.347r.final 2024-04-19 12:08:00.118312524 +0200
@@ -13,15 +13,27 @@
 ;;  entry block defs1 [%stack] 2 [%frame] 3 [%args]
 ;;  exit block uses 1 [%stack] 2 [%frame]
 ;;  regs ever live 
-;;  ref usage  r1={1d,2u} r2={1d,2u} r3={1d,1u} 
-;;total ref usage 8{3d,5u,0e} in 1{1 regular + 0 call} insns.
+;;  ref usage  r1={1d,3u} r2={1d,3u} r3={1d,2u} r22={1d,1u} 
r23={1d,2u} 
+;;total ref usage 16{5d,11u,0e} in 4{4 regular + 0 call} insns.
 (note 1 0 4 NOTE_INSN_DELETED)
 (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
-(note 2 4 3 2 NOTE_INSN_DELETED)
+(insn 2 4 3 2 (set (reg/v/f:DI 23 [ p ])
+(unspec:DI [
+(const_int 0 [0])
+] UNSPEC_ARG_REG)) 
"source-gcc/gcc/testsuite/gcc.dg/pr114768.c":8:1 14 {load_arg_regdi}
+ (nil))
 (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
-(note 6 3 10 2 NOTE_INSN_DELETED)
-(note 10 6 11 2 NOTE_INSN_EPILOGUE_BEG)
-(jump_insn 11 10 12 2 (return) 
"source-gcc/gcc/testsuite/gcc.dg/pr114768.c":10:1 289 {return}
+(insn 6 3 7 2 (set (reg:SI 22 [ _1 ])
+(mem/v:SI (reg/v/f:DI 23 [ p ]) [1 MEM[(volatile int *)p_3(D)]+0 
S4 A32])) "source-gcc/gcc/testsuite/gcc.dg/pr114768.c":9:8 6 {*movsi_insn}
+ (nil))
+(insn 7 6 10 2 (set (mem:SI (reg/v/f:DI 23 [ p ]) [1 *p_3(D)+0 S4 A32])
+(reg:SI 22 [ _1 ])) 
"source-gcc/gcc/testsuite/gcc.dg/pr114768.c":9:6 6 {*movsi_insn}
+ (expr_list:REG_DEAD (reg/v/f:DI 23 [ p ])
+(expr_list:REG_DEAD (reg:SI 22 [ _1 ])
+(nil
+(note 10 7 13 2 NOTE_INSN_EPILOGUE_BEG)
+(note 13 10 11 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
+(jump_insn 11 13 12 3 (return) 
"source-gcc/gcc/testsuite/gcc.dg/pr114768.c":10:1 289 {return}
  (nil)
  -> return)
 (barrier 12 11 0)

--- 0/pr114768.s2024-04-19 11:34:34.577037596 +0200
+++ ./pr114768.s2024-04-19 12:08:00.118312524 +0200
@@ -13,5 +13,10 @@
 {
.reg.u64 %ar0;
ld.param.u64 %ar0, [%in_ar0];
+   .reg.u32 %r22;
+   .reg.u64 %r23;
+   mov.u64 %r23, %ar0;
+   ld.u32  %r22, [%r23];
+   st.u32  [%r23], %r22;
ret;
 }


Grüße
 Thomas


GCN: Enable effective-target 'vect_long_long'

2024-04-16 Thread Thomas Schwinge
Hi!

OK to push the attached "GCN: Enable effective-target 'vect_long_long'"?
(Or is that not what you'd expect to see for GCN?  I haven't checked the
actual back end code...)


Grüße
 Thomas


>From d74cc9caadfe36652503782a8da172ae1975915c Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 16 Apr 2024 14:10:15 +0200
Subject: [PATCH] GCN: Enable effective-target 'vect_long_long'

... as made apparent by a number of unexpectedly UNSUPPORTED test cases, which
now all turn into PASS, with just one exception:

PASS: gcc.dg/vect/vect-early-break_124-pr114403.c (test for excess errors)
PASS: gcc.dg/vect/vect-early-break_124-pr114403.c execution test
FAIL: gcc.dg/vect/vect-early-break_124-pr114403.c scan-tree-dump vect "LOOP VECTORIZED"

..., which needs to be looked into, separately.

	gcc/testsuite/
	* lib/target-supports.exp (check_effective_target_vect_long_long):
	Enable for GCN.
---
 gcc/testsuite/lib/target-supports.exp | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 594837653bb..1a8459561c6 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -7692,7 +7692,8 @@ proc check_effective_target_vect_long_long { } {
 	 || ([istarget riscv*-*-*]
 		 && [check_effective_target_riscv_v])
 	 || ([istarget loongarch*-*-*]
-		 && [check_effective_target_loongarch_sx])}}]
+		 && [check_effective_target_loongarch_sx])
+	 || [istarget amdgcn-*-*] }}]
 }
 
 
-- 
2.34.1



build: Use of cargo not yet supported here in Canadian cross configurations (was: [PATCH] build: Check for cargo when building rust language)

2024-04-15 Thread Thomas Schwinge
Hi!

On 2024-04-15T13:14:42+0200, I wrote:
> On 2024-04-08T18:33:38+0200, pierre-emmanuel.pa...@embecosm.com wrote:
>> The rust frontend requires cargo to build some of it's components,
>
> In GCC upstream still: 's%requires%is going to require'.  ;-)
>
>> it's presence was not checked during configuration.
>
> After confirming the desired semantics/diagnostics, I've now pushed this
> to trunk branch in commit 3e1e73fc99584440e5967577f2049573eeaf4596
> "build: Check for cargo when building rust language".

On top of that, OK to push the attached
"build: Use of cargo not yet supported here in Canadian cross configurations"?


Grüße
 Thomas


>From eb38990b4147951dd21f19def43072368f782af5 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 15 Apr 2024 14:27:45 +0200
Subject: [PATCH] build: Use of cargo not yet supported here in Canadian cross
 configurations

..., until <https://github.com/Rust-GCC/gccrs/issues/2898>
"'cargo' should build for the host system" is resolved.

Follow-up to commit 3e1e73fc99584440e5967577f2049573eeaf4596
"build: Check for cargo when building rust language".

	* configure.ac (have_cargo): Force to "no" in Canadian cross
	configurations
	* configure: Regenerate.
---
 configure| 13 +
 configure.ac | 12 
 2 files changed, 25 insertions(+)

diff --git a/configure b/configure
index e254aa132b5..e59a870b2bd 100755
--- a/configure
+++ b/configure
@@ -9179,6 +9179,19 @@ $as_echo "$as_me: WARNING: --enable-host-shared required to build $language" >&2
   ;;
 esac
 
+# Pre-conditions to consider whether cargo being supported.
+if test x"$have_cargo" = xyes \
+  && test x"$build" != x"$host"; then
+  # Until <https://github.com/Rust-GCC/gccrs/issues/2898>
+  # "'cargo' should build for the host system" is resolved:
+  { $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: use of cargo not yet supported here in Canadian cross configurations" >&5
+$as_echo "$as_me: WARNING: use of cargo not yet supported here in Canadian cross configurations" >&2;}
+  have_cargo=no
+else
+  # Assume that cargo-produced object files are compatible with what
+  # we're going to build here.
+  :
+fi
 # Disable Rust if cargo is unavailable.
 case ${add_this_lang}:${language}:${have_cargo} in
   yes:rust:no)
diff --git a/configure.ac b/configure.ac
index 87205d0ac1f..4ab54431475 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2306,6 +2306,18 @@ directories, to avoid imposing the performance cost of
   ;;
 esac
 
+# Pre-conditions to consider whether cargo being supported.
+if test x"$have_cargo" = xyes \
+  && test x"$build" != x"$host"; then
+  # Until <https://github.com/Rust-GCC/gccrs/issues/2898>
+  # "'cargo' should build for the host system" is resolved:
+  AC_MSG_WARN([use of cargo not yet supported here in Canadian cross configurations])
+  have_cargo=no
+else
+  # Assume that cargo-produced object files are compatible with what
+  # we're going to build here.
+  :
+fi
 # Disable Rust if cargo is unavailable.
 case ${add_this_lang}:${language}:${have_cargo} in
   yes:rust:no)
-- 
2.34.1



build: Don't check for host-prefixed 'cargo' program (was: [PATCH] build: Check for cargo when building rust language)

2024-04-15 Thread Thomas Schwinge
Hi!

On 2024-04-15T13:14:42+0200, I wrote:
> On 2024-04-08T18:33:38+0200, pierre-emmanuel.pa...@embecosm.com wrote:
>> The rust frontend requires cargo to build some of it's components,
>
> In GCC upstream still: 's%requires%is going to require'.  ;-)
>
>> it's presence was not checked during configuration.
>
> After confirming the desired semantics/diagnostics, I've now pushed this
> to trunk branch in commit 3e1e73fc99584440e5967577f2049573eeaf4596
> "build: Check for cargo when building rust language".
>
>
> I now wonder: instead of 'AC_CHECK_TOOL', shouldn't this use
> 'AC_CHECK_PROG'?  (We always want plain 'cargo', not host-prefixed
> 'aarch64-linux-gnu-cargo' etc., right?)  I'll look into changing this.

OK to push "build: Don't check for host-prefixed 'cargo' program", see
attached?


Grüße
 Thomas


>From 913be0412665d02561f8aeb999860ce8d292c61e Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Mon, 15 Apr 2024 13:33:48 +0200
Subject: [PATCH] build: Don't check for host-prefixed 'cargo' program

Follow-up to commit 3e1e73fc99584440e5967577f2049573eeaf4596
"build: Check for cargo when building rust language":

On 2024-04-15T13:14:42+0200, I wrote:
> I now wonder: instead of 'AC_CHECK_TOOL', shouldn't this use
> 'AC_CHECK_PROG'?  (We always want plain 'cargo', not host-prefixed
> 'aarch64-linux-gnu-cargo' etc., right?)  I'll look into changing this.

	* configure: Regenerate.
	config/
	* acx.m4 (ACX_PROG_CARGO): Use 'AC_CHECK_PROGS'.
---
 config/acx.m4 |  3 +--
 configure | 64 ++-
 2 files changed, 8 insertions(+), 59 deletions(-)

diff --git a/config/acx.m4 b/config/acx.m4
index 3c5fe67342e..c45e55e7f51 100644
--- a/config/acx.m4
+++ b/config/acx.m4
@@ -427,8 +427,7 @@ fi
 # Test for Rust
 # We require cargo and rustc for some parts of the rust compiler.
 AC_DEFUN([ACX_PROG_CARGO],
-[AC_REQUIRE([AC_CHECK_TOOL_PREFIX])
-AC_CHECK_TOOL(CARGO, cargo, no)
+[AC_CHECK_PROGS(CARGO, cargo, no)
 if test "x$CARGO" != xno; then
   have_cargo=yes
 else
diff --git a/configure b/configure
index dd96445ac4a..e254aa132b5 100755
--- a/configure
+++ b/configure
@@ -5818,10 +5818,10 @@ else
   have_gdc=no
 fi
 
-
-if test -n "$ac_tool_prefix"; then
-  # Extract the first word of "${ac_tool_prefix}cargo", so it can be a program name with args.
-set dummy ${ac_tool_prefix}cargo; ac_word=$2
+for ac_prog in cargo
+do
+  # Extract the first word of "$ac_prog", so it can be a program name with args.
+set dummy $ac_prog; ac_word=$2
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
 $as_echo_n "checking for $ac_word... " >&6; }
 if ${ac_cv_prog_CARGO+:} false; then :
@@ -5837,7 +5837,7 @@ do
   test -z "$as_dir" && as_dir=.
 for ac_exec_ext in '' $ac_executable_extensions; do
   if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
-ac_cv_prog_CARGO="${ac_tool_prefix}cargo"
+ac_cv_prog_CARGO="$ac_prog"
 $as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
 break 2
   fi
@@ -5857,59 +5857,9 @@ $as_echo "no" >&6; }
 fi
 
 
-fi
-if test -z "$ac_cv_prog_CARGO"; then
-  ac_ct_CARGO=$CARGO
-  # Extract the first word of "cargo", so it can be a program name with args.
-set dummy cargo; ac_word=$2
-{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
-$as_echo_n "checking for $ac_word... " >&6; }
-if ${ac_cv_prog_ac_ct_CARGO+:} false; then :
-  $as_echo_n "(cached) " >&6
-else
-  if test -n "$ac_ct_CARGO"; then
-  ac_cv_prog_ac_ct_CARGO="$ac_ct_CARGO" # Let the user override the test.
-else
-as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
-for as_dir in $PATH
-do
-  IFS=$as_save_IFS
-  test -z "$as_dir" && as_dir=.
-for ac_exec_ext in '' $ac_executable_extensions; do
-  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
-ac_cv_prog_ac_ct_CARGO="cargo"
-$as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
-break 2
-  fi
+  test -n "$CARGO" && break
 done
-  done
-IFS=$as_save_IFS
-
-fi
-fi
-ac_ct_CARGO=$ac_cv_prog_ac_ct_CARGO
-if test -n "$ac_ct_CARGO"; then
-  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CARGO" >&5
-$as_echo "$ac_ct_CARGO" >&6; }
-else
-  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
-$as_echo "no" >&6; }
-fi
-
-  if test "x$ac_ct_CARGO" = x; then
-CARGO="no"
-  else
-case $cross_compiling:$ac_tool_warned in
-yes:)
-{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not prefixed with host triplet" >&5
-$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" >&2;}
-ac_tool_warned=yes ;;
-esac
-CARGO=$ac_ct_CARGO
-  fi
-else
-  CARGO="$ac_cv_prog_CARGO"
-fi
+test -n "$CARGO" || CARGO="no"
 
 if test "x$CARGO" != xno; then
   have_cargo=yes
-- 
2.34.1



Re: [PATCH] build: Check for cargo when building rust language

2024-04-15 Thread Thomas Schwinge
Hi!

On 2024-04-08T18:33:38+0200, pierre-emmanuel.pa...@embecosm.com wrote:
> The rust frontend requires cargo to build some of it's components,

In GCC upstream still: 's%requires%is going to require'.  ;-)

> it's presence was not checked during configuration.

After confirming the desired semantics/diagnostics, I've now pushed this
to trunk branch in commit 3e1e73fc99584440e5967577f2049573eeaf4596
"build: Check for cargo when building rust language".


I now wonder: instead of 'AC_CHECK_TOOL', shouldn't this use
'AC_CHECK_PROG'?  (We always want plain 'cargo', not host-prefixed
'aarch64-linux-gnu-cargo' etc., right?)  I'll look into changing this.


Grüße
 Thomas


> Prevent rust language from building when cargo is
> missing.
>
> config/ChangeLog:
>
>   * acx.m4: Add a macro to check for rust
>   components.
>
> ChangeLog:
>
>   * configure: Regenerate.
>   * configure.ac: Emit an error message when cargo
>   is missing.
>
> Signed-off-by: Pierre-Emmanuel Patry 
> ---
>  config/acx.m4 |  11 +
>  configure | 117 ++
>  configure.ac  |  18 
>  3 files changed, 146 insertions(+)
>
> diff --git a/config/acx.m4 b/config/acx.m4
> index 7efe98aaf96..3c5fe67342e 100644
> --- a/config/acx.m4
> +++ b/config/acx.m4
> @@ -424,6 +424,17 @@ else
>  fi
>  ])
>  
> +# Test for Rust
> +# We require cargo and rustc for some parts of the rust compiler.
> +AC_DEFUN([ACX_PROG_CARGO],
> +[AC_REQUIRE([AC_CHECK_TOOL_PREFIX])
> +AC_CHECK_TOOL(CARGO, cargo, no)
> +if test "x$CARGO" != xno; then
> +  have_cargo=yes
> +else
> +  have_cargo=no
> +fi])
> +
>  # Test for D.
>  AC_DEFUN([ACX_PROG_GDC],
>  [AC_REQUIRE([AC_CHECK_TOOL_PREFIX])
> diff --git a/configure b/configure
> index 874966fb9f0..46e66e20197 100755
> --- a/configure
> +++ b/configure
> @@ -714,6 +714,7 @@ PGO_BUILD_GEN_CFLAGS
>  HAVE_CXX11_FOR_BUILD
>  HAVE_CXX11
>  do_compare
> +CARGO
>  GDC
>  GNATMAKE
>  GNATBIND
> @@ -5786,6 +5787,104 @@ else
>have_gdc=no
>  fi
>  
> +
> +if test -n "$ac_tool_prefix"; then
> +  # Extract the first word of "${ac_tool_prefix}cargo", so it can be a 
> program name with args.
> +set dummy ${ac_tool_prefix}cargo; ac_word=$2
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
> +$as_echo_n "checking for $ac_word... " >&6; }
> +if ${ac_cv_prog_CARGO+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  if test -n "$CARGO"; then
> +  ac_cv_prog_CARGO="$CARGO" # Let the user override the test.
> +else
> +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
> +for as_dir in $PATH
> +do
> +  IFS=$as_save_IFS
> +  test -z "$as_dir" && as_dir=.
> +for ac_exec_ext in '' $ac_executable_extensions; do
> +  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
> +ac_cv_prog_CARGO="${ac_tool_prefix}cargo"
> +$as_echo "$as_me:${as_lineno-$LINENO}: found 
> $as_dir/$ac_word$ac_exec_ext" >&5
> +break 2
> +  fi
> +done
> +  done
> +IFS=$as_save_IFS
> +
> +fi
> +fi
> +CARGO=$ac_cv_prog_CARGO
> +if test -n "$CARGO"; then
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $CARGO" >&5
> +$as_echo "$CARGO" >&6; }
> +else
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
> +$as_echo "no" >&6; }
> +fi
> +
> +
> +fi
> +if test -z "$ac_cv_prog_CARGO"; then
> +  ac_ct_CARGO=$CARGO
> +  # Extract the first word of "cargo", so it can be a program name with args.
> +set dummy cargo; ac_word=$2
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
> +$as_echo_n "checking for $ac_word... " >&6; }
> +if ${ac_cv_prog_ac_ct_CARGO+:} false; then :
> +  $as_echo_n "(cached) " >&6
> +else
> +  if test -n "$ac_ct_CARGO"; then
> +  ac_cv_prog_ac_ct_CARGO="$ac_ct_CARGO" # Let the user override the test.
> +else
> +as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
> +for as_dir in $PATH
> +do
> +  IFS=$as_save_IFS
> +  test -z "$as_dir" && as_dir=.
> +for ac_exec_ext in '' $ac_executable_extensions; do
> +  if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
> +ac_cv_prog_ac_ct_CARGO="cargo"
> +$as_echo "$as_me:${as_lineno-$LINENO}: found 
> $as_dir/$ac_word$ac_exec_ext" >&5
> +break 2
> +  fi
> +done
> +  done
> +IFS=$as_save_IFS
> +
> +fi
> +fi
> +ac_ct_CARGO=$ac_cv_prog_ac_ct_CARGO
> +if test -n "$ac_ct_CARGO"; then
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_ct_CARGO" >&5
> +$as_echo "$ac_ct_CARGO" >&6; }
> +else
> +  { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
> +$as_echo "no" >&6; }
> +fi
> +
> +  if test "x$ac_ct_CARGO" = x; then
> +CARGO="no"
> +  else
> +case $cross_compiling:$ac_tool_warned in
> +yes:)
> +{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: using cross tools not 
> prefixed with host triplet" >&5
> +$as_echo "$as_me: WARNING: using cross tools not prefixed with host triplet" 
> >&2;}
> +ac_tool_warned=yes ;;
> +esac
> +CARGO=$ac_ct_CARGO
> +  fi
> +else
> +  CARGO="$ac_cv_prog_CARGO"
> +fi
> +
> +if test "x$CARGO" != xno; then
> +  have_cargo=yes
> +els

Re: [gcc r14-7544] gccrs: libproc_macro: Build statically

2024-04-15 Thread Thomas Schwinge
Hi!

On 2024-01-16T17:43:10+, Arthur Cohen via Gcc-cvs  
wrote:
> https://gcc.gnu.org/g:71180a9eed367667e7b2c3f6aea1ee1bba15e9b3
>
> commit r14-7544-g71180a9eed367667e7b2c3f6aea1ee1bba15e9b3
> Author: Pierre-Emmanuel Patry 
> Date:   Wed Apr 26 10:31:35 2023 +0200
>
> gccrs: libproc_macro: Build statically
> 
> We do not need dynamic linking, all use case of this library cover can
> be done statically hence the change.
> 
> gcc/rust/ChangeLog:
> 
> * Make-lang.in: Link against the static libproc_macro.

> --- a/gcc/rust/Make-lang.in
> +++ b/gcc/rust/Make-lang.in
> @@ -182,11 +182,14 @@ RUST_ALL_OBJS = $(GRS_OBJS) $(RUST_TARGET_OBJS)
>  
>  rust_OBJS = $(RUST_ALL_OBJS) rust/rustspec.o
>  
> +RUST_LDFLAGS = $(LDFLAGS) -L./../libgrust/libproc_macro
> +RUST_LIBDEPS = $(LIBDEPS) ../libgrust/libproc_macro/libproc_macro.a
> +
>  # The compiler itself is called crab1
> -crab1$(exeext): $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(LIBDEPS) $(rust.prev)
> +crab1$(exeext): $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(RUST_LIBDEPS) 
> $(rust.prev)
>   @$(call LINK_PROGRESS,$(INDEX.rust),start)
> - +$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
> -   $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(LIBS) $(BACKENDLIBS)
> + +$(LLINKER) $(ALL_LINKERFLAGS) $(RUST_LDFLAGS) -o $@ \
> +   $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(LIBS) 
> ../libgrust/libproc_macro/libproc_macro.a $(BACKENDLIBS)
>   @$(call LINK_PROGRESS,$(INDEX.rust),end)

The 'crab1' compiler is (at least potentially) just one of several
executables that 'gcc/rust/Make-lang.in' may build, which may all have
different library dependencies, etc.  Instead of via generic 'RUST_[...]'
variables, those dependencies etc. should therefore be specified as they
are individually necessary.

I've pushed to trunk branch the following clean-up commits, see attached:

  - commit cb70a49b30f0a22ec7a1b7df29c3ab370d603f90 "Remove 
'libgrust/libproc_macro_internal' from 'gcc/rust/Make-lang.in:RUST_LDFLAGS'"
  - commit f7c8fa7280c85cbdea45be9c09f36123ff16a78a "Inline 
'gcc/rust/Make-lang.in:RUST_LDFLAGS' into single user"
  - commit 24d92f65f9ed9b3c730c59f700ce2f5c038c8207 "Add 
'gcc/rust/Make-lang.in:LIBPROC_MACRO_INTERNAL'"
  - commit e3fda76af4f342ad1ba8bd901a72d811e8357e99 "Inline 
'gcc/rust/Make-lang.in:RUST_LIBDEPS' into single user"


Grüße
 Thomas


>From cb70a49b30f0a22ec7a1b7df29c3ab370d603f90 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 28 Feb 2024 22:41:42 +0100
Subject: [PATCH 1/4] Remove 'libgrust/libproc_macro_internal' from
 'gcc/rust/Make-lang.in:RUST_LDFLAGS'

This isn't necessary, as the full path to 'libproc_macro_internal.a' is
specified elsewhere.

	gcc/rust/
	* Make-lang.in (RUST_LDFLAGS): Remove
	'libgrust/libproc_macro_internal'.
---
 gcc/rust/Make-lang.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in
index 4d73412739d..e901668b93d 100644
--- a/gcc/rust/Make-lang.in
+++ b/gcc/rust/Make-lang.in
@@ -208,7 +208,7 @@ RUST_ALL_OBJS = $(GRS_OBJS) $(RUST_TARGET_OBJS)
 
 rust_OBJS = $(RUST_ALL_OBJS) rust/rustspec.o
 
-RUST_LDFLAGS = $(LDFLAGS) -L./../libgrust/libproc_macro_internal
+RUST_LDFLAGS = $(LDFLAGS)
 RUST_LIBDEPS = $(LIBDEPS) ../libgrust/libproc_macro_internal/libproc_macro_internal.a
 
 # The compiler itself is called crab1
-- 
2.34.1

>From f7c8fa7280c85cbdea45be9c09f36123ff16a78a Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 28 Feb 2024 22:45:18 +0100
Subject: [PATCH 2/4] Inline 'gcc/rust/Make-lang.in:RUST_LDFLAGS' into single
 user

	gcc/rust/
	* Make-lang.in (RUST_LDFLAGS): Inline into single user.
---
 gcc/rust/Make-lang.in | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/rust/Make-lang.in b/gcc/rust/Make-lang.in
index e901668b93d..ffeb325d6ce 100644
--- a/gcc/rust/Make-lang.in
+++ b/gcc/rust/Make-lang.in
@@ -208,13 +208,12 @@ RUST_ALL_OBJS = $(GRS_OBJS) $(RUST_TARGET_OBJS)
 
 rust_OBJS = $(RUST_ALL_OBJS) rust/rustspec.o
 
-RUST_LDFLAGS = $(LDFLAGS)
 RUST_LIBDEPS = $(LIBDEPS) ../libgrust/libproc_macro_internal/libproc_macro_internal.a
 
 # The compiler itself is called crab1
 crab1$(exeext): $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(RUST_LIBDEPS) $(rust.prev)
 	@$(call LINK_PROGRESS,$(INDEX.rust),start)
-	+$(LLINKER) $(ALL_LINKERFLAGS) $(RUST_LDFLAGS) -o $@ \
+	+$(LLINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) -o $@ \
 	  $(RUST_ALL_OBJS) attribs.o $(BACKEND) $(LIBS) ../libgrust/libproc_macro_internal/libproc_macro_internal.a $(BACKENDLIBS)
 	@$(call LINK_PROGRESS,$(INDEX.rust),end)
 
-- 
2.34.1

>From 24d92f65f9ed9b3c730c59f700ce2f5c038c8207 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 

Re: [nvptx PATCH] Correct pattern for popcountdi2 insn in nvptx.md.

2024-04-12 Thread Thomas Schwinge
Hi Roger!

On 2023-01-09T13:29:14+, "Roger Sayle"  wrote:
> The result of a POPCOUNT operation in RTL should have the same mode
> as its operand.  This corrects the specification of popcount in
> the nvptx backend, splitting the current generic define_insn into
> two, one for popcountsi2 and the other for popcountdi2 (the latter
> with an explicit truncate).
>
> This patch has been tested on nvptx-none (hosted on x86_64-pc-linux-gnu)
> with make and make -k check with no new failures.  This functionality is
> already tested by gcc.target/nvptx/popc-[123].c.

So I compared '-fdump-rtl-all' and '*.s' of current vs. patched for those
three '*.c' files.  It is expected that I only see '(popcount:SI [DI])'
-> '(truncate:SI (popcount:DI [DI]))', but not any actually observable
change, right?

Shouldn't the current erronuous form trigger a '--enable-checking=rtl'
error?

> Ok for mainline?

OK, thanks.


..., and sorry for the great delay!  The chaos that came upon my group
half a year ago, and resulted in having had to switch employers, has not
exactly helped to allow allocating proper time for better learning GCC
back end.  But, fortunately, we've been able to switch employers!


Grüße
 Thomas


> 2023-01-09  Roger Sayle  
>
> gcc/ChangeLog
>   * config/nvptx/nvptx.md (popcount2): Split into...
>   (popcountsi2): define_insn handling SImode popcount.
>   (popcountdi2): define_insn handling DImode popcount, with an
>   explicit truncate:SI to produce an SImode result.
>
> Thanks in advance,
> Roger
> --
>
> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> index 740c4de..461540e 100644
> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -658,11 +658,18 @@
>DONE;
>  })
>  
> -(define_insn "popcount2"
> +(define_insn "popcountsi2"
>[(set (match_operand:SI 0 "nvptx_register_operand" "=R")
> - (popcount:SI (match_operand:SDIM 1 "nvptx_register_operand" "R")))]
> + (popcount:SI (match_operand:SI 1 "nvptx_register_operand" "R")))]
>""
> -  "%.\\tpopc.b%T1\\t%0, %1;")
> +  "%.\\tpopc.b32\\t%0, %1;")
> +
> +(define_insn "popcountdi2"
> +  [(set (match_operand:SI 0 "nvptx_register_operand" "=R")
> + (truncate:SI
> +   (popcount:DI (match_operand:DI 1 "nvptx_register_operand" "R"]
> +  ""
> +  "%.\\tpopc.b64\\t%0, %1;")
>  
>  ;; Multiplication variants
>  


Re: [PATCH] Regenerate opt.urls

2024-04-12 Thread Thomas Schwinge
Hi!

After having received around a dozen more buildbot notifications...

On 2024-04-10T06:46:04-0700, Palmer Dabbelt  wrote:
> On Tue, 09 Apr 2024 07:57:24 PDT (-0700), ishitatsuy...@gmail.com wrote:
>> Fixes: 97069657c4e ("RISC-V: Implement TLS Descriptors.")
>>
>> gcc/ChangeLog:
>>  * config/riscv/riscv.opt.urls: Regenerated.
>> ---
>>  gcc/config/riscv/riscv.opt.urls | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/gcc/config/riscv/riscv.opt.urls 
>> b/gcc/config/riscv/riscv.opt.urls
>> index da31820e234..351f7f0dda2 100644
>> --- a/gcc/config/riscv/riscv.opt.urls
>> +++ b/gcc/config/riscv/riscv.opt.urls
>> @@ -89,3 +89,5 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strncmp)
>>  minline-strlen
>>  UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)
>>
>> +; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
>> +
>
> Thanks.  I had another one over here 
> ,
>  
> but let's go with yours -- I think the actual contents are the same, but 
> I didn't actually run the regenerate script.  So
>
> Reviewed-by: Palmer Dabbelt 
> Acked-by: Palmer Dabbelt 

..., I've now pushed this to trunk branch in
commit c9500083073ff5e0f5c1c9db92d7ce6e51a62919
"Regenerate opt.urls".


Grüße
 Thomas


Re: [PATCH] contrib/check-params-in-docs.py: Ignore target-specific params

2024-04-12 Thread Thomas Schwinge
Hi!

On 2024-04-12T09:08:13+0200, Filip Kastl  wrote:
> On Thu 2024-04-11 20:51:55, Thomas Schwinge wrote:
>> On 2024-04-11T19:52:51+0200, Martin Jambor  wrote:
>> > contrib/check-params-in-docs.py is a script that checks that all
>> > options reported with ./gcc/xgcc -Bgcc --help=param are in
>> > gcc/doc/invoke.texi and vice versa.
>> 
>> Eh, first time I'm hearing about this one!
>> 
>> (a) Shouldn't this be running as part of the GCC build process?
>> 
>> > gcn-preferred-vectorization-factor is in the manual but normally not
>> > reported by --help, probably because I do not have gcn offload
>> > configured.
>> 
>> No, because you've not been building GCC for GCN target.  ;-P
>> 
>> > This patch makes the script silently about this particular
>> > fact.
>> 
>> (b) Shouldn't we instead ignore any '--param's with "gcn" prefix, similar
>> to how that's done for "skip aarch64 params"?
>> 
>> (c) ..., and shouldn't we likewise skip any "x86" ones?
>> 
>> (d) ..., or in fact any target specific ones, following after the generic
>> section?  (Easily achieved with a special marker in
>> 'gcc/doc/invoke.texi', just before:
>> 
>> The following choices of @var{name} are available on AArch64 targets:
>> 
>> ..., and adjusting the 'takewhile' in 'contrib/check-params-in-docs.py'
>> accordingly?

> I've made a patch to address (b), (c), (d).  I didn't adjust takewhile.  I
> chose to do it differently since target-specific params in both invoke.texi 
> and
> --help=params have to be ignored.

Right, I realized that after I had sent my email...

> The downside of this patch is that the script won't complain if someone adds a
> target-specific param and doesn't document it.

Yes, but that's a pre-existing problem -- unless you happened to be
targeting some x86 variant.  The target-specific '--param's will have to
be handled differently.

> What do you think?

Looks like a good incremental improvement to me, thanks!


Grüße
 Thomas


> contrib/check-params-in-docs.py is a script that checks that all options
> reported with gcc --help=params are in gcc/doc/invoke.texi and vice
> versa.
> gcc/doc/invoke.texi lists target-specific params but gcc --help=params
> doesn't.  This meant that the script would mistakenly complain about
> parms missing from --help=params.  Previously, the script was just set
> to ignore aarch64 and gcn params which solved this issue only for x86.
> This patch sets the script to ignore all target-specific params.
>
> contrib/ChangeLog:
>
>   * check-params-in-docs.py: Ignore target specific params.
>
> Signed-off-by: Filip Kastl 
> ---
>  contrib/check-params-in-docs.py | 21 +
>  1 file changed, 13 insertions(+), 8 deletions(-)
>
> diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
> index f7879dd8e08..ccdb8d72169 100755
> --- a/contrib/check-params-in-docs.py
> +++ b/contrib/check-params-in-docs.py
> @@ -38,6 +38,9 @@ def get_param_tuple(line):
>  description = line[i:].strip()
>  return (name, description)
>  
> +def target_specific(param):
> +return param.split('-')[0] in ('aarch64', 'gcn', 'x86')
> +
>  
>  parser = argparse.ArgumentParser()
>  parser.add_argument('texi_file')
> @@ -45,13 +48,16 @@ parser.add_argument('params_output')
>  
>  args = parser.parse_args()
>  
> -ignored = {'logical-op-non-short-circuit', 
> 'gcn-preferred-vectorization-factor'}
> -params = {}
> +ignored = {'logical-op-non-short-circuit'}
> +help_params = {}
>  
>  for line in open(args.params_output).readlines():
>  if line.startswith(' ' * 2) and not line.startswith(' ' * 8):
>  r = get_param_tuple(line)
> -params[r[0]] = r[1]
> +help_params[r[0]] = r[1]
> +
> +# Skip target-specific params
> +help_params = [x for x in help_params.keys() if not target_specific(x)]
>  
>  # Find section in .texi manual with parameters
>  texi = ([x.strip() for x in open(args.texi_file).readlines()])
> @@ -66,14 +72,13 @@ for line in texi:
>  texi_params.append(line[len(token):])
>  break
>  
> -# skip digits
> +# Skip digits
>  texi_params = [x for x in texi_params if not x[0].isdigit()]
> -# skip aarch64 params
> -texi_params = [x for x in texi_params if not x.startswith('aarch64')]
> -sorted_params = sorted(texi_params)
> +# Skip target-specific params
> +texi_params = [x for x in texi_params if not target_specific(x)]
>  
>  texi_set = set(texi_params) - ignored
> -params_set = set(params.keys()) - ignored
> +params_set = set(help_params) - ignored
>  
>  success = True
>  extra = texi_set - params_set
> -- 
> 2.43.1


Re: [PATCH, OpenACC 2.7, v3] Adjust acc_map_data/acc_unmap_data interaction with reference counters

2024-04-12 Thread Thomas Schwinge
Hi Chung-Lin!

On 2024-04-11T22:08:47+0800, Chung-Lin Tang  wrote:
> On 2024/3/15 7:24 PM, Thomas Schwinge wrote:
>> -  if (n->refcount != REFCOUNT_INFINITY)
>> +  if (n->refcount != REFCOUNT_INFINITY
>> +  && n->refcount != REFCOUNT_ACC_MAP_DATA)
>>  n->refcount--;
>>n->dynamic_refcount--;
>>  }
>>  
>> +  /* Mappings created by 'acc_map_data' may only be deleted by
>> + 'acc_unmap_data'.  */
>> +  if (n->refcount == REFCOUNT_ACC_MAP_DATA
>> +  && n->dynamic_refcount == 0)
>> +n->dynamic_refcount = 1;
>> +
>>if (n->refcount == 0)
>>  {
>>bool copyout = (kind == GOMP_MAP_FROM
>> 
>> ..., which really should have the same semantics?  No strong opinion on
>> which of the two variants you now chose.
>
> My guess is that breaking off the REFCOUNT_ACC_MAP_DATA case separately will
> be lighter on any branch predictors (faster performing overall)

Eh, OK...

> so I will
> stick with my version here.


>>>> It's not clear to me why you need this handling -- instead of just
>>>> handling 'REFCOUNT_ACC_MAP_DATA' like 'REFCOUNT_INFINITY' here, that is,
>>>> early 'return'?
>>>>
>>>> Per my understanding, this code is for OpenACC only exercised for
>>>> structured data regions, and it seems strange (unnecessary?) to adjust
>>>> the 'dynamic_refcount' for these for 'acc_map_data'-mapped data?  Or am I
>>>> missing anything?
>>>
>>> No, that is not true. It goes through almost everything through 
>>> gomp_map_vars_existing/_internal.
>>> This is what happens when you acc_create/acc_copyin on a mapping created by 
>>> acc_map_data.

I still don't follow.  If you 'acc_map_data' something, and then
'acc_create' the same memory region, then that's handled, with
'dynamic_refcount', via 'acc_create' -> 'goacc_enter_datum' ->
'goacc_map_var_existing', all in 'libgomp/oacc-mem.c'.  Agree?

>> But I don't understand what you foresee breaking with the following (on
>> top of your v2):
>> 
>> --- a/libgomp/target.c
>> +++ b/libgomp/target.c
>> @@ -476,14 +476,14 @@ gomp_free_device_memory (struct gomp_device_descr 
>> *devicep, void *devptr)
>>  static inline void
>>  gomp_increment_refcount (splay_tree_key k, htab_t *refcount_set)
>>  {
>> -  if (k == NULL || k->refcount == REFCOUNT_INFINITY)
>> +  if (k == NULL
>> +  || k->refcount == REFCOUNT_INFINITY
>> +  || k->refcount == REFCOUNT_ACC_MAP_DATA)
>>  return;
>>  
>>uintptr_t *refcount_ptr = &k->refcount;
>>  
>> -  if (k->refcount == REFCOUNT_ACC_MAP_DATA)
>> -refcount_ptr = &k->dynamic_refcount;
>> -  else if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount))
>> +  if (REFCOUNT_STRUCTELEM_FIRST_P (k->refcount))
>>  refcount_ptr = &k->structelem_refcount;
> ...
>> Can you please show a test case?

That is, a test case where the 'libgomp/target.c:gomp_increment_refcount'
etc. handling is relevant.  Those test cases:

> I have re-tested the patch *without* the gomp_increment/decrement_refcount 
> changes,
> and have these regressions (just to demonstrate what is affected):
> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
> execution test
> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/nested-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> execution test
> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
> execution test
> +FAIL: libgomp.oacc-c/../libgomp.oacc-c-c++-common/pr92854-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> execution test
> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O0  
> execution test
> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/nested-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O2  
> execution test
> +FAIL: libgomp.oacc-c++/../libgomp.oacc-c-c++-common/pr92854-1.c 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -

Re: [PATCH] contrib/check-params-in-docs.py: Ignore gcn-preferred-vectorization-factor

2024-04-11 Thread Thomas Schwinge
Hi!

On 2024-04-11T19:52:51+0200, Martin Jambor  wrote:
> contrib/check-params-in-docs.py is a script that checks that all
> options reported with ./gcc/xgcc -Bgcc --help=param are in
> gcc/doc/invoke.texi and vice versa.

Eh, first time I'm hearing about this one!

(a) Shouldn't this be running as part of the GCC build process?

> gcn-preferred-vectorization-factor is in the manual but normally not
> reported by --help, probably because I do not have gcn offload
> configured.

No, because you've not been building GCC for GCN target.  ;-P

> This patch makes the script silently about this particular
> fact.

(b) Shouldn't we instead ignore any '--param's with "gcn" prefix, similar
to how that's done for "skip aarch64 params"?

(c) ..., and shouldn't we likewise skip any "x86" ones?

(d) ..., or in fact any target specific ones, following after the generic
section?  (Easily achieved with a special marker in
'gcc/doc/invoke.texi', just before:

The following choices of @var{name} are available on AArch64 targets:

..., and adjusting the 'takewhile' in 'contrib/check-params-in-docs.py'
accordingly?


Grüße
 Thomas


> I'll push the patch as obvious momentarily.
>
> Martin
>
>
> contrib/ChangeLog:
>
> 2024-04-11  Martin Jambor  
>
>   * check-params-in-docs.py (ignored): Add
>   gcn-preferred-vectorization-factor.
> ---
>  contrib/check-params-in-docs.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/contrib/check-params-in-docs.py b/contrib/check-params-in-docs.py
> index 623c82284e2..f7879dd8e08 100755
> --- a/contrib/check-params-in-docs.py
> +++ b/contrib/check-params-in-docs.py
> @@ -45,7 +45,7 @@ parser.add_argument('params_output')
>  
>  args = parser.parse_args()
>  
> -ignored = {'logical-op-non-short-circuit'}
> +ignored = {'logical-op-non-short-circuit', 
> 'gcn-preferred-vectorization-factor'}
>  params = {}
>  
>  for line in open(args.params_output).readlines():
> -- 
> 2.44.0


Re: [PATCH, OpenACC 2.7] Connect readonly modifier to points-to analysis

2024-04-11 Thread Thomas Schwinge
Hi Chung-Lin, Richard!

>From me just a few mechanical pieces, see below.  Richard, are you able
to again comment on Chung-Lin's general strategy, as I'm not at all
familiar with those parts of the code?

On 2024-04-03T19:50:55+0800, Chung-Lin Tang  wrote:
> On 2023/10/30 8:46 PM, Richard Biener wrote:
>>>
>>> What Chung-Lin's first patch does is mark the OMP clause for 'x' (not the
>>> 'x' decl itself!) as 'readonly', via a new 'OMP_CLAUSE_MAP_READONLY'
>>> flag.
>>>
>>> The actual optimization then is done in this second patch.  Chung-Lin
>>> found that he could use 'SSA_NAME_POINTS_TO_READONLY_MEMORY' for that.
>>> I don't have much experience with most of the following generic code, so
>>> would appreciate a helping hand, whether that conceptually makes sense as
>>> well as from the implementation point of view:
>
> First of all, I have removed all of the gimplify-stage scanning and setting of
> DECL_POINTS_TO_READONLY and SSA_NAME_POINTS_TO_READONLY_MEMORY (so no changes 
> to
> gimplify.cc now)
>
> I remember this code was an artifact of earlier attempts to allow 
> struct-member
> pointer mappings to also work (e.g. map(readonly:rec.ptr[:N])), but failed 
> anyways.
> I think the omp_data_* member accesses when building child function side
> receiver_refs is blocking points-to analysis from working (didn't try digging 
> deeper)
>
> Also during gimplify, VAR_DECLs appeared to be reused (at least in some 
> cases) for map
> clause decl reference building, so hoping that the variables "happen to be" 
> single-use and
> DECL_POINTS_TO_READONLY relaying into SSA_NAME_POINTS_TO_READONLY_MEMORY does 
> appear to be
> a little risky.
>
> However, for firstprivate pointers processed during omp-low, it appears to be 
> somewhat different.
> (see below description)
>
>> No, I don't think you can use that flag on non-default-defs, nor
>> preserve it on copying.  So
>> it also doesn't nicely extend to DECLs as done by the patch.  We
>> currently _only_ use it
>> for incoming parameters.  When used on arbitrary code you can get to for 
>> example
>> 
>> ptr1(points-to-readony-memory) = &p->x;
>> ... access via ptr1 ...
>> ptr2 = &p->x;
>> ... access via ptr2 ...
>> 
>> where both are your OMP regions differently constrained (the constrain is on 
>> the
>> code in the region, _not_ on the actual protections of the pointed to
>> data, much like
>> for the fortran case).  But now CSE comes along and happily replaces all ptr2
>> with ptr2 in the second region and ... oops!
>
> Richard, I assume what you meant was "happily replaces all ptr2 with ptr1 in 
> the second region"?
>
> That doesn't happen, because during omp-lower/expand, OMP target regions 
> (which is all that
> this applies currently) is separated into different individual child 
> functions.
>
> (Currently, the only "effective" use of DECL_POINTS_TO_READONLY is during 
> omp-lower, when
> for firstprivate pointers (i.e. 'a' here) we set this bit when constructing 
> the first load
> of this pointer)
>
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
> foo (a, a[8]);
> r = a[8];
>   }
>   #pragma acc parallel copyin(readonly: a[:32]) copyout(r)
>   {
> foo (a, a[12]);
> r = a[12];
>   }
>
> After omp-expand (before SSA):
>
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.1 (const struct .omp_data_t.3 & restrict .omp_data_i)
> {
>  ...
>:
>   D.2962 = .omp_data_i->D.2947;
>   a.8 = D.2962;
>   r.1 = (*a.8)[12];
>   foo (a.8, r.1);
>   r.1 = (*a.8)[12];
>   D.2965 = .omp_data_i->r;
>   *D.2965 = r.1;
>   return;
> }
>
> __attribute__((oacc parallel, omp target entrypoint, noclone))
> void main._omp_fn.0 (const struct .omp_data_t.2 & restrict .omp_data_i)
> {
>   ...
>:
>   D.2968 = .omp_data_i->D.2939;
>   a.4 = D.2968;
>   r.0 = (*a.4)[8];
>   foo (a.4, r.0);
>   r.0 = (*a.4)[8];
>   D.2971 = .omp_data_i->r;
>   *D.2971 = r.0;
>   return;
> }
>
> So actually, the creating of DECL_POINTS_TO_READONLY and its relaying to
> SSA_NAME_POINTS_TO_READONLY_MEMORY here, is actually quite similar to a 
> default-def
> for an PARM_DECL, at least conceptually.
>
> (If offloading was structured significantly differently, say if child 
> functions
> were separated much earlier before omp-lowering, than this readonly-modifier 
> might
> possibly be a direct application of 'r' in the "fn spec" attribute)
>
> Other changes since first version of patch include:
> 1) update of C/C++ FE changes to new style in c-family/c-omp.cc
> 2) merging of two if cases in fortran/trans-openmp.cc like Thomas suggested
> 3) Update of readonly-2.c testcase to scan before/after "fre1" pass, to 
> verify removal of a MEM load, also as Thomas suggested.

Thanks!

> I have re-tested this patch using mainline, with no regressions. Is this okay 
> for mainline?

> 2024-04-03  Chung-Lin Tang  
>
> gcc/c-family/ChangeLog:
>
>   * c-omp.cc (c_omp_address_inspector::expand_array_base):
>   Set OMP_CLAUSE_MAP_POINTS_

Re: [PATCH] openmp: Add support for the 'indirect' clause in C/C++

2024-04-11 Thread Thomas Schwinge
Hi!

I've filed <https://gcc.gnu.org/PR114690>
"OpenMP 'indirect' clause: dynamic image loading/unloading" for the
following issue:

On 2023-11-13T12:47:04+0100, Tobias Burnus  wrote:
> On 13.11.23 11:59, Thomas Schwinge wrote:
>>>> Also, for my understanding: why is 'build_indirect_map' done at kernel
>>>> invocation time (here) instead of at image load time?
>>> The splay_tree is generated on the device itself - and we currently do
>>> not start a kernel during GOMP_OFFLOAD_load_image. We could, the
>>> question is whether it makes sense. (Generating the splay_tree on the
>>> host for the device is a hassle and error prone as it needs to use
>>> device pointers at the end.)
>> Hmm.  It seems conceptually cleaner to me to set this up upfront, and
>> avoids potentially slowing down every device kernel invocation (at least
>> another function call, and 'gomp_mutex_lock' check).  Though, I agree
>> this may be "in the noise" with regards to all the other stuff going on
>> in 'gomp_gcn_enter_kernel' and elsewhere...
>
> I think the most common case is GOMP_INDIRECT_ADDR_MAP == NULL.
>
> The question is whether the lock should/could be moved inside  if 
> (!indirect_array)
> or not. Probably yes:
> * doing an atomic load for the outer '!indirect array', work on a local array 
> for
> the build up and only assign it at the end - and just after the lock check 
> again
> whether '!indirect array'.
>
> That way, it is lock free once build but when build there is no race.
>
>> What I just realize, what's also unclear to me is how the current
>> implementation works with regards to several images getting loaded --
>> don't we then overwrite 'GOMP_INDIRECT_ADDR_MAP' instead of
>> (conceptually) appending to it?
>
> Yes, I think that will happen - but it looks as if the same issue exists
> also the other code? I think that's not the first variable that has that
> issue?
>
> I think we should try to cleanup that handling, also to support calling
> a device function in a shared library from a target region in the main
> program, which currently also fails.
>
> All device routines that are in normal static libraries and in the
> object files of the main program should simply work thanks to offload
> LTO such that there is only a single GOMP_offload_register_ver call (per
> device type) and GOMP_OFFLOAD_load_image call (per device).
>
> Likewise if the offloading is only done via a single shared library. —
> Any mixing will currently fail, unfortunately. This patch just adds
> another item which does not handle it properly.
>
> (Not good but IMHO also not a showstopper for this patch.)
>
>> In the general case, additional images may also get loaded during
>> execution.  We thus need proper locking of the shared data structure, uh?
>> Or, can we have separate on-device data structures per image?  (I've not
>> yet thought about that in detail.)
>
> I think we could - but in the main-program 'omp target' case that calls
> a shared-library 'declare target' function means that we need to handle
> multiple GOMP_offload_register_ver / load_image calls such that they can
> work together.
>
> Obviously, it gets harder if the user keeps doing dlopen() / dlclose()
> of libraries containing offload code where a target/compute region is
> run before, between, and after those calls (but hopefully not running
> when calling dlopen/dlclose).
>
>> Relatedly then, when images are unloaded, we also need to remove stale
>> items from the table, and release resources (for example, the
>> 'GOMP_OFFLOAD_alloc' for 'map_target_addr').
>
> True. I think the general assumption is that images only get unloaded at
> the very end, which matches most but not all code. Yet another work item.
>
> I think we should open a new PR about this topic and collect work items
> there.


Grüße
 Thomas


Regeneration of 'gcc/config/riscv/riscv.opt.urls' (was: [PATCH v2 2/3] aarch64: Add support for aarch64-gnu (GNU/Hurd on AArch64))

2024-04-10 Thread Thomas Schwinge
Hi!

On 2024-04-09T09:24:29-0700, Palmer Dabbelt  wrote:
> On Tue, 09 Apr 2024 01:04:34 PDT (-0700), buga...@gmail.com wrote:
>> On Tue, Apr 9, 2024 at 10:27 AM Thomas Schwinge  
>> wrote:
>>> Thanks, pushed to trunk branch:
>>>
>>>   - commit 532c57f8c3a15b109a46d3e2b14d60a5c40979d5 "Move GNU/Hurd 
>>> startfile spec from config/i386/gnu.h to config/gnu.h"
>>>   - commit 9670a2326333caa8482377c00beb65723b7b4b26 "aarch64: Add support 
>>> for aarch64-gnu (GNU/Hurd on AArch64)"
>>>   - commit 46c91665f4bceba19aed56f5bd6e934c548b84ff "libgcc: Add basic 
>>> support for aarch64-gnu (GNU/Hurd on AArch64)"
>>
>> \o/ Thanks a lot!
>>
>> This will unblock merging the aarch64-gnu glibc port upstream.

\o/


>> I assume the buildbot failure that I just got an email about is
>> unrelated; it's failing on some RISC-V thing.
>
> Sorry if I missed something here, do you have a pointer?

<https://inbox.sourceware.org/20240409074850.ed7bd3858...@sourceware.org>
and several more such messages, requesting:

--- a/gcc/config/riscv/riscv.opt.urls
+++ b/gcc/config/riscv/riscv.opt.urls
@@ -89,3 +89,5 @@ UrlSuffix(gcc/RISC-V-Options.html#index-minline-strncmp)
 minline-strlen
 UrlSuffix(gcc/RISC-V-Options.html#index-minline-strlen)
 
+; skipping UrlSuffix for 'mtls-dialect=' due to finding no URLs
+

To be fixed by
<https://inbox.sourceware.org/20240409145724.9640-1-ishitatsuy...@gmail.com>
"Regenerate opt.urls".


Grüße
 Thomas


  1   2   3   4   5   6   7   8   9   10   >