from:"Tobias Burnus"

[gcc r15-2381] libgomp.texi: Update 'Device Information Routines' section

2024-07-29 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:8d3325708c107d20d41f0bddb0ad161c18934561

commit r15-2381-g8d3325708c107d20d41f0bddb0ad161c18934561
Author: Tobias Burnus 
Date:   Mon Jul 29 17:43:42 2024 +0200

libgomp.texi: Update 'Device Information Routines' section

Update 'OpenMP Runtime Library Routines' by adding a note that invoking
inside a target region might invoke unspecified behavior. Additionally,
update omp_{get,set}_default_device for omp_{initial,invalid}_device
named constants.

libgomp/ChangeLog:

* libgomp.texi (OpenMP Runtime Library Routines): Add missing
title to some commented still undocumented items.
(Device Information Routines): Update.

Diff:
---
 libgomp/libgomp.texi | 48 +---
 1 file changed, 33 insertions(+), 15 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 50da248b74db..07cd75124b07 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -1208,11 +1208,11 @@ They have C linkage and do not throw exceptions.
 
 @menu
 * omp_get_proc_bind::   Whether threads may be moved between CPUs
-@c * omp_get_num_places:: 
-@c * omp_get_place_num_procs:: 
-@c * omp_get_place_proc_ids:: 
-@c * omp_get_place_num:: 
-@c * omp_get_partition_num_places:: 
+@c * omp_get_num_places::   Get the number of places available
+@c * omp_get_place_num_procs::  Get the number of processes associated with a 
place
+@c * omp_get_place_proc_ids::   Get number of processes associated with a place
+@c * omp_get_place_num::Get place number of the associated task
+@c * omp_get_partition_num_places:: Get number of places of innermost task
 @c * omp_get_partition_place_nums:: 
 @c * omp_set_affinity_format:: 
 @c * omp_get_affinity_format:: 
@@ -1627,8 +1627,12 @@ Returns the number of processors online on that device.
 @subsection @code{omp_set_default_device} -- Set the default device for target 
regions
 @table @asis
 @item @emph{Description}:
-Set the default device for target regions without device clause.  The argument
-shall be a nonnegative device number.
+Get the value of the @emph{default-device-var} ICV, which is used
+for target regions without a device clause.  The argument
+shall be a nonnegative device number, @code{omp_initial_device},
+or @code{omp_invalid_device}.
+
+The effect of running this routine in a @code{target} region is unspecified.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
@@ -1654,7 +1658,15 @@ shall be a nonnegative device number.
 @subsection @code{omp_get_default_device} -- Get the default device for target 
regions
 @table @asis
 @item @emph{Description}:
-Get the default device for target regions without device clause.
+Get the value of the @emph{default-device-var} ICV, which is used
+for target regions without a device clause. The value is either a
+nonnegative device number, @code{omp_initial_device} or
+@code{omp_invalid_device}. Note that for the host, the ICV can have two values
+and, hence, this routine might return either the value of the named constant
+@code{omp_initial_device} or the value returned by the
+@code{omp_get_initial_device} routine.
+
+The effect of running this routine in a @code{target} region is unspecified.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
@@ -1667,7 +1679,8 @@ Get the default device for target regions without device 
clause.
 @end multitable
 
 @item @emph{See also}:
-@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
+@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device},
+@ref{omp_get_initial_device}
 
 @item @emph{Reference}:
 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
@@ -1681,6 +1694,8 @@ Get the default device for target regions without device 
clause.
 @item @emph{Description}:
 Returns the number of target devices.
 
+The effect of running this routine in a @code{target} region is unspecified.
+
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
@@ -1702,9 +1717,9 @@ Returns the number of target devices.
 @table @asis
 @item @emph{Description}:
 This function returns a device number that represents the device that the
-current thread is executing on. For OpenMP 5.0, this must be equal to the
-value returned by the @code{omp_get_initial_device} function when called
-from the host.
+current thread is executing on. When called on the host, it returns
+the same value as returned by the @code{omp_get_initial_device} function
+as required since OpenMP 5.0.
 
 @item @emph{C/C++}
 @multitable @columnfractions .20 .80
@@ -1754,9 +1769,11 @@ their language-specific counterparts.
 @table @asis
 @item @emph{Description}:
 This function returns a device number that represents the host device.
-For OpenMP 5.1, this must be equal to the value returned by the
+Since OpenMP 5.1, this is equal to the value returned by the
 @code{omp_get_num_devices} function

[Patch] libgomp.texi: Update 'Device Information Routines' section

2024-07-29 Thread Tobias Burnus


I recently stumbled over omp_get_default_device returning -1 (= 
omp_initial_device)
vs. returning omp_get_num_devices(). Thus, it makes sense to document this 
properly.
I also updated some wording and made a tiny step to documenting the missing 
functions
by adding a title to the commented @menu items.

→ https://gcc.gnu.org/onlinedocs/libgomp/#toc-OpenMP-Runtime-Library-Routines
for the current wording.

Comments or suggestions before I commit it?

Tobias
libgomp.texi: Update 'Device Information Routines' section

Update 'OpenMP Runtime Library Routines' by adding a note that invoking
inside a target region might invoke unspecified behavior. Additionally,
update omp_{get,set}_default_device for omp_{initial,invalid}_device
named constants.

libgomp/ChangeLog:

	* libgomp.texi (OpenMP Runtime Library Routines): Add missing
	title to some commented still undocumented items.
	(Device Information Routines): Update.

 libgomp/libgomp.texi | 48 +---
 1 file changed, 33 insertions(+), 15 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 50da248b74d..8fe74d58562 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -1208,11 +1208,11 @@ They have C linkage and do not throw exceptions.
 
 @menu
 * omp_get_proc_bind::   Whether threads may be moved between CPUs
-@c * omp_get_num_places:: 
-@c * omp_get_place_num_procs:: 
-@c * omp_get_place_proc_ids:: 
-@c * omp_get_place_num:: 
-@c * omp_get_partition_num_places:: 
+@c * omp_get_num_places::   Get the number of places available
+@c * omp_get_place_num_procs::  Get the number of processes associated with a place
+@c * omp_get_place_proc_ids::   Get number of processes associated with a place
+@c * omp_get_place_num::Get place number of the associated task
+@c * omp_get_partition_num_places:: Get number of places of innermost task
 @c * omp_get_partition_place_nums:: 
 @c * omp_set_affinity_format:: 
 @c * omp_get_affinity_format:: 
@@ -1627,8 +1627,12 @@ Returns the number of processors online on that device.
 @subsection @code{omp_set_default_device} -- Set the default device for target regions
 @table @asis
 @item @emph{Description}:
-Set the default device for target regions without device clause.  The argument
-shall be a nonnegative device number.
+Get the value of the @emph{default-device-var} ICV, which is used
+for target regions without device clause.  The argument
+shall be a nonnegative device number, @code{omp_initial_device},
+or @code{omp_invalid_device}.
+
+The effect of running this routine in a @code{target} region is unspecified.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
@@ -1654,7 +1658,15 @@ shall be a nonnegative device number.
 @subsection @code{omp_get_default_device} -- Get the default device for target regions
 @table @asis
 @item @emph{Description}:
-Get the default device for target regions without device clause.
+Get the value of the @emph{default-device-var} ICV, which is used
+for target regions without device clause. The value is either a
+nonnegative device number, @code{omp_initial_device} or
+@code{omp_invalid_device}. Note that for the host, the ICV can have two values
+and, hence, this routine might return either the value of the named constant
+@code{omp_initial_device} or the value returned by the
+@code{omp_get_initial_device} routine.
+
+The effect of running this routine in a @code{target} region is unspecified.
 
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
@@ -1667,7 +1679,8 @@ Get the default device for target regions without device clause.
 @end multitable
 
 @item @emph{See also}:
-@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device}
+@ref{OMP_DEFAULT_DEVICE}, @ref{omp_set_default_device},
+@ref{omp_get_initial_device}
 
 @item @emph{Reference}:
 @uref{https://www.openmp.org, OpenMP specification v4.5}, Section 3.2.30.
@@ -1681,6 +1694,8 @@ Get the default device for target regions without device clause.
 @item @emph{Description}:
 Returns the number of target devices.
 
+The effect of running this routine in a @code{target} region is unspecified.
+
 @item @emph{C/C++}:
 @multitable @columnfractions .20 .80
 @item @emph{Prototype}: @tab @code{int omp_get_num_devices(void);}
@@ -1702,9 +1717,9 @@ Returns the number of target devices.
 @table @asis
 @item @emph{Description}:
 This function returns a device number that represents the device that the
-current thread is executing on. For OpenMP 5.0, this must be equal to the
-value returned by the @code{omp_get_initial_device} function when called
-from the host.
+current thread is executing on. When called on the host, it returns
+the same value as returned by the @code{omp_get_initial_device} function
+as required since OpenMP 5.0.
 
 @item @emph{C/C++}
 @multitable @columnfractions .20 .80
@@ -1754,9 +1769,11 @@ their language-specific counterparts.
 @table @asis
 @item @emph{Description}:
 This function returns a device number that

[gcc/devel/omp/gcc-14] Merge remote-tracking branch 'origin/releases/gcc-14' into devel/omp/gcc-14

2024-07-29 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:8ad1a509662a9af828600d053652d6d0f414027c

commit 8ad1a509662a9af828600d053652d6d0f414027c
Merge: 9e05aff533d8 98baaa17561c
Author: Tobias Burnus 
Date:   Mon Jul 29 12:52:36 2024 +0200

Merge remote-tracking branch 'origin/releases/gcc-14' into devel/omp/gcc-14

Merge up to commit r14-10515-g98baaa17561ca2 (29th July 2024)

Diff:

 gcc/ChangeLog  |  131 +
 gcc/DATESTAMP  |2 +-
 gcc/config/i386/avx512dqintrin.h   |   16 +-
 gcc/config/i386/avx512vlbwintrin.h |4 +-
 gcc/config/i386/avx512vlintrin.h   |2 +-
 gcc/config/i386/i386.md|2 +-
 gcc/config/riscv/bitmanip.md   |2 +-
 gcc/config/rs6000/rs6000-logue.cc  |   47 +-
 gcc/config/rs6000/rs6000.cc|   12 +
 gcc/config/rs6000/rs6000.md|6 +-
 gcc/cp/ChangeLog   |   21 +
 gcc/cp/call.cc |2 +-
 gcc/cp/coroutines.cc   |   18 +-
 gcc/ipa-icf-gimple.cc  |4 +
 gcc/ipa-inline.cc  |   79 +-
 gcc/ipa-modref.cc  |   16 +-
 gcc/ipa-prop.cc|4 +-
 gcc/po/ChangeLog   |4 +
 gcc/po/gcc.pot | 7713 ++--
 gcc/testsuite/ChangeLog|  132 +
 .../g++.dg/coroutines/pr104981-preview-this.C  |   34 +
 .../g++.dg/coroutines/pr115550-preview-this.C  |   47 +
 .../g++.dg/cpp23/explicit-obj-diagnostics11.C  |   48 +
 gcc/testsuite/g++.target/powerpc/pr106069.C|2 +-
 gcc/testsuite/gcc.c-torture/compile/pr115277.c |   28 +
 gcc/testsuite/gcc.c-torture/execute/pr114207.c |   23 +
 gcc/testsuite/gcc.c-torture/execute/pr115033.c |   35 +
 gcc/testsuite/gcc.c-torture/pr111613.c |   29 +
 gcc/testsuite/gcc.dg/pr116034.c|   23 +
 gcc/testsuite/gcc.dg/vect/tsvc/vect-tsvc-s1281.c   |3 +
 .../gcc.target/i386/avx512bw-vpalignr-1b.c |   18 +
 .../gcc.target/i386/avx512dq-vfpclasssd-1b.c   |   14 +
 .../gcc.target/i386/avx512dq-vfpcla-1b.c   |   14 +
 .../gcc.target/i386/avx512dq-vreducesd-1b.c|   16 +
 .../gcc.target/i386/avx512dq-vreducess-1b.c|   16 +
 .../gcc.target/i386/avx512vl-valignq-1b.c  |   15 +
 gcc/testsuite/gcc.target/i386/prefetchi-1.c|4 +-
 gcc/testsuite/gcc.target/powerpc/pr114759-2.c  |   17 +
 gcc/testsuite/gcc.target/powerpc/pr114759-3.c  |   21 +
 gcc/testsuite/gcc.target/powerpc/pr115389.c|   17 +
 gcc/testsuite/gcc.target/riscv/pr116035-1.c|   29 +
 gcc/testsuite/gcc.target/riscv/pr116035-2.c|   26 +
 gcc/testsuite/lib/target-supports.exp  |2 +-
 gcc/tree-ssa.cc|5 +-
 libstdc++-v3/ChangeLog |   17 +
 libstdc++-v3/src/c++23/print.cc|8 +-
 46 files changed, 4796 insertions(+), 3932 deletions(-)

[gcc/devel/omp/gcc-14] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR115559]

2024-07-29 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:9e05aff533d8c506aa6a99a1b9ac5c1743862af7

commit 9e05aff533d8c506aa6a99a1b9ac5c1743862af7
Author: Tobias Burnus 
Date:   Mon Jul 29 12:52:11 2024 +0200

OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause 
[PR115559]

Contrary to a normal 'declare target', the 'declare target link' attribute
also needs to set node->offloadable and push the offload_vars in the front 
end.

Linked variables require that the data is mapped. For module variables, this
can happen anywhere. For variables in an external subprograms or the main
programm, this can only happen in the either that program itself or in an
internal subprogram. - Whether a variable is just normally mapped or linked 
then
becomes relevant if a device routine exists that can access that variable,
i.e. an internal procedure has then to be marked as declare target.

PR fortran/115559

gcc/fortran/ChangeLog:

* trans-common.cc (build_common_decl): Add 'omp declare target' and
'omp declare target link' variables to offload_vars.
* trans-decl.cc (add_attributes_to_decl): Likewise; update args and
call decl_attributes.
(get_proc_pointer_decl, gfc_get_extern_function_decl,
build_function_decl): Update calls.
(gfc_get_symbol_decl): Likewise; move after 'DECL_STATIC (t)=1'
to avoid errors with symtab_node::get_create.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/declare-target-link.f90: New test.

(cherry picked from commit 29b1587e7d34667a1fd63071c1e4f5475cd71026)

Diff:
---
 gcc/fortran/ChangeLog.omp  |  15 +++
 gcc/fortran/trans-common.cc|  21 
 gcc/fortran/trans-decl.cc  |  81 +-
 libgomp/ChangeLog.omp  |   8 ++
 .../libgomp.fortran/declare-target-link.f90| 116 +
 5 files changed, 215 insertions(+), 26 deletions(-)

diff --git a/gcc/fortran/ChangeLog.omp b/gcc/fortran/ChangeLog.omp
index 917fad1de90a..31470f4852e4 100644
--- a/gcc/fortran/ChangeLog.omp
+++ b/gcc/fortran/ChangeLog.omp
@@ -1,3 +1,18 @@
+2024-07-29  Tobias Burnus  
+
+   Backported from master:
+   2024-07-29  Tobias Burnus  
+
+   PR fortran/115559
+   * trans-common.cc (build_common_decl): Add 'omp declare target' and
+   'omp declare target link' variables to offload_vars.
+   * trans-decl.cc (add_attributes_to_decl): Likewise; update args and
+   call decl_attributes.
+   (get_proc_pointer_decl, gfc_get_extern_function_decl,
+   build_function_decl): Update calls.
+   (gfc_get_symbol_decl): Likewise; move after 'DECL_STATIC (t)=1'
+   to avoid errors with symtab_node::get_create.
+
 2024-07-03  Thomas Schwinge  
 
* class.cc (generate_callback_wrapper) [GCC_NVPTX_H]: Disable.
diff --git a/gcc/fortran/trans-common.cc b/gcc/fortran/trans-common.cc
index 5f44e7bd663d..e714342c3c0b 100644
--- a/gcc/fortran/trans-common.cc
+++ b/gcc/fortran/trans-common.cc
@@ -98,6 +98,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "tm.h"
 #include "tree.h"
+#include "cgraph.h"
+#include "context.h"
+#include "omp-offload.h"
 #include "gfortran.h"
 #include "trans.h"
 #include "stringpool.h"
@@ -497,6 +500,24 @@ build_common_decl (gfc_common_head *com, tree union_type, 
bool is_init)
  = tree_cons (get_identifier ("omp declare target"),
   omp_clauses, DECL_ATTRIBUTES (decl));
 
+  if (com->omp_declare_target_link || com->omp_declare_target)
+   {
+ /* Add to offload_vars; get_create does so for omp_declare_target,
+omp_declare_target_link requires manual work.  */
+ gcc_assert (symtab_node::get (decl) == 0);
+ symtab_node *node = symtab_node::get_create (decl);
+ if (node != NULL && com->omp_declare_target_link)
+   {
+ node->offloadable = 1;
+ if (ENABLE_OFFLOADING)
+   {
+ g->have_offload = true;
+ if (is_a  (node))
+   vec_safe_push (offload_vars, decl);
+   }
+   }
+   }
+
   /* Place the back end declaration for this common block in
  GLOBAL_BINDING_LEVEL.  */
   gfc_map_of_all_commons[identifier] = pushdecl_top_level (decl);
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 643c93f36ee8..019e845bf9f0 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -46,7 +46,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans-stmt.h"
 #include "gomp-constants.h"
 #include "gimplify.h"
+#include "context.h"
 #inclu

[gcc/devel/omp/gcc-14] libgomp: Fix declare target link with offset array-section mapping [PR116107]

2024-07-29 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:c9e52a1a3d2c2970065c254a414bab76f798ce7d

commit c9e52a1a3d2c2970065c254a414bab76f798ce7d
Author: Tobias Burnus 
Date:   Mon Jul 29 12:50:56 2024 +0200

libgomp: Fix declare target link with offset array-section mapping 
[PR116107]

Assume that 'int var[100]' is 'omp declare target link(var)'. When now
mapping an array section with offset such as 'map(to:var[20:10])',
the device-side link pointer has to store &[0] minus
the offset such that var[20] will access [0]. But
the offset calculation was missed such that the device-side 'var' pointed
to the first element of the mapped data - and var[20] points beyond at
some invalid memory.

PR middle-end/116107

libgomp/ChangeLog:

* target.c (gomp_map_vars_internal): Honor array mapping offsets
with declare-target 'link' variables.
* testsuite/libgomp.c-c++-common/target-link-2.c: New test.

(cherry picked from commit 14c47e7eb06e8b95913794f6059560fc2fa6de91)

Diff:
---
 libgomp/ChangeLog  | 11 
 libgomp/target.c   |  7 ++-
 .../testsuite/libgomp.c-c++-common/target-link-2.c | 59 ++
 3 files changed, 75 insertions(+), 2 deletions(-)

diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 555f1f126f26..aaa2e8defe3e 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,14 @@
+-- libgomp/ChangeLog -- 
+2024-07-29  Tobias Burnus  
+
+   Backported from master:
+   2024-07-29  Tobias Burnus  
+
+   PR middle-end/116107
+   * target.c (gomp_map_vars_internal): Honor array mapping offsets
+   with declare-target 'link' variables.
+   * testsuite/libgomp.c-c++-common/target-link-2.c: New test.
+
 2024-05-07  Jakub Jelinek  
 
Backported from master:
diff --git a/libgomp/target.c b/libgomp/target.c
index caa501c27acb..eb02d478e109 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1892,8 +1892,11 @@ gomp_map_vars_internal (struct gomp_device_descr 
*devicep,
if (k->aux && k->aux->link_key)
  {
/* Set link pointer on target to the device address of the
-  mapped object.  */
-   void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset);
+  mapped object.  Also deal with offsets due to
+  array-section mapping.  */
+   void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset
+  - (k->host_start
+ - 
k->aux->link_key->host_start));
/* We intentionally do not use coalescing here, as it's not
   data allocated by the current call to this function.  */
gomp_copy_host2dev (devicep, aq, (void *) n->tgt_offset,
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c 
b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
new file mode 100644
index ..15da1656ebf9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
@@ -0,0 +1,59 @@
+/* { dg-do run }  */
+/* PR middle-end/116107  */
+
+#include 
+
+int arr[15] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+#pragma omp declare target link(arr)
+
+#pragma omp begin declare target
+void f(int *res)
+{
+  __builtin_memcpy (res, [5], sizeof(int)*10);
+}
+
+void g(int *res)
+{
+  __builtin_memcpy (res, [3], sizeof(int)*10);
+}
+#pragma omp end declare target
+
+int main()
+{
+  int res[10], res2;
+  for (int dev = 0; dev < omp_get_num_devices(); dev++)
+{
+  __builtin_memset (res, 0, sizeof (res));
+  res2 = 99;
+
+  #pragma omp target enter data map(arr[5:10]) device(dev)
+
+  #pragma omp target map(from: res) device(dev)
+   f (res);
+
+  #pragma omp target map(from: res2) device(dev)
+   res2 = arr[5];
+
+  if (res2 != 6)
+   __builtin_abort ();
+  for (int i = 0; i < 10; i++)
+   if (res[i] != 6 + i)
+ __builtin_abort ();
+
+  #pragma omp target exit data map(release:arr[5:10]) device(dev)
+
+  for (int i = 0; i < 15; i++)
+   arr[i] *= 10;
+  __builtin_memset (res, 0, sizeof (res));
+
+  #pragma omp target enter data map(arr[3:10]) device(dev)
+
+  #pragma omp target map(from: res) device(dev)
+   g (res);
+
+  for (int i = 0; i < 10; i++)
+   if (res[i] != (4 + i)*10)
+ __builtin_abort ();
+}
+  return 0;
+}

[gcc/devel/omp/gcc-14] (33 commits) Merge remote-tracking branch 'origin/releases/gcc-14' into

2024-07-29 Thread Tobias Burnus via Gcc-cvs

The branch 'devel/omp/gcc-14' was updated to point to:

 8ad1a509662a... Merge remote-tracking branch 'origin/releases/gcc-14' into 

It previously pointed to:

 b71fc8d1382b... Merge remote-tracking branch 'origin/releases/gcc-14' into 

Diff:

Summary of changes (added commits):
---

  8ad1a50... Merge remote-tracking branch 'origin/releases/gcc-14' into 
  9e05aff... OpenMP/Fortran: Fix handling of 'declare target' with 'link
  c9e52a1... libgomp: Fix declare target link with offset array-section 
  98baaa1... Fix ICE with -fdump-tree-moref (*)
  affb2e8... i386: Fix AVX512 intrin macro typo (*)
  b858a51... Daily bump. (*)
  c3eef3d... Daily bump. (*)
  8eae5b0... Daily bump. (*)
  92eb0ee... Daily bump. (*)
  a32aff1... Regenerate gcc.pot (*)
  a7f07e5... Daily bump. (*)
  181f40f... testsuite: Fix up pr116034.c test for big/pdp endian [PR116 (*)
  ab03866... RISC-V: Disable Zba optimization pattern if XTheadMemIdx is (*)
  ae2909a... Daily bump. (*)
  a544898... testsuite: Disable finite math only for test  [PR115826] (*)
  b41487a... libstdc++: Use [[maybe_unused]] attribute in src/c++23/prin (*)
  5fad887... libstdc++: Do not use isatty on avr [PR115482] (*)
  084768c... ssa: Fix up maybe_rewrite_mem_ref_base complex type handlin (*)
  81f356f... i386: Change prefetchi output template (*)
  109b389... [powerpc] [testsuite] reorder dg directives [PR106069] (*)
  066c789... c++/coroutines: correct passing *this to promise type [PR10 (*)
  50ff112... c++: xobj fn call without obj [PR115783] (*)
  dfae324... Daily bump. (*)
  9ddd5f8... Fix modref's iteraction with store merging (*)
  bd535b4... rs6000: Catch unsupported ABI errors when using -mrop-prote (*)
  35e5c2d... rs6000: Error on CPUs and ABIs that don't support the ROP p (*)
  e2d746e... rs6000: ROP - Emit hashst and hashchk insns on Power8 and l (*)
  33ebeb2... rs6000: Compute rop_hash_save_offset for non-Altivec compil (*)
  c33532c... rs6000: Update ELFv2 stack frame comment showing the correc (*)
  27ef3a0... Fix modref_eaf_analysis::analyze_ssa_name handling of value (*)
  f2e9808... Fix accounting of offsets in unadjusted_ptr_and_unit_offset (*)
  c5397d3... Compare loop bounds in ipa-icf (*)
  9a7d668... Reduce recursive inlining of always_inline functions (*)

(*) This commit already exists in another branch.
Because the reference `refs/heads/devel/omp/gcc-14' matches
your hooks.email-new-commits-only configuration,
no separate email is sent for this commit.

[gcc r15-2378] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR115559]

2024-07-29 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:29b1587e7d34667a1fd63071c1e4f5475cd71026

commit r15-2378-g29b1587e7d34667a1fd63071c1e4f5475cd71026
Author: Tobias Burnus 
Date:   Mon Jul 29 11:46:57 2024 +0200

OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause 
[PR115559]

Contrary to a normal 'declare target', the 'declare target link' attribute
also needs to set node->offloadable and push the offload_vars in the front 
end.

Linked variables require that the data is mapped. For module variables, this
can happen anywhere. For variables in an external subprograms or the main
programm, this can only happen in the either that program itself or in an
internal subprogram. - Whether a variable is just normally mapped or linked 
then
becomes relevant if a device routine exists that can access that variable,
i.e. an internal procedure has then to be marked as declare target.

PR fortran/115559

gcc/fortran/ChangeLog:

* trans-common.cc (build_common_decl): Add 'omp declare target' and
'omp declare target link' variables to offload_vars.
* trans-decl.cc (add_attributes_to_decl): Likewise; update args and
call decl_attributes.
(get_proc_pointer_decl, gfc_get_extern_function_decl,
build_function_decl): Update calls.
(gfc_get_symbol_decl): Likewise; move after 'DECL_STATIC (t)=1'
to avoid errors with symtab_node::get_create.

libgomp/ChangeLog:

* testsuite/libgomp.fortran/declare-target-link.f90: New test.

Diff:
---
 gcc/fortran/trans-common.cc|  21 
 gcc/fortran/trans-decl.cc  |  81 +-
 .../libgomp.fortran/declare-target-link.f90| 116 +
 3 files changed, 192 insertions(+), 26 deletions(-)

diff --git a/gcc/fortran/trans-common.cc b/gcc/fortran/trans-common.cc
index 5f44e7bd663d..e714342c3c0b 100644
--- a/gcc/fortran/trans-common.cc
+++ b/gcc/fortran/trans-common.cc
@@ -98,6 +98,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "tm.h"
 #include "tree.h"
+#include "cgraph.h"
+#include "context.h"
+#include "omp-offload.h"
 #include "gfortran.h"
 #include "trans.h"
 #include "stringpool.h"
@@ -497,6 +500,24 @@ build_common_decl (gfc_common_head *com, tree union_type, 
bool is_init)
  = tree_cons (get_identifier ("omp declare target"),
   omp_clauses, DECL_ATTRIBUTES (decl));
 
+  if (com->omp_declare_target_link || com->omp_declare_target)
+   {
+ /* Add to offload_vars; get_create does so for omp_declare_target,
+omp_declare_target_link requires manual work.  */
+ gcc_assert (symtab_node::get (decl) == 0);
+ symtab_node *node = symtab_node::get_create (decl);
+ if (node != NULL && com->omp_declare_target_link)
+   {
+ node->offloadable = 1;
+ if (ENABLE_OFFLOADING)
+   {
+ g->have_offload = true;
+ if (is_a  (node))
+   vec_safe_push (offload_vars, decl);
+   }
+   }
+   }
+
   /* Place the back end declaration for this common block in
  GLOBAL_BINDING_LEVEL.  */
   gfc_map_of_all_commons[identifier] = pushdecl_top_level (decl);
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 82fa2bb61349..0fdc41b1784b 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -46,7 +46,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans-stmt.h"
 #include "gomp-constants.h"
 #include "gimplify.h"
+#include "context.h"
 #include "omp-general.h"
+#include "omp-offload.h"
 #include "attr-fnspec.h"
 #include "tree-iterator.h"
 #include "dependency.h"
@@ -1472,19 +1474,18 @@ gfc_add_assign_aux_vars (gfc_symbol * sym)
 }
 
 
-static tree
-add_attributes_to_decl (symbol_attribute sym_attr, tree list)
+static void
+add_attributes_to_decl (tree *decl_p, const gfc_symbol *sym)
 {
   unsigned id;
-  tree attr;
+  tree list = NULL_TREE;
+  symbol_attribute sym_attr = sym->attr;
 
   for (id = 0; id < EXT_ATTR_NUM; id++)
 if (sym_attr.ext_attr & (1 << id) && ext_attr_list[id].middle_end_name)
   {
-   attr = build_tree_list (
-get_identifier (ext_attr_list[id].middle_end_name),
-NULL_TREE);
-   list = chainon (list, attr);
+   tree ident = get_identifier (ext_attr_list[id].middle_end_name);
+   list = tree_cons (ident, NULL_TREE, list);
   }
 
   tree clauses = NULL_TREE;
@@ -1547,6 +1548,7 @@ add_attributes_to_decl (symbol_attribute sym_

[gcc r15-2377] libgomp: Fix declare target link with offset array-section mapping [PR116107]

2024-07-29 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:14c47e7eb06e8b95913794f6059560fc2fa6de91

commit r15-2377-g14c47e7eb06e8b95913794f6059560fc2fa6de91
Author: Tobias Burnus 
Date:   Mon Jul 29 11:40:38 2024 +0200

libgomp: Fix declare target link with offset array-section mapping 
[PR116107]

Assume that 'int var[100]' is 'omp declare target link(var)'. When now
mapping an array section with offset such as 'map(to:var[20:10])',
the device-side link pointer has to store &[0] minus
the offset such that var[20] will access [0]. But
the offset calculation was missed such that the device-side 'var' pointed
to the first element of the mapped data - and var[20] points beyond at
some invalid memory.

PR middle-end/116107

libgomp/ChangeLog:

* target.c (gomp_map_vars_internal): Honor array mapping offsets
with declare-target 'link' variables.
* testsuite/libgomp.c-c++-common/target-link-2.c: New test.

Diff:
---
 libgomp/target.c   |  7 ++-
 .../testsuite/libgomp.c-c++-common/target-link-2.c | 59 ++
 2 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index aa01c1367b98..efed6ad68ff4 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1820,8 +1820,11 @@ gomp_map_vars_internal (struct gomp_device_descr 
*devicep,
if (k->aux && k->aux->link_key)
  {
/* Set link pointer on target to the device address of the
-  mapped object.  */
-   void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset);
+  mapped object.  Also deal with offsets due to
+  array-section mapping.  */
+   void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset
+  - (k->host_start
+ - 
k->aux->link_key->host_start));
/* We intentionally do not use coalescing here, as it's not
   data allocated by the current call to this function.  */
gomp_copy_host2dev (devicep, aq, (void *) n->tgt_offset,
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c 
b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
new file mode 100644
index ..15da1656ebf9
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
@@ -0,0 +1,59 @@
+/* { dg-do run }  */
+/* PR middle-end/116107  */
+
+#include 
+
+int arr[15] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+#pragma omp declare target link(arr)
+
+#pragma omp begin declare target
+void f(int *res)
+{
+  __builtin_memcpy (res, [5], sizeof(int)*10);
+}
+
+void g(int *res)
+{
+  __builtin_memcpy (res, [3], sizeof(int)*10);
+}
+#pragma omp end declare target
+
+int main()
+{
+  int res[10], res2;
+  for (int dev = 0; dev < omp_get_num_devices(); dev++)
+{
+  __builtin_memset (res, 0, sizeof (res));
+  res2 = 99;
+
+  #pragma omp target enter data map(arr[5:10]) device(dev)
+
+  #pragma omp target map(from: res) device(dev)
+   f (res);
+
+  #pragma omp target map(from: res2) device(dev)
+   res2 = arr[5];
+
+  if (res2 != 6)
+   __builtin_abort ();
+  for (int i = 0; i < 10; i++)
+   if (res[i] != 6 + i)
+ __builtin_abort ();
+
+  #pragma omp target exit data map(release:arr[5:10]) device(dev)
+
+  for (int i = 0; i < 15; i++)
+   arr[i] *= 10;
+  __builtin_memset (res, 0, sizeof (res));
+
+  #pragma omp target enter data map(arr[3:10]) device(dev)
+
+  #pragma omp target map(from: res) device(dev)
+   g (res);
+
+  for (int i = 0; i < 10; i++)
+   if (res[i] != (4 + i)*10)
+ __builtin_abort ();
+}
+  return 0;
+}

Re: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-07-29 Thread Tobias Burnus


Hi Andre, hi all,

Andre Vehreschild wrote:

yes, I could have looked harder 


I wrote ;-) on purpose as this feature is somewhat hidden and writing 
'dg-do compile' doesn't harm.


In case of gcc/testsuite, the 'run' is also needed and were often missed 
(or rather caused by invalid variants such as 'dg-run' (should be: 
'dg-do run') or '{dg-do run }' (missing space after '{') prevented the 
running of the code). Sam did fix some of those (and some other dg-* 
issues) recently, e.g. in r15-2349-ga75c6295252d0d (→ 
https://gcc.gnu.org/r15-2349-ga75c6295252d0d ).



This isn't by any chance documented on the developer website of gcc somewhere?
It would be sad, if that knowledge is not publicy available for the future.


https://gcc.gnu.org/onlinedocs/gccint/Directives.html#Specify-how-to-build-the-test 
documents it.


And libgomp has: lib/libgomp.exp:set dg-do-what-default run

The all arguments vs. only -O2 is set in libgomp via:

libgomp.c++/c++.exp:    set DEFAULT_CFLAGS "-O2"

libgomp.c/c.exp:    set DEFAULT_CFLAGS "-O2"

and for libgomp.*fortran/fortran.exp, the difference between 'dg-do run' 
vs. default is *not* *documented,* but seems to be the result of the 
following:


# For Fortran we're doing torture testing, as Fortran has far more tests
# with arrays etc. that testing just -O0 or -O2 is insufficient, that is
# typically not the case for C/C++.
gfortran-dg-runtest $tests "" ""


Tobias

Re: [Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-07-29 Thread Tobias Burnus


Hi Andre,

Andre Vehreschild wrote:

I am wondering why the testcase has no `!{ dg-do ... }` line. What will dejagnu
do then? Sorry for the may be stupid question, but I never encountered a
testcase without a dg-do line. It was the minimum for me.


Well, then you need look harder ;-)

In gcc/testsuite/, the default is '{ dg-do compile }', i.e. you can
specify or leave out that line without any additional effect. Having it
might be a tad clearer, albeit makes the test a tad longer.

But if you want to 'run' or 'link', you need to specify the dg-do line.
There are several files which don't have the "dg-do compile" line, also
under gcc/testsuite/gfortran.dg

In case of libgomp, it is becomes interesting: the default is running
the code, i.e. you need a 'compile' or 'link' when it shouldn't be run.

However, at least for Fortran (libgomp.{oacc-}fortran), there is a
difference between specifying nothing and specifying 'dg-do run': In
case of the default, it is compiled and run. But if you specify 'dg-do
run', it is compiled multiple times with different optimization options
and then run.

(Actually, also under gcc/testsuite/gfortran.dg, you get multiple
compilations + runs with 'dg-do run'. If you use dg-additional-options,
you can also add options. I think with dg-options, you set it to a
single run [not confirmed].)

The downside of compiling + running it multiple times is a longer test
time without any real benefit. However, especially with Fortran,
compiling with different optimization levels did expose issues in the
past, both in the Fortran front end and in the middle end. — Thus, there
some benefit of using it.

In any case, there more complex the code is that front-end + middle-end
code have to process, the more useful is "dg-do run". The more work is
done by the run-time library, be it libgfortran or libgomp, the less
useful it becomes as the heavy lifting is done in the run-time library.
— As libgomp progressing already takes quite some time (albeit it can
now run in parallel), there are some who prefer few 'dg-do run' and
others who prefer if all Fortran testcases there use 'dg-do run' …

I hope it helps,

Tobias

[Patch, v2] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-07-26 Thread Tobias Burnus


Updated patch - only change is to the testcase:

* With the just posted patch for PR116107, array sections with offset 
work for 'link', hence, I updated the testcase.


* For 'arr2', I added ref to the associated PR.

I intent to commit it once PR116107 has been committed.

Tobias

Tobias Burnus wrote:

Hi all,

it turned out that 'declare target' with 'link' clause was broken in multiple 
ways.

The main fix is the attached patch, i.e. namely pushing the variables already to
the offload-vars list already in the FE.

When implementing it, I noticed:
* C has a similar issue when using nested functions, which is
   a GNU extension →https://gcc.gnu.org/115574

* When doing partial mapping of arrays (which is one of the reasons for 'link'),
   offsets are mishandled in Fortran (not tested in C), see FIXME in the patch)
   There: arr2(10) should print 10 but with map(arr2(10:)) it prints 19.
   (I will file a PR about this).

* It might happen that linked variables do not get linked. I have not 
investigated
   why, but 'arr2' gives link errors – while 'arr' works.
   See FIXME in the patch. (I will file a PR about this)

* For COMMON blocks, map(/common/) is rejected,https://gcc.gnu.org/PR115577

* When then mapping map(a,b,c) which is identical for 'common /mycom/ a,b,c',
   it fails to link the device side as the 'mycom_' symbol cannot be found on 
the
   device side.  (I will file a PR about this)

As COMMON as issues, an alternative would be to defer the trans-common.cc
changes to a later patch.

Comments, questions, concerns?

Tobias

PS: Tested with nvptx offloading with a page-migration supporting system with
nvptx and GCN offloading configured and no new fails observed.OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

Contrary to a normal 'declare target', the 'declare target link' attribute
also needs to set node->offloadable and push the offload_vars in the front end.

Linked variables require that the data is mapped. For module variables, this
can happen anywhere. For variables in an external subprograms or the main
programm, this can only happen in the either that program itself or in an
internal subprogram. - Whether a variable is just normally mapped or linked then
becomes relevant if a device routine exists that can access that variable,
i.e. an internal procedure has then to be marked as declare target.

	PR fortran/115559

gcc/fortran/ChangeLog:

	* trans-common.cc (build_common_decl): Add 'omp declare target' and
	'omp declare target link' variables to offload_vars.
	* trans-decl.cc (add_attributes_to_decl): Likewise; update args and
	call decl_attributes.
	(get_proc_pointer_decl, gfc_get_extern_function_decl,
	build_function_decl): Update calls.
	(gfc_get_symbol_decl): Likewise; move after 'DECL_STATIC (t)=1'
	to avoid errors with symtab_node::get_create.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-link.f90: New test.

 gcc/fortran/trans-common.cc|  21 
 gcc/fortran/trans-decl.cc  |  81 +-
 .../libgomp.fortran/declare-target-link.f90| 116 +
 3 files changed, 192 insertions(+), 26 deletions(-)

diff --git a/gcc/fortran/trans-common.cc b/gcc/fortran/trans-common.cc
index 5f44e7bd663..e714342c3c0 100644
--- a/gcc/fortran/trans-common.cc
+++ b/gcc/fortran/trans-common.cc
@@ -98,6 +98,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "tm.h"
 #include "tree.h"
+#include "cgraph.h"
+#include "context.h"
+#include "omp-offload.h"
 #include "gfortran.h"
 #include "trans.h"
 #include "stringpool.h"
@@ -497,6 +500,24 @@ build_common_decl (gfc_common_head *com, tree union_type, bool is_init)
 	  = tree_cons (get_identifier ("omp declare target"),
 		   omp_clauses, DECL_ATTRIBUTES (decl));
 
+  if (com->omp_declare_target_link || com->omp_declare_target)
+	{
+	  /* Add to offload_vars; get_create does so for omp_declare_target,
+	 omp_declare_target_link requires manual work.  */
+	  gcc_assert (symtab_node::get (decl) == 0);
+	  symtab_node *node = symtab_node::get_create (decl);
+	  if (node != NULL && com->omp_declare_target_link)
+	{
+	  node->offloadable = 1;
+	  if (ENABLE_OFFLOADING)
+		{
+		  g->have_offload = true;
+		  if (is_a  (node))
+		vec_safe_push (offload_vars, decl);
+		}
+	}
+	}
+
   /* Place the back end declaration for this common block in
  GLOBAL_BINDING_LEVEL.  */
   gfc_map_of_all_commons[identifier] = pushdecl_top_level (decl);
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 82fa2bb6134..0fdc41b1784 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -46,7 +46,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans-stmt.h"
 #include "gomp-cons

[Patch] libgomp: Fix declare target link with offset array-section mapping [PR116107]

2024-07-26 Thread Tobias Burnus


The main idea of 'link' is to permit putting only a subset of a
huge array on the device. Well, in order to make this work properly,
it requires that one can map an array section, which does not
start with the first element.

This patch adjusts the pointers such, that this actually works.

(Tested on x86-64-gnu-linux with Nvptx offloading.)
Comments, suggestions, remarks before I commit it?

Tobias
libgomp: Fix declare target link with offset array-section mapping [PR116107]

Assume that 'int var[100]' is 'omp declare target link(var)'. When now
mapping an array section with offset such as 'map(to:var[20:10])',
the device-side link pointer has to store &[0] minus
the offset such that var[20] will access [0]. But
the offset calculation was missed such that the device-side 'var' pointed
to the first element of the mapped data - and var[20] points beyond at
some invalid memory.

	PR middle-end/116107

libgomp/ChangeLog:

	* target.c (gomp_map_vars_internal): Honor array mapping offsets
	with declare-target 'link' variables.
	* testsuite/libgomp.c-c++-common/target-link-2.c: New test.

 libgomp/target.c   |  7 ++-
 .../testsuite/libgomp.c-c++-common/target-link-2.c | 59 ++
 2 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/libgomp/target.c b/libgomp/target.c
index aa01c1367b9..e3e648f5443 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1820,8 +1820,11 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
 		if (k->aux && k->aux->link_key)
 		  {
 		/* Set link pointer on target to the device address of the
-		   mapped object.  */
-		void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset);
+		   mapped object. Also deal with offsets due to
+		   array-section mapping. */
+		void *tgt_addr = (void *) (tgt->tgt_start + k->tgt_offset
+	   - (k->host_start
+		  - k->aux->link_key->host_start));
 		/* We intentionally do not use coalescing here, as it's not
 		   data allocated by the current call to this function.  */
 		gomp_copy_host2dev (devicep, aq, (void *) n->tgt_offset,
diff --git a/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
new file mode 100644
index 000..4ff4080da76
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/target-link-2.c
@@ -0,0 +1,59 @@
+/* PR middle-end/116107  */
+
+#include 
+
+int arr[15] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
+#pragma omp declare target link(arr)
+
+#pragma omp begin declare target
+void f(int *res)
+{
+  __builtin_memcpy (res, [5], sizeof(int)*10);
+}
+
+void g(int *res)
+{
+  __builtin_memcpy (res, [3], sizeof(int)*10);
+}
+#pragma omp end declare target
+
+int main()
+{
+  int res[10], res2;
+  for (int dev = 0; dev < omp_get_num_devices(); dev++)
+{
+  __builtin_memset (res, 0, sizeof (res));
+  res2 = 99;
+
+  #pragma omp target enter data map(arr[5:10]) device(dev)
+
+  #pragma omp target map(from: res) device(dev)
+	f (res);
+
+  #pragma omp target map(from: res2) device(dev)
+	res2 = arr[5];
+
+  if (res2 != 6)
+	__builtin_abort ();
+  for (int i = 0; i < 10; i++)
+	if (res[i] != 6 + i)
+	  __builtin_abort ();
+
+  #pragma omp target exit data map(release:arr[5:10]) device(dev)
+
+  for (int i = 0; i < 15; i++)
+	res[i] *= 10;
+	  __builtin_abort ();
+
+  #pragma omp target enter data map(arr[3:10]) device(dev)
+  __builtin_memset (res, 0, sizeof (res));
+
+  #pragma omp target map(from: res) device(dev)
+	g (res);
+
+  for (int i = 0; i < 10; i++)
+	if (res[i] != (4 + i)*10)
+	  __builtin_abort ();
+}
+  return 0;
+}

Re: [PATCH v3 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-07-25 Thread Tobias Burnus


Hi Sandra,

thanks for your patch. (Disclaimer: I have not finished reading through 
your patch.)


Some upfront generic remarks:

[* When first compiling it (incremental build), I did run into the issue 
that OMP_METADIRECTIVE_CHECK wasn't declared. Thus, there seems to be a 
dependency issue causing that tree-check.h might generated after code 
that includes tree.h is processed. (Unrelated to your patch itself, but 
for completeness …)]


* Not required right now, but eventually we need to check whether 
https://gcc.gnu.org/PR112779 is fully fixed by this patch set or whether 
follow-up work is required (and if so which). There is also PR107067 for 
a Fortran ICE.


* There are some not-implemented/FIXME comments in the patches for 
missing features. I think we should ensure that those won't get 
forgotten, e.g. by filing PRs for those. – For declare variant, some PRs 
might already exist.


Can you eventually take care of the last two items?

(For the last item: e.g. 'target_device' for declare_variant, for which 
'sorry' already existed.)


* * *

I might have asked the following question before – and you might have 
answered it already:


Sandra Loosemore wrote:


This patch adds the OMP_METADIRECTIVE tree node and shared tree-level
support for manipulating metadirectives.  It defines/exposes
interfaces that will be used in subsequent patches that add front-end
and middle-end support, but nothing generates these nodes yet.


I have to admit that I do not understand the part:


+  else if (set == OMP_TRAIT_SET_TARGET_DEVICE)
+/* The target_device set is dynamic, so treat it as always
+   resolvable.  */
+continue;
+


The current code has 3 states:

* 0 - if a trait is false; this directly returns as it cannot be fixed later

* 1 - if the all traits are known to match (initial value)

* -1 - if one trait cannot be evaluated, either because it is too early 
(e.g. during parsing) or because it is a dynamic context selector.


Thus, I had expected:

(a) ret = -1 as default in this case (not known)

(b) for cases where it is known, a 'return 0' / not-setting -1. In 
particular:


* n == const → device_num(n) – false if '< -1' and, for 
'!ENABLE_OFFLOADING || offload_targets == NULL' either false for n > 0 
or otherwise false.


* Checks similar to OMP_TRAIT_DEVICE_{KIND,ARCH,ISA}, i.e. kind(any) → 
true, kind(fpga) → false, arch(something_unknown) → false if not true 
for any device. With '!ENABLE_OFFLOADING || offload_targets == NULL', 
the kind_arch_isa check can be done as for the host.


* * *

Have I missed something and is it sensible to return 1 instead of -1 here?

* * *



@@ -1804,6 +1834,12 @@ omp_context_selector_matches (tree ctx)


   case OMP_TRAIT_USER_CONDITION:
 if (set == OMP_TRAIT_SET_USER)

for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p))
  if (OMP_TP_NAME (p) == NULL_TREE)
{
+ /* OpenMP 5.1 allows non-constant conditions for
+metadirectives.  */
+ if (metadirective_p
+ && !tree_fits_shwi_p (OMP_TP_VALUE (p)))
+   break;
+

 if (integer_zerop (OMP_TP_VALUE (p)))
   return 0;
 if (integer_nonzerop (OMP_TP_VALUE (p)))
   break;
 ret = -1;
   }



* Comment wording: Please change to imply >= 5.1 not == 5.0 * Comment: I 
don't see why the non-const only applies to metadirectives; the OpenMP 
>= 5.1 seems to imply that it is also valid for declare variant. Thus, 
I would change the wording. * The current code seems to already handle 
non-const values as expected. ... except that it changes "res" to -1, 
while the idea seems to be not to modify 'ret' in this case for 
metadirectives. (Why? Same question as above).

* * *

Quotes from the specifications regarding the expressions:

The current spec has:

"Restrictions to context selectors are as follows:" …

"A variable or procedure that is referenced in an expression that 
appears in a context selector
must be visible at the location of the directive on which the context 
selector appears unless
the directive is a declare_variant directive and the variable is an 
argument of the

associated base function."

5.1 wording is the following (approx. same except for argument bit):

"All variables that are referenced in an expression that appears in
the context selector of a match clause must be accessible at a call site 
to the base function

according to the base language rules."

5.0 had (e.g. for C): "The condition(boolean-expr) selector defines a 
constant expression that must evaluate to true for the selector to be true."


* * *


+ if (metadirective_p
+ && !tree_fits_shwi_p (OMP_TP_VALUE (p)))
+   break;
+
  if (integer_zerop

[gcc r15-2226] install.texi (gcn): Suggest newer commit for Newlib

2024-07-23 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:b95c82d60c8c88f6346c5602f2e22a4531afe47c

commit r15-2226-gb95c82d60c8c88f6346c5602f2e22a4531afe47c
Author: Tobias Burnus 
Date:   Tue Jul 23 12:41:40 2024 +0200

install.texi (gcn): Suggest newer commit for Newlib

Newlib 4.4.0 lacks two commits: 7dd4eb1db (2024-03-25) to fix device console
output for GFX10/GFX11 and ed50a50b9 (2024-04-04) to make the added lock.h
compilable with C++. This commit mentiones now also the second commit.

gcc/ChangeLog:

* doc/install.texi (amdgcn-x-amdhsa): Suggest newer git version
for newlib.

Diff:
---
 gcc/doc/install.texi | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index b54569925837..dda623f4410a 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3952,9 +3952,9 @@ Instead of GNU Binutils, you will need to install LLVM 
15, or later, and copy
 by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}
 and @code{gfx1103}.
 
-Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commit
-7dd4eb1db (2024-03-25, post-4.4.0) fixes device console output for GFX10 and
-GFX11 devices).
+Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commits
+7dd4eb1db and ed50a50b9 (2024-04-04, post-4.4.0) fix device console output
+for GFX10 and GFX11 devices).
 
 To run the binaries, install the HSA Runtime from the
 @uref{https://rocm.docs.amd.com/,,ROCm Platform}, and use

[Patch] install.texi (gcn): Suggest newer commit for Newlib

2024-07-23 Thread Tobias Burnus


Hi Andrew, hi all,

to be compatible with C++ (and Thomas' WIP work for GCN C++ support), I 
suggest the attach patch that also suggest Thomas' Newlib commit (April 
4, 2024)


ed50a50b9   amdgcn: Implement proper locks: Fix 
'newlib/libc/sys/amdgcn/include/sys/lock.h' for C++


and not only your commit (March 25, 2024)

7dd4eb1db amdgcn: Implement proper locks

Comments or suggestions before I commit it?

Tobias
install.texi (gcn): Suggest newer commit for Newlib

Newlib 4.4.0 lacks two commits: 7dd4eb1db (2024-03-25) to fix device console
output for GFX10/GFX11 and ed50a50b9 (2024-04-04) to make the added lock.h
compilable with C++. This commit mentiones now also the second commit.

gcc/ChangeLog:

	* doc/install.texi (amdgcn-x-amdhsa): Suggest newer git version
	for newlib.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index b5456992583..dda623f4410 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3952,9 +3952,9 @@ Instead of GNU Binutils, you will need to install LLVM 15, or later, and copy
 by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}
 and @code{gfx1103}.
 
-Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commit
-7dd4eb1db (2024-03-25, post-4.4.0) fixes device console output for GFX10 and
-GFX11 devices).
+Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commits
+7dd4eb1db and ed50a50b9 (2024-04-04, post-4.4.0) fix device console output
+for GFX10 and GFX11 devices).
 
 To run the binaries, install the HSA Runtime from the
 @uref{https://rocm.docs.amd.com/,,ROCm Platform}, and use

[gcc/devel/omp/gcc-14] Merge remote-tracking branch 'origin/releases/gcc-14' into devel/omp/gcc-14

2024-07-22 Thread Tobias Burnus via Libstdc++-cvs

https://gcc.gnu.org/g:b71fc8d1382ba569126b3a905cea7927a37e98ff

commit b71fc8d1382ba569126b3a905cea7927a37e98ff
Merge: 8678fc697046 323d010fa5d4
Author: Tobias Burnus 
Date:   Mon Jul 22 12:08:44 2024 +0200

Merge remote-tracking branch 'origin/releases/gcc-14' into devel/omp/gcc-14

Merge up to commit 323d010fa5d (22nd July 2024)

Diff:

 contrib/ChangeLog  | 7 +
 gcc/ChangeLog  |  1351 ++
 gcc/DATESTAMP  | 2 +-
 gcc/ada/ChangeLog  |20 +
 gcc/ada/Makefile.rtl   |13 +-
 gcc/ada/exp_ch6.adb|11 +-
 gcc/ada/exp_util.adb   | 6 +
 gcc/ada/sem_ch6.adb|12 +-
 gcc/analyzer/ChangeLog |22 +
 gcc/analyzer/access-diagram.cc | 3 +-
 gcc/analyzer/diagnostic-manager.cc |18 +-
 gcc/analyzer/infinite-loop.cc  | 2 +-
 gcc/analyzer/infinite-recursion.cc | 2 +-
 gcc/analyzer/varargs.cc| 2 +-
 gcc/attribs.cc |20 +-
 gcc/builtins.cc|22 +-
 gcc/c-family/ChangeLog |43 +
 gcc/c-family/c-common.cc   |18 +-
 gcc/c-family/c-opts.cc | 2 +-
 gcc/c-family/c-warn.cc |13 +-
 gcc/c-family/c.opt | 2 +-
 gcc/c/ChangeLog|32 +
 gcc/c/c-decl.cc|53 +-
 gcc/c/c-parser.cc  | 4 +-
 gcc/combine.cc | 6 +-
 gcc/common/config/i386/cpuinfo.h   | 4 +-
 gcc/common/config/i386/i386-common.cc  | 4 +-
 gcc/common/config/i386/i386-cpuinfo.h  | 5 +-
 gcc/common/config/i386/i386-isas.h | 4 +-
 gcc/common/config/riscv/riscv-common.cc|   182 +-
 gcc/config/aarch64/aarch64-c.cc| 6 +
 gcc/config/aarch64/aarch64-cores.def   | 2 +
 gcc/config/aarch64/aarch64-ldp-fusion.cc   | 4 +-
 gcc/config/aarch64/aarch64-simd.md | 2 -
 gcc/config/aarch64/aarch64-tune.md | 2 +-
 gcc/config/alpha/alpha.cc  |12 +
 gcc/config/alpha/alpha.md  |31 +-
 gcc/config/alpha/constraints.md| 2 +-
 gcc/config/arm/arm.cc  |   135 +-
 gcc/config/arm/arm.h   | 4 +-
 gcc/config/arm/arm.md  | 8 +-
 gcc/config/arm/mve.md  | 2 +-
 gcc/config/arm/predicates.md   | 5 +
 gcc/config/arm/sync.md | 4 +-
 gcc/config/avr/avr-dimode.md   |26 +-
 gcc/config/avr/avr.cc  |41 +-
 gcc/config/avr/avr.md  |64 +-
 gcc/config/i386/i386-expand.cc | 7 +
 gcc/config/i386/i386-options.cc|79 +-
 gcc/config/i386/i386.cc|   236 +-
 gcc/config/i386/i386.h | 6 +-
 gcc/config/i386/i386.md|10 +-
 gcc/config/i386/mingw-w64.h| 2 +
 gcc/config/i386/mingw32.h  | 2 +
 gcc/config/i386/x86-tune-costs.h   |10 +-
 gcc/config/i386/x86-tune.def   |13 +-
 gcc/config/loongarch/loongarch.cc  |19 +-
 gcc/config/loongarch/loongarch.h   | 7 -
 gcc/config/mips/mips.cc|11 +-
 gcc/config/pa/pa.md|18 -
 gcc/config/pa/pa32-linux.h | 5 +
 gcc/config/riscv/autovec.md| 5 +-
 gcc/config/riscv/elf.h | 1 +
 gcc/config/riscv/freebsd.h | 1 +
 gcc/config/riscv/linux.h   | 1 +
 gcc/config/riscv/riscv-c.cc| 2 +-
 gcc/config/riscv/riscv-protos.h| 4 +
 gcc/config/riscv/riscv-subset.h|12 +-
 gcc/config/riscv/riscv-target-attr.cc  |   119 +-
 gcc/config/riscv/riscv-vector-builtins.cc  |51 +
 gcc/config/riscv/riscv.cc  |90 +-
 gcc/config/riscv/riscv.opt | 6 +-
 gcc/config/riscv/vector-iterators.md   | 6 +
 gcc/config/riscv/vector.md |   131 +-
 gcc

[gcc/devel/omp/gcc-14] (312 commits) Merge remote-tracking branch 'origin/releases/gcc-14' into

2024-07-22 Thread Tobias Burnus via Gcc-cvs

The branch 'devel/omp/gcc-14' was updated to point to:

 b71fc8d1382b... Merge remote-tracking branch 'origin/releases/gcc-14' into 

It previously pointed to:

 8678fc697046... Revert "[og10] vect: Add target hook to prefer gather/scatt

Diff:

Summary of changes (added commits):
---

  b71fc8d... Merge remote-tracking branch 'origin/releases/gcc-14' into 
  323d010... [PR115565] cse: Don't use a valid regno for non-register in (*)
  91a6faf... Daily bump. (*)
  043f3ad... Daily bump. (*)
  bb34b7e... s390: Fix unresolved iterators bhfgq and xdee (*)
  2eca8a9... Avoid undefined behaviour in build_option_suggestions (*)
  94e4661... Revert "Fortran: Auto array allocation with function depend (*)
  6b6a056... Daily bump. (*)
  d15664f... Fortran: Fix wrong code in unlimited polymorphic assignment (*)
  5034af8... Fortran: Auto array allocation with function dependencies [ (*)
  1205104... rs6000: Fix .machine cpu selection w/ altivec [PR97367] (*)
  ca0fa18... Fortran: character array constructor with >= 4 constant ele (*)
  187eec8... Fix Xcode 16 build break with NULL != nullptr (*)
  0abce41... RISC-V: Split vwadd.wx and vwsub.wx and add helpers. (*)
  937713a... RISC-V: Do not allow v0 as dest when merging [PR115068]. (*)
  3a7e796... RISC-V: Add -X to link spec (*)
  92003fa... RISC-V: Fix parsing of Zic* extensions (*)
  68ef0c3... RISC-V: Bugfix ICE for RVV intrinisc vfw on _Float16 scalar (*)
  c38dbfc... RISC-V: Fix missing boolean_expression in zmmul extension (*)
  4db3875... RISC-V: Bugfix vec_extract v mode iterator restriction mism (*)
  87346ed... RISC-V: Bugfix vec_extract vls mode iterator restriction mi (*)
  c32995c... [PATCH] RISC-V: Fix unrecognizable pattern in riscv_expand_ (*)
  2d7dda8... RISC-V: Use tu policy for first-element vec_set [PR115725]. (*)
  b218c42... [RISC-V] add implied extension repeatly until stable (*)
  a2a2916... Daily bump. (*)
  493035c... eh: ICE with std::initializer_list and ASan [PR115865] (*)
  747c4b5... Do not use caller-saved registers for COMDAT functions (*)
  c314867... c++: ICE with __has_unique_object_representations [PR115476 (*)
  a4c9ade... i386: PR target/115351: RTX costs for *concatditi3 and *ins (*)
  b0452ed... analyzer: fix ICE seen with -fsanitize=undefined [PR114899] (*)
  0b7ec50... Fix points_to_local_or_readonly_memory_p wrt TARGET_MEM_REF (*)
  0f593e4... PR tree-optimization/113673: Avoid load merging when potent (*)
  0fbad21... testsuite: Fix up builtin-clear-padding-3.c for -funsigned- (*)
  f0c3a1c... c++/modules: Conditionally start timer during lazy load [PR (*)
  4871b0f... Daily bump. (*)
  1bbfe78... c++: constrained partial spec type context [PR111890] (*)
  2249c63... c++: alias template with dependent attributes [PR115897] (*)
  79c5a09... c++: bad 'this' conversion for nullary memfn [PR106760] (*)
  3a963d4... alpha: Fix duplicate !tlsgd!62 assemble error [PR115526] (*)
  01dfc5b... bitint: Use gsi_insert_on_edge rather than gsi_insert_on_ed (*)
  d668f87... gimple-fold: Fix up __builtin_clear_padding lowering [PR115 (*)
  297ea7e... c++: Fix ICE on constexpr placement new [PR115754] (*)
  bf64404... vect: Merge loop mask and cond_op mask in fold-left reducti (*)
  c58bede... tree-optimization/115868 - ICE with .MASK_CALL in simdclone (*)
  5fad0b5... c++/modules: Propagate BINDING_VECTOR_*_DUPS_P on realloc [ (*)
  4039c74... Daily bump. (*)
  59ed01d... tree-optimization/115841 - reduction epilogue placement iss (*)
  06829e5... tree-optimization/115843 - fix wrong-code with fully-masked (*)
  e01012c... tree-optimization/115701 - fix maybe_duplicate_ssa_info_at_ (*)
  6f74a5f... tree-optimization/115701 - factor out maybe_duplicate_ssa_i (*)
  ca275b6... tree-optimization/115867 - ICE with simdcall vectorization  (*)
  4a04110... Fixup unaligned load/store cost for znver5 (*)
  d702a95... Fixup unaligned load/store cost for znver4 (*)
  c8fdef7... [alpha] adjust MEM alignment for block move [PR115459] (*)
  b3cff83... RISC-V: Allow adding enabled extension via target arch attr (*)
  0e1f599... RISC-V: Rewrite target attribute handling (*)
  b604d59... RISC-V: Fix comment/naming in attribute parsing code (*)
  20fb450... RISC-V: Deduplicate arch subset list processing (*)
  ea5907d... RISC-V: testsuite: Properly gate LTO tests (*)
  7bc63f1... [i386] adjust flag_omit_frame_pointer in a single function  (*)
  102bcf1... [i386] restore recompute to override opts after change [PR1 (*)
  1fff665... x86: Update branch hint for Redwood Cove. (*)
  0fcadb3... Daily bump. (*)
  71ec9ed... Fortran: improve attribute conflict checking [PR93635] (*)
  13bfc38... Fix SSA_NAME leak due to def_stmt is removed before use_stm (*)
  53dd1ce... Daily bump. (*)
  c80a746... fortran: Assume there is no cyclic reference with submodule (*)
  55988c4... fortran: Correctly evaluate scalar MASK arguments of MINLOC (*)
  8197264... Daily bump. (*)
  89f9342... LoongArch: TFmode is not allowed to be stored in

Re: [PATCH v2 3/8] OpenMP: middle-end support for dispatch + adjust_args

2024-07-22 Thread Tobias Burnus


Hi PA,

as discussed off list, I was stumbling over the call to GOMP_task. I now 
understand why: I was looking at a different version of the OpenMP spec.


Namely, OpenMP 5.2 contains the changes for spec Issue 2741 "dispatch 
construct data scoping issues". Namely: Performance issue due to 'task' 
compared to direct call, effect of unintended firstprivatization, …


The currrent version has

(a) nowait

"The addition of the *nowait* element to the semantic requirement set by 
the *dispatch* directive has no effect on the dispatch construct apart 
from the effect it may have on the arguments that are passed when 
calling a function variant." (I assume the latter is about 'append_args' 
of interop objects)


(b) depend

"If the *dispatch* directive adds one or more _depend_ element to the 
semantic requirement set, and those element are not removed by the 
effect of a declare variant directive, the behavior is as if those 
properties were applied as *depend* clauses to a *taskwait* construct 
that is executed before the *dispatch* region is executed."


I think it would good to match the 5.2 behavior.

* * *

I have not fully checked whether the 'device' routine is properly 
handled. The current wording states:


"If the device clause is present, the value of the default-device-var 
ICV is set to the value of the expression in the clause on entry to the 
dispatch region and is restored to its previous value at the end of the 
region."


For the code itself, it seems to be handled correctly, see attached 
testcase (consider including).


I was wondering (and haven't checked) whether the ICV is set for too 
much (i.e. not only the "data environment" (i.e.
"The variables associated with the execution of a given region"), but is 
also imminently visible by other concurrently running threads outside of 
that region).


Can you check. (Albeit, my question might also be answered once I finish 
reading the patch …)


Thanks,

Tobias
#include 

int f ()
{
  return omp_get_default_device ();
}

int main ()
{
  for (int d = omp_initial_device; d <= omp_get_num_devices (); d++)
{
  int dev = omp_invalid_device;
  omp_set_default_device (d);

  #pragma omp dispatch
	dev = f ();

  if (d == omp_initial_device || d == omp_get_num_devices ())
	{
	  if (dev != omp_initial_device && dev != omp_get_num_devices ())
	__builtin_abort ();
	  if (omp_get_default_device() != omp_initial_device
	  && omp_get_default_device() != omp_get_num_devices ())
	__builtin_abort ();
	}
  else
	if (dev != d || d != omp_get_default_device())
	  __builtin_abort ();

  for (int d2 = omp_initial_device; d2 <= omp_get_num_devices (); d2++)
	{
	  dev = omp_invalid_device;
	  #pragma omp dispatch device(d2)
	dev = f ();

	  if (d == omp_initial_device || d == omp_get_num_devices ())
	{
	  if (omp_get_default_device() != omp_initial_device
		  && omp_get_default_device() != omp_get_num_devices ())
		__builtin_abort ();
	}
	  else if (d != omp_get_default_device())
	__builtin_abort ();

	  if (d2 == omp_initial_device || d2 == omp_get_num_devices ())
	{
	  if (dev != omp_initial_device && dev != omp_get_num_devices ())
		__builtin_abort ();
	}
	  else if (dev != d2)
	__builtin_abort ();
	}
}
  return 0;
}

[Patch, v3] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-07-19 Thread Tobias Burnus


Hi,

Jakub Jelinek wrote:

+  "#if defined(__STDC_EMBED_FOUND__) && __has_embed (\"%s\") == 
__STDC_EMBED_FOUND__\n"

If this was an attempt to deal gracefully with no #embed support, then
the above would be wrong and should have been
#if defined(__STDC_EMBED_FOUND__) && defined(__has_embed)
#if __has_embed ("whatever") == __STDC_EMBED_FOUND__


I was kind of both – assuming that #embed is available (as it should be 
compiled by the accompanied compiler) but handle the case that it is not.


However, as '#embed' is well diagnosed if unsupported, that part is not 
really needed.



Now, if all you want is an error if the file doesn't exist, then
#embed "whatever"
will do that too […]

If you want an error not just when it doesn't exist, but also when it
is empty, then you could do
#embed "whatever" if_empty (%%%)


The idea was to also error out if the file is empty – as that shouldn't 
happen here: if offloading code was found, the code gen should be done. 
However, using an invalid expression seems to be a good idea as that's 
really a special case that shouldn't happen.


* * *

I have additionally replaced the #include by __UINTPTR_TYPE__ and 
__SIZE_TYPE__ to avoid including 3 header files; this doesn't have a 
large effect, but still.


Updated patch attached.

OK for mainline, once Jakub's #embed is committed?

* * *

BTW: Testing shows for a hello world program (w/o #embed patch)

For -foffload=...: 'disable' 0.04s, 'nvptx-none' 0.15s, 'amdgcn-amdhsa' 
1.2s.


With a simple #embed (this patch plus Jakub's first patch), the 
performance is unchanged. I then applied Jakub's follow up patches, but 
I then get an ICE (Jakub will have a look).


But compiling it with 'g++' (→ COLLECT_GCC is g++) works; result: takes 
0.2s (~6× faster) and compiling for both nvptx and gcn takes 0.3s, 
nearly 5× faster.


Tobias
 gcn/mkoffload.cc: Use #embed for including the generated ELF file

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (read_file): Remove.
	(process_asm): Do not add '#include' to generated C file.
	(process_obj): Generate C file that uses #embed and use
	__SIZE_TYPE__ and __UINTPTR_TYPE__ instead the #include-defined
	size_t and uintptr.
	(main): Update call to it; remove no longer needed file I/O.

 gcc/config/gcn/mkoffload.cc | 79 +++--
 1 file changed, 12 insertions(+), 67 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 810298a799b..c3c998639ff 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -182,44 +182,6 @@ xputenv (const char *string)
   putenv (CONST_CAST (char *, string));
 }
 
-/* Read the whole input file.  It will be NUL terminated (but
-   remember, there could be a NUL in the file itself.  */
-
-static const char *
-read_file (FILE *stream, size_t *plen)
-{
-  size_t alloc = 16384;
-  size_t base = 0;
-  char *buffer;
-
-  if (!fseek (stream, 0, SEEK_END))
-{
-  /* Get the file size.  */
-  long s = ftell (stream);
-  if (s >= 0)
-	alloc = s + 100;
-  fseek (stream, 0, SEEK_SET);
-}
-  buffer = XNEWVEC (char, alloc);
-
-  for (;;)
-{
-  size_t n = fread (buffer + base, 1, alloc - base - 1, stream);
-
-  if (!n)
-	break;
-  base += n;
-  if (base + 1 == alloc)
-	{
-	  alloc *= 2;
-	  buffer = XRESIZEVEC (char, buffer, alloc);
-	}
-}
-  buffer[base] = 0;
-  *plen = base;
-  return buffer;
-}
-
 /* Parse STR, saving found tokens into PVALUES and return their number.
Tokens are assumed to be delimited by ':'.  */
 
@@ -657,10 +619,6 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
   struct oaccdims *dims = XOBFINISH (_os, struct oaccdims *);
   struct regcount *regcounts = XOBFINISH (_os, struct regcount *);
 
-  fprintf (cfile, "#include \n");
-  fprintf (cfile, "#include \n");
-  fprintf (cfile, "#include \n\n");
-
   fprintf (cfile, "static const int gcn_num_vars = %d;\n\n", var_count);
   fprintf (cfile, "static const int gcn_num_ind_funcs = %d;\n\n", ind_fn_count);
 
@@ -725,35 +683,28 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 /* Embed an object file into a C source file.  */
 
 static void
-process_obj (FILE *in, FILE *cfile, uint32_t omp_requires)
+process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires)
 {
-  size_t len = 0;
-  const char *input = read_file (in, );
-
   /* Dump out an array containing the binary.
- FIXME: do this with objcopy.  */
-  fprintf (cfile, "static unsigned char gcn_code[] = {");
-  for (size_t i = 0; i < len; i += 17)
-{
-  fprintf (cfile, "\n\t");
-  for (size_t j = i; j < i + 17 && j < len; j++)
-	fprintf (cfile, "%3u,", (unsigned char) input[j]);
-}
-  fprintf (cfile, "\n};\n\n");
+ If the file is empty, a parse error is shown as the argument to is_empty
+ is an undeclared identifier.  */
+  fprintf (cfile,
+	   "static unsigned char gcn_code[] = {\n"
+	   "#embed \"%s\" if_empty (error_file_is_empty)\n"
+	   "};\n\n", fname_in);

Re: [PATCH v2 3/8] OpenMP: middle-end support for dispatch + adjust_args

2024-07-18 Thread Tobias Burnus


Hi PA,

not yet a full review, but some observations:

First: Please include the change
  gcc/fortran/types.def (BT_FN_PTR_CONST_PTR_INT)
of "[PATCH v2 7/8] OpenMP: Fortran front-end support for dispatch + 
adjust_args"


Do so either in this patch (3/8) - or in the previous (2/8) one that 
adds it to gcc/builtin-types.def.


Otherwise this will break the build as omp-builtins.def (modified
in this patch) is also used by gfortran.
Causing intermittened build fails is bad - first, in general, and
secondly it causes issues when bisecting.

* * *

If I try your testcase and move "bar" and "baz" *after* 'foo' and leave 
only the following before:


int baz (double *d_bv, const double *d_av, int n);
int bar (double *d_bv, const double *d_av, int n);

it fails at runtime with:

ERROR at 1: 0.00 (act) != 2.718280 (exp)

as the two calls to __builtin_omp_get_mapped_ptr are now missing.

With both the declaration and the definition before the declare target, 
it works.


* * *

I think this variant needs to be either supported – or an error has to 
be printed that it cannot be supported, but that would be rather 
unfortunate.


Thanks,

Tobias

Re: [PATCH v2 2/8] OpenMP: dispatch + adjust_args tree data structures and front-end interfaces

2024-07-18 Thread Tobias Burnus


Paul-Antoine Arras wrote:

This patch introduces the OMP_DISPATCH tree node, as well as two new clauses
`nocontext` and `novariants`. It defines/exposes interfaces that will be
used in subsequent patches that add front-end and middle-end support, but
nothing generates these nodes yet.


LGTM.

OFF TOPIC regarding "OMP_TRAIT_SET_NEED_DEVICE_PTR" and
"pseudo-set selector used to convey argument list until variant has a decl":
This reminds me vaguely of the issue that we should store the variant 
declarations with the base function and not with the variant, cf.

https://gcc.gnu.org/PR113905

Thanks for the patch!

Tobias


It also adds support for new OpenMP context selectors: `dispatch` as trait
selector and `need_device_ptr` as pseudo-trait set selector. The purpose of the
latter is for the C++ front-end to store the list of arguments (that need to be
converted to device pointers) until the declaration of the variant function
becomes available.

gcc/ChangeLog:

* builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New.
* omp-selectors.h (enum omp_tss_code): Add
OMP_TRAIT_SET_NEED_DEVICE_PTR.
(enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS
and OMP_CLAUSE_NOCONTEXT.
(dump_generic_node): Handle OMP_DISPATCH.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_clause_code_name): Add "novariants" and "nocontext".
* tree.def (OMP_DISPATCH): New.
* tree.h (OMP_DISPATCH_BODY): New macro.
(OMP_DISPATCH_CLAUSES): New macro.
(OMP_CLAUSE_NOVARIANTS_EXPR): New macro.
(OMP_CLAUSE_NOCONTEXT_EXPR): New macro.
---
  gcc/builtin-types.def|  1 +
  gcc/omp-selectors.h  |  3 +++
  gcc/tree-core.h  |  7 +++
  gcc/tree-pretty-print.cc | 21 +
  gcc/tree.cc  |  4 
  gcc/tree.def |  5 +
  gcc/tree.h   |  7 +++
  7 files changed, 48 insertions(+)

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c97d6bad1de..ef7aaf67d13 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -677,6 +677,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_INT_FEXCEPT_T_PTR_INT, BT_INT, 
BT_FEXCEPT_T_PTR,
  DEF_FUNCTION_TYPE_2 (BT_FN_INT_CONST_FEXCEPT_T_PTR_INT, BT_INT,
 BT_CONST_FEXCEPT_T_PTR, BT_INT)
  DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_UINT8, BT_PTR, BT_CONST_PTR, 
BT_UINT8)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT)
  
  DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)
  
diff --git a/gcc/omp-selectors.h b/gcc/omp-selectors.h

index c61808ec0ad..12bc9e9afa0 100644
--- a/gcc/omp-selectors.h
+++ b/gcc/omp-selectors.h
@@ -31,6 +31,8 @@ enum omp_tss_code {
OMP_TRAIT_SET_TARGET_DEVICE,
OMP_TRAIT_SET_IMPLEMENTATION,
OMP_TRAIT_SET_USER,
+  OMP_TRAIT_SET_NEED_DEVICE_PTR, // pseudo-set selector used to convey argument
+// list until variant has a decl
OMP_TRAIT_SET_LAST,
OMP_TRAIT_SET_INVALID = -1
  };
@@ -55,6 +57,7 @@ enum omp_ts_code {
OMP_TRAIT_CONSTRUCT_PARALLEL,
OMP_TRAIT_CONSTRUCT_FOR,
OMP_TRAIT_CONSTRUCT_SIMD,
+  OMP_TRAIT_CONSTRUCT_DISPATCH,
OMP_TRAIT_LAST,
OMP_TRAIT_INVALID = -1
  };
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 27c569c7702..508f5c580d4 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -542,6 +542,13 @@ enum omp_clause_code {
  
/* OpenACC clause: nohost.  */

OMP_CLAUSE_NOHOST,
+
+  /* OpenMP clause: novariants (scalar-expression).  */
+  OMP_CLAUSE_NOVARIANTS,
+
+  /* OpenMP clause: nocontext (scalar-expression).  */
+  OMP_CLAUSE_NOCONTEXT,
+
  };
  
  #undef DEFTREESTRUCT

diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 4bb946bb0e8..752a402e0d0 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -506,6 +506,22 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, 
dump_flags_t flags)
  case OMP_CLAUSE_EXCLUSIVE:
name = "exclusive";
goto print_remap;
+case OMP_CLAUSE_NOVARIANTS:
+  pp_string (pp, "novariants");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOVARIANTS_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOVARIANTS_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
+case OMP_CLAUSE_NOCONTEXT:
+  pp_string (pp, "nocontext");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOCONTEXT_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOCONTEXT_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
  case OMP_CLAUSE__LOOPTEMP_:
name = "_looptemp_";
goto print_remap;
@@ -3947,6 +3963,11 @@

Re: [PATCH v2 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-07-16 Thread Tobias Burnus


Hi Sandra,

Am 16.07.24 um 19:03 schrieb Sandra Loosemore:
Well, I still do not understand why backward compatibility concerns 
specific to some other directive should affect the ABI for a new 
directive that does not have any current libgomp runtime support,


I am happy that I managed to explain you the background of the "-1" 
mess. Otherwise:



The backward-compatibility hack is not required, but it has two 
advantages: consistency of the values used and it makes the code inside 
target.c way simpler by just using


  struct gomp_device_descr *devicep = resolve_device (device, true);

instead of handling several additional cases.


However, as written, avoiding the '(n == -1) ? -2 : n' code generation 
also has advantages; hence, I am also happy with that variant. (i.e. -2 
or -3 denoting the default device).


However, if you use -2 == default device, you need to fix the 
libgomp/target.c implementation as your code doesn't handle 
omp_default_device correctly, which 'resolve_device (device, true);' 
would handle automatically.



you just tell me what ABI you want me to implement and I will re-do 
the code that way.


Having looked at the code again – and in particular at libgomp/target.c, 
I realized the merits of using -2. Thus, at the end, I am happy with 
*either* variant.


But either version requires some changes: One the creation of the 
conditional gimple code + much simplified code in target.c. And the 
other, keeping the current gimple code – but fixing/extending target.c.


Tobias

Re: [PATCH v2 1/8] Fix warnings for tree formats in gfc_error

2024-07-16 Thread Tobias Burnus

I think it would be nice if some C/C++/global maintainer could rubber 
stamp the following patch.



Otherwise, I think it is trivial, i.e. I think it can be committed in a 
few days, unless someone has concerns.


This change to gcc/c-family/c-format.cc LGTM from the *gfortran* POV and 
is trivially copied from gcc_tdiag_char_table or gcc_cdiag_char_table 
(which both have it).


* * *

Background:

While this is for gcc/c-family/c-format.cc, the 'gcc_gfc_char_table' is 
for diagnostic for compiling gcc/fortran/, only.


Namely, the gfc_error, gfc_warning etc. functions are annotated by the
format checking attribute:

#define ATTRIBUTE_GCC_GFC(m, n) __attribute__ ((__format__ (__gcc_gfc__, 
m, n))) ATTRIBUTE_NONNULL(m)


* * *

As gfc_error etc. call the common diagnostic at the end, '%qE', %qD' 
etc. are already supported.


(As tested manually; it is also used by this patch series of PA.)

But while %qE is already supported, without the 'gcc_gfc_char_table' 
change, the '__format__ (__gcc_gfc__' check does not recognize it and

yields a -Werror, causing that a bootstrap fails.

Hence, we need this patch …

* * *

Paul-Antoine Arras wrote:

This enables proper warnings for formats like %qD.

gcc/c-family/ChangeLog:

* c-format.cc (gcc_gfc_char_table): Add formats for tree objects.
---
  gcc/c-family/c-format.cc | 4 
  1 file changed, 4 insertions(+)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 5bfd2fc4469..f4163c9cbc0 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -847,6 +847,10 @@ static const format_char_info gcc_gfc_char_table[] =
/* This will require a "locus" at runtime.  */
{ "L",   0, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN  }, "", "R", NULL },
  
+  /* These will require a "tree" at runtime.  */

+  { "DFTV", 1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "'",   NULL },
+  { "E",   1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "",   NULL },
+
/* These will require nothing.  */
{ "<>",0, STD_C89, NOARGUMENTS, "",  "",   NULL },
{ NULL,  0, STD_C89, NOLENGTHS, NULL, NULL, NULL

Re: [PATCH v2 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-07-16 Thread Tobias Burnus


Hi Sandra,

Sandra Loosemore wrote:

+    /* omp_initial_device is -1, omp_invalid_device is -4; choose
+   a value that isn't otherwise defined to indicate the default
+   device.  */
+    device_num = build_int_cst (integer_type_node, -2);


Don't do this - we do it differently for 'target' and it should do the 
same. Some value usage history:


Without caring for backward compatibility, I think we had somewhere

#define OMP_DEFAULT_DEVICE -2

and would simply use it everywhere when doing API calls.


But to handle old code, we have to handle both:
 -1 → default device
and
 -1 → initial device (= host).


Before coming back to your code, let's try to explain the history
and reason again. Maybe I manage to explain it better this time:

* * *

The problem is that -1 on the user side and -1 on the internal-use
side mean different things. Namely:

In the old days OpenMP had on the user side:
  device numbers 0 ... omp_get_num_devices()
where the upper bound was the initial device (= host), 
omp_get_initial_device().


For
  omp target num_device(n)
the device number has to be passed to the run time – and GCC just passes 
"n" here.


But GCC also needs to handle:
  omp target
i.e. not specifying a device number (= using the default device). It has 
been implemented in the obvious way, i.e. passing '-1'.



Later, OpenMP added:
  omp_initial_device == -1
  omp_invalid_device (negative, implementation defined, != 
omp_initial_device)


GCC set the latter rather arbitrary to -4.


RESULT: Everything works fine, except for -1 as
  omp target device_num(omp_initial_device)
and
  omp target
are now the same, but semantically one uses the host and the other the
default device.


Therefore, GCC uses:
(A) API routines - use omp_initial_device == -1 as value.

(B) Directives - use -1 for no clause (= backward compatible), using the 
default device.

Using -2 for omp_initial_device.


Hence, the following defines exist:

#define GOMP_DEVICE_ICV -1
#define GOMP_DEVICE_HOST_FALLBACK   -2
#define GOMP_DEVICE_INVALID -4


If you call an OpenMP runtime API routine, you need to use -1 for the 
initial device and for GOMP_* functions related to directives -2 using

GOMP_DEVICE_HOST_FALLBACK, when constructing it manually.

Code wise, GCC handles num_device(n) by generating code like:
  if !num_device
devnum = GOMP_DEVICE_ICV;
  else
devnum = (n == -1) ? GOMP_DEVICE_HOST_FALLBACK : n;

That's not ideal but one solution to handle backward compatibility.



Inside libgomp/target.c, there is:

  resolve_device (int device_id, bool remapped)

and 'remapped' is
- 'false' for OpenMP API routines and
- 'true' for GOMP_* calls.

The following code in resolve_device does then undo the '-1':

  if (remapped && device_id == GOMP_DEVICE_ICV)
  device_id = icv->default_device_var;
  remapped = false;
  if (device_id < 0)
  if (device_id == (remapped ? GOMP_DEVICE_HOST_FALLBACK
 : omp_initial_device))
return NULL;

* * *

Now coming back to your code:

If you call
  resolve_device
directly, using the GOMP_* variant makes sense, i.e. passing
the device number as is with 'remap = true'. This also makes
sense for consistency with the remaining code.

Downside: This requires to add
  (n == -1) ? -2 : n
for user-specified 'n'.


If you handle the device_num resolution yourself in libgomp, you have 
two variants to chose from:


(a) using a different value to denote the default-device (e.g. '-2' or 
'-3')  and pass it as is


(b) call resolve_device with remapping in libgomp, but handling -1 for 
the default device as '(n == -1) ? -2 : n' during code gen


I think either works - and either variant is confusing in one way or the
other.

* * *

Jumping to:

[PATCH v2 03/12] libgomp: runtime support for target_device selector

libgomp/target.c:


+bool
+GOMP_evaluate_target_device (int device_num, const char *kind,
+const char *arch, const char *isa)
+{


If you do the remapping, you could just use:

  struct gomp_device_descr *devicep = resolve_device (device, true);
  if (kind && strcmp (kind, "any") == 0)
kind = NULL;
  if (devicep == NULL)
result = GOMP_evaluate_current_device (kind, arch, isa);
  else
result = device->evaluate_device_func (device_num, kind, arch, isa);

which seems to be simpler than the code you have.

If you don't do the remapping:


+  bool result = true;
+
+  /* -2 is a magic number to indicate the device number was not specified;
+ in that case it's supposed to use the default device.  */
+  if (device_num == -2)
+device_num = omp_get_default_device ();


… then you need to handle -2 yourself.


+  if (kind && strcmp (kind, "any") == 0)
+kind = NULL;
+
+  gomp_debug (1, "%s: device_num = %u, kind=%s, arch=%s, isa=%s",
+ __FUNCTION__, device_num, kind, arch, isa);
+
+  if (omp_get_device_num () == device_num)
+result = GOMP_evaluate_current_device (kind,

x86_64-gnu-linux bootstrap fail (was: [PATCH v2 2/6] Extract ix86 dllimport implementation to mingw)

2024-06-25 Thread Tobias Burnus


Hi Evgeny,

I am not sure whether I have chosen the right email in the thread but:
a x86-64 GNU Linux build currently fails as follows.

At a glance, it seems to be sufficient to remove the prototype 
declaration in i386.cc.


Namely:

gcc/config/i386/i386.cc:107:12: error: 'rtx_def* 
legitimize_dllimport_symbol(rtx, bool)' declared 'static' but never 
defined [-Werror=unused-function]

  107 | static rtx legitimize_dllimport_symbol (rtx, bool);
  |^~~

gcc/gcc/config/i386/i386.cc:108:12: error: 'rtx_def* 
legitimize_pe_coff_extern_decl(rtx, bool)' declared 'static' but never 
defined [-Werror=unused-function]

  108 | static rtx legitimize_pe_coff_extern_decl (rtx, bool);
  |^~
^Cmake[3]: *** [Makefile:2556: i386.o] Interrupt

There is:

config/i386/i386.cc:static rtx legitimize_dllimport_symbol (rtx, bool);
config/mingw/winnt-dll.cc:legitimize_dllimport_symbol (rtx symbol, bool 
want_reg)
config/mingw/winnt-dll.cc:  return legitimize_dllimport_symbol 
(addr, inreg);
config/mingw/winnt-dll.cc:rtx t = legitimize_dllimport_symbol 
(XEXP (XEXP (addr, 0), 0), inreg);



And:

config/i386/i386.cc:static rtx legitimize_pe_coff_extern_decl (rtx, bool);
config/mingw/winnt-dll.cc:legitimize_pe_coff_extern_decl (rtx symbol, 
bool want_reg)
config/mingw/winnt-dll.cc:return legitimize_pe_coff_extern_decl 
(addr, inreg);
config/mingw/winnt-dll.cc:  rtx t = legitimize_pe_coff_extern_decl 
(XEXP (XEXP (addr, 0), 0), inreg);


Tobias

[Patch, v2] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-06-21 Thread Tobias Burnus


[I messed up copying from the build system, picking up an old version.
Changes to v1 (bottom of the diff): fopen is no longer required.]

Tobias Burnus wrote:

mkoffload's generated .c file looks much nicer with '#embed'.

This patch depends on Jakub's #embed patch at
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html

It might be a tiny bit faster than currently (or not); however,
once #embed has a large-file mode, it should also speed up
the offloading compilation quit a bit.

OK for mainline, once '#embed' support is in?

Tobiasgcn/mkoffload.cc: Use #embed for including the generated ELF file

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (read_file): Remove.
	(process_obj): Generate C file that uses #embed.
	(main): Update call to it; remove no longer needed file I/O.

 gcc/config/gcn/mkoffload.cc | 72 -
 1 file changed, 12 insertions(+), 60 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 810298a799b..0c840318b2d 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -182,44 +182,6 @@ xputenv (const char *string)
   putenv (CONST_CAST (char *, string));
 }
 
-/* Read the whole input file.  It will be NUL terminated (but
-   remember, there could be a NUL in the file itself.  */
-
-static const char *
-read_file (FILE *stream, size_t *plen)
-{
-  size_t alloc = 16384;
-  size_t base = 0;
-  char *buffer;
-
-  if (!fseek (stream, 0, SEEK_END))
-{
-  /* Get the file size.  */
-  long s = ftell (stream);
-  if (s >= 0)
-	alloc = s + 100;
-  fseek (stream, 0, SEEK_SET);
-}
-  buffer = XNEWVEC (char, alloc);
-
-  for (;;)
-{
-  size_t n = fread (buffer + base, 1, alloc - base - 1, stream);
-
-  if (!n)
-	break;
-  base += n;
-  if (base + 1 == alloc)
-	{
-	  alloc *= 2;
-	  buffer = XRESIZEVEC (char, buffer, alloc);
-	}
-}
-  buffer[base] = 0;
-  *plen = base;
-  return buffer;
-}
-
 /* Parse STR, saving found tokens into PVALUES and return their number.
Tokens are assumed to be delimited by ':'.  */
 
@@ -725,31 +687,27 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 /* Embed an object file into a C source file.  */
 
 static void
-process_obj (FILE *in, FILE *cfile, uint32_t omp_requires)
+process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires)
 {
-  size_t len = 0;
-  const char *input = read_file (in, );
-
   /* Dump out an array containing the binary.
  FIXME: do this with objcopy.  */
-  fprintf (cfile, "static unsigned char gcn_code[] = {");
-  for (size_t i = 0; i < len; i += 17)
-{
-  fprintf (cfile, "\n\t");
-  for (size_t j = i; j < i + 17 && j < len; j++)
-	fprintf (cfile, "%3u,", (unsigned char) input[j]);
-}
-  fprintf (cfile, "\n};\n\n");
+  fprintf (cfile,
+	   "static unsigned char gcn_code[] = {\n"
+	   "#if defined(__STDC_EMBED_FOUND__) && __has_embed (\"%s\") == __STDC_EMBED_FOUND__\n"
+	   "#embed \"%s\"\n"
+	   "#else\n"
+	   "#error \"#embed '%s' failed\"\n"
+	   "#endif\n"
+	   "};\n\n", fname_in, fname_in, fname_in);
 
   fprintf (cfile,
 	   "static const struct gcn_image {\n"
 	   "  size_t size;\n"
 	   "  void *image;\n"
 	   "} gcn_image = {\n"
-	   "  %zu,\n"
+	   "  sizeof(gcn_code),\n"
 	   "  gcn_code\n"
-	   "};\n\n",
-	   len);
+	   "};\n\n");
 
   fprintf (cfile,
 	   "static const struct gcn_data {\n"
@@ -1312,13 +1270,7 @@ main (int argc, char **argv)
   fork_execute (ld_argv[0], CONST_CAST (char **, ld_argv), true, ".ld_args");
   obstack_free (_argv_obstack, NULL);
 
-  in = fopen (gcn_o_name, "r");
-  if (!in)
-	fatal_error (input_location, "cannot open intermediate gcn obj file");
-
-  process_obj (in, cfile, omp_requires);
-
-  fclose (in);
+  process_obj (gcn_o_name, cfile, omp_requires);
 
   xputenv (concat ("GCC_EXEC_PREFIX=", execpath, NULL));
   xputenv (concat ("COMPILER_PATH=", cpath, NULL));

[Patch] gcn/mkoffload.cc: Use #embed for including the generated ELF file

2024-06-21 Thread Tobias Burnus


mkoffload's generated .c file looks much nicer with '#embed'.

This patch depends on Jakub's #embed patch at
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html

It might be a tiny bit faster than currently (or not); however,
once #embed has a large-file mode, it should also speed up
the offloading compilation quit a bit.

OK for mainline, once '#embed' support is in?

Tobias
gcn/mkoffload.cc: Use #embed for including the generated ELF file

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (read_file): Remove.
	(process_obj): Generate C file that uses #embed.
	(main): Update call to it; remove no longer needed file I/O.

 gcc/config/gcn/mkoffload.cc | 66 +
 1 file changed, 12 insertions(+), 54 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 810298a799b..0ccb874398a 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -182,44 +182,6 @@ xputenv (const char *string)
   putenv (CONST_CAST (char *, string));
 }
 
-/* Read the whole input file.  It will be NUL terminated (but
-   remember, there could be a NUL in the file itself.  */
-
-static const char *
-read_file (FILE *stream, size_t *plen)
-{
-  size_t alloc = 16384;
-  size_t base = 0;
-  char *buffer;
-
-  if (!fseek (stream, 0, SEEK_END))
-{
-  /* Get the file size.  */
-  long s = ftell (stream);
-  if (s >= 0)
-	alloc = s + 100;
-  fseek (stream, 0, SEEK_SET);
-}
-  buffer = XNEWVEC (char, alloc);
-
-  for (;;)
-{
-  size_t n = fread (buffer + base, 1, alloc - base - 1, stream);
-
-  if (!n)
-	break;
-  base += n;
-  if (base + 1 == alloc)
-	{
-	  alloc *= 2;
-	  buffer = XRESIZEVEC (char, buffer, alloc);
-	}
-}
-  buffer[base] = 0;
-  *plen = base;
-  return buffer;
-}
-
 /* Parse STR, saving found tokens into PVALUES and return their number.
Tokens are assumed to be delimited by ':'.  */
 
@@ -725,31 +687,27 @@ process_asm (FILE *in, FILE *out, FILE *cfile)
 /* Embed an object file into a C source file.  */
 
 static void
-process_obj (FILE *in, FILE *cfile, uint32_t omp_requires)
+process_obj (const char *fname_in, FILE *cfile, uint32_t omp_requires)
 {
-  size_t len = 0;
-  const char *input = read_file (in, );
-
   /* Dump out an array containing the binary.
  FIXME: do this with objcopy.  */
-  fprintf (cfile, "static unsigned char gcn_code[] = {");
-  for (size_t i = 0; i < len; i += 17)
-{
-  fprintf (cfile, "\n\t");
-  for (size_t j = i; j < i + 17 && j < len; j++)
-	fprintf (cfile, "%3u,", (unsigned char) input[j]);
-}
-  fprintf (cfile, "\n};\n\n");
+  fprintf (cfile,
+	   "static unsigned char gcn_code[] = {\n"
+	   "#if defined(__STDC_EMBED_FOUND__) && __has_embed (\"%s\") == __STDC_EMBED_FOUND__\n"
+	   "#embed \"%s\"\n"
+	   "#else\n"
+	   "#error \"#embed '%s' failed\"\n"
+	   "#endif\n"
+	   "};\n\n", fname_in, fname_in, fname_in);
 
   fprintf (cfile,
 	   "static const struct gcn_image {\n"
 	   "  size_t size;\n"
 	   "  void *image;\n"
 	   "} gcn_image = {\n"
-	   "  %zu,\n"
+	   "  sizeof(gcn_code),\n"
 	   "  gcn_code\n"
-	   "};\n\n",
-	   len);
+	   "};\n\n");
 
   fprintf (cfile,
 	   "static const struct gcn_data {\n"
@@ -1316,7 +1274,7 @@ main (int argc, char **argv)
   if (!in)
 	fatal_error (input_location, "cannot open intermediate gcn obj file");
 
-  process_obj (in, cfile, omp_requires);
+  process_obj (gcn_o_name, cfile, omp_requires);
 
   fclose (in);

[Patch] OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

2024-06-21 Thread Tobias Burnus


Hi all,

it turned out that 'declare target' with 'link' clause was broken in multiple 
ways.

The main fix is the attached patch, i.e. namely pushing the variables already to
the offload-vars list already in the FE.

When implementing it, I noticed:
* C has a similar issue when using nested functions, which is
  a GNU extension →https://gcc.gnu.org/115574

* When doing partial mapping of arrays (which is one of the reasons for 'link'),
  offsets are mishandled in Fortran (not tested in C), see FIXME in the patch)
  There: arr2(10) should print 10 but with map(arr2(10:)) it prints 19.
  (I will file a PR about this).

* It might happen that linked variables do not get linked. I have not 
investigated
  why, but 'arr2' gives link errors – while 'arr' works.
  See FIXME in the patch. (I will file a PR about this)

* For COMMON blocks, map(/common/) is rejected,https://gcc.gnu.org/PR115577

* When then mapping map(a,b,c) which is identical for 'common /mycom/ a,b,c',
  it fails to link the device side as the 'mycom_' symbol cannot be found on the
  device side.  (I will file a PR about this)

As COMMON as issues, an alternative would be to defer the trans-common.cc
changes to a later patch.

Comments, questions, concerns?

Tobias

PS: Tested with nvptx offloading with a page-migration supporting system with
nvptx and GCN offloading configured and no new fails observed.
OpenMP/Fortran: Fix handling of 'declare target' with 'link' clause [PR11555]

Contrary to a normal 'declare target', the 'declare target link' attribute
also needs to set node->offloadable and push the offload_vars in the front end.

Linked variables require that the data is mapped. For module variables, this
can happen anywhere. For variables in an external subprograms or the main
programm, this can only happen in the either that program itself or in an
internal subprogram. - Whether a variable is just normally mapped or linked then
becomes relevant if a device routine exists that can access that variable,
i.e. an internal procedure has then to be marked as declare target.

	PR fortran/115559

gcc/fortran/ChangeLog:

	* trans-common.cc (build_common_decl): Add 'omp declare target' and
	'omp declare target link' variables to offload_vars.
	* trans-decl.cc (add_attributes_to_decl): Likewise; update args and
	call decl_attributes.
	(get_proc_pointer_decl, gfc_get_extern_function_decl,
	build_function_decl): Update calls.
	(gfc_get_symbol_decl): Likewise; move after 'DECL_STATIC (t)=1'
	to avoid errors with symtab_node::get_create.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/declare-target-link.f90: New test.

 gcc/fortran/trans-common.cc|  21 
 gcc/fortran/trans-decl.cc  |  81 +-
 .../libgomp.fortran/declare-target-link.f90| 119 +
 3 files changed, 195 insertions(+), 26 deletions(-)

diff --git a/gcc/fortran/trans-common.cc b/gcc/fortran/trans-common.cc
index 5f44e7bd663..e714342c3c0 100644
--- a/gcc/fortran/trans-common.cc
+++ b/gcc/fortran/trans-common.cc
@@ -98,6 +98,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "coretypes.h"
 #include "tm.h"
 #include "tree.h"
+#include "cgraph.h"
+#include "context.h"
+#include "omp-offload.h"
 #include "gfortran.h"
 #include "trans.h"
 #include "stringpool.h"
@@ -497,6 +500,24 @@ build_common_decl (gfc_common_head *com, tree union_type, bool is_init)
 	  = tree_cons (get_identifier ("omp declare target"),
 		   omp_clauses, DECL_ATTRIBUTES (decl));
 
+  if (com->omp_declare_target_link || com->omp_declare_target)
+	{
+	  /* Add to offload_vars; get_create does so for omp_declare_target,
+	 omp_declare_target_link requires manual work.  */
+	  gcc_assert (symtab_node::get (decl) == 0);
+	  symtab_node *node = symtab_node::get_create (decl);
+	  if (node != NULL && com->omp_declare_target_link)
+	{
+	  node->offloadable = 1;
+	  if (ENABLE_OFFLOADING)
+		{
+		  g->have_offload = true;
+		  if (is_a  (node))
+		vec_safe_push (offload_vars, decl);
+		}
+	}
+	}
+
   /* Place the back end declaration for this common block in
  GLOBAL_BINDING_LEVEL.  */
   gfc_map_of_all_commons[identifier] = pushdecl_top_level (decl);
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 8d4f06a4e1d..4067dd6ed77 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -46,7 +46,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "trans-stmt.h"
 #include "gomp-constants.h"
 #include "gimplify.h"
+#include "context.h"
 #include "omp-general.h"
+#include "omp-offload.h"
 #include "attr-fnspec.h"
 #include "tree-iterator.h"
 #include "dependency.h"
@@ -1470,19 +1472,18 @@ gfc_add_assign_aux_vars (gfc_symbol * sym)
 }
 
 
-static tree
-add_attributes_to_decl (symbol_attribute sym_attr, tree list)
+static void
+add_attributes_to_decl (tree *decl_p, const gfc_symbol *sym)
 {
   unsigned id;
-  tree attr;
+  tree list =

Re: [PATCH v5 2/6] libgomp, openmp: Add ompx_gnu_pinned_mem_alloc

2024-06-12 Thread Tobias Burnus


Andrew Stubbs wrote:

Compared to the previous v4 (1/5) posting of this patch:
- The enumeration of the ompx allocators have been moved (again) to 200
   (as 100 is already in use by another toolchain vendor and this seems
   like a possible source of confusion).
- The "ompx" has also been changed to "ompx_gnu" to highlight that these
   are specifically GNU extensions.
- The failure mode of the testcases had been modified, including adding
   an abort in CHECK_SIZE and skipping the test on unsupported platforms.
- The OMP_ALLOCATE environment variable now supports the new allocator.
- The Fortran frontend allows use of the new allocator in "allocator"
   clauses.

---

This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  This is not in the OpenMP standard so it uses the "ompx"
namespace and an independent enum baseline of 200 (selected to not clash with
other known implementations).

The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.  One motivation for having this feature is
for use by the (planned) -foffload-memory=pinned feature.


The patch LGTM.

Thanks!

Tobias

gcc/fortran/ChangeLog:

* openmp.cc (is_predefined_allocator): Update valid ranges to
  incorporate ompx_gnu_pinned_mem_alloc.

libgomp/ChangeLog:

* allocator.c (ompx_gnu_min_predefined_alloc): New.
(ompx_gnu_max_predefined_alloc): New.
(predefined_alloc_mapping): Rename to ...
(predefined_omp_alloc_mapping): ... this.
(predefined_ompx_gnu_alloc_mapping): New.
(_Static_assert): Adjust for the new name, and add a new assert for the
new table.
(predefined_allocator_p): New.
(predefined_alloc_mapping): New.
(omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc.
Use predefined_allocator_p and predefined_alloc_mapping.
(omp_free): Likewise.
(omp_alligned_calloc): Likewise.
(omp_realloc): Likewise.
* env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc.
* libgomp.texi: Document ompx_gnu_pinned_mem_alloc.
* omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc.
* omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/allocate-pinned-1.f90: New test.

Co-Authored-By: Thomas Schwinge
---
  gcc/fortran/openmp.cc |  11 +-
  .../gfortran.dg/gomp/allocate-pinned-1.f90|  16 +++
  libgomp/allocator.c   | 115 +-
  libgomp/env.c |   1 +
  libgomp/libgomp.texi  |   7 +-
  libgomp/omp.h.in  |   1 +
  libgomp/omp_lib.f90.in|   2 +
  libgomp/omp_lib.h.in  |   2 +
  libgomp/testsuite/libgomp.c/alloc-pinned-5.c  | 100 +++
  libgomp/testsuite/libgomp.c/alloc-pinned-6.c  | 102 
  .../libgomp.fortran/alloc-pinned-1.f90|  16 +++
  11 files changed, 336 insertions(+), 37 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/gomp/allocate-pinned-1.f90
  create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-5.c
  create mode 100644 libgomp/testsuite/libgomp.c/alloc-pinned-6.c
  create mode 100644 libgomp/testsuite/libgomp.fortran/alloc-pinned-1.f90

Re: [PATCH v5 1/6] libgomp: change alloc-pinned tests failure mode

2024-06-12 Thread Tobias Burnus


Andrew Stubbs wrote:

The feature doesn't work on non-Linux hosts, at present, so skip the tests
entirely.

On Linux systems that have insufficient lockable memory configured we still
need to fail or else the feature won't be getting tested when we think it is,
but now there's a message to explain why.

libgomp/ChangeLog:

* testsuite/libgomp.c/alloc-pinned-1.c: Change dg-xfail-run-if to
dg-skip-if.
Correct spelling mistake.
Abort on insufficient lockable memory.
Use #error on non-linux hosts.
* testsuite/libgomp.c/alloc-pinned-2.c: Likewise.


LGTM. Thanks!

Tobias

Re: [Patch, PR Fortran/90072] Polymorphic Dispatch to Polymophic Return Type Memory Leak

2024-06-08 Thread Tobias Burnus


Andre Vehreschild wrote:

PS That's good news about the funding. Maybe we will get to see "built in"
coarrays soon?

You hopefully will see Nikolas work on the shared memory coarray support, if
that is what you mean by "built in" coarrays. I will be working on the
distributed memory coarray support esp. fixing the module issues and some other
team related things.


Cool! (Both of it.)

I assume "distributed memory coarray support" is still based on Open
Coarrays?

* * *

I am asking because there is coarray API being defined: Parallel Runtime
Interface for Fortran (PRIF), https://go.lbl.gov/prif

with an implementation called Caffeine – CoArray Fortran Framework of
Efficient Interfaces to Network Environments,
https://crd.lbl.gov/caffeine which uses GASNet or POSIX processes.

Well, the among the implementers is (unsurprising?) Damian – and the
idea seems to be that LLVM's FLANG will use the API.

Tobias

PS: I think it might be useful in the long run to support both
PRIF/Caffeine and OpenCoarrays.

I have attached my hello-world patch for -fcoarray=prif that I wrote
after ISC-HPC; it only handles this_image() / num_images() + init/stop.
I got confirmation by the PRIF developers that the next revision will
permit calling __prif_MOD_prif_init multiple times such that one can use
it in the constructor for static coarrays, which won't work otherwise.
gcc/ChangeLog:

	* flag-types.h (enum gfc_fcoarray):

gcc/fortran/ChangeLog:

	* invoke.texi:
	* lang.opt:
	* trans-decl.cc (gfc_build_builtin_function_decls):
	(create_main_function):
	* trans-intrinsic.cc (trans_this_image):
	(trans_num_images):
	* trans.h (GTY):

 gcc/flag-types.h   |  3 ++-
 gcc/fortran/invoke.texi|  7 +-
 gcc/fortran/lang.opt   |  5 +++-
 gcc/fortran/trans-decl.cc  | 56 --
 gcc/fortran/trans-intrinsic.cc | 42 +++
 gcc/fortran/trans.h|  5 
 6 files changed, 108 insertions(+), 10 deletions(-)

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 5a2b461fa75..babd747c01d 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -427,7 +427,8 @@ enum gfc_fcoarray
 {
   GFC_FCOARRAY_NONE = 0,
   GFC_FCOARRAY_SINGLE,
-  GFC_FCOARRAY_LIB
+  GFC_FCOARRAY_LIB,
+  GFC_FCOARRAY_PRIF
 };
 
 
diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 40e8e4a7cdd..331a40d31db 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -1753,7 +1753,12 @@ Single-image mode, i.e. @code{num_images()} is always one.
 
 @item @samp{lib}
 Library-based coarray parallelization; a suitable GNU Fortran coarray
-library needs to be linked.
+library needs to be linked such as @url{http://opencoarrays.org}.
+
+@item @samp{prif}
+Using the Parallel Runtime Interface for Fortran (PRIF),
+@url{https://go.lbl.gov/@/prif}; for instance, via Caffeine,
+@url{https://go.lbl.gov/@/caffeine}.
 @end table
 
 
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 5efd4a0129a..9ba957d5571 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -786,7 +786,7 @@ Copy array sections into a contiguous block on procedure entry.
 
 fcoarray=
 Fortran RejectNegative Joined Enum(gfc_fcoarray) Var(flag_coarray) Init(GFC_FCOARRAY_NONE)
--fcoarray=	Specify which coarray parallelization should be used.
+-fcoarray=	Specify which coarray parallelization should be used.
 
 Enum
 Name(gfc_fcoarray) Type(enum gfc_fcoarray) UnknownError(Unrecognized option: %qs)
@@ -800,6 +800,9 @@ Enum(gfc_fcoarray) String(single) Value(GFC_FCOARRAY_SINGLE)
 EnumValue
 Enum(gfc_fcoarray) String(lib) Value(GFC_FCOARRAY_LIB)
 
+EnumValue
+Enum(gfc_fcoarray) String(prif) Value(GFC_FCOARRAY_PRIF)
+
 fcheck=
 Fortran RejectNegative JoinedOrMissing
 -fcheck=[...]	Specify which runtime checks are to be performed.
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index dca7779528b..d1c0e2ee997 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -170,6 +170,10 @@ tree gfor_fndecl_co_sum;
 tree gfor_fndecl_caf_is_present;
 tree gfor_fndecl_caf_random_init;
 
+tree gfor_fndecl_prif_init;
+tree gfor_fndecl_prif_stop;
+tree gfor_fndecl_prif_this_image_no_coarray;
+tree gfor_fndecl_prif_num_images;
 
 /* Math functions.  Many other math functions are handled in
trans-intrinsic.cc.  */
@@ -4147,6 +4151,31 @@ gfc_build_builtin_function_decls (void)
 	get_identifier (PREFIX("caf_random_init")),
 	void_type_node, 2, logical_type_node, logical_type_node);
 }
+  else if (flag_coarray == GFC_FCOARRAY_PRIF)
+{
+  tree pint_type = build_pointer_type (integer_type_node);
+  tree pbool_type = build_pointer_type (boolean_type_node);
+  tree pintmax_type_node = get_typenode_from_name (INTMAX_TYPE);
+  pintmax_type_node = build_pointer_type (pintmax_type_node);
+
+  gfor_fndecl_prif_init = gfc_build_library_function_decl_with_spec (
+	get_identifier ("__prif_MOD_prif_init"), ". W ",
+	void_type_node, 1,

Re: [wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-08 Thread Tobias Burnus


Hi Gerald,

Gerald Pfeifer wrote:

Looks like a janitorial task to fix the absolute links, possibly
excluding those with /git, /onlinedocs, /wiki – or assuming that the
main page is GCC.gnu.org, relying on the redirects.

It's on my list. A first quick check indicates there isn't much to do,
though. :-)


You could consider

htdocs/search.html:

to avoid a redirect (but it is not a broken link);
otherwise, I but I concur that it seems to be (mostly) fine :-)

* * *


+  loop-transformation constructs are now supported.
I'm thinking "loop transformation" in English? Or is this a specific term
from the standard?

Loop transformation happens at the end. But e.g "(#pragma omp) unroll
full" is a directive and, e.g.
...
is a construct (= directive + structured block (if any) + end directive
(if any)).

I believe there was a misunderstanding and I wasn't clear enough: I was
wondering whether instead of "loop-transformation" the patch should have
"loop transformation".

In your response you use the version without dash, so I guess we agree?
:-)


(Pedantically it's a hyphen (-) and not a(n en/em) dash (–/—), i.e. '-' 
not '--' or '---' in TeX.)


No, we don't. – There is a difference whether the two words are used 
alone or as modifier to a noun, like the "this is well defined" vs. "a 
well-defined project".


Thus, while "loop transformation happens" is without hyphen (as we both 
agree),* for "loop(-| )tranformation constructs" the (non-)usage of 
hyphens is not well defined; grouping wise, those are clearly '((loop 
transformation) constructs)' and not '(loop (transformation constructs))'.


I believe both variants are perfectly fine.

BTW: In the OpenMP pre-6.0 draft (TR12), the verb 'transform' is now 
used as noun not with suffix '-ation' but with the suffix '-ing' (also 
referred to as gerund) such that a section title now uses 
"Loop-Transforming Constructs"; I think for '(word) plus (-ing word)' – 
used as modifier –, a hyphen is a tad more common than for '(word) plus 
'(word with -ation suffix)'.


Tobias

* The Oxford Guide to Style points out some words that do get 
hyphenated: clear-cut, drip-proof, take-off, part-time, … – or to refer 
to the abstract meaning rather than literal: bull's-eye, crow's-feet, … 
— Formerly, present particle plus noun got hyphenated when the compound 
was acted on: walking-stick, walking-frame. Likewise, it was formerly 
normal in British English to hyphenate a single adjectival noun and the 
noun it modified: note-cue, title-page, volume-number (less common now, 
but can linger in some combination). And until recently: small 
scale-factory (vs. small-scale factory), white water-lily (vs. 
white-water lily).

gcc-wwwdocs branch master updated. 4260d675af42b9c97e29818ab3b3154d27103d49

2024-06-07 Thread Tobias Burnus via Gcc-cvs-wwwdocs

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  4260d675af42b9c97e29818ab3b3154d27103d49 (commit)
  from  8507122b38e6b60e8f2f3c8cd339d4f318377203 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit 4260d675af42b9c97e29818ab3b3154d27103d49
Author: Tobias Burnus 
Date:   Fri Jun 7 10:06:52 2024 +0200

gcc-15/changes.html + projects/gomp: update for new OpenMP features

GCC 15 now supports unified-shared memory and the tile/unroll constructs
in OpenMP.

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index 0ea7bdec..a121f40a 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -40,6 +40,24 @@ a work-in-progress.
 
 New Languages and Language specific improvements
 
+
+  OpenMP
+  
+
+  Support for unified-shared memory has been added for some AMD and Nvidia
+  GPU devices, enabled when using the unified_shared_memory
+  clause to the requires directive. For details,
+  see the offload-target specifics section in the
+  https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html;
+  >GNU Offloading and Multi Processing Runtime Library Manual.
+
+
+  OpenMP 5.1: The unroll and tile
+  loop-transformation constructs are now supported.
+
+  
+
+
 
 
 
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 94bda5ff..d1765fc3 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -313,18 +313,21 @@ than listed, depending on resolved corner cases and 
optimizations.
   
   
 requires directive
-
+
   GCC9
   GCC12
   GCC13
-  GCC14
+  GCC14
+  GCC15
 
 
   (atomic_default_mem_order)
   (dynamic_allocators)
   complete but no non-host devices provides unified_address or
   unified_shared_memory
-  complete but no non-host devices provides 
unified_shared_memory
+  complete but no non-host devices provides 
unified_shared_memory
+  complete; see also https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html;>
+  Offload-Target Specifics
 
   
   
@@ -706,7 +709,7 @@ than listed, depending on resolved corner cases and 
optimizations.
   
   
 Loop transformation constructs
-No
+GCC15
 
   
   

---

Summary of changes:
 htdocs/gcc-15/changes.html  | 18 ++
 htdocs/projects/gomp/index.html | 11 +++
 2 files changed, 25 insertions(+), 4 deletions(-)


hooks/post-receive
-- 
gcc-wwwdocs

gcc-wwwdocs branch master updated. 8507122b38e6b60e8f2f3c8cd339d4f318377203

2024-06-07 Thread Tobias Burnus via Gcc-cvs-wwwdocs

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  8507122b38e6b60e8f2f3c8cd339d4f318377203 (commit)
  from  1db5b34eb8cf47f070f643f993d835149bce2ec7 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit 8507122b38e6b60e8f2f3c8cd339d4f318377203
Author: Tobias Burnus 
Date:   Fri Jun 7 09:58:52 2024 +0200

gcc-15/changes.html (nvptx): Constructors are now supported

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index b59fd3be..0ea7bdec 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -85,7 +85,14 @@ a work-in-progress.
 
 
 
-
+NVPTX
+
+
+  GCC's nvptx target now supports constructors and destructors.
+  For this, a recent version of https://gcc.gnu.org/install/specific.html#nvptx-x-none;
+  >nvptx-tools is required.
+
 
 
 

---

Summary of changes:
 htdocs/gcc-15/changes.html | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)


hooks/post-receive
-- 
gcc-wwwdocs

Re: [wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-06 Thread Tobias Burnus


Hi Gerald,

Gerald Pfeifer wrote:

+++ b/htdocs/gcc-15/changes.html
+
+  https://gcc.gnu.org/projects/gomp/;>OpenMP

Can you please make this a relative link, i.e. "../projects/gomp/"?


Good point. I thought such links should be absolute because of 
(www.)GNU.org, i.e.


https://www.gnu.org/software/gcc/releases.html

... but also that page has https://www.gnu.org/software/gcc/projects/gomp/

GNU.org does not have the documentation, but going to 
https://www.gnu.org/software/gcc/onlinedocs/ or a subpage redirects (302 
temporary redirect) to the GCC website. Likewise for '../git' but for 
'../wiki' it has a HTTP 404 not found; fortunately, ../wiki/ works.


I think there are plenty of links which could be relative ones but are 
absolute ones.


Looks like a janitorial task to fix the absolute links, possibly 
excluding those with /git, /onlinedocs, /wiki – or assuming that the 
main page is GCC.gnu.org, relying on the redirects.


In any case, those links are probably broken on GNU.org:

htdocs/gcc-14/porting_to.html:href="/onlinedocs/gcc-14.1.0/gcc/Diagnostic-Pragmas.html">#pragma 
GCC diagnostic warning


htdocs/gcc-5/changes.html:    A href="/onlinedocs/libstdc++/manual/using_dual_abi.html">Dual


* * *


+
+  OpenMP 5.1: The unroll and tile
+  loop-transformation constructs are now supported.
+

I'm thinking "loop transformation" in English? Or is this a specific term
from the standard?


Loop transformation happens at the end. But e.g "(#pragma omp) unroll 
full" is a directive and, e.g.


#pragma omp unroll partial(2)

for (int i=0; i < n; i++)

a[i] = 5;

is a construct (= directive + structured block (if any) + end directive 
(if any)).


Tobias

Re: [committed] nvptx, libgfortran: Switch out of "minimal" mode

2024-06-06 Thread Tobias Burnus


Sandra Loosemore wrote:

On 6/6/24 06:06, Tobias Burnus wrote:
+@item I/O within OpenMP target regions and OpenACC compute regions 
is supported

+  using the C library @code{printf} functions.
+  Additionally, the Fortran @code{print}/@code{write} 
statements are
+  supported within OpenMP target regions, but not yet OpenACC 
compute
+  regions.  @c The latter needs 
'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'.




I think an "in" (or 'within') is missing before OpenACC.


Yes, "...not yet within OpenACC compute regions", please.


Thanks! Committed as https://gcc.gnu.org/r15-1072-g423522aacd9f30

Tobias

[gcc r15-1072] libgomp.texi (nvptx): Add missing preposition

2024-06-06 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:423522aacd9f30bb75aa77d38fccb630bfc4c98a

commit r15-1072-g423522aacd9f30bb75aa77d38fccb630bfc4c98a
Author: Tobias Burnus 
Date:   Thu Jun 6 16:37:55 2024 +0200

libgomp.texi (nvptx): Add missing preposition

libgomp/
* libgomp.texi (nvptx): Add missing preposition.

Diff:
---
 libgomp/libgomp.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index eb608915938..73e8e39ca42 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6432,7 +6432,7 @@ The implementation remark:
 @item I/O within OpenMP target regions and OpenACC compute regions is supported
   using the C library @code{printf} functions.
   Additionally, the Fortran @code{print}/@code{write} statements are
-  supported within OpenMP target regions, but not yet OpenACC compute
+  supported within OpenMP target regions, but not yet within OpenACC 
compute
   regions.  @c The latter needs 'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'.
 @item Compilation OpenMP code that contains @code{requires reverse_offload}
   requires at least @code{-march=sm_35}, compiling for @code{-march=sm_30}

Re: [committed] nvptx, libgfortran: Switch out of "minimal" mode

2024-06-06 Thread Tobias Burnus


Hi Thomas,

regarding the commit r15-1070-g3a4775d4403f2e / https://gcc.gnu.org/r15-1070

First, thanks for adding I/O support to nvptx offloading.

I have a wording nit, to be confirmed by a native speaker:


--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi

...

+@item I/O within OpenMP target regions and OpenACC compute regions is 
supported

+  using the C library @code{printf} functions.
+  Additionally, the Fortran @code{print}/@code{write} statements are
+  supported within OpenMP target regions, but not yet OpenACC compute
+  regions.  @c The latter needs 
'GOMP_NVPTX_NATIVE_GPU_THREAD_STACK_SIZE'.




I think an "in" (or 'within') is missing before OpenACC.

Otherwise, it seemed to fine at a glance – and I am happy that that 
feature now finally works :-)


Hooray, no longer using reverse offload ("!$omp target 
device(ancestor:1)") for Fortran I/O when debugging.


Thanks,

Tobias

Re: [PATCH v4 1/5] libgomp, openmp: Add ompx_pinned_mem_alloc

2024-06-06 Thread Tobias Burnus


Hi Andrew, hi Jakub, hello world,

Andrew Stubbs wrote:


Compared to the previous v3 posting of this patch, the enumeration of
the "ompx" allocators have been moved to start at "100"


100 is a bad value - as can be seen below.

As Jakub suggested at 
https://gcc.gnu.org/pipermail/gcc-patches/2023-December/640432.html
"given that LLVM uses 100-102 range, perhaps pick a different one, 200 or 150"

(I know that the first review email suggested 100.)


This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP.  The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.


Namely: ompx_pinned_mem_alloc

RFC: Should we use this name or - similar to LLVM - prefix this by
a vendor prefix instead (gnu_omp_ or gcc_omp_ instead of ompx_)?

IMHO it is fine to use ompx_ for pinned as the semantic is clear
and should be compatible with IBM and AMD.

For other additional memspaces / allocators, I am less sure, i.e.
on OG13 there are:
- ompx_unified_shared_mem_space, ompx_host_mem_space
- ompx_unified_shared_mem_alloc, ompx_host_mem_alloc

(BTW: In light of TR13 naming, the USM one could be
..._devices_all_mem_{alloc,space}, just to start some bikeshading
or following LLVM + Intel '…target_{host,shared}…'.)

* * *

Looking at other compilers:

IBM's compiler, https://www.ibm.com/docs/en/SSXVZZ_16.1.1/pdf/compiler.pdf , 
has:
- ompx_pinned_mem_alloc, tagged as IBM extension and otherwise without 
documenting it further

Checking omp.h, they define it as:
  ompx_pinned_mem_alloc = 9, /* Preview of host pinned memory support */
and additionally have:
  LOMP_MAX_MEM_ALLOC = 1024,

AMD's compiler based on clang has:
  /* Preview of pinned memory support */
  ompx_pinned_mem_alloc = 120,
in addition to the LLVM defines shown below.

Regarding LLVM:
- they don't offer 'pinned'
- they use the prefix 'llvm_omp' not 'ompx'

Namely:
typedef enum omp_allocator_handle_t
...
  llvm_omp_target_host_mem_alloc = 100,
  llvm_omp_target_shared_mem_alloc = 101,
  llvm_omp_target_device_mem_alloc = 102,
...
typedef enum omp_memspace_handle_t
...
  llvm_omp_target_host_mem_space = 100,
  llvm_omp_target_shared_mem_space = 101,
  llvm_omp_target_device_mem_space = 102,

Remark: I did not find a documentation - and while I
understand in principle host and shared, I wonder how
LLVM handles 'device_mem_space' when there is more than
one device.

BTW: OpenMP TR13 avoids this issue by adding two sets of
API routines. Namely:

First, for memspaces,
- omp_get_{device,devices}_memspace
- omp_get_{device,devices}_and_host_memspace
- omp_get_devices_all_memspace

and, secondly, for allocators:
- omp_get_{device,devices}_allocator
- omp_get_{device,devices}_and_host_allocator
- omp_get_devices_all_allocator

where omp_get_device_* takes a single device number and
omp_get_devices_* a list of device numbers while _and_host
automatically adds the initial device to the list.

* * *

Looking at Intel, they even use extensions without prefix:

omp_target_{host,shared,device}_mem_{space,alloc}

and contrary to LLVM they document it with the semantic, cf.
https://www.intel.com/content/www/us/en/docs/dpcpp-cpp-compiler/developer-guide-reference/2023-1/openmp-memory-spaces-and-allocators.html

* * *


The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.


...


diff --git a/libgomp/allocator.c b/libgomp/allocator.c
index cdedc7d80e9..18e3f525ec6 100644
--- a/libgomp/allocator.c
+++ b/libgomp/allocator.c
@@ -99,6 +99,8 @@ GOMP_is_alloc (void *ptr)


...


   #define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))
-_Static_assert (ARRAY_SIZE (predefined_alloc_mapping)
+_Static_assert (ARRAY_SIZE (predefined_omp_alloc_mapping)
== omp_max_predefined_alloc + 1,
-   "predefined_alloc_mapping must match omp_memspace_handle_t");
+   "predefined_omp_alloc_mapping must match 
omp_memspace_handle_t");
+#define ARRAY_SIZE(A) (sizeof (A) / sizeof ((A)[0]))


I am surprised that this compiles: Why do you re-#define this macro?

* * *


--- a/libgomp/omp.h.in
+++ b/libgomp/omp.h.in
@@ -134,6 +134,7 @@ typedef enum omp_allocator_handle_t __GOMP_UINTPTR_T_ENUM
 omp_cgroup_mem_alloc = 6,
 omp_pteam_mem_alloc = 7,
 omp_thread_mem_alloc = 8,
+  ompx_pinned_mem_alloc = 100,


See remark regarding "100" at the top of this email.


--- a/libgomp/omp_lib.f90.in
+++ b/libgomp/omp_lib.f90.in
+integer (kind=omp_allocator_handle_kind), &
+ parameter :: ompx_pinned_mem_alloc = 100


Likewise.

* * *

Why didn't you also update omp_lib.h.in?

* * *

I think you really want to update the checking code inside GCC itself,

i.e. for Fortran:

3 |   !$omp allocate(a) allocator(100)

  | 21

Error: Predefined allocator required in ALLOCATOR clause at (1) as the list 
item 'a' at (2) has the

[wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-06 Thread Tobias Burnus


GCC 15 now supports unified-shared memory and the tile/unroll constructs
in OpenMP.

Updates https://gcc.gnu.org/gcc-15/changes.html
and https://gcc.gnu.org/projects/gomp/

Comments?

Tobias
gcc-15/changes.html + projects/gomp: update for new OpenMP features

GCC 15 now supports unified-shared memory and the tile/unroll constructs
in OpenMP.

 htdocs/gcc-15/changes.html  | 27 ++-
 htdocs/projects/gomp/index.html | 11 +++
 2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index b59fd3be..94528ebd 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -40,6 +40,24 @@ a work-in-progress.
 
 New Languages and Language specific improvements
 
+
+  https://gcc.gnu.org/projects/gomp/;>OpenMP
+  
+
+  Support for unified-shared memory has been added for some AMD and Nvidia
+  GPUs devices, enabled only when using the
+  unified_shared_memory clause to the requires
+  directive. For details, see the offload-target specifics section in the
+  https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html;
+  >GNU Offloading and Multi Processing Runtime Library Manual.
+
+
+  OpenMP 5.1: The unroll and tile
+  loop-transformation constructs are now supported.
+
+  
+
+
 
 
 
diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 94bda5ff..d1765fc3 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -313,18 +313,21 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 requires directive
-
+
   GCC9
   GCC12
   GCC13
-  GCC14
+  GCC14
+  GCC15
 
 
   (atomic_default_mem_order)
   (dynamic_allocators)
   complete but no non-host devices provides unified_address or
   unified_shared_memory
-  complete but no non-host devices provides unified_shared_memory
+  complete but no non-host devices provides unified_shared_memory
+  complete; see also https://gcc.gnu.org/onlinedocs/libgomp/Offload-Target-Specifics.html;>
+  Offload-Target Specifics
 
   
   
@@ -706,7 +709,7 @@ than listed, depending on resolved corner cases and optimizations.
   
   
 Loop transformation constructs
-No
+GCC15

ping – Re: [wwwdocs] gcc-15/changes.html (nvptx): Constructors are now supported

2024-06-05 Thread Tobias Burnus

Regarding 
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653417.html , are 
there any …


Tobias Burnus wrote:

Comments or fine as is?


Tobias

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-05 Thread Tobias Burnus


Hi Andrew, hello world,

Now with AMD Instinct MI200 data - see below.

And a better look at the numbers. In terms of USM,
there does not seem to be any clear winner of both
approaches. If we want to draw conclusions, definitely
more runs are needed (statistics):

The runs below show that the differences between runs
can be larger than the effect of mapping vs. USM.
And that OG13's USM was be 40% slower on MI210
(compared with mainline or OG13 'map') while
mainline's USM is about as fast as 'map' (OG13 or mainline)
is not consistent with the MI250X result, were both USM are
slower with mainline's USM being much slower with ~30%
than OG13 with 12%.



Tobias Burnus wrote:


I have now tried it on my laptop with 
BabelStream,https://github.com/UoB-HPC/BabelStream

Compiling with:
echo "#pragma omp requires unified_shared_memory" > omp-usm.h
cmake -DMODEL=omp -DCMAKE_CXX_COMPILER=$HOME/projects/gcc-trunk-offload/bin/g++ 
\
   -DCXX_EXTRA_FLAGS="-g -include ../omp-usm.h -foffload=nvptx-none 
-fopenmp" -DOFFLOAD=ON ..

(and the variants: no -include (→ map) + -DOFFLOAD=OFF (= host), and with 
hostfallback,
via env var (or usm-14 by due to lacking support.)

For mainline, I get (either with libgomp.so of mainline or GCC 14, i.e. w/o USM 
support):
host-14.log 195.84user 0.94system 0 11.20elapsed 1755%CPU 
(0avgtext+0avgdata 1583268maxresident)k
host-mainline.log   200.16user 1.00system 0 11.89elapsed 1691%CPU 
(0avgtext+0avgdata 1583272maxresident)k
hostfallback-mainline.log   288.99user 4.57system 0 19.39elapsed 1513%CPU 
(0avgtext+0avgdata 1583972maxresident)k
usm-14.log  279.91user 5.38system 0 19.57elapsed 1457%CPU 
(0avgtext+0avgdata 1590168maxresident)k
map-14.log  4.17user 0.45system 0   03.58elapsed 129%CPU 
(0avgtext+0avgdata 1691152maxresident)k
map-mainline.log    4.15user 0.44system 0   03.58elapsed 128%CPU 
(0avgtext+0avgdata 1691260maxresident)k
usm-mainline.log    3.63user 1.96system 0   03.88elapsed 144%CPU 
(0avgtext+0avgdata 1692068maxresident)k

Thus: GPU is faster than host, host fallback takes 40% longer than doing host 
compilation.
USM is 15% faster than mapping.


Correction: I shouldn't look at user time but at elapsed time. For the 
latter, USM is 8% slower on mainline; hostfallback is ~70% slower than 
host execution.



With OG13, the pattern is similar, except that USM is only 3% faster.
Here, USM (elapsed) is 2.5% faster. It is a bit difficult to compare the 
results as OG13 is faster for mapping and USM, which makes 
distinguishing OG13 vs mainline performance and the two different USM 
approaches difficult.

host-og13.log   191.51user 0.70system 0 09.80elapsed 1960%CPU 
(0avgtext+0avgdata 1583280maxresident)k
map-hostfallback-og13.log   205.12user 1.09system 0 10.82elapsed 1905%CPU 
(0avgtext+0avgdata 1585092maxresident)k
usm-hostfallback-og13.log   338.82user 4.60system 0 19.34elapsed 1775%CPU 
(0avgtext+0avgdata 1584580maxresident)k
map-og13.log4.43user 0.42system 0   03.59elapsed 135%CPU 
(0avgtext+0avgdata 1692692maxresident)k
usm-og13.log4.31user 1.18system 0   03.68elapsed 149%CPU 
(0avgtext+0avgdata 1686256maxresident)k

* * *


As IT issues are now solved:

(A) On  AMD Instinct MI210 (gfx90a)

The host fallback is here very slow with elapsed time 24s vs. 1.6s for host 
execution.
map and USM seem to be in the same ballpark.
For two 'map' runs, I see a difference of 8%, the USM times are between those 
map results.

I see similar results for OG13 than mainline, except for USM which is ~40% 
slower (elapse time)
than map (OG13 or mainline - or mainline's USM).

host-mainline-2.log 194.00user 7.21system 0 01.44elapsed 13954%CPU 
(0avgtext+0avgdata 1320960maxresident)k
host-mainline.log   221.53user 5.58system 0 01.78elapsed 12716%CPU 
(0avgtext+0avgdata 1318912maxresident)k
hostfallback-mainline-1.log 3073.35user 146.22system 0  24.25elapsed 
13272%CPU (0avgtext+0avgdata 1644544maxresident)k
hostfallback-mainline-2.log 2268.62user 146.13system 0  23.39elapsed 
10320%CPU (0avgtext+0avgdata 1650544maxresident)k
map-mainline-1.log  5.38user 16.16system 0  03.00elapsed 716%CPU 
(0avgtext+0avgdata 1714936maxresident)k
map-mainline-2.log  5.12user 15.93system 0  02.74elapsed 768%CPU 
(0avgtext+0avgdata 1714932maxresident)k
usm-mainline-1.log  7.61user 2.30system 0   02.89elapsed 342%CPU 
(0avgtext+0avgdata 1716984maxresident)k
usm-mainline-2.log  7.75user 2.92system 0   02.89elapsed 369%CPU 
(0avgtext+0avgdata 1716980maxresident)k

host-og13-1.log 213.69user 6.37system 0 01.56elapsed 14026%CPU 
(0avgtext+0avgdata 1316864maxresident)k
hostfallback-map-og13-1.log 3026.68user 123.77system 0  23.69elapsed 
13295%CPU (0avgtext+0avgdata 1642496maxresident)k
hostfallback-map-og1

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-04 Thread Tobias Burnus


Andrew Stubbs wrote:


PS: I would love to do some comparisons [...]

Actually, I think testing only data transfer is fine for this, but we
might like to try some different access patterns, besides straight
linear copies.


I have now tried it on my laptop with 
BabelStream,https://github.com/UoB-HPC/BabelStream

Compiling with:
echo "#pragma omp requires unified_shared_memory" > omp-usm.h
cmake -DMODEL=omp -DCMAKE_CXX_COMPILER=$HOME/projects/gcc-trunk-offload/bin/g++ 
\
  -DCXX_EXTRA_FLAGS="-g -include ../omp-usm.h -foffload=nvptx-none 
-fopenmp" -DOFFLOAD=ON ..

(and the variants: no -include (→ map) + -DOFFLOAD=OFF (= host), and with 
hostfallback,
via env var (or usm-14 by due to lacking support.)

For mainline, I get (either with libgomp.so of mainline or GCC 14, i.e. w/o USM 
support):

host-14.log 195.84user 0.94system 0 11.20elapsed 1755%CPU 
(0avgtext+0avgdata 1583268maxresident)k
host-mainline.log   200.16user 1.00system 0 11.89elapsed 1691%CPU 
(0avgtext+0avgdata 1583272maxresident)k
hostfallback-mainline.log   288.99user 4.57system 0 19.39elapsed 1513%CPU 
(0avgtext+0avgdata 1583972maxresident)k
usm-14.log  279.91user 5.38system 0 19.57elapsed 1457%CPU 
(0avgtext+0avgdata 1590168maxresident)k
map-14.log  4.17user 0.45system 0   03.58elapsed 129%CPU 
(0avgtext+0avgdata 1691152maxresident)k
map-mainline.log    4.15user 0.44system 0   03.58elapsed 128%CPU 
(0avgtext+0avgdata 1691260maxresident)k
usm-mainline.log    3.63user 1.96system 0   03.88elapsed 144%CPU 
(0avgtext+0avgdata 1692068maxresident)k

Thus: GPU is faster than host, host fallback takes 40% longer than doing host 
compilation.
USM is 15% faster than mapping.


With OG13, the pattern is similar, except that USM is only 3% faster. Thus, HMM 
seems to win my my laptop.

host-og13.log   191.51user 0.70system 0 09.80elapsed 1960%CPU 
(0avgtext+0avgdata 1583280maxresident)k
map-hostfallback-og13.log   205.12user 1.09system 0 10.82elapsed 1905%CPU 
(0avgtext+0avgdata 1585092maxresident)k
usm-hostfallback-og13.log   338.82user 4.60system 0 19.34elapsed 1775%CPU 
(0avgtext+0avgdata 1584580maxresident)k
map-og13.log4.43user 0.42system 0   03.59elapsed 135%CPU 
(0avgtext+0avgdata 1692692maxresident)k
usm-og13.log4.31user 1.18system 0   03.68elapsed 149%CPU 
(0avgtext+0avgdata 1686256maxresident)k

* * *

I planned to try an AMD Instinct MI200 device, but due to two IT issues, I 
cannot.
(Shutdown for maintenance of the MI250X system and an NFS issues for the MI210 
run,
but being unable to reboot due to the absence of a colleague having tons of 
editors
still open).

Tobias

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Tobias Burnus


Andrew Stubbs wrote:

On 03/06/2024 17:46, Tobias Burnus wrote:

Andrew Stubbs wrote:

+    /* If USM has been requested and is supported by all devices
+   of this type, set the capability accordingly. */
+    if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)
+  current_device.capabilities |= GOMP_OFFLOAD_CAP_SHARED_MEM;
+


This breaks my USM patches that add the omp_alloc support (because 
it now short-circuits all of those code-paths),


which I believe is fine. Your USM patches are for pseudo-USM, i.e. a 
(useful) bandaid for systems where the memory is not truely 
unified-shared memory but only specially tagged host memory is device 
accessible. (e.g. only memory allocated via cuMemAllocManaged) — And, 
quite similar, for -foffload-memory=pinned.


Er, no.

The default do-nothing USM uses slow uncachable PCI memory accesses 
(on devices that don't have truly shared memory, like APUs).


I have no idea what a "default do nothing USM" is – and using the PCI-E 
to transfer the data is the only option unless there is either a common 
memory controller or some other interconnect Infinity Fabric interconnect).


However, your description sounds as if you talk about pinned memory – 
which by construction cannot migrate – and not about managed memory, 
which is one of the main approaches for USM – especially as that's how 
HMM works and as it avoids to transfer any memory access.


If you use a Linux kernel with HMM and have support for it, the default 
is that upon device access, the page migrates to the GPU (using, e.g. 
PCI-E) and then stays there until the host accesses that memory page 
again, triggering a page fault and transfer back. That's the whole idea 
of HMM and works similar to the migrate to disk feature (aka swapping), 
cf. https://docs.kernel.org/mm/hmm.html


That's the very same behavior as with hipMallocManaged with XNACK 
enabled according to 
https://rocm.docs.amd.com/en/develop/conceptual/gpu-memory.html


As PowerPC + Volta (+ normal kernel) does not support USM but a system 
with + Nvlink does, I bet that on such a system, the memory stays on the 
host and Nvlink does the remote access, but I don't know how Nvlink 
handles caching. (The feature flags state that direct host-memory access 
from the device is possible.)


By contrast, for my laptop GPU (Nvidia RTX A1000) with open kernel 
drivers + CUDA drivers, I bet the memory migration will happen – 
especially as the feature flags direct host-memory access is not possible.


* * *

If host and device access data on the same memory page, page migration 
forth and back will happen continuously, which is very slow.


Also slow is if data is spread over many pages as one gets keeps getting 
page faults until the data is finally completely migrated. The solution 
in that case is a large page such that the data is transferred in 
one/few large chunks.


In general using manual allocation (x = omp_alloc(...)) with a suitable 
allocator can manually avoid the problem by using pinning or large pages 
or … Without knowing the algorithm it is hard to have a generic solution.


If there such a concurrent access issue occurs for compiler generated 
code or with the run-time library, we should definitely try to fix it; 
for user code, it is probably hopeless in the generic case.


* * *

I actually tried to find an OpenMP target-offload benchmark, possibly 
for USM, but I failed. Most seem to be either not available or seriously 
broken – when testing starts by fixing OpenMP syntax bugs, it does not 
increase the trust in the testcase. — Can you suggest a testcase?


* * *

The CUDA Managed Memory and AMD Coarse Grained memory implementation 
uses proper page migration and permits full-speed memory access on the 
device (just don't thrash the pages too fast).


As written, in my understanding that is what happens with HMM kernel 
support for any memory that is not explicitly pinned. The only extra 
trick an implementation can play is pinning the page – such that it 
knows that the memory host does not change (e.g. won't migrates to the 
other NUMA memory of the CPU or to swap space) such that the memory can 
be directly accessed.


I am pretty sure that's the reason, e.g., CUDA pinned memory is faster – 
and it might also help with HMM migration if the destination is known 
not to change; no idea whether the managed memory routines play such 
tricks or not.


Another optimization opportunity exists if it is known that the memory 
won't be accessed by host until the kernel ends, but I don't see this 
guaranteed in general in user code.


* * *

On AMD MI200, your check broken my USM testcases (because the code 
they were testing isn't active).  This is a serious performance problem.


"I need more data." — First, a valid USM testcase should not be broken 
in the mainline. Secondly, I don't see how a generic testcase can have a 
performance issue when USM works. And, I didn't see a tes

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-06-03 Thread Tobias Burnus


Andrew Stubbs wrote:

+    /* If USM has been requested and is supported by all devices
+   of this type, set the capability accordingly.  */
+    if (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY)
+  current_device.capabilities |= GOMP_OFFLOAD_CAP_SHARED_MEM;
+


This breaks my USM patches that add the omp_alloc support (because it 
now short-circuits all of those code-paths),


which I believe is fine. Your USM patches are for pseudo-USM, i.e. a 
(useful) bandaid for systems where the memory is not truely 
unified-shared memory but only specially tagged host memory is device 
accessible. (e.g. only memory allocated via cuMemAllocManaged) — And, 
quite similar, for -foffload-memory=pinned.


I think if a user wants to have pseudo USM – and does so by passing 
-foffload-memory=unified – we can add another flag to the internal 
omp_requires_mask. - By passing this option, a user should then also be 
aware of all the unavoidable special-case issues of pseudo-USM and 
cannot complain if they run into those.


If not, well, then the user either gets true USM (if supported) - or 
host fallback. Either of it is perfectly fine.


With -foffload-memory=unified, the compiler can then add all the 
omp_alloc calls – and, e.g., set a new GOMP_REQUIRES_OFFLOAD_MANAGED 
flag. If that's set, we wouldn't do the line above quoted capability 
setting in libgomp/target.c.


For nvidia, GOMP_REQUIRES_OFFLOAD_MANAGED probably requires 
CU_DEVICE_ATTRIBUTE_CONCURRENT_MANAGED_ACCESS, i.e. when 0 then we 
probably want to return -1 also for -foffload-memory=unified. - A quick 
check shows that Tesla K20 (Kepler, sm_35) has 0 while Volta, Ada, 
Ampere (sm_70, sm_82, sm_89) have 1. (I recall using managed memory on 
an old system; page migration to the device worked fine, but a on-host 
accesses while the kernel was still running, crashed the program.|)

|

For amdgcn, my impression is that we don't need to handle 
-foffload-memory=unified as only the MI200 series (+ APUs) supports this 
well, but MI200 also supports true USM (with page migration; for APU it 
makes even less sense). - But, of course, we still may. — Auto-setting 
HSA_XNACK could be still be done MI200, but I wonder how to distinguish 
MI300X vs. MI300A, but it probably doesn't harm (nor help) to set 
HSA_XNACK for APUs …



and it's just not true for devices where all host memory isn't 
magically addressable on the device.

Is there another way to detect truly shared memory?


Do you have any indication that the current checks become true when the 
memory is not accessible?


Tobias

[committed] install.texi (gcn): Fix date of recommended newlib version

2024-06-03 Thread Tobias Burnus


Somehow, I was one year ahead. The commit wasn't 2025-03-25 but in 2024.

Committed as obvious, also to avoid future confusions.

Tobias
commit 16fb3abf0fb4b88ee0e27732db217909fa429a81
Author: Tobias Burnus 
Date:   Mon Jun 3 12:56:39 2024 +0200

install.texi (gcn): Fix date of recommended newlib version

gcc/ChangeLog:

* doc/install.texi (gcn): Fix date of recommended newlib version.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 42b462a2ce2..c781646ac1f 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3950,7 +3950,7 @@ by specifying a @code{--with-multilib-list=} that does not list @code{gfx1100}
 and @code{gfx1103}.
 
 Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commit
-7dd4eb1db (2025-03-25, post-4.4.0) fixes device console output for GFX10 and
+7dd4eb1db (2024-03-25, post-4.4.0) fixes device console output for GFX10 and
 GFX11 devices).
 
 To run the binaries, install the HSA Runtime from the

[gcc r15-990] install.texi (gcn): Fix date of recommended newlib version

2024-06-03 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:16fb3abf0fb4b88ee0e27732db217909fa429a81

commit r15-990-g16fb3abf0fb4b88ee0e27732db217909fa429a81
Author: Tobias Burnus 
Date:   Mon Jun 3 12:56:39 2024 +0200

install.texi (gcn): Fix date of recommended newlib version

gcc/ChangeLog:

* doc/install.texi (gcn): Fix date of recommended newlib version.

Diff:
---
 gcc/doc/install.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 42b462a2ce2..c781646ac1f 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -3950,7 +3950,7 @@ by specifying a @code{--with-multilib-list=} that does 
not list @code{gfx1100}
 and @code{gfx1103}.
 
 Use Newlib (4.3.0 or newer; 4.4.0 contains some improvements and git commit
-7dd4eb1db (2025-03-25, post-4.4.0) fixes device console output for GFX10 and
+7dd4eb1db (2024-03-25, post-4.4.0) fixes device console output for GFX10 and
 GFX11 devices).
 
 To run the binaries, install the HSA Runtime from the

Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30

2024-06-03 Thread Tobias Burnus


Richard Biener wrote:

install.texi also has the issue that it's not pre-packaged in a
easy to discover and readable file in the release tarballs and that
the online version is only for trunk.


I always wondered why it is not included at 
https://gcc.gnu.org/onlinedocs/ — it would then also be linked from, 
e.g., https://gcc.gnu.org/gcc-14/index.html


Tobias

Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30

2024-06-03 Thread Tobias Burnus


Richard Biener wrote:

On Mon, 3 Jun 2024, Tobias Burnus wrote:

Thomas Schwinge wrote:

In the following, I have then reconsidered that stance; we may actually
"Implement global constructor, destructor support in a conceptually
simpler way than using 'collect2' (the program): implement the respective
functionality in the nvptx-tools 'ld'".  The latter is
<https://github.com/SourceryTools/nvptx-tools/commit/96f8fc59a757767b9e98157d95c21e9fef22a93b>
"ld: Global constructor/destructor support".

The attached patch makes clearer which version should be
installed by recommending this patch (= latest nvptx-tools)
in install.texi.

Can we simply say "newerst" where I guess refering to a github repo
already implies this?


Good question. The problem I see with just referring to a repository 
(even with newest) often means: yes, that software I have (whatever 
version). While if some reference goes to a 2024 version, I might not 
know what version I have but likely an older version → I will update.


Admittedly, as people tend to *not* read the documentation, this 
approach might fail as well. But, maybe, it is sufficient to update GCC 
15's release notes?*


It won't help those not reading with the release notes before building 
and the wording* had to be changed a bit as install.texi no longer 
states what version should be used, but it would be an alternative


(*) https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653417.html

Tobias

[wwwdocs] gcc-15/changes.html (nvptx): Constructors are now supported

2024-06-03 Thread Tobias Burnus


Comments or fine as is?

Tobias
gcc-15/changes.html (nvptx): Constructors are now supported

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index b59fd3be..b3305079 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -85,7 +103,14 @@ a work-in-progress.
 
 
 
-
+NVPTX
+
+
+  GCC's nvptx target now supports constructors and destructors;
+  for this, a recent version of nvptx-tools is https://gcc.gnu.org/install/specific.html#nvptx-x-none;
+  >required.
+

[nvptx] ping - [patch] [gcn][nvptx] Add warning to mkoffload for 32bit host code

2024-06-03 Thread Tobias Burnus


Hi Thomas, hi Tom,

any comment regarding this patch?
 https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650007.html

Tobias

Am 25.04.24 um 12:51 schrieb Tobias Burnus:

Motivated by a surprise of a colleague that with -m32,
no offload dumps were created; that's because mkoffload
does not process host binaries when the are 32bit (i.e. ilp32).

Internally, that done as follows: The host compiler passes to
'mkoffload' the used host ABI, i.e. -foffload-abi=ilp32 or -foffload-abi=lp64

That's done via TARGET_OFFLOAD_OPTIONS, which is supported by aarch64, i386, 
and rs6000.

While it is sensible (albeit not strictly required) that GCC requires that
the host and device side agree and that only 64bit is implemented for the
device side, it can be confusing that silently no offloading code is generated.


Hence, I propose to print a warning in that case - as implemented in the 
attached patch:

$ gcc -fopenmp -m32 test.c
nvptx mkoffload: warning: offload code generation skipped: offloading with 
32-bit host code is currently not supported
gcn mkoffload: warning: offload code generation skipped: offloading with 32-bit 
host code is currently not supported

* * *

This shouldn't have any effect on offload builds using -m64
and non-offload builds – while several testcases already have
issues with '-m32' when offloading is enabled or an offloading
device is available.

To make it not worse, this patch adds some pruning and for
a subset of the failing testcases, I added code to avoids FAILS.
There are some more fails, but those aren't new.

Comments, remarks, suggestions?
Is the mkoffload.cc part is okay?

Tobias

[patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30 (was: Re: nvptx target: Global constructor, destructor support, via nvptx-tools 'ld')

2024-06-03 Thread Tobias Burnus


Thomas Schwinge wrote:

In the following, I have then reconsidered that stance; we may actually
"Implement global constructor, destructor support in a conceptually
simpler way than using 'collect2' (the program): implement the respective
functionality in the nvptx-tools 'ld'".  The latter is

"ld: Global constructor/destructor support".


The attached patch makes clearer which version should be
installed by recommending this patch (= latest nvptx-tools)
in install.texi.

OK? Comments, remarks?

Tobias

PS: If the https://github.com/SourceryTools/nvptx-tools/pull/47
(nvptx-ld.cc: Improve C++11 compatibility with older compilers)
proofs worthwhile and gets merged, we should point to that commit
instead.install.texi (nvptx): Recommend nvptx-tools 2024-05-30

gcc/
	* doc/install.texi (nvptx): Recommend nvptx-tools 2024-05-30 or newer.

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 42b462a2ce2..4859f6743ab 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -4698,7 +4698,8 @@ Andes NDS32 target in big endian mode.
 Nvidia PTX target.
 
 Instead of GNU binutils, you will need to install
-@uref{https://github.com/SourceryTools/nvptx-tools,,nvptx-tools}.
+@uref{https://github.com/SourceryTools/nvptx-tools,,nvptx-tools}
+(recommended: 96f8fc5 of 2024-05-30 -- or newer).
 Tell GCC where to find it:
 @option{--with-build-time-tools=[install-nvptx-tools]/nvptx-none/bin}.

Re: [PATCH v2 01/12] OpenMP: metadirective tree data structures and front-end interfaces

2024-05-31 Thread Tobias Burnus


Hi Sandra,

some observations/comments, but in general it looks good.

Sandra Loosemore wrote:

This patch adds the OMP_METADIRECTIVE tree node and shared tree-level
support for manipulating metadirectives.  It defines/exposes
interfaces that will be used in subsequent patches that add front-end
and middle-end support, but nothing generates these nodes yet.

This patch also adds compile-time support for dynamic context
selectors (the target_device selector set and the condition selector
of the user selector set) for metadirectives only.  The "declare
variant" directive still supports only static selectors.

...

  /* Return 1 if context selector matches the current OpenMP context, 0
 if it does not and -1 if it is unknown and need to be determined later.
 Some properties can be checked right away during parsing (this routine),
 others need to wait until the whole TU is parsed, others need to wait until
-   IPA, others until vectorization.  */
+   IPA, others until vectorization.
+
+   METADIRECTIVE_P is true if this is a metadirective context, and DELAY_P
+   is true if it's too early in compilation to determine whether some
+   properties match.
+
+   Dynamic properties (which are evaluated at run-time) should always
+   return 1.  */

I have to admit that I don't really see the use of metadirective_p as …

  int
-omp_context_selector_matches (tree ctx)
+omp_context_selector_matches (tree ctx, bool metadirective_p, bool delay_p)

...

+   if (metadirective_p && delay_p)
+ return -1;


I do see why the resolution of KIND/ARCH/ISA should be delayed – for 
both variant/metadirective as long as the code is run by the host and 
the device. Except that we could exclude, e.g., 'kind(FPGA)' early on as 
we don't support it at all.


But once the device code is split off, I don't see why we can't expand 
the DEVICE clause right away for both variant and metadirective – while 
for 'target_device', we cannot do much until runtime – except of 
excluding things like 'kind(fpga)' – or excluding all 'arch' known not 
to be supported neither by the host nor by any enabled offload devices.


Thus, I see why there is a 'delay_p', but not why there is a 
'metadirective_p'.


But I might have missed something important ...


 case OMP_TRAIT_USER_CONDITION:
   if (set == OMP_TRAIT_SET_USER)
 for (tree p = OMP_TS_PROPERTIES (ts); p; p = TREE_CHAIN (p))
   if (OMP_TP_NAME (p) == NULL_TREE)
 {
+ /* OpenMP 5.1 allows non-constant conditions for
+metadirectives.  */
+ if (metadirective_p
+ && !tree_fits_shwi_p (OMP_TP_VALUE (p)))
+   break;
   if (integer_zerop (OMP_TP_VALUE (p)))
 return 0;
   if (integer_nonzerop (OMP_TP_VALUE (p)))
 break;
   ret = -1;
 }


(BTW: I am happy to be enlightened as I likely have miss some fine print.)

Regarding the comment: True, but shouldn't this be handled before by 
issuing an error when such a clause is used in 'declare variant', i.e. 
only occur when metadirective_p is/can be true?


Besides, I have to admit that I do not understand the new code. The 
current code has: constant zero → whole selector known to be false 
("return 0"); nonzero constant → keep current state, i.e. either 'true' 
(1) or don't known ('-1') and continue; otherwise (not const) → set to 
"don't know" (-1) and continue with the next item.


That seems to make also sense for metadirectives. But your patch changes 
this to keep current state if a variable. In that case, '1' is used if 
this is the only item or the previous condition is true. Or "-1" when 
the previous item is "don't know" (-1). - I think that doesn't make 
sense and it should always return -1 for a run time value.


Additionally, I wonder why you use tree_fits_shwi_p instead of a simple 
'TREE_CODE (OMP_TP_VALUE (p)) != INTEGER_CST'. It does not seem to 
matter here, but '(uint128_t)-1' looks like a valid condition and valid 
constant, which integer_nonzerop should handled but if the hwi is 128bit 
wide, it won't fit into a signed variable.


(As integer_nonzerop and the current code both do "break;" it won't 
change the result of the current code.)


* * *

+static tree
+omp_dynamic_cond (tree ctx)
+{

...

+  /* The user condition is not dynamic if it is constant.  */
+  if (!tree_fits_shwi_p (TREE_VALUE (expr_list)))


Any reason for using tree_fits_shwi_p instead of INTEGER_CST? Here, 
(uint128_t)-1 could make a difference …



+   /* omp_initial_device is -1, omp_invalid_device is -4; choose
+  a value that isn't otherwise defined to indicate the default
+  device.  */
+   device_num = build_int_cst (integer_type_node, -2);


Don't do this - we do it differently

[gcc r15-924] libgomp.texi: Impl. update for USM and missing 5.2 item

2024-05-30 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:370df6ef0fe6d99613050d33a18cc008be7ceca4

commit r15-924-g370df6ef0fe6d99613050d33a18cc008be7ceca4
Author: Tobias Burnus 
Date:   Thu May 30 13:21:43 2024 +0200

libgomp.texi: Impl. update for USM and missing 5.2 item

libgomp/ChangeLog:

* libgomp.texi (OpenMP 5.0 status): Mark 'requires' as done and
link to 'Offload-Target Specifics'.
(OpenMP 5.2 status): Add item about additional map-type modifiers
in 'declare mapper'.

Diff:
---
 libgomp/libgomp.texi | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index e79bd7a3392..d612488ad10 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -198,8 +198,8 @@ The OpenMP 4.5 specification is fully supported.
 @item @var{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
   env variable @tab Y @tab
 @item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
-@item @code{requires} directive @tab P
-  @tab complete but no non-host device provides 
@code{unified_shared_memory}
+@item @code{requires} directive @tab Y
+  @tab See also @ref{Offload-Target Specifics}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
 @item Non-rectangular loop nests @tab P
   @tab Full support for C/C++, partial for Fortran
@@ -443,6 +443,8 @@ to address of matching mapped list item per 5.1, Sect. 
2.21.7.2 @tab N @tab
   of the @code{interop} construct @tab N @tab
 @item Invoke virtual member functions of C++ objects created on the host device
   on other devices @tab N @tab
+@item @code{iterator} and @code{mapper} as map-type modifier in @code{declare 
mappter}
+  @tab N @tab
 @end multitable

[patch] libgomp.texi: Impl. update for USM and missing 5.2 item

2024-05-29 Thread Tobias Burnus

Now that unified-shared memory works (with some devices), mark it as 'Y' 
and link to the device-specific chapter. While there is always room for 
improvement (like having opt-in partial support for managed-memory 
semi-USM devices), it works sufficienty for a 'Y'.


Additionally, I saw that 5.2 now extended what is permitted inside 
'declare mapper'. Instead of listening the permitted clauses as in 5.1, 
it now refers to the 'map' clause such that 'delete'/'release', 
'present' and in particular 'iterator' and 'mapper' itself are permitted 
inside a declare-mapper 'map' clause. - Thus, I added it as to-do item 
to the 5.2 status.


Comments?

Tobias

PS: As this is also about USM, the declare-target USM issue I mentioned 
in several patch emails is now filed as https://gcc.gnu.org/PR115279libgomp.texi: Impl. update for USM and missing 5.2 item

libgomp/ChangeLog:

	* libgomp.texi (OpenMP 5.0 status): Mark 'requires' as done and
	link to 'Offload-Target Specifics'.
	(OpenMP 5.2 status): Add item about additional map-type modifiers
	in 'declare mapper'.

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index e79bd7a3392..03e6455219d 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -198,8 +198,8 @@ The OpenMP 4.5 specification is fully supported.
 @item @var{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
   env variable @tab Y @tab
 @item Nested-parallel changes to @var{max-active-levels-var} ICV @tab Y @tab
-@item @code{requires} directive @tab P
-  @tab complete but no non-host device provides @code{unified_shared_memory}
+@item @code{requires} directive @tab Y
+  @tab See @ref{Offload-Target Specifics}
 @item @code{teams} construct outside an enclosing target region @tab Y @tab
 @item Non-rectangular loop nests @tab P
   @tab Full support for C/C++, partial for Fortran
@@ -443,6 +443,8 @@ to address of matching mapped list item per 5.1, Sect. 2.21.7.2 @tab N @tab
   of the @code{interop} construct @tab N @tab
 @item Invoke virtual member functions of C++ objects created on the host device
   on other devices @tab N @tab
+@item @code{iterator} and @code{mapper} as map-type modifier in @code{declare mappter}
+  @tab N @tab
 @end multitable

[gcc r15-899] libgomp: Enable USM for AMD APUs and MI200 devices

2024-05-29 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:18f477980c8597fe3dca2c2e8bd533c0c2b17aa6

commit r15-899-g18f477980c8597fe3dca2c2e8bd533c0c2b17aa6
Author: Tobias Burnus 
Date:   Wed May 29 15:29:06 2024 +0200

libgomp: Enable USM for AMD APUs and MI200 devices

If HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true,
all GPUs on the system support unified shared memory. That's
the case for APUs and MI200 devices when XNACK is enabled.

XNACK can be enabled by setting HSA_XNACK=1 as env var for
supported devices; otherwise, if disable, USM code will
use host fallback.

gcc/ChangeLog:

* config/gcn/gcn-hsa.h (gcn_local_sym_hash): Fix typo.

include/ChangeLog:

* hsa.h (HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): Add
enum value.

libgomp/ChangeLog:

* libgomp.texi (gcn): Update USM handling
* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Handle
USM if HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true.

Diff:
---
 gcc/config/gcn/gcn-hsa.h|  2 +-
 include/hsa.h   |  4 +++-
 libgomp/libgomp.texi|  9 +++--
 libgomp/plugin/plugin-gcn.c | 17 +
 4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 4611bc55392..03220555075 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -80,7 +80,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
writes a new AMD GPU object file and the ABI version needs to be the
same. - LLVM <= 17 defaults to 4 while LLVM >= 18 defaults to 5.
GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5.
-   Note that Fiji is only suppored with LLVM <= 17 as version 3 is no longer
+   Note that Fiji is only supported with LLVM <= 17 as version 3 is no longer
supported in LLVM >= 18.  */
 #define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \
 "!march=*|march=*:--amdhsa-code-object-version=4"
diff --git a/include/hsa.h b/include/hsa.h
index f9b5d9daf85..3c7be95d7fd 100644
--- a/include/hsa.h
+++ b/include/hsa.h
@@ -466,7 +466,9 @@ typedef enum {
   /**
   * String containing the ROCr build identifier.
   */
-  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200
+  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200,
+
+  HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = 0x202
 } hsa_system_info_t;
 
 /**
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 22868635230..e79bd7a3392 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6360,8 +6360,13 @@ The implementation remark:
   such that the next reverse offload region is only executed after the 
previous
   one returned.
 @item OpenMP code that has a @code{requires} directive with
-  @code{unified_shared_memory} will remove any GCN device from the list of
-  available devices (``host fallback'').
+  @code{unified_shared_memory} is only supported if all AMD GPUs have the
+  @code{HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT} property; for
+  discrete GPUs, this may require setting the @code{HSA_XNACK} environment
+  variable to @samp{1}; for systems with both an APU and a discrete GPU 
that
+  does not support XNACK, consider using @code{ROCR_VISIBLE_DEVICES} to
+  enable only the APU.  If not supported, all AMD GPU devices are removed
+  from the list of available devices (``host fallback'').
 @item The available stack size can be changed using the @code{GCN_STACK_SIZE}
   environment variable; the default is 32 kiB per thread.
 @item Low-latency memory (@code{omp_low_lat_mem_space}) is supported when the
diff --git a/libgomp/plugin/plugin-gcn.c b/libgomp/plugin/plugin-gcn.c
index 3cdc7ba929f..3d882b5ab63 100644
--- a/libgomp/plugin/plugin-gcn.c
+++ b/libgomp/plugin/plugin-gcn.c
@@ -3355,8 +3355,25 @@ GOMP_OFFLOAD_get_num_devices (unsigned int 
omp_requires_mask)
   if (hsa_context.agent_count > 0
   && ((omp_requires_mask
   & ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+  | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
   | GOMP_REQUIRES_REVERSE_OFFLOAD)) != 0))
 return -1;
+  /* Check whether host page access is supported; this is per system level
+ (all GPUs supported by HSA).  While intrinsically true for APUs, it
+ requires XNACK support for discrete GPUs.  */
+  if (hsa_context.agent_count > 0
+  && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
+{
+  bool b;
+  hsa_system_info_t type = HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT;
+  hsa_status_t status = hsa_fns.hsa_system_get_info_fn (type, );
+  if (status != HSA_STATUS_SUCCESS)
+   GOMP_PLUGIN_error ("HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT "
+  "failed");
+  if (!b)
+   return -1;
+}
+
   return hsa_context.agent_count;
 }

[gcc r15-898] libgomp: Enable USM for some nvptx devices

2024-05-29 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:4ccb3366ade6ec9493f8ca20ab73b0da4b9816db

commit r15-898-g4ccb3366ade6ec9493f8ca20ab73b0da4b9816db
Author: Tobias Burnus 
Date:   Wed May 29 15:14:38 2024 +0200

libgomp: Enable USM for some nvptx devices

A few high-end nvptx devices support the attribute
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS; for those, unified shared
memory is supported in hardware. This patch enables support for those -
if all installed nvptx devices have this feature (as the capabilities
are per device type).

This exposes a bug in gomp_copy_back_icvs as it did before use
omp_get_mapped_ptr to find mapped variables, but that returns
the unchanged pointer in cased of shared memory. But in this case,
we have a few actually mapped pointers - like the ICV variables.
Additionally, there was a mismatch with regards to '-1' for the
device number as gomp_copy_back_icvs and omp_get_mapped_ptr count
differently. Hence, do the lookup manually.

include/ChangeLog:

* cuda/cuda.h (CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS): Add.

libgomp/ChangeLog:

* libgomp.texi (nvptx): Update USM description.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
Claim support when requesting USM and all devices support
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS.
* target.c (gomp_copy_back_icvs): Fix device ptr lookup.
(gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the
devices supports USM.

Diff:
---
 include/cuda/cuda.h   |  3 ++-
 libgomp/libgomp.texi  |  7 +--
 libgomp/plugin/plugin-nvptx.c | 15 +++
 libgomp/target.c  | 24 +++-
 4 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 0dca4b3a5c0..804d08ca57e 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -83,7 +83,8 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
   CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41,
-  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
+  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82,
+  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS = 88
 } CUdevice_attribute;
 
 enum {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 71d62105a20..22868635230 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6435,8 +6435,11 @@ The implementation remark:
   the next reverse offload region is only executed after the previous
   one returned.
 @item OpenMP code that has a @code{requires} directive with
-  @code{unified_shared_memory} will remove any nvptx device from the
-  list of available devices (``host fallback'').
+  @code{unified_shared_memory} runs on nvptx devices if and only if
+  all of those support the @code{pageableMemoryAccess} property;@footnote{
+  
@uref{https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements}}
+  otherwise, all nvptx device are removed from the list of available
+  devices (``host fallback'').
 @item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
   in the GCC manual.
 @item The OpenMP routines @code{omp_target_memcpy_rect} and
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5aad3448a8d..4cedc5390a3 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1201,8 +1201,23 @@ GOMP_OFFLOAD_get_num_devices (unsigned int 
omp_requires_mask)
   if (num_devices > 0
   && ((omp_requires_mask
   & ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+  | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
   | GOMP_REQUIRES_REVERSE_OFFLOAD)) != 0))
 return -1;
+  /* Check whether host page access (direct or via migration) is supported;
+ if so, enable USM.  Currently, capabilities is per device type, hence,
+ check all devices.  */
+  if (num_devices > 0
+  && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
+for (int dev = 0; dev < num_devices; dev++)
+  {
+   int pi;
+   CUresult r;
+   r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, ,
+  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS, dev);
+   if (r != CUDA_SUCCESS || pi == 0)
+ return -1;
+  }
   return num_devices;
 }
 
diff --git a/libgomp/target.c b/libgomp/target.c
index 5ec19ae489e..48689920d4a 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -2969,8 +2969,25 @@ gomp_copy_back_icvs (struct gomp_device_descr *devicep, 
int device)
   if (item == NULL)
 return;
 
+  gomp_mutex_lock (>lock);
+
+  struct splay_tree_s *mem_map = >mem_map;
+  struct splay_tree_key_s cur_node;
+  void *dev_ptr = NULL;
+
   void *host_ptr = >icvs;
-  void *dev_ptr = omp_get_mapped_ptr

[patch] libgomp: Enable USM for AMD APUs and MI200 devices

2024-05-29 Thread Tobias Burnus


This patch depends (on the libgomp/target.c parts) of the patch
"[patch] libgomp: Enable USM for some nvptx devices",
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652987.html

AMD GPUs that are either APU devices or MI200 [or MI300X]
(with HSA_XNACK=1 set) can access host memory; the run-time library
returns in that case HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = true.

Thus, it makes sense to enable USM support for those devices, which
this patch does. — A simple test with all unified_shared_memory tests
shipping with sollve_vv now works:*

  Test passed on the device.

as tested on an MI200 series device. In line with (some) other compilers,
it requires that HSA_XNACK=1 is set, otherwise the code will be executed
on the host.

(* Well, for C++, -O2 -fno-exception was used but stillonly 5 test case PASS, 1 delete[] etc. link error 1 ICE (segfault during 
IPA pass: cpin gcn gcc) 1 runtime fail for 
tests/5.2/unified_shared_mem/test_target_struct_obj_access.cpp [**] but 
all 15 Fortran and 16 C tests PASS.)


Comments, remarks, suggestions?
Any reason not to commit it to mainline?

Tobias

PS: Richard confirmed that his gfx1036 APU also has
HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT == true; at least when
he disables the discrete gfx1030, which neither supports xnack not
is an APU.

** rocgdb shows:

Thread 4 "a.out" received signal SIGSEGV, Segmentation fault.
[Switching to thread 4, lane 0 (AMDGPU Lane 1:1:1:1/0 (0,0,0)[0,0,0])]
0x77309c30 in main._omp_fn () at 
tests/5.2/unified_shared_mem/test_target_struct_obj_access.cpp:88
88if (Emp.name[i] != RefStr[i]) {

but I have not tried to debug this.
libgomp: Enable USM for AMD APUs and MI200 devices

If HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true,
all GPUs on the system support unified shared memory. That's
the case for APUs and MI200 devices when XNACK is enabled.

XNACK can be enabled by setting HSA_XNACK=1 as env var for
supported devices; otherwise, if disable, USM code will
use host fallback.

gcc/ChangeLog:

	* config/gcn/gcn-hsa.h (gcn_local_sym_hash): Fix typo.

include/ChangeLog:

	* hsa.h (HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT): Add
	enum value.

libgomp/ChangeLog:

	* libgomp.texi (gcn): Update USM handling
	* plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Handle
	USM if HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT is true.

 gcc/config/gcn/gcn-hsa.h|  2 +-
 include/hsa.h   |  4 +++-
 libgomp/libgomp.texi|  9 +++--
 libgomp/plugin/plugin-gcn.c | 18 ++
 4 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/gcc/config/gcn/gcn-hsa.h b/gcc/config/gcn/gcn-hsa.h
index 4611bc55392..03220555075 100644
--- a/gcc/config/gcn/gcn-hsa.h
+++ b/gcc/config/gcn/gcn-hsa.h
@@ -80,7 +80,7 @@ extern unsigned int gcn_local_sym_hash (const char *name);
writes a new AMD GPU object file and the ABI version needs to be the
same. - LLVM <= 17 defaults to 4 while LLVM >= 18 defaults to 5.
GCC supports LLVM >= 13.0.1 and only LLVM >= 14 supports version 5.
-   Note that Fiji is only suppored with LLVM <= 17 as version 3 is no longer
+   Note that Fiji is only supported with LLVM <= 17 as version 3 is no longer
supported in LLVM >= 18.  */
 #define ABI_VERSION_SPEC "march=fiji:--amdhsa-code-object-version=3;" \
 			 "!march=*|march=*:--amdhsa-code-object-version=4"
diff --git a/include/hsa.h b/include/hsa.h
index f9b5d9daf85..3c7be95d7fd 100644
--- a/include/hsa.h
+++ b/include/hsa.h
@@ -466,7 +466,9 @@ typedef enum {
   /**
   * String containing the ROCr build identifier.
   */
-  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200
+  HSA_AMD_SYSTEM_INFO_BUILD_VERSION = 0x200,
+
+  HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT = 0x202
 } hsa_system_info_t;
 
 /**
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 22868635230..e79bd7a3392 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6360,8 +6360,13 @@ The implementation remark:
   such that the next reverse offload region is only executed after the previous
   one returned.
 @item OpenMP code that has a @code{requires} directive with
-  @code{unified_shared_memory} will remove any GCN device from the list of
-  available devices (``host fallback'').
+  @code{unified_shared_memory} is only supported if all AMD GPUs have the
+  @code{HSA_AMD_SYSTEM_INFO_SVM_ACCESSIBLE_BY_DEFAULT} property; for
+  discrete GPUs, this may require setting the @code{HSA_XNACK} environment
+  variable to @samp{1}; for systems with both an APU and a discrete GPU that
+  does not support XNACK, consider using @code{ROCR_VISIBLE_DEVICES} to
+  enable only the APU.  If not supported, all AMD GPU devices are removed
+  from the list of available devices (``host fallback'').
 @item The available stack size can be changed using the @code{GCN_STACK_SIZE}
   environment variable; the default is 32 kiB per thread.
 @item Low-latency memory

Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-29 Thread Tobias Burnus


Jakub Jelinek wrote:

I mean, if we want to add something, maybe better would an -include like
option that instead of including a file includes it directly.
gcc --include-inline '#pragma omp requires unified_shared_memory' ...


Likewise for Fortran, but there the question is whether it should be in 
the use-stmt, import-stmt, implicit-part or declaration-part; I guess 
having one --include-inline-use-stmt and --include-inline-declaration 
would make sense …


And, I guess, multiple flags should be permitted, which can then be 
processed as separate lines.


Tobias

Re: [patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-29 Thread Tobias Burnus


Jakub Jelinek wrote:

How is that option different from
echo '#pragma omp requires unified_shared_memory' > omp-usm.h
gcc -include omp-usm.h
?
I mean with -include you can add anything you want, not just one particular
directive, and adding a separate option for each is just weird.


For C/C++, -include seems to be indeed sufficient (albeit not widely 
known). For Fortran, there at two issues: One placement/semantic issue: 
it has to be added per "compilation unit", i.e. to the specification 
part of a module, subprogram or main program. And a practical issue, 
gfortran shows:


error: command-line option '-include !$omp requires' is valid for 
C/C++/ObjC/ObjC++ but not for Fortran


Thus, for Fortran it is still intrinsically useful – even if one can 
argue whether that feature is needed at all / whether it should be added 
as command-line argument.


Tobias

Re: [patch] libgomp: Enable USM for some nvptx devices

2024-05-29 Thread Tobias Burnus


Tobias Burnus wrote:
While most of the nvptx systems I have access to don't have the 
support for 
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES, one 
has:


Actually, CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS is sufficient. And 
I finally also found the proper webpage for this feature; I couldn't 
find it as Nvidia's documentation uses pageableMemoryAccess and not 
CU_... for that feature. The updated patch is attached.


For details: 
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements


In principle, this proper USM is supported by Grace Hopper, PowerPC9 + 
Volta (sm_70) – but for some reasons, our PPC/Volta system does not 
support it. It is also said to work with Turing (sm_75) and newer when 
using Linux Kernel's HMM and the Open Kernel Modules (newer CUDA have 
this but don't use them by default). See link above.


I am not quite sure whether there are unintended side effects, hence, 
I have not enabled support for it in general. In particular, 'declare 
target enter(global_var)' seems to be mishandled (I think it should be 
link + pointer updated to point to the host; cf. description for 
'self_maps'). Thus, it is not enabled by default but only when USM has 
been requested.

OK for mainline?
Comments? Remarks? Suggestions?

Tobias

PS: I guess some more USM tests should be added…
libgomp: Enable USM for some nvptx devices

A few high-end nvptx devices support the attribute
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS; for those, unified shared
memory is supported in hardware. This patch enables support for those -
if all installed nvptx devices have this feature (as the capabilities
are per device type).

This exposes a bug in gomp_copy_back_icvs as it did before use
omp_get_mapped_ptr to find mapped variables, but that returns
the unchanged pointer in cased of shared memory. But in this case,
we have a few actually mapped pointers - like the ICV variables.
Additionally, there was a mismatch with regards to '-1' for the
device number as gomp_copy_back_icvs and omp_get_mapped_ptr count
differently. Hence, do the lookup manually.

include/ChangeLog:

	* cuda/cuda.h (CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS): Add.

libgomp/ChangeLog:

	* libgomp.texi (nvptx): Update USM description.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
	Claim support when requesting USM and all devices support 
	CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS.
	* target.c (gomp_copy_back_icvs): Fix device ptr lookup.
	(gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the
	devices supports USM.

 include/cuda/cuda.h   |  3 ++-
 libgomp/libgomp.texi  |  7 +--
 libgomp/plugin/plugin-nvptx.c | 16 
 libgomp/target.c  | 24 +++-
 4 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 0dca4b3a5c0..804d08ca57e 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -83,7 +83,8 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
   CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41,
-  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
+  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82,
+  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS = 88
 } CUdevice_attribute;
 
 enum {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 71d62105a20..ba534b6b3c4 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6435,8 +6435,11 @@ The implementation remark:
   the next reverse offload region is only executed after the previous
   one returned.
 @item OpenMP code that has a @code{requires} directive with
-  @code{unified_shared_memory} will remove any nvptx device from the
-  list of available devices (``host fallback'').
+  @code{unified_shared_memory} will run on nvptx devices if and only if
+  all of those support the @code{pageableMemoryAccess} property;@footnote{
+  @uref{https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements}}
+  otherwise, all nvptx device are removed from the list of available
+  devices (``host fallback'').
 @item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
   in the GCC manual.
 @item The OpenMP routines @code{omp_target_memcpy_rect} and
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5aad3448a8d..d3764185d4b 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1201,8 +1201,24 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   if (num_devices > 0
   && ((omp_requires_mask
 	   & ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+	   | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
 	   | GOMP_REQUIRES_REVERSE_OFFLOAD)) != 0))
 return -1;
+  /* Check whether host page access (direct or via migration) is supported;
+ if so, enable USM.  Currently, capa

[patch] libgomp: Enable USM for some nvptx devices

2024-05-28 Thread Tobias Burnus

While most of the nvptx systems I have access to don't have the support 
for CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES, 
one has:


Tesla V100-SXM2-16GB (as installed, e.g., on ORNL's Summit) does support 
this feature. And with that feature, unified-shared memory support does 
work, presumably by handling automatic page migration when a page fault 
occurs.


Hence: Enable USM support for those. When doing so, all 'requires 
unified_shared_memory' tests of sollve_vv pass :-)


I am not quite sure whether there are unintended side effects, hence, I 
have not enabled support for it in general. In particular, 'declare 
target enter(global_var)' seems to be mishandled (I think it should be 
link + pointer updated to point to the host; cf. description for 
'self_maps'). Thus, it is not enabled by default but only when USM has 
been requested.


OK for mainline?
Comments? Remarks? Suggestions?

Tobias

PS: I guess some more USM tests should be added…

libgomp: Enable USM for some nvptx devices

A few high-end nvptx devices support the attribute
CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES;
for those, unified shared memory is supported in hardware. This
patch enables support for those - if all installed nvptx devices
have this feature (as the capabilities are per device type).

This exposes a bug in gomp_copy_back_icvs as it did before use
omp_get_mapped_ptr to find mapped variables, but that returns
the unchanged pointer in cased of shared memory. But in this case,
we have a few actually mapped pointers - like the ICV variables.
Additionally, there was a mismatch with regards to '-1' for the
device number as gomp_copy_back_icvs and omp_get_mapped_ptr count
differently. Hence, do the lookup manually.

include/ChangeLog:

	* cuda/cuda.h
	(CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES):
	Add.

libgomp/ChangeLog:

	* libgomp.texi (nvptx): Update USM description.
	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_get_num_devices):
	Claim support when requesting USM and all devices support 
	CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES.
	* target.c (gomp_copy_back_icvs): Fix device ptr lookup.
	(gomp_target_init): Set GOMP_OFFLOAD_CAP_SHARED_MEM is the
	devices supports USM.

 include/cuda/cuda.h   |  3 ++-
 libgomp/libgomp.texi  |  5 -
 libgomp/plugin/plugin-nvptx.c | 15 +++
 libgomp/target.c  | 24 +++-
 4 files changed, 44 insertions(+), 3 deletions(-)

diff --git a/include/cuda/cuda.h b/include/cuda/cuda.h
index 0dca4b3a5c0..db640d20366 100644
--- a/include/cuda/cuda.h
+++ b/include/cuda/cuda.h
@@ -83,7 +83,8 @@ typedef enum {
   CU_DEVICE_ATTRIBUTE_MAX_THREADS_PER_MULTIPROCESSOR = 39,
   CU_DEVICE_ATTRIBUTE_ASYNC_ENGINE_COUNT = 40,
   CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING = 41,
-  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82
+  CU_DEVICE_ATTRIBUTE_MAX_REGISTERS_PER_MULTIPROCESSOR = 82,
+  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES = 100
 } CUdevice_attribute;
 
 enum {
diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 71d62105a20..e0d37f67983 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -6435,7 +6435,10 @@ The implementation remark:
   the next reverse offload region is only executed after the previous
   one returned.
 @item OpenMP code that has a @code{requires} directive with
-  @code{unified_shared_memory} will remove any nvptx device from the
+  @code{unified_shared_memory} will run on nvptx devices if and only if
+  all of those support the
+  @code{CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES}
+  attribute; otherwise, all nvptx device are removed from the
   list of available devices (``host fallback'').
 @item The default per-warp stack size is 128 kiB; see also @code{-msoft-stack}
   in the GCC manual.
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 5aad3448a8d..c4b0f5dd4bf 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1201,8 +1201,23 @@ GOMP_OFFLOAD_get_num_devices (unsigned int omp_requires_mask)
   if (num_devices > 0
   && ((omp_requires_mask
 	   & ~(GOMP_REQUIRES_UNIFIED_ADDRESS
+	   | GOMP_REQUIRES_UNIFIED_SHARED_MEMORY
 	   | GOMP_REQUIRES_REVERSE_OFFLOAD)) != 0))
 return -1;
+  /* Check whether automatic page migration is supported; if so, enable USM.
+ Currently, capabilities is per device type, hence, check all devices.  */
+  if (num_devices > 0
+  && (omp_requires_mask & GOMP_REQUIRES_UNIFIED_SHARED_MEMORY))
+for (int dev = 0; dev < num_devices; dev++)
+  {
+	int pi;
+	CUresult r;
+	r = CUDA_CALL_NOCHECK (cuDeviceGetAttribute, ,
+	  CU_DEVICE_ATTRIBUTE_PAGEABLE_MEMORY_ACCESS_USES_HOST_PAGE_TABLES,
+	  dev);
+	if (r != CUDA_SUCCESS || pi == 0)
+	  return -1;
+  }
   return num_devices;
 }
 
diff --git a/libgomp/target.c

[patch] OpenMP: Add -fopenmp-force-usm mode

2024-05-28 Thread Tobias Burnus

-fopenmp-force-usm can be useful for some badly written code. Explicity 
using 'omp requires' makes more sense but still. It might also make 
sense for testing purpose.


Unfortunately, I did not see a simple way of testing it. When trying it 
manually, I looked at the 'a.xamdgcn-amdhsa.c' -save-temps file, where 
gcn_data has the omp_requires_mask as second argument and testing showed 
that an explicit pragma and the -f... argument have the same result.


Alternative would be to move this code later, e.g. to lto-cgraph.cc's 
omp_requires_mask, which might be safer (as it avoids changing as many 
locations). On the other hand, it might require more special cases 
elsewhere.*


Comment, suggestions?

Tobias

*I am especially thinking about a global variable and "#pragma omp 
declare target". At least with 'omp requires self_maps' of OpenMP 6, it 
seems as if 'declare target enter(global_var)' should become 
'link(global_var)' where the global_var pointer is updated to point to 
the host version.


At least I don't see how otherwise the "all corresponding list items 
created by the 'enter' clauses specified by declare target directives in 
the compilation unit share storage with the original list items." could 
be fulfilled.


This will require generating different code for 'self_maps' (and, 
potentially / [RFC] 'unified_shared_memory') than normal code, which 
would be the first compiler code-gen change due to USM (→ 
GOMP_OFFLOAD_CAP_SHARED_MEM) for non-host devices.
OpenMP: Add -fopenmp-force-usm mode

Add an implicit 'omp requires unified_shared_memory' to all files that
use target constructs ("OMP_REQUIRES_TARGET_USED").  As constructed, the
diagnostic "'unified_shared_memory' clause used lexically after first target
construct or offloading API" is not inhibited.

The option has no effect without -fopenmp and does not affect OpenACC code,
matching what the directive would do.  The name of the command-line option
matches Clang's, added in LLVM 18.

gcc/c-family/ChangeLog:

	* c.opt (fopenmp-force-usm): New.
	* c.opt.urls: Regenerated

gcc/c/ChangeLog:

	* c-parser.cc (c_parser_omp_target_data, c_parser_omp_target_update,
	c_parser_omp_target_enter_data, c_parser_omp_target_exit_data,
	c_parser_omp_target): When setting OMP_REQUIRES_TARGET_USED, also
	set OMP_REQUIRES_UNIFIED_SHARED_MEMORY if -fopenmp-force-usm is
	in force.

gcc/cp/ChangeLog:

	* parser.cc (cp_parser_omp_target_data,
	cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data,
	cp_parser_omp_target_update, cp_parser_omp_target): When setting
	OMP_REQUIRES_TARGET_USED, also set OMP_REQUIRES_UNIFIED_SHARED_MEMORY
	if -fopenmp-force-usm is in force.


gcc/ChangeLog:

	* doc/invoke.texi (-fopenmp-force-usm): Document new option.

gcc/fortran/ChangeLog:

	* invoke.texi (-fopenmp-force-usm): Document new option.
	* lang.opt (fopenmp-force-usm): New.
	* lang.opt.urls: Regenerate.
	* parse.cc (gfc_parse_file): When setting
	OMP_REQUIRES_TARGET_USED, also set OMP_REQUIRES_UNIFIED_SHARED_MEMORY
	if -fopenmp-force-usm is in force.

 gcc/c-family/c.opt|  4 
 gcc/c-family/c.opt.urls   |  3 +++
 gcc/c/c-parser.cc | 50 +--
 gcc/cp/parser.cc  | 50 +--
 gcc/doc/invoke.texi   | 11 +--
 gcc/fortran/invoke.texi   |  7 +++
 gcc/fortran/lang.opt  |  4 
 gcc/fortran/lang.opt.urls |  3 +++
 gcc/fortran/parse.cc  | 10 --
 9 files changed, 118 insertions(+), 24 deletions(-)

diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index fb34c3b7031..4985cd61c48 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -2136,6 +2136,10 @@ fopenmp
 C ObjC C++ ObjC++ LTO Var(flag_openmp)
 Enable OpenMP (implies -frecursive in Fortran).
 
+fopenmp-force-usm
+C ObjC C++ ObjC++ Var(flag_openmp_force_usm)
+Behave as if the source file contained OpenMP's 'requires unified_shared_memory'.
+
 fopenmp-simd
 C ObjC C++ ObjC++ Var(flag_openmp_simd)
 Enable OpenMP's SIMD directives.
diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
index dd455d7c0dc..34b3a395e84 100644
--- a/gcc/c-family/c.opt.urls
+++ b/gcc/c-family/c.opt.urls
@@ -1222,6 +1222,9 @@ UrlSuffix(gcc/C-Dialect-Options.html#index-fopenacc-dim)
 fopenmp
 UrlSuffix(gcc/C-Dialect-Options.html#index-fopenmp) LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-fopenmp)
 
+fopenmp-force-usm
+UrlSuffix(gcc/C-Dialect-Options.html#index-fopenmp-force-usm) LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-fopenmp-force-usm)
+
 fopenmp-simd
 UrlSuffix(gcc/C-Dialect-Options.html#index-fopenmp-simd) LangUrlSuffix_Fortran(gfortran/Fortran-Dialect-Options.html#index-fopenmp-simd)
 
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 00f8bf4376e..93c9cd1c9d0 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -23849,8 +23849,14 @@ static tree
 c_parser_omp_target_data (location_t loc, c_parser *parser, bool *if_p)
 {
   if

[gcc r15-867] testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

2024-05-28 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:c0d78289fcd9c04110907f8cad90d7e1e5c55a44

commit r15-867-gc0d78289fcd9c04110907f8cad90d7e1e5c55a44
Author: Tobias Burnus 
Date:   Tue May 28 19:52:44 2024 +0200

testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/lastprivate-conditional-1.c: Remove
'{ dg-prune-output "not supported yet" }'.
* c-c++-common/gomp/requires-1.c: Likewise.
* c-c++-common/gomp/requires-2.c: Likewise.
* c-c++-common/gomp/reverse-offload-1.c: Likewise.
* g++.dg/gomp/requires-1.C: Likewise.
* gfortran.dg/gomp/requires-1.f90: Likewise.
* gfortran.dg/gomp/requires-2.f90: Likewise.
* gfortran.dg/gomp/requires-4.f90: Likewise.
* gfortran.dg/gomp/requires-5.f90: Likewise.
* gfortran.dg/gomp/requires-6.f90: Likewise.
* gfortran.dg/gomp/requires-7.f90: Likewise.

Diff:
---
 gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c | 2 --
 gcc/testsuite/c-c++-common/gomp/requires-1.c| 2 --
 gcc/testsuite/c-c++-common/gomp/requires-2.c| 2 --
 gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c | 2 --
 gcc/testsuite/g++.dg/gomp/requires-1.C  | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-1.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-2.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-4.f90   | 1 -
 gcc/testsuite/gfortran.dg/gomp/requires-5.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-6.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-7.f90   | 1 -
 11 files changed, 20 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c 
b/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
index 722aba79a52..d4ef49690e8 100644
--- a/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
@@ -60,5 +60,3 @@ bar (int *p)
s = u;
   }
 }
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/requires-1.c 
b/gcc/testsuite/c-c++-common/gomp/requires-1.c
index e1f2e3a503f..a47ec659566 100644
--- a/gcc/testsuite/c-c++-common/gomp/requires-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/requires-1.c
@@ -10,5 +10,3 @@ foo ()
 
 #pragma omp requires unified_shared_memory unified_address
 #pragma omp requires atomic_default_mem_order(seq_cst)
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/requires-2.c 
b/gcc/testsuite/c-c++-common/gomp/requires-2.c
index 717b65caeea..d7430b1b1a4 100644
--- a/gcc/testsuite/c-c++-common/gomp/requires-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/requires-2.c
@@ -6,5 +6,3 @@
 #pragma omp requires dynamic_allocators , dynamic_allocators   /* { dg-error 
"too many 'dynamic_allocators' clauses" } */
 #pragma omp requires atomic_default_mem_order(seq_cst) 
atomic_default_mem_order(seq_cst)   /* { dg-error "too many 
'atomic_default_mem_order' clauses" } */
 #pragma omp requires atomic_default_mem_order (seq_cst)/* { dg-error 
"more than one 'atomic_default_mem_order' clause in a single compilation unit" 
} */
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c 
b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
index 9a3fa5230f8..ddc3c2c6be1 100644
--- a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
@@ -6,8 +6,6 @@
 /* { dg-final { scan-tree-dump-times "__attribute__\\(\\(omp declare 
target\\)\\)\[\n\r\]*int called_in_target2" 1 "omplower" } }  */
 /* { dg-final { scan-tree-dump-times "__attribute__\\(\\(omp declare target, 
omp declare target block\\)\\)\[\n\r\]*void tg_fn" 1 "omplower" } }  */
 
-/* { dg-prune-output "'reverse_offload' clause on 'requires' directive not 
supported yet" } */
-
 #pragma omp requires reverse_offload
 
 extern int add_3 (int);
diff --git a/gcc/testsuite/g++.dg/gomp/requires-1.C 
b/gcc/testsuite/g++.dg/gomp/requires-1.C
index aefeb288dad..5ca5e006da1 100644
--- a/gcc/testsuite/g++.dg/gomp/requires-1.C
+++ b/gcc/testsuite/g++.dg/gomp/requires-1.C
@@ -8,5 +8,3 @@ namespace M {
 #pragma omp requires atomic_default_mem_order(seq_cst)
 }
 }
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-1.f90 
b/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
index b115a654e71..19007834c45 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
@@ -9,5 +9,3 @@ subroutine bar
 !$omp requires unified_shared_memory unified_address
 !$omp requires atomic_de

[Patch] testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

2024-05-28 Thread Tobias Burnus

Improve test coverage by removing 'prune-output' given that the features 
are implemented in the meanwhile.


Comments, suggestions? Otherwise I will commit the patch as obvious.

Tobias
testsuite/*/gomp: Remove 'dg-prune-output "not supported yet"'

gcc/testsuite/ChangeLog:

	* c-c++-common/gomp/lastprivate-conditional-1.c: Remove
	'{ dg-prune-output "not supported yet" }'.
	* c-c++-common/gomp/requires-1.c: Likewise.
	* c-c++-common/gomp/requires-2.c: Likewise.
	* c-c++-common/gomp/reverse-offload-1.c: Likewise.
	* g++.dg/gomp/requires-1.C: Likewise.
	* gfortran.dg/gomp/requires-1.f90: Likewise.
	* gfortran.dg/gomp/requires-2.f90: Likewise.
	* gfortran.dg/gomp/requires-4.f90: Likewise.
	* gfortran.dg/gomp/requires-5.f90: Likewise.
	* gfortran.dg/gomp/requires-6.f90: Likewise.
	* gfortran.dg/gomp/requires-7.f90: Likewise.

 gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c | 2 --
 gcc/testsuite/c-c++-common/gomp/requires-1.c| 2 --
 gcc/testsuite/c-c++-common/gomp/requires-2.c| 2 --
 gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c | 2 --
 gcc/testsuite/g++.dg/gomp/requires-1.C  | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-1.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-2.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-4.f90   | 1 -
 gcc/testsuite/gfortran.dg/gomp/requires-5.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-6.f90   | 2 --
 gcc/testsuite/gfortran.dg/gomp/requires-7.f90   | 1 -
 11 files changed, 20 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c b/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
index 722aba79a52..d4ef49690e8 100644
--- a/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/lastprivate-conditional-1.c
@@ -63,2 +62,0 @@ bar (int *p)
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/requires-1.c b/gcc/testsuite/c-c++-common/gomp/requires-1.c
index e1f2e3a503f..a47ec659566 100644
--- a/gcc/testsuite/c-c++-common/gomp/requires-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/requires-1.c
@@ -13,2 +12,0 @@ foo ()
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/requires-2.c b/gcc/testsuite/c-c++-common/gomp/requires-2.c
index 717b65caeea..d7430b1b1a4 100644
--- a/gcc/testsuite/c-c++-common/gomp/requires-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/requires-2.c
@@ -9,2 +8,0 @@
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
index 9a3fa5230f8..ddc3c2c6be1 100644
--- a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
+++ b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c
@@ -9,2 +8,0 @@
-/* { dg-prune-output "'reverse_offload' clause on 'requires' directive not supported yet" } */
-
diff --git a/gcc/testsuite/g++.dg/gomp/requires-1.C b/gcc/testsuite/g++.dg/gomp/requires-1.C
index aefeb288dad..5ca5e006da1 100644
--- a/gcc/testsuite/g++.dg/gomp/requires-1.C
+++ b/gcc/testsuite/g++.dg/gomp/requires-1.C
@@ -11,2 +10,0 @@ namespace M {
-
-/* { dg-prune-output "not supported yet" } */
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-1.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
index b115a654e71..19007834c45 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-1.f90
@@ -12,2 +11,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-2.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-2.f90
index 5f11a7bfb2a..f144d391034 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-2.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-2.f90
@@ -13,2 +12,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-4.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-4.f90
index c870a2840d3..9d936197f8f 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-4.f90
@@ -36 +35,0 @@ end
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-5.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-5.f90
index e719e929294..87be933ba49 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-5.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-5.f90
@@ -15,2 +14,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-6.f90 b/gcc/testsuite/gfortran.dg/gomp/requires-6.f90
index cabd3d94a90..b20c218dd6b 100644
--- a/gcc/testsuite/gfortran.dg/gomp/requires-6.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/requires-6.f90
@@ -15,2 +14,0 @@ end
-
-! { dg-prune-output "not yet supported" }
diff --git a/gcc/testsuite/gfortran.dg/gomp/requires-7.f90

[gcc r12-10476] Fortran: Fix SHAPE for zero-size arrays

2024-05-28 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:e0b2c4f90f908a9bca4038c7ae0d8ca6ee157d8f

commit r12-10476-ge0b2c4f90f908a9bca4038c7ae0d8ca6ee157d8f
Author: Tobias Burnus 
Date:   Mon May 20 08:34:48 2024 +0200

Fortran: Fix SHAPE for zero-size arrays

PR fortran/115150

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_bound): Fix SHAPE
for zero-size arrays

gcc/testsuite/ChangeLog:

* gfortran.dg/shape_12.f90: New test.

(cherry picked from commit b701306a9b38bd74cdc26c7ece5add22f2203b56)

Diff:
---
 gcc/fortran/trans-intrinsic.cc |  4 ++-
 gcc/testsuite/gfortran.dg/shape_12.f90 | 51 ++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index c30cdfd37f9..9393ca10b06 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -3083,7 +3083,9 @@ gfc_conv_intrinsic_bound (gfc_se * se, gfc_expr * expr, 
enum gfc_isym_id op)
  lbound, gfc_index_one_node);
}
   else if (op == GFC_ISYM_SHAPE)
-   se->expr = size;
+   se->expr = fold_build2_loc (input_location, MAX_EXPR,
+   gfc_array_index_type, size,
+   gfc_index_zero_node);
   else
gcc_unreachable ();
 
diff --git a/gcc/testsuite/gfortran.dg/shape_12.f90 
b/gcc/testsuite/gfortran.dg/shape_12.f90
new file mode 100644
index 000..e672e1ff9f9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/shape_12.f90
@@ -0,0 +1,51 @@
+! { dg-do run }
+!
+! PR fortran/115150
+!
+! Check that SHAPE handles zero-sized arrays correctly
+!
+implicit none
+call one
+call two
+
+contains
+
+subroutine one
+  real,allocatable :: A(:),B(:,:)
+  allocate(a(3:0), b(5:1, 2:5))
+
+  if (any (shape(a) /= [0])) stop 1
+  if (any (shape(b) /= [0, 4])) stop 2
+  if (size(a) /= 0) stop 3
+  if (size(b) /= 0) stop 4
+  if (any (lbound(a) /= [1])) stop 5
+  if (any (lbound(b) /= [1, 2])) stop 6
+  if (any (ubound(a) /= [0])) stop 5
+  if (any (ubound(b) /= [0,5])) stop 6
+end
+
+subroutine two
+integer :: x1(10), x2(10,10)
+call f(x1, x2, -3)
+end
+
+subroutine f(y1, y2, n)
+  integer, value :: n
+  integer :: y1(1:n)
+  integer :: y2(1:n,4,2:*)
+  call g(y1, y2)
+end
+
+subroutine g(z1, z2)
+  integer :: z1(..), z2(..)
+
+  if (any (shape(z1) /= [0])) stop 1
+  if (any (shape(z2) /= [0, 4, -1])) stop 2
+  if (size(z1) /= 0) stop 3
+  if (size(z2) /= 0) stop 4
+  if (any (lbound(z1) /= [1])) stop 5
+  if (any (lbound(z2) /= [1, 1, 1])) stop 6
+  if (any (ubound(z1) /= [0])) stop 5
+  if (any (ubound(z2) /= [0, 4, -1])) stop 6
+end
+end

[gcc r13-8805] Fortran: Fix SHAPE for zero-size arrays

2024-05-28 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:3185cfe495944e6e5d000ccd820bed2e6f10cd6c

commit r13-8805-g3185cfe495944e6e5d000ccd820bed2e6f10cd6c
Author: Tobias Burnus 
Date:   Mon May 20 08:34:48 2024 +0200

Fortran: Fix SHAPE for zero-size arrays

PR fortran/115150

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_bound): Fix SHAPE
for zero-size arrays

gcc/testsuite/ChangeLog:

* gfortran.dg/shape_12.f90: New test.

(cherry picked from commit b701306a9b38bd74cdc26c7ece5add22f2203b56)

Diff:
---
 gcc/fortran/trans-intrinsic.cc |  4 ++-
 gcc/testsuite/gfortran.dg/shape_12.f90 | 51 ++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index aa0dea50089..455b61aa564 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -3090,7 +3090,9 @@ gfc_conv_intrinsic_bound (gfc_se * se, gfc_expr * expr, 
enum gfc_isym_id op)
  lbound, gfc_index_one_node);
}
   else if (op == GFC_ISYM_SHAPE)
-   se->expr = size;
+   se->expr = fold_build2_loc (input_location, MAX_EXPR,
+   gfc_array_index_type, size,
+   gfc_index_zero_node);
   else
gcc_unreachable ();
 
diff --git a/gcc/testsuite/gfortran.dg/shape_12.f90 
b/gcc/testsuite/gfortran.dg/shape_12.f90
new file mode 100644
index 000..e672e1ff9f9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/shape_12.f90
@@ -0,0 +1,51 @@
+! { dg-do run }
+!
+! PR fortran/115150
+!
+! Check that SHAPE handles zero-sized arrays correctly
+!
+implicit none
+call one
+call two
+
+contains
+
+subroutine one
+  real,allocatable :: A(:),B(:,:)
+  allocate(a(3:0), b(5:1, 2:5))
+
+  if (any (shape(a) /= [0])) stop 1
+  if (any (shape(b) /= [0, 4])) stop 2
+  if (size(a) /= 0) stop 3
+  if (size(b) /= 0) stop 4
+  if (any (lbound(a) /= [1])) stop 5
+  if (any (lbound(b) /= [1, 2])) stop 6
+  if (any (ubound(a) /= [0])) stop 5
+  if (any (ubound(b) /= [0,5])) stop 6
+end
+
+subroutine two
+integer :: x1(10), x2(10,10)
+call f(x1, x2, -3)
+end
+
+subroutine f(y1, y2, n)
+  integer, value :: n
+  integer :: y1(1:n)
+  integer :: y2(1:n,4,2:*)
+  call g(y1, y2)
+end
+
+subroutine g(z1, z2)
+  integer :: z1(..), z2(..)
+
+  if (any (shape(z1) /= [0])) stop 1
+  if (any (shape(z2) /= [0, 4, -1])) stop 2
+  if (size(z1) /= 0) stop 3
+  if (size(z2) /= 0) stop 4
+  if (any (lbound(z1) /= [1])) stop 5
+  if (any (lbound(z2) /= [1, 1, 1])) stop 6
+  if (any (ubound(z1) /= [0])) stop 5
+  if (any (ubound(z2) /= [0, 4, -1])) stop 6
+end
+end

gcc-wwwdocs branch master updated. 30f0c75e77a10942590037b749a64db74b0c8480

2024-05-28 Thread Tobias Burnus via Gcc-cvs-wwwdocs

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  30f0c75e77a10942590037b749a64db74b0c8480 (commit)
  from  582d3e94dbcdf2aa63134532dc66b01d651d7a1d (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit 30f0c75e77a10942590037b749a64db74b0c8480
Author: Tobias Burnus 
Date:   Tue May 28 15:20:50 2024 +0200

gcc-15/changes.html: Fortran - mention F2023 logical-kind additions

diff --git a/htdocs/gcc-15/changes.html b/htdocs/gcc-15/changes.html
index a89a7f2b..b59fd3be 100644
--- a/htdocs/gcc-15/changes.html
+++ b/htdocs/gcc-15/changes.html
@@ -49,7 +49,13 @@ a work-in-progress.
 
 
 
-
+Fortran
+
+
+  Fortran 2023: The selected_logical_kind intrinsic function
+  and, in the ISO_FORTRAN_ENV module, the named constants
+  logical{8,16,32,64} and real16 were added.
+
 
 
 

---

Summary of changes:
 htdocs/gcc-15/changes.html | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)


hooks/post-receive
-- 
gcc-wwwdocs

[gcc r14-10251] Fortran: Fix SHAPE for zero-size arrays

2024-05-28 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:dbeb3d127da07963ecaa26680da62a255199e9c2

commit r14-10251-gdbeb3d127da07963ecaa26680da62a255199e9c2
Author: Tobias Burnus 
Date:   Mon May 20 08:34:48 2024 +0200

Fortran: Fix SHAPE for zero-size arrays

PR fortran/115150

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_bound): Fix SHAPE
for zero-size arrays

gcc/testsuite/ChangeLog:

* gfortran.dg/shape_12.f90: New test.

(cherry picked from commit b701306a9b38bd74cdc26c7ece5add22f2203b56)

Diff:
---
 gcc/fortran/trans-intrinsic.cc |  4 ++-
 gcc/testsuite/gfortran.dg/shape_12.f90 | 51 ++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 4e26af21b46..7cb7c2e6949 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -3090,7 +3090,9 @@ gfc_conv_intrinsic_bound (gfc_se * se, gfc_expr * expr, 
enum gfc_isym_id op)
  lbound, gfc_index_one_node);
}
   else if (op == GFC_ISYM_SHAPE)
-   se->expr = size;
+   se->expr = fold_build2_loc (input_location, MAX_EXPR,
+   gfc_array_index_type, size,
+   gfc_index_zero_node);
   else
gcc_unreachable ();
 
diff --git a/gcc/testsuite/gfortran.dg/shape_12.f90 
b/gcc/testsuite/gfortran.dg/shape_12.f90
new file mode 100644
index 000..e672e1ff9f9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/shape_12.f90
@@ -0,0 +1,51 @@
+! { dg-do run }
+!
+! PR fortran/115150
+!
+! Check that SHAPE handles zero-sized arrays correctly
+!
+implicit none
+call one
+call two
+
+contains
+
+subroutine one
+  real,allocatable :: A(:),B(:,:)
+  allocate(a(3:0), b(5:1, 2:5))
+
+  if (any (shape(a) /= [0])) stop 1
+  if (any (shape(b) /= [0, 4])) stop 2
+  if (size(a) /= 0) stop 3
+  if (size(b) /= 0) stop 4
+  if (any (lbound(a) /= [1])) stop 5
+  if (any (lbound(b) /= [1, 2])) stop 6
+  if (any (ubound(a) /= [0])) stop 5
+  if (any (ubound(b) /= [0,5])) stop 6
+end
+
+subroutine two
+integer :: x1(10), x2(10,10)
+call f(x1, x2, -3)
+end
+
+subroutine f(y1, y2, n)
+  integer, value :: n
+  integer :: y1(1:n)
+  integer :: y2(1:n,4,2:*)
+  call g(y1, y2)
+end
+
+subroutine g(z1, z2)
+  integer :: z1(..), z2(..)
+
+  if (any (shape(z1) /= [0])) stop 1
+  if (any (shape(z2) /= [0, 4, -1])) stop 2
+  if (size(z1) /= 0) stop 3
+  if (size(z2) /= 0) stop 4
+  if (any (lbound(z1) /= [1])) stop 5
+  if (any (lbound(z2) /= [1, 1, 1])) stop 6
+  if (any (ubound(z1) /= [0])) stop 5
+  if (any (ubound(z2) /= [0, 4, -1])) stop 6
+end
+end

[wwwdocs][patch] gcc-15/changes.html: Fortran - mention F2023 logical-kind additions

2024-05-28 Thread Tobias Burnus

Let's make https://gcc.gnu.org/gcc-15/changes.html a bit more useful … 
While there were several useful Fortran commits already, only one seems 
to be about a new feature.


Thus, document selected_logical_kind and the ISO_FORTRAN_ENV additions.

Comments or suggestions before I commit it?

Tobias
Title: GCC 15 Release Series — Changes, New Features, and Fixes








GCC 15 Release SeriesChanges, New Features, and Fixes


This page is a "brief" summary of some of the huge number of improvements
in GCC 15.



Note: GCC 15 has not been released yet, so this document is
a work-in-progress.


Caveats

  ...




General Improvements


New Languages and Language specific improvements










Fortran


  Fortran 2023: The selected_logical_kind intrinsic function
  and, in the ISO_FORTRAN_ENV module, the named constants
  logical{8,16,32,64} and real16 were added.








New Targets and Target Specific Improvements








































Operating Systems



























Other significant improvements

Re: [PATCH 6/7] OpenMP: Fortran front-end support for dispatch + adjust_args

2024-05-28 Thread Tobias Burnus


Hi PA, hi all,

two remarks while quickly browsing the code:

Paul-Antoine Arras:

+ if (n->sym->ts.type != BT_DERIVED
+ || !n->sym->ts.u.derived->ts.is_iso_c)
+   {
+ gfc_error ("argument list item %qs in "
+"% at %L must be of "
+"TYPE(C_PTR)",
+n->sym->name, >where);


I think you need to rule out 'c_funptr' as well, e.g. via:

|| (n->sym->ts.u.derived->intmod_sym_id
!= ISOCBINDING_PTR)))

I do note that in openmp.cc, we have one check which checks explicitly 
for c_ptr and one existing one which only checks for (c_ptr or 
c_funptr); can you fix that one as well?


* * *

But I mainly miss an update to 'module.cc' for the 'declare variant' 
change; the 'adjust_args' (for 'need_device_ptr', only) list items have

to be saved in the .mod file - otherwise the following will not work:

-aux.f90
! { dg-do compile { target skip-all-targets } }
module my_mod
  ...
  !$omp declare variant ... adjust_args(need_device_ptr: ...)
  ...
end module

.f90
{ dg-do ...
! { dg-additional-sources -aux.f90 }
  ...
  call 
  ...
  !$omp displatch
   call 
end


For C++ modules, it should be fine as those for those, the tree is dumped.

Tobias

Re: [Patch] Fortran: invoke.texi - link to OpenCoarrays.org + mention libcaf_single

2024-05-21 Thread Tobias Burnus


Hi Bernhard,

rep.dot@gmail.com wrote:

library such as @url{http://opencoarrays.org} needs to be linked.

Maybe use https?


Works, but as the certificate is not valid, it requires to ignore the 
errors in a browser, which is a worse user experience.


The error is, e.g.,

"curl: (60) SSL certificate problem: self-signed certificate"

Or at 
https://www.ssllabs.com/ssltest/analyze.html?d=www.opencoarrays.org=on


"Common names: invalid-sni.invalid / Issuer: invalid-sni.invalid  
(Self-signed)"


@Damian: Can you fix the server to actually have a valid certificate?

Tobias

[gcc r15-749] contrib/gcc-changelog/git_update_version.py: Improve diagnostic

2024-05-21 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:9596f6567ce6fdf94227b97ac28d3549f421ef73

commit r15-749-g9596f6567ce6fdf94227b97ac28d3549f421ef73
Author: Tobias Burnus 
Date:   Tue May 21 10:13:13 2024 +0200

contrib/gcc-changelog/git_update_version.py: Improve diagnostic

contrib/ChangeLog:

* gcc-changelog/git_update_version.py: Add '-i'/'--ignore' argument
to add to-be-ignored commits via the command line.
(ignored_commits): Rename from IGNORED_COMMITS and change
type from tuple to set.
(prepend_to_changelog_files): Show git hash if errors occurred.
(update_current_branch): Mark argument as optional by defaulting
to None.

Diff:
---
 contrib/gcc-changelog/git_update_version.py | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/contrib/gcc-changelog/git_update_version.py 
b/contrib/gcc-changelog/git_update_version.py
index 24f6c43d0b2..c69a3a6897a 100755
--- a/contrib/gcc-changelog/git_update_version.py
+++ b/contrib/gcc-changelog/git_update_version.py
@@ -22,6 +22,7 @@ import argparse
 import datetime
 import logging
 import os
+import re
 
 from git import Repo
 
@@ -30,7 +31,7 @@ from git_repository import parse_git_revisions
 current_timestamp = datetime.datetime.now().strftime('%Y%m%d\n')
 
 # Skip the following commits, they cannot be correctly processed
-IGNORED_COMMITS = (
+ignored_commits = {
 'c2be82058fb40f3ae891c68d185ff53e07f14f45',
 '04a040d907a83af54e0a98bdba5bfabc0ef4f700',
 '2e96b5f14e4025691b57d2301d71aa6092ed44bc',
@@ -41,7 +42,7 @@ IGNORED_COMMITS = (
 '040e5b0edbca861196d9e2ea2af5e805769c8d5d',
 '8057f9aa1f7e70490064de796d7a8d42d446caf8',
 '109f1b28fc94c93096506e3df0c25e331cef19d0',
-'39f81924d88e3cc197fc3df74204c9b5e01e12f7')
+'39f81924d88e3cc197fc3df74204c9b5e01e12f7'}
 
 FORMAT = '%(asctime)s:%(levelname)s:%(name)s:%(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT,
@@ -58,6 +59,7 @@ def read_timestamp(path):
 
 def prepend_to_changelog_files(repo, folder, git_commit, add_to_git):
 if not git_commit.success:
+logging.info(f"While processing {git_commit.info.hexsha}:")
 for error in git_commit.errors:
 logging.info(error)
 raise AssertionError()
@@ -93,13 +95,15 @@ parser.add_argument('-d', '--dry-mode',
  ' is expected')
 parser.add_argument('-c', '--current', action='store_true',
 help='Modify current branch (--push argument is ignored)')
+parser.add_argument('-i', '--ignore', action='append',
+help='list of commits to ignore')
 args = parser.parse_args()
 
 repo = Repo(args.git_path)
 origin = repo.remotes['origin']
 
 
-def update_current_branch(ref_name):
+def update_current_branch(ref_name=None):
 commit = repo.head.commit
 commit_count = 1
 while commit:
@@ -123,7 +127,7 @@ def update_current_branch(ref_name):
 head = head.parents[1]
 commits = parse_git_revisions(args.git_path, '%s..%s'
   % (commit.hexsha, head.hexsha), ref_name)
-commits = [c for c in commits if c.info.hexsha not in IGNORED_COMMITS]
+commits = [c for c in commits if c.info.hexsha not in ignored_commits]
 for git_commit in reversed(commits):
 prepend_to_changelog_files(repo, args.git_path, git_commit,
not args.dry_mode)
@@ -153,6 +157,9 @@ def update_current_branch(ref_name):
 else:
 logging.info('DATESTAMP unchanged')
 
+if args.ignore is not None:
+for item in args.ignore:
+ignored_commits.update(set(i for i in re.split(r'\s*,\s*|\s+', item)))
 
 if args.current:
 logging.info('=== Working on the current branch ===')

Re: [Patch] contrib/gcc-changelog/git_update_version.py: Improve diagnostic

2024-05-21 Thread Tobias Burnus


Hi Jakub,

Jakub Jelinek wrote:

On Mon, May 20, 2024 at 08:31:02AM +0200, Tobias Burnus wrote:

Hmm, there were now two daily bumps: [...] I really wonder why.

Because I've done it by hand.


Okay, that explains it.

I still do not understand why it slipped through at the first place; I 
tried old versions down to r12-709-g772e5e82e3114f and it still FAIL for 
the invalid commit ("ERR: cannot find a ChangeLog location in message").


Thus, I wonder whether the commit hook is active at all?!?


I have in ~gccadmin a gcc-changelog copy and adjusted update_version_git
script which doesn't use contrib/gcc-changelog subdirectory from the
checkout it makes but from the ~gccadmin directory,

[...]

I'm already using something similar in
my hack (just was doing it for even successful commits, but I think your
patch is better).
And, I think best would be if update_version_git script simply
accepted a list of ignored commits from the command line too,
passed it to the git_update_version.py script and that one
added those to IGNORED_COMMITS.


Updated version:

* Uses my diagnostic

* Adds an -i/--ignore argument for commits. Permits to use '-i hash1  -i 
hash2' but also '-i hash1,hash2' or '-i "hash1 hash2'


* I changed the global variable to lower case as Python's style guide 
states that all uppercase variables is for constants.


* The '=None' matches one of the current usages (no argument passed); 
hence, it is now explicit and 'pylint' is happy.


OK for mainline?

Tobias

PS: I have not updated the hashes. If needed/wanted, I leave that to 
you, Jakub.
contrib/gcc-changelog/git_update_version.py: Improve diagnostic

contrib/ChangeLog:

	* gcc-changelog/git_update_version.py: Add '-i'/'--ignore' argument
	to add to-be-ignored commits via the command line.
	(ignored_commits): Rename from IGNORED_COMMITS and change
	type from tuple to set.
	(prepend_to_changelog_files): Show git hash if errors occurred.
	(update_current_branch): Mark argument as optional by defaulting
	to None.

 contrib/gcc-changelog/git_update_version.py | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/contrib/gcc-changelog/git_update_version.py b/contrib/gcc-changelog/git_update_version.py
index 24f6c43d0b2..c69a3a6897a 100755
--- a/contrib/gcc-changelog/git_update_version.py
+++ b/contrib/gcc-changelog/git_update_version.py
@@ -22,6 +22,7 @@ import argparse
 import datetime
 import logging
 import os
+import re
 
 from git import Repo
 
@@ -30,7 +31,7 @@ from git_repository import parse_git_revisions
 current_timestamp = datetime.datetime.now().strftime('%Y%m%d\n')
 
 # Skip the following commits, they cannot be correctly processed
-IGNORED_COMMITS = (
+ignored_commits = {
 'c2be82058fb40f3ae891c68d185ff53e07f14f45',
 '04a040d907a83af54e0a98bdba5bfabc0ef4f700',
 '2e96b5f14e4025691b57d2301d71aa6092ed44bc',
@@ -41,7 +42,7 @@ IGNORED_COMMITS = (
 '040e5b0edbca861196d9e2ea2af5e805769c8d5d',
 '8057f9aa1f7e70490064de796d7a8d42d446caf8',
 '109f1b28fc94c93096506e3df0c25e331cef19d0',
-'39f81924d88e3cc197fc3df74204c9b5e01e12f7')
+'39f81924d88e3cc197fc3df74204c9b5e01e12f7'}
 
 FORMAT = '%(asctime)s:%(levelname)s:%(name)s:%(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT,
@@ -58,6 +59,7 @@ def read_timestamp(path):
 
 def prepend_to_changelog_files(repo, folder, git_commit, add_to_git):
 if not git_commit.success:
+logging.info(f"While processing {git_commit.info.hexsha}:")
 for error in git_commit.errors:
 logging.info(error)
 raise AssertionError()
@@ -93,13 +95,15 @@ parser.add_argument('-d', '--dry-mode',
  ' is expected')
 parser.add_argument('-c', '--current', action='store_true',
 help='Modify current branch (--push argument is ignored)')
+parser.add_argument('-i', '--ignore', action='append',
+help='list of commits to ignore')
 args = parser.parse_args()
 
 repo = Repo(args.git_path)
 origin = repo.remotes['origin']
 
 
-def update_current_branch(ref_name):
+def update_current_branch(ref_name=None):
 commit = repo.head.commit
 commit_count = 1
 while commit:
@@ -123,7 +127,7 @@ def update_current_branch(ref_name):
 head = head.parents[1]
 commits = parse_git_revisions(args.git_path, '%s..%s'
   % (commit.hexsha, head.hexsha), ref_name)
-commits = [c for c in commits if c.info.hexsha not in IGNORED_COMMITS]
+commits = [c for c in commits if c.info.hexsha not in ignored_commits]
 for git_commit in reversed(commits):
 prepend_to_changelog_files(repo, args.git_path, git_commit,
not args.dry_mode)
@@ -153,6 +157,9 @@ def update_current_branch(ref_name):
 else:
 logging.info('DATESTAMP unchanged')
 
+if args.ignore is not None:
+

[gcc r15-658] Fortran: Fix SHAPE for zero-size arrays

2024-05-20 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:b701306a9b38bd74cdc26c7ece5add22f2203b56

commit r15-658-gb701306a9b38bd74cdc26c7ece5add22f2203b56
Author: Tobias Burnus 
Date:   Mon May 20 08:34:48 2024 +0200

Fortran: Fix SHAPE for zero-size arrays

PR fortran/115150

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_bound): Fix SHAPE
for zero-size arrays

gcc/testsuite/ChangeLog:

* gfortran.dg/shape_12.f90: New test.

Diff:
---
 gcc/fortran/trans-intrinsic.cc |  4 ++-
 gcc/testsuite/gfortran.dg/shape_12.f90 | 51 ++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 80dc3426ab04..912c1000e186 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -3090,7 +3090,9 @@ gfc_conv_intrinsic_bound (gfc_se * se, gfc_expr * expr, 
enum gfc_isym_id op)
  lbound, gfc_index_one_node);
}
   else if (op == GFC_ISYM_SHAPE)
-   se->expr = size;
+   se->expr = fold_build2_loc (input_location, MAX_EXPR,
+   gfc_array_index_type, size,
+   gfc_index_zero_node);
   else
gcc_unreachable ();
 
diff --git a/gcc/testsuite/gfortran.dg/shape_12.f90 
b/gcc/testsuite/gfortran.dg/shape_12.f90
new file mode 100644
index ..e672e1ff9f95
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/shape_12.f90
@@ -0,0 +1,51 @@
+! { dg-do run }
+!
+! PR fortran/115150
+!
+! Check that SHAPE handles zero-sized arrays correctly
+!
+implicit none
+call one
+call two
+
+contains
+
+subroutine one
+  real,allocatable :: A(:),B(:,:)
+  allocate(a(3:0), b(5:1, 2:5))
+
+  if (any (shape(a) /= [0])) stop 1
+  if (any (shape(b) /= [0, 4])) stop 2
+  if (size(a) /= 0) stop 3
+  if (size(b) /= 0) stop 4
+  if (any (lbound(a) /= [1])) stop 5
+  if (any (lbound(b) /= [1, 2])) stop 6
+  if (any (ubound(a) /= [0])) stop 5
+  if (any (ubound(b) /= [0,5])) stop 6
+end
+
+subroutine two
+integer :: x1(10), x2(10,10)
+call f(x1, x2, -3)
+end
+
+subroutine f(y1, y2, n)
+  integer, value :: n
+  integer :: y1(1:n)
+  integer :: y2(1:n,4,2:*)
+  call g(y1, y2)
+end
+
+subroutine g(z1, z2)
+  integer :: z1(..), z2(..)
+
+  if (any (shape(z1) /= [0])) stop 1
+  if (any (shape(z2) /= [0, 4, -1])) stop 2
+  if (size(z1) /= 0) stop 3
+  if (size(z2) /= 0) stop 4
+  if (any (lbound(z1) /= [1])) stop 5
+  if (any (lbound(z2) /= [1, 1, 1])) stop 6
+  if (any (ubound(z1) /= [0])) stop 5
+  if (any (ubound(z2) /= [0, 4, -1])) stop 6
+end
+end

[gcc r15-657] Fortran: invoke.texi - link to OpenCoarrays.org + mention libcaf_single

2024-05-20 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:544d5dcc9150c0ea278fba79ea515f5a87732ce7

commit r15-657-g544d5dcc9150c0ea278fba79ea515f5a87732ce7
Author: Tobias Burnus 
Date:   Mon May 20 08:33:31 2024 +0200

Fortran: invoke.texi - link to OpenCoarrays.org + mention libcaf_single

gcc/fortran/ChangeLog:

* invoke.texi (fcoarray): Link to OpenCoarrays.org;
mention libcaf_single.

Diff:
---
 gcc/fortran/invoke.texi | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 40e8e4a7cdde..6bc42afe2c4f 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -1753,7 +1753,10 @@ Single-image mode, i.e. @code{num_images()} is always 
one.
 
 @item @samp{lib}
 Library-based coarray parallelization; a suitable GNU Fortran coarray
-library needs to be linked.
+library such as @url{http://opencoarrays.org} needs to be linked.
+Alternatively, GCC's @code{libcaf_single} library can be linked,
+albeit it only supports a single image.
+
 @end table

[Patch] contrib/gcc-changelog/git_update_version.py: Improve diagnostic (was: [Patch] contrib/gcc-changelog/git_update_version.py: Add ignore commit, improve diagnostic)

2024-05-20 Thread Tobias Burnus

Hmm, there were now two daily bumps:

Date:   Mon May 20 00:16:30 2024 +

Date:   Sun May 19 18:15:28 2024 +

I really wonder why.

I guess, the 'ignore commit' is hence not needed – but I think the 
improved diagnostic part still makes sense.

See updated patch.

On May 19, 24 Tobias Burnus wrote:

I noticed that the last bump happened on Thursday.

* * *

The error is according to
https://gcc.gnu.org/pipermail/gccadmin/2024q2/021298.html

2024-05-19 00:17:28,643:INFO:root:cannot find a ChangeLog location in 
message

That's the commit
---
    Revert "Revert: "Enable prange support.""

    This reverts commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 and 
enables prange

    support again.
---

* * * The attached patch adds this commit to the ignore list and helps 
with the diagnosis by showing the failing hash in the error message.

OK for mainline?

Post commit: Can someone install the new version + fix the ChangeLog 
for the ignored commit?

* * *

What I do not understand: Why does this commit get applied? I do see 
for both

contrib/gcc-changelog/git_check_commit.py -v -p 
da73261ce7731be7f2b164f1db796878cdc23365

and

contrib/gcc-changelog/git_email.py 
0001-Revert-Revert-Enable-prange-support.patch the error above. - And 
I do not understand why it made it past the commit check but now fails?

Likewise for8057f9aa1f7e70490064de796d7a8d42d446caf8

Does the commit hook use an older version of the check scripts? Does 
it ignore the errors? Or what goes wrong here? Any idea?

TobiasFrom f56b1764f2b5c2c83c6852607405e5be0a763a2c Mon Sep 17 00:00:00 2001
From: Tobias Burnus 
Date: Sun, 19 May 2024 08:17:42 +0200
Subject: [PATCH] contrib/gcc-changelog/git_update_version.py: Improve diagnostic

contrib/ChangeLog:

* gcc-changelog/git_update_version.py (prepend_to_changelog_files): Output
	git hash in case errors occurred.

diff --git a/contrib/gcc-changelog/git_update_version.py b/contrib/gcc-changelog/git_update_version.py
index 24f6c43d0b2..ec0151b83fe 100755
--- a/contrib/gcc-changelog/git_update_version.py
+++ b/contrib/gcc-changelog/git_update_version.py
@@ -58,6 +58,7 @@ def read_timestamp(path):

 def prepend_to_changelog_files(repo, folder, git_commit, add_to_git):
 if not git_commit.success:
+logging.info(f"While processing {git_commit.info.hexsha}:")
 for error in git_commit.errors:
 logging.info(error)
 raise AssertionError()
-- 
2.45.0

[Patch] Fortran: Fix SHAPE for zero-size arrays

2024-05-19 Thread Tobias Burnus

That is for https://gcc.gnu.org/PR115150 – a GCC 12/13/14/15 regression, 
caused when switching from a libgomp call to inline code and missing the 
corner case of zero-size arrays ...


OK for mainline + all affected branches?

Tobias
Fortran: Fix SHAPE for zero-size arrays

	PR fortran/115150

gcc/fortran/ChangeLog:

	* trans-intrinsic.cc (gfc_conv_intrinsic_bound): Fix SHAPE
	for zero-size arrays

gcc/testsuite/ChangeLog:

	* gfortran.dg/shape_12.f90: New test.

 gcc/fortran/trans-intrinsic.cc |  4 ++-
 gcc/testsuite/gfortran.dg/shape_12.f90 | 51 ++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 80dc3426ab0..912c1000e18 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -3090,7 +3090,9 @@ gfc_conv_intrinsic_bound (gfc_se * se, gfc_expr * expr, enum gfc_isym_id op)
   lbound, gfc_index_one_node);
 	}
   else if (op == GFC_ISYM_SHAPE)
-	se->expr = size;
+	se->expr = fold_build2_loc (input_location, MAX_EXPR,
+gfc_array_index_type, size,
+gfc_index_zero_node);
   else
 	gcc_unreachable ();
 
diff --git a/gcc/testsuite/gfortran.dg/shape_12.f90 b/gcc/testsuite/gfortran.dg/shape_12.f90
new file mode 100644
index 000..e672e1ff9f9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/shape_12.f90
@@ -0,0 +1,51 @@
+! { dg-do run }
+!
+! PR fortran/115150
+!
+! Check that SHAPE handles zero-sized arrays correctly
+!
+implicit none
+call one
+call two
+
+contains
+
+subroutine one
+  real,allocatable :: A(:),B(:,:)
+  allocate(a(3:0), b(5:1, 2:5))
+
+  if (any (shape(a) /= [0])) stop 1
+  if (any (shape(b) /= [0, 4])) stop 2
+  if (size(a) /= 0) stop 3
+  if (size(b) /= 0) stop 4
+  if (any (lbound(a) /= [1])) stop 5
+  if (any (lbound(b) /= [1, 2])) stop 6
+  if (any (ubound(a) /= [0])) stop 5
+  if (any (ubound(b) /= [0,5])) stop 6
+end
+
+subroutine two
+integer :: x1(10), x2(10,10)
+call f(x1, x2, -3)
+end
+
+subroutine f(y1, y2, n)
+  integer, value :: n
+  integer :: y1(1:n)
+  integer :: y2(1:n,4,2:*)
+  call g(y1, y2)
+end
+
+subroutine g(z1, z2)
+  integer :: z1(..), z2(..)
+
+  if (any (shape(z1) /= [0])) stop 1
+  if (any (shape(z2) /= [0, 4, -1])) stop 2
+  if (size(z1) /= 0) stop 3
+  if (size(z2) /= 0) stop 4
+  if (any (lbound(z1) /= [1])) stop 5
+  if (any (lbound(z2) /= [1, 1, 1])) stop 6
+  if (any (ubound(z1) /= [0])) stop 5
+  if (any (ubound(z2) /= [0, 4, -1])) stop 6
+end
+end

[Patch] Fortran: invoke.texi - link to OpenCoarrays.org + mention libcaf_single

2024-05-19 Thread Tobias Burnus

I noticed that gfortran's coarray support did not link to the 
http://www.opencoarrays.org/


As that library is needed to support parallelization, it makes sense to 
have the link.


Motivated by someone claiming at ISC-HPC that GCC only supports a single 
image.


And also motivated by Damian's presentation, which showed that 
gfortran's coarrays could successfully run the ICAR atmospheric model 
with 25,600 processes (OpenCoarrays with OpenSHMEM backend), which 
definitely is more than one image :-)


I think mentioning the existing libcaf_single is still useful, even 
though it is only of limited use (except that it does ship with GCC and 
permits to do some testings. Especially, it is used by GCC's testsuite).


OK for mainline?

Tobias
Fortran: invoke.texi - link to OpenCoarrays.org + mention libcaf_single

gcc/fortran/ChangeLog:

	* invoke.texi (fcoarray): Link to OpenCoarrays.org;
	mention libcaf_single.

 gcc/fortran/invoke.texi | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 40e8e4a7cdd..78a2910b8d8 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -1753,7 +1753,10 @@ Single-image mode, i.e. @code{num_images()} is always one.
 
 @item @samp{lib}
 Library-based coarray parallelization; a suitable GNU Fortran coarray
-library needs to be linked.
+library needs to be linked such as @url{http://opencoarrays.org}.
+Alternatively, GCC's @code{libcaf_single} library can be linked,
+albeit it only supports a single image.
+
 @end table

[Patch] contrib/gcc-changelog/git_update_version.py: Add ignore commit, improve diagnostic

2024-05-19 Thread Tobias Burnus


I noticed that the last bump happened on Thursday.

* * *

The error is according to
https://gcc.gnu.org/pipermail/gccadmin/2024q2/021298.html

2024-05-19 00:17:28,643:INFO:root:cannot find a ChangeLog location in message

That's the commit
---
Revert "Revert: "Enable prange support.""

This reverts commit d7bb8eaade3cd3aa70715c8567b4d7b08098e699 and enables 
prange
support again.
---

* * * The attached patch adds this commit to the ignore list and helps 
with the diagnosis by showing the failing hash in the error message. OK 
for mainline? Post commit: Can someone install the new version + fix the 
ChangeLog for the ignored commit? * * * What I do not understand: Why does this commit get applied? I do see for both
contrib/gcc-changelog/git_check_commit.py -v -p 
da73261ce7731be7f2b164f1db796878cdc23365 and 
contrib/gcc-changelog/git_email.py 
0001-Revert-Revert-Enable-prange-support.patch the error above. - And I 
do not understand why it made it past the commit check but now fails?

Likewise for8057f9aa1f7e70490064de796d7a8d42d446caf8
Does the commit hook use an older version of the check scripts? Does it 
ignore the errors? Or what goes wrong here? Any idea? Tobias
From f56b1764f2b5c2c83c6852607405e5be0a763a2c Mon Sep 17 00:00:00 2001
From: Tobias Burnus 
Date: Sun, 19 May 2024 08:17:42 +0200
Subject: [PATCH] contrib/gcc-changelog/git_update_version.py: Add ignore
 commit, improve diagnostic

contrib/ChangeLog:

* gcc-changelog/git_update_version.py (IGNORED_COMMITS): Add
	cfceb070e2aea3cef9bd1f50d8d030c51449f45b.
	(prepend_to_changelog_files): Output git hash in case of error.

diff --git a/contrib/gcc-changelog/git_update_version.py b/contrib/gcc-changelog/git_update_version.py
index 24f6c43d0b2..ec0151b83fe 100755
--- a/contrib/gcc-changelog/git_update_version.py
+++ b/contrib/gcc-changelog/git_update_version.py
@@ -41,7 +41,8 @@ IGNORED_COMMITS = (
 '040e5b0edbca861196d9e2ea2af5e805769c8d5d',
 '8057f9aa1f7e70490064de796d7a8d42d446caf8',
 '109f1b28fc94c93096506e3df0c25e331cef19d0',
-'39f81924d88e3cc197fc3df74204c9b5e01e12f7')
+'39f81924d88e3cc197fc3df74204c9b5e01e12f7',
+'da73261ce7731be7f2b164f1db796878cdc23365')
 
 FORMAT = '%(asctime)s:%(levelname)s:%(name)s:%(message)s'
 logging.basicConfig(level=logging.INFO, format=FORMAT,
@@ -58,6 +59,7 @@ def read_timestamp(path):
 
 def prepend_to_changelog_files(repo, folder, git_commit, add_to_git):
 if not git_commit.success:
+logging.info(f"While processing {git_commit.info.hexsha}:")
 for error in git_commit.errors:
 logging.info(error)
 raise AssertionError()
-- 
2.45.0

[wwwdocs,committed] projects/gomp: Update doc links for GCC 14

2024-05-14 Thread Tobias Burnus


Minor update – to include GCC 14 and update mainline to 15.

I also replaced the doc links to the latest release; shouldn't matter 
for the status but it is nicer nonetheless.


Tobias
commit 6d76756d2070040c35e7991a626805a736edea1d
Author: Tobias Burnus 
Date:   Tue May 14 09:34:47 2024 +0200

projects/gomp: Update doc links for GCC 14

And link to latest GCC 12 + 13 release version

diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 05b81f1e..94bda5ff 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -144,10 +144,12 @@ filing a bug report.
 
 Implementation status in libgomp manual:
 https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Implementation-Status.html;
->Mainline (GCC 14),
-https://gcc.gnu.org/onlinedocs/gcc-13.1.0/libgomp/OpenMP-Implementation-Status.html;
+>Mainline (GCC 15),
+https://gcc.gnu.org/onlinedocs/gcc-14.1.0/libgomp/OpenMP-Implementation-Status.html;
+>GCC 14,
+https://gcc.gnu.org/onlinedocs/gcc-13.2.0/libgomp/OpenMP-Implementation-Status.html;
 >GCC 13,
-https://gcc.gnu.org/onlinedocs/gcc-12.1.0/libgomp/OpenMP-Implementation-Status.html;
+https://gcc.gnu.org/onlinedocs/gcc-12.3.0/libgomp/OpenMP-Implementation-Status.html;
 >GCC 12.
 
 Disclaimer: A feature might be only fully supported in a later GCC version

gcc-wwwdocs branch master updated. 6d76756d2070040c35e7991a626805a736edea1d

2024-05-14 Thread Tobias Burnus via Gcc-cvs-wwwdocs

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  6d76756d2070040c35e7991a626805a736edea1d (commit)
  from  de51d0fe7b7f29ce6037224f33a3d82281aac88e (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit 6d76756d2070040c35e7991a626805a736edea1d
Author: Tobias Burnus 
Date:   Tue May 14 09:34:47 2024 +0200

projects/gomp: Update doc links for GCC 14

And link to latest GCC 12 + 13 release version

diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index 05b81f1e..94bda5ff 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -144,10 +144,12 @@ filing a bug report.
 
 Implementation status in libgomp manual:
 https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-Implementation-Status.html;
->Mainline (GCC 14),
-https://gcc.gnu.org/onlinedocs/gcc-13.1.0/libgomp/OpenMP-Implementation-Status.html;
+>Mainline (GCC 15),
+https://gcc.gnu.org/onlinedocs/gcc-14.1.0/libgomp/OpenMP-Implementation-Status.html;
+>GCC 14,
+https://gcc.gnu.org/onlinedocs/gcc-13.2.0/libgomp/OpenMP-Implementation-Status.html;
 >GCC 13,
-https://gcc.gnu.org/onlinedocs/gcc-12.1.0/libgomp/OpenMP-Implementation-Status.html;
+https://gcc.gnu.org/onlinedocs/gcc-12.3.0/libgomp/OpenMP-Implementation-Status.html;
 >GCC 12.
 
 Disclaimer: A feature might be only fully supported in a later GCC version

---

Summary of changes:
 htdocs/projects/gomp/index.html | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)


hooks/post-receive
-- 
gcc-wwwdocs

[patch] [gcn][nvptx] Add warning to mkoffload for 32bit host code

2024-04-25 Thread Tobias Burnus


Motivated by a surprise of a colleague that with -m32,
no offload dumps were created; that's because mkoffload
does not process host binaries when the are 32bit (i.e. ilp32).

Internally, that done as follows: The host compiler passes to
'mkoffload' the used host ABI, i.e. -foffload-abi=ilp32 or -foffload-abi=lp64

That's done via TARGET_OFFLOAD_OPTIONS, which is supported by aarch64, i386, 
and rs6000.

While it is sensible (albeit not strictly required) that GCC requires that
the host and device side agree and that only 64bit is implemented for the
device side, it can be confusing that silently no offloading code is generated.


Hence, I propose to print a warning in that case - as implemented in the 
attached patch:

$ gcc -fopenmp -m32 test.c
nvptx mkoffload: warning: offload code generation skipped: offloading with 
32-bit host code is currently not supported
gcn mkoffload: warning: offload code generation skipped: offloading with 32-bit 
host code is currently not supported

* * *

This shouldn't have any effect on offload builds using -m64
and non-offload builds – while several testcases already have
issues with '-m32' when offloading is enabled or an offloading
device is available.

To make it not worse, this patch adds some pruning and for
a subset of the failing testcases, I added code to avoids FAILS.
There are some more fails, but those aren't new.

Comments, remarks, suggestions?
Is the mkoffload.cc part is okay?

Tobias
[gcn][nvptx] Add warning to mkoffload for 32bit host code

mkoffload in principle handles 32bit and 64bit offload targets,
but 32bit support has no been implemented.  Before this patch,
offloading is then silently disabled for the respective target.

With the patch, the user gets a warning by mkoffload (and the
programm continues to be build with out offloading code).

gcc/ChangeLog:

	* config/gcn/mkoffload.cc (main): Warn for -foffload-abi=ilp32
	that no offload code will be generated.
	* config/nvptx/mkoffload.cc (main): Likewise.

libgomp/ChangeLog:

	* testsuite/lib/libgomp-dg.exp (libgomp-dg-prune): Prune warning
	by mkoffload that 32-bit offloading is not supported.
	* testsuite/libgomp.c-c++-common/requires-1.c: Silence a FAIL for
	'ia32' targets as for them no offload code is generated.
	* testsuite/libgomp.c-c++-common/requires-3.c: Likewise.
	* testsuite/libgomp.c-c++-common/requires-7.c: Likewise.
	* testsuite/libgomp.c-c++-common/variable-not-offloaded.c: Likewise.
	* testsuite/libgomp.fortran/requires-1.f90: Likewise.

 gcc/config/gcn/mkoffload.cc|  5 -
 gcc/config/nvptx/mkoffload.cc  |  5 -
 libgomp/testsuite/lib/libgomp-dg.exp   |  3 +++
 libgomp/testsuite/libgomp.c-c++-common/requires-1.c|  8 +---
 libgomp/testsuite/libgomp.c-c++-common/requires-3.c|  8 +---
 libgomp/testsuite/libgomp.c-c++-common/requires-7.c| 10 ++
 .../testsuite/libgomp.c-c++-common/variable-not-offloaded.c|  4 ++--
 libgomp/testsuite/libgomp.fortran/requires-1.f90   |  8 +---
 8 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/gcc/config/gcn/mkoffload.cc b/gcc/config/gcn/mkoffload.cc
index 9a438de331a..c37c269d4d2 100644
--- a/gcc/config/gcn/mkoffload.cc
+++ b/gcc/config/gcn/mkoffload.cc
@@ -1143,7 +1143,10 @@ main (int argc, char **argv)
 fatal_error (input_location, "cannot open %qs", gcn_cfile_name);
 
   /* Currently, we only support offloading in 64-bit configurations.  */
-  if (offload_abi == OFFLOAD_ABI_LP64)
+  if (offload_abi == OFFLOAD_ABI_ILP32)
+warning (0, "offload code generation skipped: offloading with 32-bit host "
+		"code is currently not supported");
+  else if (offload_abi == OFFLOAD_ABI_LP64)
 {
   const char *mko_dumpbase = concat (dumppfx, ".mkoffload", NULL);
   const char *hsaco_dumpbase = concat (dumppfx, ".mkoffload.hsaco", NULL);
diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index 503b1abcefd..a7ff32cf8bd 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -798,7 +798,10 @@ main (int argc, char **argv)
 
   /* PR libgomp/65099: Currently, we only support offloading in 64-bit
  configurations.  */
-  if (offload_abi == OFFLOAD_ABI_LP64)
+  if (offload_abi == OFFLOAD_ABI_ILP32)
+warning (0, "offload code generation skipped: offloading with 32-bit host "
+		"code is currently not supported");
+  else if (offload_abi == OFFLOAD_ABI_LP64)
 {
   char *mko_dumpbase = concat (dumppfx, ".mkoffload", NULL);
   if (save_temps)
diff --git a/libgomp/testsuite/lib/libgomp-dg.exp b/libgomp/testsuite/lib/libgomp-dg.exp
index ebf78e17e6d..9c9a5f2ed4b 100644
--- a/libgomp/testsuite/lib/libgomp-dg.exp
+++ b/libgomp/testsuite/lib/libgomp-dg.exp
@@ -3,5 +3,8 @@ proc libgomp-dg-test { prog do_what extra_tool_flags } {
 }
 
 proc libgomp-dg-prune { system text } {
+global additional_prunes
+

Generated files in libgfortran for Fortran intrinsic procedures (was: Updated Sourceware infrastructure plans)

2024-04-18 Thread Tobias Burnus


Hi Janne,

Janne Blomqvist wrote:

back when I was active I did think about this
issue. IMHO the best of my ideas was to convert these into C++
templates.


I think this will work – but we have to be super careful:

With C++, there is the problem that we definitely do not want to add 
dependency on libstdc++ nor to use some features which require special 
hardware support (like exceptions [always bad], symbol aliases, ...). — 
On some systems, a full C++ support might be not available, like 
embedded systems (including some odd embedded OS) or offloading devices.


The libstdc++ dependency would be detected by linking as we currently 
do. For in-language features, we have to ensure the appropriate flags 
-fno-exceptions (and probably a few more). And it should be clear what 
language features to use.


If we do, I think that would surely be an option.


What we're essentially doing with the M4 stuff and the
proposed in-house Python reimplementation is to make up for lack of
monomorphization in plain old C. Rather than doing some DIY templates,
switch the implementation language to something which has that feature
built-in, in this case C++.  No need to convert the entire libgfortran
to C++ if you don't want to, just those objects that are generated
from the M4 templates. Something like

template
void matmul(T* a, T* b, T* c, ...)
{
// actual matmul code here
}

extern "C" {
   // Instantiate template for every type and export the symbol
   void matmul_r4(gfc_array_r4* a, gfc_array_r4* b, gfc_array_r4* c, ...)
   {
 matmul(a, b, c, ...);
   }
   // And so on for other types
}


Cheers,

Tobias

gcc-wwwdocs branch master updated. 794555052d5c1d9a92298aba1fc4b645042946dd

2024-04-16 Thread Tobias Burnus via Gcc-cvs-wwwdocs

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  794555052d5c1d9a92298aba1fc4b645042946dd (commit)
  from  c5e08294215518f00e9762cebe3d6f46f1f00526 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit 794555052d5c1d9a92298aba1fc4b645042946dd
Author: Tobias Burnus 
Date:   Tue Apr 16 09:57:57 2024 +0200

gcc-14/changes.html + projects/gomp/: Fix OpenMP/OpenACC changes 
section/anchor

In earlier release notes, OpenMP and OpenACC changes were under "New
Languages and Language specific improvements", either directly under that
section as in 4.2, 4.4, 4.7, 4.9, 5, 6 (+ c-family + Fortran), 10, 11, and 
12
or under a subsection in 4.5 (Fortran), 4.8 (C++), 7 (Fortran), 9 
(c-family).

In gcc-13, the OpenMP and OpenACC ended up by chance under "General
Improvements", which gcc-14 replicated.

This commit does not touch gcc-13 to avoid breaking links, but it corrects 
the
anchor used in the links to GCC 13 in projects/gomp/.

However, for GCC 14, it moves the OpenMP/OpenACC changes to the language
section.

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index b4c602a5..6035ae37 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -59,6 +59,75 @@ a work-in-progress.
 
 General Improvements
 
+
+  For offload-device code generated via OpenMP and OpenACC, the math
+  and the Fortran runtime libraries will now automatically be linked,
+  when the user or compiler links them on the host side. Thus, it is no
+  longer required to explicitly pass -lm and/or
+  -lgfortran to the offload-device linker using the https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-foffload-options;
+  >-foffload-options= flag.
+  
+  
+New configure options: --enable-host-pie, to build the
+compiler executables as PIE; and --enable-host-bind-now,
+to link the compiler executables with -Wl,-z,now in order
+to enable additional hardening.
+  
+  
+New option
+https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fhardened;>-fhardened,
+an umbrella option that enables a set of hardening flags.
+The options it enables can be displayed using the
+--help=hardened option.
+  
+  
+New option
+https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fharden-control-flow-redundancy;>-fharden-control-flow-redundancy,
+to verify, at the end of functions, that the visited basic blocks
+correspond to a legitimate execution path, so as to detect and
+prevent attacks that transfer control into the middle of
+functions.
+  
+  
+New type attribute
+https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-hardbool-type-attribute;>hardbool,
+for C and Ada.  Hardened
+booleans take user-specified representations for true
+and false, presumably with higher hamming distance
+than standard booleans, and get verified at every use, detecting
+memory corruption and some malicious attacks.
+  
+  
+New type attribute
+https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-strub-type-attribute;>strub
+to control stack scrubbing
+properties of functions and variables.  The stack frame used by
+functions marked with the attribute gets zeroed-out upon returning
+or exception escaping.  Scalar variables marked with the attribute
+cause functions contaning or accessing them to get stack scrubbing
+enabled implicitly.
+  
+  
+New option
+https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-finline-stringops;>-finline-stringops,
+to force inline
+expansion of memcmp, memcpy,
+memmove and memset, even when that is
+not an optimization, to avoid relying on library
+implementations.
+  
+  
+
+New function attribute
+https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-null_005fterminated_005fstring_005farg-function-attribute;>
 null_terminated_string_arg(PARAM_IDX)
+for indicating parameters that are expected to be null-terminated
+strings.
+  
+
+
+New Languages and Language specific improvements
+
 
   https://gcc.gnu.org/projects/gomp/;>OpenMP
   
@@ -136,73 +205,7 @@ a work-in-progress.
   acc_memcpy_from_device_async.
   
   
-  For offload-device code generated via OpenMP and OpenACC, the math
-  and the Fortran runtime libraries will now automatically be linked,
-  when the user or compiler links them on the host side. Thus, it is no
-  longer re

Re: [wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Tobias Burnus


Richard Biener wrote:

I do wonder whether hot-patching the ELF header from the libgomp plugin
with the actual micro-subarch would be possible to make the driver happy.


For completeness, there is also the possibility to play with an 
environment variable as in HSA_OVERRIDE_GFX_VERSION=9.0.0 or 
HSA_OVERRIDE_GFX_VERSION=11.0.0


Tobias

[wwwdocs] gcc-14/changes.html + projects/gomp/: Fix OpenMP/OpenACC changes section/anchor

2024-04-15 Thread Tobias Burnus

When clicking on the GCC..1x links at 
https://gcc.gnu.org/projects/gomp/#omp5.0 , I noticed that the GCC 13 
and 14 links did not link to the OpenMP changes.


It turned out that in GCC 12 and before (see commit message for 
details), the OpenMP and OpenACC changes are under "New Languages and 
Language-Specific Improvements" – while for GCC 13 and 14 they are under 
"General Improvements"


Example: GCC 12 – https://gcc.gnu.org/gcc-12/changes.html#languages 
(directly under  and before the first  entry ["Ada"]).


GCC 13: https://gcc.gnu.org/gcc-13/changes.html#general

The attached patch keeps GCC 13 for backward compatibility but moves 
them for GCC 14 "back" to languages.


To fix the links at projects/gomp/, it therefore it updates the page 
anchors to 'general'.


* * *

Comments or remarks?

Tobias
gcc-14/changes.html + projects/gomp/: Fix OpenMP/OpenACC changes section/anchor

In earlier release notes, OpenMP and OpenACC changes were under "New
Languages and Language specific improvements", either directly under that
section as in 4.2, 4.4, 4.7, 4.9, 5, 6 (+ c-family + Fortran), 10, 11, and 12
or under a subsection in 4.5 (Fortran), 4.8 (C++), 7 (Fortran), 9 (c-family).

In gcc-13, the OpenMP and OpenACC ended up by chance under "General
Improvements", which gcc-14 replicated.

This commit does not touch gcc-13 to avoid breaking links, but it corrects the
anchor used in the links to GCC 13 in projects/gomp/.

However, for GCC 14, it moves the OpenMP/OpenACC changes to the language
section.

 htdocs/gcc-14/changes.html  | 135 
 htdocs/projects/gomp/index.html |  44 ++---
 2 files changed, 91 insertions(+), 88 deletions(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index b4c602a5..6035ae37 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -59,6 +59,75 @@ a work-in-progress.
 
 General Improvements
 
+
+  For offload-device code generated via OpenMP and OpenACC, the math
+  and the Fortran runtime libraries will now automatically be linked,
+  when the user or compiler links them on the host side. Thus, it is no
+  longer required to explicitly pass -lm and/or
+  -lgfortran to the offload-device linker using the https://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html#index-foffload-options;
+  >-foffload-options= flag.
+  
+  
+New configure options: --enable-host-pie, to build the
+compiler executables as PIE; and --enable-host-bind-now,
+to link the compiler executables with -Wl,-z,now in order
+to enable additional hardening.
+  
+  
+New option
+https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fhardened;>-fhardened,
+an umbrella option that enables a set of hardening flags.
+The options it enables can be displayed using the
+--help=hardened option.
+  
+  
+New option
+https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html#index-fharden-control-flow-redundancy;>-fharden-control-flow-redundancy,
+to verify, at the end of functions, that the visited basic blocks
+correspond to a legitimate execution path, so as to detect and
+prevent attacks that transfer control into the middle of
+functions.
+  
+  
+New type attribute
+https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-hardbool-type-attribute;>hardbool,
+for C and Ada.  Hardened
+booleans take user-specified representations for true
+and false, presumably with higher hamming distance
+than standard booleans, and get verified at every use, detecting
+memory corruption and some malicious attacks.
+  
+  
+New type attribute
+https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-strub-type-attribute;>strub
+to control stack scrubbing
+properties of functions and variables.  The stack frame used by
+functions marked with the attribute gets zeroed-out upon returning
+or exception escaping.  Scalar variables marked with the attribute
+cause functions contaning or accessing them to get stack scrubbing
+enabled implicitly.
+  
+  
+New option
+https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-finline-stringops;>-finline-stringops,
+to force inline
+expansion of memcmp, memcpy,
+memmove and memset, even when that is
+not an optimization, to avoid relying on library
+implementations.
+  
+  
+
+New function attribute
+https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-null_005fterminated_005fstring_005farg-function-attribute;> null_terminated_string_arg(PARAM_IDX)
+for indicating parameters that are expected to be null-terminated
+strings.
+  
+
+
+New Languages and Language specific improvements
+
 
   https://gcc.gnu.org/projects/gomp/;>OpenMP
   
@@ -136,73 +205,7 @@ a work-in-progress.
   acc_memcpy_from_device_async.
   
   
-  For offload-device code

gcc-wwwdocs branch master updated. c5e08294215518f00e9762cebe3d6f46f1f00526

2024-04-15 Thread Tobias Burnus via Gcc-cvs-wwwdocs

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  c5e08294215518f00e9762cebe3d6f46f1f00526 (commit)
  from  d18a80a52a7ec2edd7ef9a583d8920d61c0b48e5 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit c5e08294215518f00e9762cebe3d6f46f1f00526
Author: Tobias Burnus 
Date:   Mon Apr 15 13:16:36 2024 +0200

gcc-14/changes.html (AMD GCN): Mention gfx1036 support

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 8ac08e9a..b4c602a5 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -623,8 +623,9 @@ a work-in-progress.
 AMD Radeon (GCN)
 
 
-  Initial support for the AMD Radeon gfx1030 (RDNA2),
-gfx1100 and gfx1103 (RDNA3) devices has been
+  Initial support for the AMD Radeon gfx1030,
+gfx1036 (RDNA2), gfx1100 and
+gfx1103 (RDNA3) devices has been
 added. LLVM 15+ (assembler and linker) is https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa;>required
 to support GFX11.

---

Summary of changes:
 htdocs/gcc-14/changes.html | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


hooks/post-receive
-- 
gcc-wwwdocs

[wwwdocs] gcc-14/changes.html (AMD GCN): Mention gfx1036 support

2024-04-15 Thread Tobias Burnus

I experimented with some variants to make clearer that each of RDNA2 and 
RNDA3 applies to two card types, but at the end I settled on the 
fewest-word version.


Comments, remarks, suggestions? (To this change or in general?)

Current version: https://gcc.gnu.org/gcc-14/changes.html#amdgcn

Compiler flags, listing the the gfx* cards: 
https://gcc.gnu.org/onlinedocs/gcc/AMD-GCN-Options.html


Tobias

PS: On the compiler side, I am looking forward to a .def file which 
reduces the number of files to change when adding a new gfx* card, given 
that we have doubled the number of entries. [Well, 1 missing but I know 
of one WIP addition.]
gcc-14/changes.html (AMD GCN): Mention gfx1036 support

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 8ac08e9a..b4c602a5 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -623,8 +623,9 @@ a work-in-progress.
 AMD Radeon (GCN)
 
 
-  Initial support for the AMD Radeon gfx1030 (RDNA2),
-gfx1100 and gfx1103 (RDNA3) devices has been
+  Initial support for the AMD Radeon gfx1030,
+gfx1036 (RDNA2), gfx1100 and
+gfx1103 (RDNA3) devices has been
 added. LLVM 15+ (assembler and linker) is https://gcc.gnu.org/install/specific.html#amdgcn-x-amdhsa;>required
 to support GFX11.

[gcc r14-9843] Fortran: Accept again tab as alternative to space as separator [PR114304]

2024-04-08 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:477c8a82f38e353a8c6313b38197c70b12deea80

commit r14-9843-g477c8a82f38e353a8c6313b38197c70b12deea80
Author: Tobias Burnus 
Date:   Mon Apr 8 21:47:51 2024 +0200

Fortran: Accept again tab as alternative to space as separator [PR114304]

This fixes a side-effect of/regression caused by r14-9822-g93adf88cc6744a,
which was for the same PR.

PR libfortran/114304

libgfortran/ChangeLog:

* io/list_read.c (eat_separator): Accept tab as alternative to 
space.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr114304-2.f90: New test.

Diff:
---
 gcc/testsuite/gfortran.dg/pr114304-2.f90 | 82 
 libgfortran/io/list_read.c   |  2 +-
 2 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/pr114304-2.f90 
b/gcc/testsuite/gfortran.dg/pr114304-2.f90
new file mode 100644
index 000..5ef5874f528
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr114304-2.f90
@@ -0,0 +1,82 @@
+! { dg-do run }
+!
+! PR fortran/114304
+!
+! Ensure that '\t' (tab) is supported as separator in list-directed input
+! While not really standard conform, this is widely used in user input and
+! widely supported.
+!
+
+use iso_c_binding
+implicit none
+character(len=*,kind=c_char), parameter :: tab = C_HORIZONTAL_TAB
+
+! Accept '' as variant to ' ' as separator
+! Check that  and  are handled
+
+character(len=*,kind=c_char), parameter :: nml_str &
+   = ''//C_CARRIAGE_RETURN // C_NEW_LINE // &
+ 'first'//tab//'='//tab//' .true.'// C_NEW_LINE // &
+ ' , other'//tab//' ='//tab//'3'//tab//', 2'//tab//'/'
+
+! Check that  is handled,
+
+! Note: For new line, Unix uses \n, Windows \r\n but old Apple systems used 
'\r'
+!
+! Gfortran does not seem to support all \r, but the following is supported
+! since ages, ! which seems to be a gfortran extension as ifort and flang 
don't like it.
+
+character(len=*,kind=c_char), parameter :: nml_str2 &
+   = ''//C_CARRIAGE_RETURN // C_NEW_LINE // &
+ 'first'//C_NEW_LINE//'='//tab//' .true.'// C_CARRIAGE_RETURN // &
+ ' , other'//tab//' ='//tab//'3'//tab//', 2'//tab//'/'
+
+character(len=*,kind=c_char), parameter :: str &
+   = tab//'1'//tab//'2,'//tab//'3'//tab//',4'//tab//','//tab//'5'//tab//'/'
+character(len=*,kind=c_char), parameter :: str2 &
+   = tab//'1'//tab//'2;'//tab//'3'//tab//';4'//tab//';'//tab//'5'//tab//'/'
+logical :: first
+integer :: other(4)
+integer :: ints(6)
+namelist /inparm/ first , other
+
+other = 1
+
+open(99, file="test.inp")
+write(99, '(a)') nml_str
+rewind(99)
+read(99,nml=inparm)
+close(99, status="delete")
+
+if (.not.first .or. any (other /= [3,2,1,1])) stop 1
+
+other = 9
+
+open(99, file="test.inp")
+write(99, '(a)') nml_str2
+rewind(99)
+read(99,nml=inparm)
+close(99, status="delete")
+
+if (.not.first .or. any (other /= [3,2,9,9])) stop 2
+
+ints = 66
+
+open(99, file="test.inp", decimal='point')
+write(99, '(a)') str
+rewind(99)
+read(99,*) ints
+close(99, status="delete")
+
+if (any (ints /= [1,2,3,4,5,66])) stop 3
+
+ints = 77 
+
+open(99, file="test.inp", decimal='comma')
+write(99, '(a)') str2
+rewind(99)
+read(99,*) ints
+close(99, status="delete")
+
+if (any (ints /= [1,2,3,4,5,77])) stop 4
+end
diff --git a/libgfortran/io/list_read.c b/libgfortran/io/list_read.c
index b56f2a4e6d6..5bbbef26c26 100644
--- a/libgfortran/io/list_read.c
+++ b/libgfortran/io/list_read.c
@@ -463,7 +463,7 @@ eat_separator (st_parameter_dt *dtp)
 
   dtp->u.p.comma_flag = 0;
   c = next_char (dtp);
-  if (c == ' ')
+  if (c == ' ' || c == '\t')
 {
   eat_spaces (dtp);
   c = next_char (dtp);

[Patch] Fortran: List-directed read - accept again tab as alternative to space as separator [PR114304] (was: [patch, libgfortran] PR114304 - [13/14 Regression] libgfortran I/O – bogus "Semicolon not a

2024-04-08 Thread Tobias Burnus


Jerry D wrote:

See attached updated patch.


It turned rather quickly out that this patch – committed as 
r14-9822-g93adf88cc6744a – caused regressions.


Namely, real-world code use tab(s) as separator instead of spaces.

[For instance, PR114304 which contains a named-list input file from SPEC 
CPU 2017; that example uses tabs before the '=' sign, but the issue is 
more generic.]


I think the ISO Fortran standard only permits spaces, but as it feels 
natural and is widely supported, tabs are used and should remain supported.


It is not quite clear how '\r' are or should be handled, but as 
eat_spaces did use it, I thought I would add one testcase using them as 
well.


That test is not affected by my change; it did work before with GCC and 
still does – but it does fail with ifort/ifx/flang. I have not thought 
deeply whether it should be supported or not – and looking at the 
libgfortran source file, it often but (→ testcase) not consistently 
requires that an \n follows the \r.


OK for mainline? [And: When the previous patch gets backported, this 
surely needs to be included as well.]


Tobias
Fortran: Accept again tab as alternative to space as separator [PR114304]

This fixes a side-effect of/regression caused by r14-9822-g93adf88cc6744a,
which was for the same PR.

	PR libfortran/114304

libgfortran/ChangeLog:

	* io/list_read.c (eat_separator): Accept tab as alternative to space.

gcc/testsuite/ChangeLog:

	* gfortran.dg/pr114304-2.f90: New test.

 gcc/testsuite/gfortran.dg/pr114304-2.f90 | 82 
 libgfortran/io/list_read.c   |  2 +-
 2 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gfortran.dg/pr114304-2.f90 b/gcc/testsuite/gfortran.dg/pr114304-2.f90
new file mode 100644
index 000..5ef5874f528
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr114304-2.f90
@@ -0,0 +1,82 @@
+! { dg-do run }
+!
+! PR fortran/114304
+!
+! Ensure that '\t' (tab) is supported as separator in list-directed input
+! While not really standard conform, this is widely used in user input and
+! widely supported.
+!
+
+use iso_c_binding
+implicit none
+character(len=*,kind=c_char), parameter :: tab = C_HORIZONTAL_TAB
+
+! Accept '' as variant to ' ' as separator
+! Check that  and  are handled
+
+character(len=*,kind=c_char), parameter :: nml_str &
+   = ''//C_CARRIAGE_RETURN // C_NEW_LINE // &
+ 'first'//tab//'='//tab//' .true.'// C_NEW_LINE // &
+ ' , other'//tab//' ='//tab//'3'//tab//', 2'//tab//'/'
+
+! Check that  is handled,
+
+! Note: For new line, Unix uses \n, Windows \r\n but old Apple systems used '\r'
+!
+! Gfortran does not seem to support all \r, but the following is supported
+! since ages, ! which seems to be a gfortran extension as ifort and flang don't like it.
+
+character(len=*,kind=c_char), parameter :: nml_str2 &
+   = ''//C_CARRIAGE_RETURN // C_NEW_LINE // &
+ 'first'//C_NEW_LINE//'='//tab//' .true.'// C_CARRIAGE_RETURN // &
+ ' , other'//tab//' ='//tab//'3'//tab//', 2'//tab//'/'
+
+character(len=*,kind=c_char), parameter :: str &
+   = tab//'1'//tab//'2,'//tab//'3'//tab//',4'//tab//','//tab//'5'//tab//'/'
+character(len=*,kind=c_char), parameter :: str2 &
+   = tab//'1'//tab//'2;'//tab//'3'//tab//';4'//tab//';'//tab//'5'//tab//'/'
+logical :: first
+integer :: other(4)
+integer :: ints(6)
+namelist /inparm/ first , other
+
+other = 1
+
+open(99, file="test.inp")
+write(99, '(a)') nml_str
+rewind(99)
+read(99,nml=inparm)
+close(99, status="delete")
+
+if (.not.first .or. any (other /= [3,2,1,1])) stop 1
+
+other = 9
+
+open(99, file="test.inp")
+write(99, '(a)') nml_str2
+rewind(99)
+read(99,nml=inparm)
+close(99, status="delete")
+
+if (.not.first .or. any (other /= [3,2,9,9])) stop 2
+
+ints = 66
+
+open(99, file="test.inp", decimal='point')
+write(99, '(a)') str
+rewind(99)
+read(99,*) ints
+close(99, status="delete")
+
+if (any (ints /= [1,2,3,4,5,66])) stop 3
+
+ints = 77 
+
+open(99, file="test.inp", decimal='comma')
+write(99, '(a)') str2
+rewind(99)
+read(99,*) ints
+close(99, status="delete")
+
+if (any (ints /= [1,2,3,4,5,77])) stop 4
+end
diff --git a/libgfortran/io/list_read.c b/libgfortran/io/list_read.c
index b56f2a4e6d6..5bbbef26c26 100644
--- a/libgfortran/io/list_read.c
+++ b/libgfortran/io/list_read.c
@@ -463,7 +463,7 @@ eat_separator (st_parameter_dt *dtp)
 
   dtp->u.p.comma_flag = 0;
   c = next_char (dtp);
-  if (c == ' ')
+  if (c == ' ' || c == '\t')
 {
   eat_spaces (dtp);
   c = next_char (dtp);

Re: [patch, libgfortran] PR114304 - [13/14 Regression] libgfortran I/O – bogus "Semicolon not allowed as separator with DECIMAL='point'"

2024-04-05 Thread Tobias Burnus


Hi Jerry, hello world,

Jerry D wrote:

On 4/5/24 10:47 AM, Jerry D wrote:

On 4/4/24 2:41 PM, Tobias Burnus wrote:
I think for the current testcases, I like the patch – the question 
is only what's about:

   ',3' as input for 'comma'   (or '.3' as input for 'point')
[...]
But for 'comma': [...]
* GCC with your patch: Same result: ios != 0 and nothing read.

Expected: [...] read-in value is 0.3. [...]



See attached updated patch.
Regressions tested on x86-64. OK for trunk and 13 after a bit.


OK. Thanks for the patch!

Tobias

gcc-wwwdocs branch master updated. 8765e9c73ae14cfad592b8a3885fe1bcc3ff96cd

2024-04-05 Thread Tobias Burnus via Gcc-cvs-wwwdocs

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  8765e9c73ae14cfad592b8a3885fe1bcc3ff96cd (commit)
   via  62e1ccdc5b71b7fa9162c336c0964d13c6fa5c79 (commit)
  from  c9e275660a19c804dd8c591c73cb9b169a9d7573 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit 8765e9c73ae14cfad592b8a3885fe1bcc3ff96cd
Author: Tobias Burnus 
Date:   Fri Apr 5 11:58:56 2024 +0200

gcc-14/changes.html: Mention OpenACC 2.7's 'readonly' modifier

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 6ddd2788..2d8968cf 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -121,7 +121,9 @@ a work-in-progress.
   
 OpenACC 2.7: The self clause was added to be used on
   compute constructs and the default clause for data
-  constructs.
+  constructs. Additionally, the readonly modifier is now
+  handled in the copyin clause and cache
+  directive.
 OpenACC 3.2: The following API routines are now available in
   Fortran using the openacc module or the
   openacc_lib.h header file:

commit 62e1ccdc5b71b7fa9162c336c0964d13c6fa5c79
Author: Tobias Burnus 
Date:   Fri Apr 5 11:58:06 2024 +0200

gcc-14/changes.html: Comment out  of empty sections

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 1cc68430..6ddd2788 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -748,7 +748,7 @@ __asm (".global __flmap_lock"  "\n\t"
 
 
 
-Operating Systems
+
 
 
 
@@ -994,7 +994,7 @@ it emits:
 
 
 
-Other significant improvements
+
 
 
 

---

Summary of changes:
 htdocs/gcc-14/changes.html | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)


hooks/post-receive
-- 
gcc-wwwdocs

Re: [patch, libgfortran] PR114304 - [13/14 Regression] libgfortran I/O – bogus "Semicolon not allowed as separator with DECIMAL='point'"

2024-04-04 Thread Tobias Burnus


Hi Jerry,

I think for the current testcases, I like the patch – the question is 
only what's about:


  ',3' as input for 'comma'   (or '.3' as input for 'point')

For 'point' – 0.3 is read and ios = 0 (as expected)
But for 'comma':
* GCC 12 reads nothing and has ios = 0.
* GCC 13/mainline has an error (ios != 0 – and reads nothing)
* GCC with your patch: Same result: ios != 0 and nothing read.

Expected: Same as with ','/'comma' – namely: read-in value is 0.3.
→ https://godbolt.org/z/4rc8fz4sT for the full example, which works with 
ifort, ifx and flang


* * *

Can you check and fix this? It looks perfectly valid to me to have 
remove the '0' in the floating point numbers '0.3' or '0,3' seems to be 
permitted – and it works for '.' (with 'point') but not for ',' (with 
'comma').


F2023's "13.10.3.1 List-directed input forms" refers to "13.7.2.3.2 F 
editing", which states:


"The standard form of the input field [...] The form of the mantissa is 
an optional sign, followed by a string of one or more digits optionally 
containing a decimal symbol."


The latter does not require that the digit has to be before the decimal 
sign and as for output, it is optional, it is surely intended that ",3" 
is a valid floating-point number for decimal='comma'.


* * *

I extended the testcase to check for this – see attached diff. All 
'point' work, all 'comma' fail.


Thanks for working on this!

Tobiasdiff --git a/gcc/testsuite/gfortran.dg/pr114304.f90 b/gcc/testsuite/gfortran.dg/pr114304.f90
index 8344a9ea857..2bcf9bc7f57 100644
--- a/gcc/testsuite/gfortran.dg/pr114304.f90
+++ b/gcc/testsuite/gfortran.dg/pr114304.f90
@@ -70,7 +70,25 @@
   call t(.true.,  'point', '4,4 ,', .true.)
   call t(.true.,  'comma', '4;4 ;', .true.)
   call t(.true.,  'point', '4,4 ;', .true.)
+
+  call t2('comma', ',2')
+  call t2('point', '.2')
+  call t2('comma', ',2;')
+  call t2('point', '.2,')
+  call t2('comma', ',2 ,')
+  call t2('point', '.2 .')
 contains
+subroutine t2(dec, testinput)
+  character(*) :: dec, testinput
+  integer ios
+  real :: r
+  r = 42
+  read(testinput,*,decimal=dec,iostat=ios) r
+  if (ios /= 0 .or.  abs(r - 0.2) > epsilon(r)) then
+print '(*(g0))', dec, ', testinput = "',testinput,'"',', r=',r,' ios=',ios
+stop 3 
+  end if
+end
 subroutine t(valid, dec, testinput, isreal)
   logical, value :: valid
   character(len=*) :: dec, testinput

[wwwdocs] gcc-14/changes.html: Comment out of empty sections

2024-04-04 Thread Tobias Burnus

I find it confusing to see multiple  in a row without content. 
Actually, both have  as content, but those are commented out as 
actual news is missing ...


See https://gcc.gnu.org/gcc-14/changes.html and see the last entry at 
the bottom of the page and "Operating Systems" somewhere in between.


And comment, remark or suggestion before I commit this?

Tobias
gcc-14/changes.html: Comment out  of empty sections

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 1cc68430..6ddd2788 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -748,7 +748,7 @@ __asm (".global __flmap_lock"  "\n\t"
 
 
 
-Operating Systems
+
 
 
 
@@ -994,7 +994,7 @@ it emits:
 
 
 
-Other significant improvements
+

[wwwdocs] gcc-14/changes.html: Mention OpenACC 2.7's 'readonly' modifier

2024-04-04 Thread Tobias Burnus


Minor OpenACC 2.7 update to https://gcc.gnu.org/gcc-14/changes.html#openacc

The 'readonly' modifier is now in (well, since March), albeit more 2.7 
features are in the pipeline...


Comments, remarks, suggestions before I commit it?

Tobias
gcc-14/changes.html: Mention OpenACC 2.7's 'readonly' modifier

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 045893cf..58f153ec 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -121,7 +121,9 @@ a work-in-progress.
   
 OpenACC 2.7: The self clause was added to be used on
   compute constructs and the default clause for data
-  constructs.
+  constructs. Additionally, the readonly modifier is now
+  handled in the copyin clause and cache
+  directive.
 OpenACC 3.2: The following API routines are now available in
   Fortran using the openacc module or the
   openacc_lib.h header file:

[wwwdocs,committed] gcc-14/changes.html: Fix HTML syntax

2024-04-04 Thread Tobias Burnus


Found when testing my own change via https://validator.w3.org/nu/#file

Committed as obvious.

Tobias
commit c9e275660a19c804dd8c591c73cb9b169a9d7573
Author: Tobias Burnus 
Date:   Thu Apr 4 22:07:28 2024 +0200

gcc-14/changes.html: Fix HTML syntax

W3.org's HTML checker complained about missing  and
about ... within a ... (or rather: it complained about
the unexpected '').
---
 htdocs/gcc-14/changes.html | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 045893cf..1cc68430 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -861,7 +861,7 @@ __asm (".global __flmap_lock"  "\n\t"
   
 
 The analyzer now makes use of the function attribute
-https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute;>alloc_size
+https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute;>alloc_size
 allowing
 https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#index-fanalyzer;>-fanalyzer
 to emit
@@ -887,7 +887,7 @@ __asm (".global __flmap_lock"  "\n\t"
   
   
 
-The warning
+  The warning
   https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#index-Wanalyzer-out-of-bounds;>-Wanalyzer-out-of-bounds
   has been extended so that, where possible, it will emit a text-based
   diagram visualizing the spatial relationship between
@@ -899,9 +899,9 @@ __asm (".global __flmap_lock"  "\n\t"
   whether they overlap, are touching, are close or far apart;
   which one is before or after in memory, the relative sizes involved,
   the direction of the access (read vs write), and, in some cases,
-  the values of data involved.
+  the values of data involved.
 Such "text art" diagrams can be controlled (or suppressed) via a new
-  https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-text-art-charset;>-fdiagnostics-text-art-charset= option.
+  https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-text-art-charset;>-fdiagnostics-text-art-charset= option.
 For example, given the out-of-bounds write in strcat in:
   
 
@@ -953,17 +953,17 @@ it emits:
   
 
 The SARIF output from
-https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=
 now adds indentation and newlines to reflect the logical JSON structure of the data.  The previous compact behavior can be restored via the new option
-https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fno-diagnostics-json-formatting;>-fno-diagnostics-json-formatting.
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fno-diagnostics-json-formatting;>-fno-diagnostics-json-formatting.
 This also applies to the older output format named "json".
   
   
 
 If profiling information about the compiler itself is requested via
-https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-ftime-report;>-ftime-report,
+https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-ftime-report;>-ftime-report,
 and a SARIF output format is requested via
-https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=,
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=,
 then the timing and memory usage data is now written in JSON form into
 the SARIF output, rather than as plain text to stderr.

gcc-wwwdocs branch master updated. c9e275660a19c804dd8c591c73cb9b169a9d7573

2024-04-04 Thread Tobias Burnus via Gcc-cvs-wwwdocs

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  c9e275660a19c804dd8c591c73cb9b169a9d7573 (commit)
  from  6eeeb6a53c2e57e3f02f97da176589cf15877247 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit c9e275660a19c804dd8c591c73cb9b169a9d7573
Author: Tobias Burnus 
Date:   Thu Apr 4 22:07:28 2024 +0200

gcc-14/changes.html: Fix HTML syntax

W3.org's HTML checker complained about missing  and
about ... within a ... (or rather: it complained about
the unexpected '').

diff --git a/htdocs/gcc-14/changes.html b/htdocs/gcc-14/changes.html
index 045893cf..1cc68430 100644
--- a/htdocs/gcc-14/changes.html
+++ b/htdocs/gcc-14/changes.html
@@ -861,7 +861,7 @@ __asm (".global __flmap_lock"  "\n\t"
   
 
 The analyzer now makes use of the function attribute
-https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute;>alloc_size
+https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute;>alloc_size
 allowing
 https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#index-fanalyzer;>-fanalyzer
 to emit
@@ -887,7 +887,7 @@ __asm (".global __flmap_lock"  "\n\t"
   
   
 
-The warning
+  The warning
   https://gcc.gnu.org/onlinedocs/gcc/Static-Analyzer-Options.html#index-Wanalyzer-out-of-bounds;>-Wanalyzer-out-of-bounds
   has been extended so that, where possible, it will emit a text-based
   diagram visualizing the spatial relationship between
@@ -899,9 +899,9 @@ __asm (".global __flmap_lock"  "\n\t"
   whether they overlap, are touching, are close or far apart;
   which one is before or after in memory, the relative sizes involved,
   the direction of the access (read vs write), and, in some cases,
-  the values of data involved.
+  the values of data involved.
 Such "text art" diagrams can be controlled (or suppressed) via a new
-  https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-text-art-charset;>-fdiagnostics-text-art-charset=
 option.
+  https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-text-art-charset;>-fdiagnostics-text-art-charset=
 option.
 For example, given the out-of-bounds write in strcat in:
   
 
@@ -953,17 +953,17 @@ it emits:
   
 
 The SARIF output from
-https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=
 now adds indentation and newlines to reflect the logical JSON structure of 
the data.  The previous compact behavior can be restored via the new option
-https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fno-diagnostics-json-formatting;>-fno-diagnostics-json-formatting.
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fno-diagnostics-json-formatting;>-fno-diagnostics-json-formatting.
 This also applies to the older output format named "json".
   
   
 
 If profiling information about the compiler itself is requested via
-https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-ftime-report;>-ftime-report,
+https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html#index-ftime-report;>-ftime-report,
 and a SARIF output format is requested via
-https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=,
+https://gcc.gnu.org/onlinedocs/gcc/Diagnostic-Message-Formatting-Options.html#index-fdiagnostics-format;>-fdiagnostics-format=,
 then the timing and memory usage data is now written in JSON form into
 the SARIF output, rather than as plain text to stderr.
   

---

Summary of changes:
 htdocs/gcc-14/changes.html | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)


hooks/post-receive
-- 
gcc-wwwdocs

[gcc r14-9792] nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_libintl

2024-04-04 Thread Tobias Burnus via Gcc-cvs

https://gcc.gnu.org/g:7520a4992c94254016085a461c58c972497c4483

commit r14-9792-g7520a4992c94254016085a461c58c972497c4483
Author: Tobias Burnus 
Date:   Thu Apr 4 21:55:29 2024 +0200

nvptx: In mkoffload.cc, call diagnostic_color_init + gcc_init_libintl

gcc/ChangeLog:

* config/nvptx/mkoffload.cc (main): Call
gcc_init_libintl and diagnostic_color_init.

Diff:
---
 gcc/config/nvptx/mkoffload.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/nvptx/mkoffload.cc b/gcc/config/nvptx/mkoffload.cc
index a7fc28cbd3f..503b1abcefd 100644
--- a/gcc/config/nvptx/mkoffload.cc
+++ b/gcc/config/nvptx/mkoffload.cc
@@ -638,7 +638,9 @@ main (int argc, char **argv)
   const char *outname = 0;
 
   progname = tool_name;
+  gcc_init_libintl ();
   diagnostic_initialize (global_dc, 0);
+  diagnostic_color_init (global_dc);
 
   if (atexit (mkoffload_cleanup) != 0)
 fatal_error (input_location, "atexit failed");

gcc-wwwdocs branch master updated. 5355f9e63f8240f6a3753a6f9ae10133d0c34e38

2024-04-04 Thread Tobias Burnus via Gcc-cvs-wwwdocs

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gcc-wwwdocs".

The branch, master has been updated
   via  5355f9e63f8240f6a3753a6f9ae10133d0c34e38 (commit)
  from  501aef9bacc3842d0b7d022a4333c9d71d419d4d (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -
commit 5355f9e63f8240f6a3753a6f9ae10133d0c34e38
Author: Tobias Burnus 
Date:   Thu Apr 4 12:22:12 2024 +0200

projects/gomp/: Update 5.2 (fix misplaced GCC 14) and TR12 (new items) 
status

diff --git a/htdocs/projects/gomp/index.html b/htdocs/projects/gomp/index.html
index b8f11508..798efb21 100644
--- a/htdocs/projects/gomp/index.html
+++ b/htdocs/projects/gomp/index.html
@@ -846,7 +846,7 @@ than listed, depending on resolved corner cases and 
optimizations.
   
   
 declare mapper with iterator and present 
modifiers
-GCC14
+No
 
   
   
@@ -871,7 +871,7 @@ than listed, depending on resolved corner cases and 
optimizations.
   
   
 New allocators directive for Fortran
-No
+GCC14
 
   
   
@@ -1225,9 +1225,9 @@ error.
 
   
   
-coexecute directive for Fortran
+workdistribute directive for Fortran
 No
-
+Renamed just after TR12; added in TR12 as coexecute
   
   
 Fortran DO CONCURRENT as associated loop in a loop
@@ -1295,6 +1295,11 @@ error.
 No
 
   
+  
+Canonical loop nest enclosed in (multiple) curly braces (C/C++) or 
BLOCK constructs (Fortran)
+No
+
+  
   
 Relaxed Fortran restrictions to the aligned clause
 No

---

Summary of changes:
 htdocs/projects/gomp/index.html | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)


hooks/post-receive
-- 
gcc-wwwdocs

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3602 matches

Mail list logo