date:20210616

[PATCH] stor-layout: Create DECL_BIT_FIELD_REPRESENTATIVE even for bitfields in unions [PR101062]

2021-06-16 Thread Jakub Jelinek via Gcc-patches

Hi!

The following testcase is miscompiled on x86_64-linux, the bitfield store
is implemented as a RMW 64-bit operation at d+24 when the d variable has
size of only 28 bytes and scheduling moves in between the R and W part
a store to a different variable that happens to be right after the d
variable.

The reason for this is that we weren't creating
DECL_BIT_FIELD_REPRESENTATIVEs for bitfields in unions.

The following patch does create them, but treats all such bitfields as if
they were in a structure where the particular bitfield is the only field.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-06-16  Jakub Jelinek  

PR middle-end/101062
* stor-layout.c (finish_bitfield_representative): For fields in unions
assume nextf is always NULL.
(finish_bitfield_layout): Compute bit field representatives also in
unions, but handle it as if each bitfield was the only field in the
aggregate.

* gcc.dg/pr101062.c: New test.

--- gcc/stor-layout.c.jj2021-03-30 18:11:52.537092233 +0200
+++ gcc/stor-layout.c   2021-06-15 10:58:59.244353965 +0200
@@ -2072,9 +2072,14 @@ finish_bitfield_representative (tree rep
   bitsize = (bitsize + BITS_PER_UNIT - 1) & ~(BITS_PER_UNIT - 1);
 
   /* Now nothing tells us how to pad out bitsize ...  */
-  nextf = DECL_CHAIN (field);
-  while (nextf && TREE_CODE (nextf) != FIELD_DECL)
-nextf = DECL_CHAIN (nextf);
+  if (TREE_CODE (DECL_CONTEXT (field)) == RECORD_TYPE)
+{
+  nextf = DECL_CHAIN (field);
+  while (nextf && TREE_CODE (nextf) != FIELD_DECL)
+   nextf = DECL_CHAIN (nextf);
+}
+  else
+nextf = NULL_TREE;
   if (nextf)
 {
   tree maxsize;
@@ -2167,13 +2172,6 @@ finish_bitfield_layout (tree t)
   tree field, prev;
   tree repr = NULL_TREE;
 
-  /* Unions would be special, for the ease of type-punning optimizations
- we could use the underlying type as hint for the representative
- if the bitfield would fit and the representative would not exceed
- the union in size.  */
-  if (TREE_CODE (t) != RECORD_TYPE)
-return;
-
   for (prev = NULL_TREE, field = TYPE_FIELDS (t);
field; field = DECL_CHAIN (field))
 {
@@ -2233,7 +2231,13 @@ finish_bitfield_layout (tree t)
   if (repr)
DECL_BIT_FIELD_REPRESENTATIVE (field) = repr;
 
-  prev = field;
+  if (TREE_CODE (t) == RECORD_TYPE)
+   prev = field;
+  else if (repr)
+   {
+ finish_bitfield_representative (repr, field);
+ repr = NULL_TREE;
+   }
 }
 
   if (repr)
--- gcc/testsuite/gcc.dg/pr101062.c.jj  2021-06-15 10:42:58.642919880 +0200
+++ gcc/testsuite/gcc.dg/pr101062.c 2021-06-15 10:42:40.897171191 +0200
@@ -0,0 +1,29 @@
+/* PR middle-end/101062 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-toplevel-reorder -frename-registers" } */
+
+union U { signed b : 5; };
+int c;
+volatile union U d[7] = { { 8 } };
+short e = 1;
+
+__attribute__((noipa)) void
+foo ()
+{
+  d[6].b = 0;
+  d[6].b = 0;
+  d[6].b = 0;
+  d[6].b = 0;
+  d[6].b = 0;
+  e = 0;
+  c = 0;
+}
+
+int
+main ()
+{
+  foo ();
+  if (e != 0)
+__builtin_abort ();
+  return 0;
+}

Jakub

[PATCH] tree-optimization/101083 - fix ICE with SLP reassoc

2021-06-16 Thread Richard Biener

This makes us pass down the vector type for the two-operand
SLP node build rather than picking that from operand one which,
when constant or external, could be NULL.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-16  Richard Biener  

PR tree-optimization/101083
* tree-vect-slp.c (vect_slp_build_two_operator_nodes): Get
vectype as argument.
(vect_build_slp_tree_2): Adjust.

* gcc.dg/vect/pr97832-4.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr97832-4.c | 28 +++
 gcc/tree-vect-slp.c   |  5 ++---
 2 files changed, 30 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr97832-4.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr97832-4.c 
b/gcc/testsuite/gcc.dg/vect/pr97832-4.c
new file mode 100644
index 000..74ae27ff873
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr97832-4.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-Ofast" } */
+/* { dg-require-effective-target vect_double } */
+
+void foo1x1(double* restrict y, const double* restrict x, int clen)
+{
+  int xi = clen & 2;
+  double f_re = x[0+xi+0];
+  double f_im = x[4+xi+0];
+  int clen2 = (clen+xi) * 2;
+#pragma GCC unroll 0
+  for (int c = 0; c < clen2; c += 8) {
+#pragma GCC unroll 4
+for (int k = 0; k < 4; ++k) {
+  double x_re = x[k];
+  double x_im = x[c+4+k];
+  double y_re = y[c+0+k];
+  double y_im = y[c+4+k];
+  y_re = y_re - x_re * f_re - x_im * f_im;;
+  y_im = y_im + x_re * f_im - x_im * f_re;
+  y[c+0+k] = y_re;
+  y[c+4+k] = y_im;
+}
+  }
+}
+
+/* { dg-final { scan-tree-dump "vectorizing stmts using SLP" "vect" } } */
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" "vect" } } */
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 9ded58592c8..8ec589b7948 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1536,13 +1536,12 @@ vect_build_slp_tree (vec_info *vinfo,
 /* Helper for building an associated SLP node chain.  */
 
 static void
-vect_slp_build_two_operator_nodes (slp_tree perm,
+vect_slp_build_two_operator_nodes (slp_tree perm, tree vectype,
   slp_tree op0, slp_tree op1,
   stmt_vec_info oper1, stmt_vec_info oper2,
   vec > lperm)
 {
   unsigned group_size = SLP_TREE_LANES (op1);
-  tree vectype = SLP_TREE_VECTYPE (op1);
 
   slp_tree child1 = new _slp_tree;
   SLP_TREE_DEF_TYPE (child1) = vect_internal_def;
@@ -2087,7 +2086,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
  for (unsigned lane = 0; lane < group_size; ++lane)
lperm.quick_push (std::make_pair
  (chains[lane][i].code != chains[0][i].code, lane));
- vect_slp_build_two_operator_nodes (child, op0, op1,
+ vect_slp_build_two_operator_nodes (child, vectype, op0, op1,
 (chains[0][i].code == code
  ? op_stmt_info
  : other_op_stmt_info),
-- 
2.26.2

[PATCH] libffi: Fix up x86_64 classify_argument

2021-06-16 Thread Jakub Jelinek via Gcc-patches

Hi!

As the following testcase shows, libffi didn't handle properly
classify_arguments of structures at byte offsets not divisible by
UNITS_PER_WORD.  The following patch adjusts it to match what
config/i386/ classify_argument does for that and also ports the
PR38781 fix there (the second chunk).

This has been committed to upstream libffi already:
https://github.com/libffi/libffi/commit/5651bea284ad0822eafe768e3443c2f4d7da2c8f

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-06-16  Jakub Jelinek  

* src/x86/ffi64.c (classify_argument): For FFI_TYPE_STRUCT set words
to number of words needed for type->size + byte_offset bytes rather
than just type->size bytes.  Compute pos before the loop and check
total size of the structure.
* testsuite/libffi.call/nested_struct12.c: New test.

--- libffi/src/x86/ffi64.c.jj   2020-01-14 20:02:48.557583260 +0100
+++ libffi/src/x86/ffi64.c  2021-06-15 19:50:06.059108230 +0200
@@ -217,7 +217,8 @@ classify_argument (ffi_type *type, enum
 case FFI_TYPE_STRUCT:
   {
const size_t UNITS_PER_WORD = 8;
-   size_t words = (type->size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+   size_t words = (type->size + byte_offset + UNITS_PER_WORD - 1)
+  / UNITS_PER_WORD;
ffi_type **ptr;
int i;
enum x86_64_reg_class subclasses[MAX_CLASSES];
@@ -241,16 +242,16 @@ classify_argument (ffi_type *type, enum
/* Merge the fields of structure.  */
for (ptr = type->elements; *ptr != NULL; ptr++)
  {
-   size_t num;
+   size_t num, pos;
 
byte_offset = ALIGN (byte_offset, (*ptr)->alignment);
 
num = classify_argument (*ptr, subclasses, byte_offset % 8);
if (num == 0)
  return 0;
-   for (i = 0; i < num; i++)
+   pos = byte_offset / 8;
+   for (i = 0; i < num && (i + pos) < words; i++)
  {
-   size_t pos = byte_offset / 8;
classes[i + pos] =
  merge_classes (subclasses[i], classes[i + pos]);
  }
--- libffi/testsuite/libffi.call/nested_struct12.c.jj   2021-06-15 
20:31:43.327144303 +0200
+++ libffi/testsuite/libffi.call/nested_struct12.c  2021-06-15 
20:47:13.129489263 +0200
@@ -0,0 +1,107 @@
+/* Area:   ffi_call, closure_call
+   Purpose:Check structure passing.
+   Limitations:none.
+   PR: none.
+   Originator:  and  20210609*/
+
+/* { dg-do run } */
+#include "ffitest.h"
+
+typedef struct A {
+  float a, b;
+} A;
+
+typedef struct B {
+  float x;
+  struct A y;
+} B;
+
+B B_fn(float b0, struct B b1)
+{
+  struct B result;
+
+  result.x = b0 + b1.x;
+  result.y.a = b0 + b1.y.a;
+  result.y.b = b0 + b1.y.b;
+
+  printf("%g %g %g %g: %g %g %g\n", b0, b1.x, b1.y.a, b1.y.b,
+result.x, result.y.a, result.y.b);
+
+  return result;
+}
+
+static void
+B_gn(ffi_cif* cif __UNUSED__, void* resp, void** args,
+ void* userdata __UNUSED__)
+{
+  float b0;
+  struct B b1;
+
+  b0 = *(float*)(args[0]);
+  b1 = *(struct B*)(args[1]);
+
+  *(B*)resp = B_fn(b0, b1);
+}
+
+int main (void)
+{
+  ffi_cif cif;
+  void *code;
+  ffi_closure *pcl = ffi_closure_alloc(sizeof(ffi_closure), &code);
+  void* args_dbl[3];
+  ffi_type* cls_struct_fields[3];
+  ffi_type* cls_struct_fields1[3];
+  ffi_type cls_struct_type, cls_struct_type1;
+  ffi_type* dbl_arg_types[3];
+
+  float e_dbl = 12.125f;
+  struct B f_dbl = { 24.75f, { 31.625f, 32.25f } };
+
+  struct B res_dbl;
+
+  cls_struct_type.size = 0;
+  cls_struct_type.alignment = 0;
+  cls_struct_type.type = FFI_TYPE_STRUCT;
+  cls_struct_type.elements = cls_struct_fields;
+
+  cls_struct_type1.size = 0;
+  cls_struct_type1.alignment = 0;
+  cls_struct_type1.type = FFI_TYPE_STRUCT;
+  cls_struct_type1.elements = cls_struct_fields1;
+
+  cls_struct_fields[0] = &ffi_type_float;
+  cls_struct_fields[1] = &ffi_type_float;
+  cls_struct_fields[2] = NULL;
+
+  cls_struct_fields1[0] = &ffi_type_float;
+  cls_struct_fields1[1] = &cls_struct_type;
+  cls_struct_fields1[2] = NULL;
+
+
+  dbl_arg_types[0] = &ffi_type_float;
+  dbl_arg_types[1] = &cls_struct_type1;
+  dbl_arg_types[2] = NULL;
+
+  CHECK(ffi_prep_cif(&cif, FFI_DEFAULT_ABI, 2, &cls_struct_type1,
+dbl_arg_types) == FFI_OK);
+
+  args_dbl[0] = &e_dbl;
+  args_dbl[1] = &f_dbl;
+  args_dbl[2] = NULL;
+
+  ffi_call(&cif, FFI_FN(B_fn), &res_dbl, args_dbl);
+  /* { dg-output "12.125 24.75 31.625 32.25: 36.875 43.75 44.375" } */
+  CHECK( res_dbl.x == (e_dbl + f_dbl.x));
+  CHECK( res_dbl.y.a == (e_dbl + f_dbl.y.a));
+  CHECK( res_dbl.y.b == (e_dbl + f_dbl.y.b));
+
+  CHECK(ffi_prep_closure_loc(pcl, &cif, B_gn, NULL, code) == FFI_OK);
+
+  res_dbl = ((B(*)(float, B))(code))(e_dbl, f_dbl);
+  /* { dg-output "\n12.125 24.75 31.625 32.25: 36.875 43.75 44.375" } */
+  CHECK( res_dbl.x == (e_dbl + f_dbl.x));
+  CHECK( res_dbl.y.a == (e_dbl + f_dbl.y.a));
+  CHECK( re

Re: [PATCH] libffi: Fix up x86_64 classify_argument

2021-06-16 Thread Uros Bizjak via Gcc-patches

On Wed, Jun 16, 2021 at 9:49 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As the following testcase shows, libffi didn't handle properly
> classify_arguments of structures at byte offsets not divisible by
> UNITS_PER_WORD.  The following patch adjusts it to match what
> config/i386/ classify_argument does for that and also ports the
> PR38781 fix there (the second chunk).
>
> This has been committed to upstream libffi already:
> https://github.com/libffi/libffi/commit/5651bea284ad0822eafe768e3443c2f4d7da2c8f
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-06-16  Jakub Jelinek  
>
> * src/x86/ffi64.c (classify_argument): For FFI_TYPE_STRUCT set words
> to number of words needed for type->size + byte_offset bytes rather
> than just type->size bytes.  Compute pos before the loop and check
> total size of the structure.
> * testsuite/libffi.call/nested_struct12.c: New test.

OK.

Thanks,
Uros.

>
> --- libffi/src/x86/ffi64.c.jj   2020-01-14 20:02:48.557583260 +0100
> +++ libffi/src/x86/ffi64.c  2021-06-15 19:50:06.059108230 +0200
> @@ -217,7 +217,8 @@ classify_argument (ffi_type *type, enum
>  case FFI_TYPE_STRUCT:
>{
> const size_t UNITS_PER_WORD = 8;
> -   size_t words = (type->size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> +   size_t words = (type->size + byte_offset + UNITS_PER_WORD - 1)
> +  / UNITS_PER_WORD;
> ffi_type **ptr;
> int i;
> enum x86_64_reg_class subclasses[MAX_CLASSES];
> @@ -241,16 +242,16 @@ classify_argument (ffi_type *type, enum
> /* Merge the fields of structure.  */
> for (ptr = type->elements; *ptr != NULL; ptr++)
>   {
> -   size_t num;
> +   size_t num, pos;
>
> byte_offset = ALIGN (byte_offset, (*ptr)->alignment);
>
> num = classify_argument (*ptr, subclasses, byte_offset % 8);
> if (num == 0)
>   return 0;
> -   for (i = 0; i < num; i++)
> +   pos = byte_offset / 8;
> +   for (i = 0; i < num && (i + pos) < words; i++)
>   {
> -   size_t pos = byte_offset / 8;
> classes[i + pos] =
>   merge_classes (subclasses[i], classes[i + pos]);
>   }
> --- libffi/testsuite/libffi.call/nested_struct12.c.jj   2021-06-15 
> 20:31:43.327144303 +0200
> +++ libffi/testsuite/libffi.call/nested_struct12.c  2021-06-15 
> 20:47:13.129489263 +0200
> @@ -0,0 +1,107 @@
> +/* Area:   ffi_call, closure_call
> +   Purpose:Check structure passing.
> +   Limitations:none.
> +   PR: none.
> +   Originator:  and  20210609*/
> +
> +/* { dg-do run } */
> +#include "ffitest.h"
> +
> +typedef struct A {
> +  float a, b;
> +} A;
> +
> +typedef struct B {
> +  float x;
> +  struct A y;
> +} B;
> +
> +B B_fn(float b0, struct B b1)
> +{
> +  struct B result;
> +
> +  result.x = b0 + b1.x;
> +  result.y.a = b0 + b1.y.a;
> +  result.y.b = b0 + b1.y.b;
> +
> +  printf("%g %g %g %g: %g %g %g\n", b0, b1.x, b1.y.a, b1.y.b,
> +result.x, result.y.a, result.y.b);
> +
> +  return result;
> +}
> +
> +static void
> +B_gn(ffi_cif* cif __UNUSED__, void* resp, void** args,
> + void* userdata __UNUSED__)
> +{
> +  float b0;
> +  struct B b1;
> +
> +  b0 = *(float*)(args[0]);
> +  b1 = *(struct B*)(args[1]);
> +
> +  *(B*)resp = B_fn(b0, b1);
> +}
> +
> +int main (void)
> +{
> +  ffi_cif cif;
> +  void *code;
> +  ffi_closure *pcl = ffi_closure_alloc(sizeof(ffi_closure), &code);
> +  void* args_dbl[3];
> +  ffi_type* cls_struct_fields[3];
> +  ffi_type* cls_struct_fields1[3];
> +  ffi_type cls_struct_type, cls_struct_type1;
> +  ffi_type* dbl_arg_types[3];
> +
> +  float e_dbl = 12.125f;
> +  struct B f_dbl = { 24.75f, { 31.625f, 32.25f } };
> +
> +  struct B res_dbl;
> +
> +  cls_struct_type.size = 0;
> +  cls_struct_type.alignment = 0;
> +  cls_struct_type.type = FFI_TYPE_STRUCT;
> +  cls_struct_type.elements = cls_struct_fields;
> +
> +  cls_struct_type1.size = 0;
> +  cls_struct_type1.alignment = 0;
> +  cls_struct_type1.type = FFI_TYPE_STRUCT;
> +  cls_struct_type1.elements = cls_struct_fields1;
> +
> +  cls_struct_fields[0] = &ffi_type_float;
> +  cls_struct_fields[1] = &ffi_type_float;
> +  cls_struct_fields[2] = NULL;
> +
> +  cls_struct_fields1[0] = &ffi_type_float;
> +  cls_struct_fields1[1] = &cls_struct_type;
> +  cls_struct_fields1[2] = NULL;
> +
> +
> +  dbl_arg_types[0] = &ffi_type_float;
> +  dbl_arg_types[1] = &cls_struct_type1;
> +  dbl_arg_types[2] = NULL;
> +
> +  CHECK(ffi_prep_cif(&cif, FFI_DEFAULT_ABI, 2, &cls_struct_type1,
> +dbl_arg_types) == FFI_OK);
> +
> +  args_dbl[0] = &e_dbl;
> +  args_dbl[1] = &f_dbl;
> +  args_dbl[2] = NULL;
> +
> +  ffi_call(&cif, FFI_FN(B_fn), &res_dbl, args_dbl);
> +  /* { dg-output "12.125 24.75 31.625 32.25: 36.875 43.75 44.375" } */
> +  CHECK( res_dbl.x == (e_dbl + f_dbl.x));
> +  CHECK( res_dbl.y.a ==

[Ada] Small cleanup in System.Exceptions

2021-06-16 Thread Pierre-Marie de Rodat

This removes obsolete stuff.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-except.ads (ZCX_By_Default): Delete.
(Require_Body): Likewise.
* libgnat/s-except.adb: Replace body with pragma No_Body.diff --git a/gcc/ada/libgnat/s-except.adb b/gcc/ada/libgnat/s-except.adb
--- a/gcc/ada/libgnat/s-except.adb
+++ b/gcc/ada/libgnat/s-except.adb
@@ -29,17 +29,4 @@
 --  --
 --
 
---  This package does not require a body, since it is a package renaming. We
---  provide a dummy file containing a No_Body pragma so that previous versions
---  of the body (which did exist) will not interfere.
-
---  pragma No_Body;
-
---  The above pragma is commented out, since for now we can't use No_Body in
---  a unit marked as a Compiler_Unit, since this requires GNAT 6.1, and we
---  do not yet require this for bootstrapping. So instead we use a dummy Taft
---  amendment type to require the body:
-
-package body System.Exceptions is
-   type Require_Body is new Integer;
-end System.Exceptions;
+pragma No_Body;


diff --git a/gcc/ada/libgnat/s-except.ads b/gcc/ada/libgnat/s-except.ads
--- a/gcc/ada/libgnat/s-except.ads
+++ b/gcc/ada/libgnat/s-except.ads
@@ -34,30 +34,10 @@ pragma Compiler_Unit_Warning;
 package System.Exceptions is
 
pragma Preelaborate;
-   --  To let Ada.Exceptions "with" us and let us "with" Standard_Library
-
-   ZCX_By_Default : constant Boolean;
-   --  Visible copy to allow Ada.Exceptions to know the exception model
+   --  To let Ada.Exceptions "with" us
 
 private
 
-   type Require_Body;
-   --  Dummy Taft-amendment type to make it legal (and required) to provide
-   --  a body for this package.
-   --
-   --  We do this because this unit used to have a body in earlier versions
-   --  of GNAT, and it causes various bootstrap path problems etc if we remove
-   --  a body, since we may pick up old unwanted bodies.
-   --
-   --  Note: we use this standard Ada method of requiring a body rather
-   --  than the cleaner pragma No_Body because System.Exceptions is a compiler
-   --  unit, and older bootstrap compilers do not support pragma No_Body. This
-   --  type can be removed, and s-except.adb can be replaced by a source
-   --  containing just that pragma, when we decide to move to a 2008 compiler
-   --  as the minimal bootstrap compiler version. ???
-
-   ZCX_By_Default : constant Boolean := System.ZCX_By_Default;
-
Foreign_Exception : exception;
pragma Unreferenced (Foreign_Exception);
--  This hidden exception is used to represent non-Ada exception to

[Ada] Reorder code for validity checks of unchecked conversions

2021-06-16 Thread Pierre-Marie de Rodat

The placement of check for types of an unchecked conversion being
generic was done in the middle of switching from their original to
underlying views, which was inconsistent and made the surrounding
comments confusing.

Code cleanup in preparation for a follow-up improvement; behaviour is
unaffected, because the flag Is_Generic_Type is present in all views.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch13.adb (Validate_Unchecked_Conversion): Move detection
of generic types before switching to their private views; fix
style in using AND THEN.diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -16578,18 +16578,7 @@ package body Sem_Ch13 is
   --  here because the processing for generic instantiation always makes
   --  subtypes, and we want the original frozen actual types.
 
-  --  If we are dealing with private types, then do the check on their
-  --  fully declared counterparts if the full declarations have been
-  --  encountered (they don't have to be visible, but they must exist).
-
   Source := Ancestor_Subtype (Etype (First_Formal (Act_Unit)));
-
-  if Is_Private_Type (Source)
-and then Present (Underlying_Type (Source))
-  then
- Source := Underlying_Type (Source);
-  end if;
-
   Target := Ancestor_Subtype (Etype (Act_Unit));
 
   --  If either type is generic, the instantiation happens within a generic
@@ -16600,6 +16589,16 @@ package body Sem_Ch13 is
  return;
   end if;
 
+  --  If we are dealing with private types, then do the check on their
+  --  fully declared counterparts if the full declarations have been
+  --  encountered (they don't have to be visible, but they must exist).
+
+  if Is_Private_Type (Source)
+and then Present (Underlying_Type (Source))
+  then
+ Source := Underlying_Type (Source);
+  end if;
+
   if Is_Private_Type (Target)
 and then Present (Underlying_Type (Target))
   then
@@ -16692,8 +16691,8 @@ package body Sem_Ch13 is
   --  in the same unit as the unchecked conversion, then set the flag
   --  No_Strict_Aliasing (no strict aliasing is implicit here)
 
-  if Is_Access_Type (Target) and then
-In_Same_Source_Unit (Target, N)
+  if Is_Access_Type (Target)
+and then In_Same_Source_Unit (Target, N)
   then
  Set_No_Strict_Aliasing (Implementation_Base_Type (Target));
   end if;

[Ada] Raise expressions and unconstrained components

2021-06-16 Thread Pierre-Marie de Rodat

During the implementation of AI12-0172, we incorrectly allowed illegal
cases of unconstrained components.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch3.adb (Analyze_Component_Declaration): Do not special
case raise expressions.

gcc/testsuite/

* gnat.dg/limited4.adb: Disable illegal code.diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -2083,21 +2083,10 @@ package body Sem_Ch3 is
  end if;
   end if;
 
-  --  Avoid reporting spurious errors if the component is initialized with
-  --  a raise expression (which is legal in any expression context)
-
-  if Present (E)
-and then
-  (Nkind (E) = N_Raise_Expression
- or else (Nkind (E) = N_Qualified_Expression
-and then Nkind (Expression (E)) = N_Raise_Expression))
-  then
- null;
-
   --  The parent type may be a private view with unknown discriminants,
   --  and thus unconstrained. Regular components must be constrained.
 
-  elsif not Is_Definite_Subtype (T)
+  if not Is_Definite_Subtype (T)
 and then Chars (Id) /= Name_uParent
   then
  if Is_Class_Wide_Type (T) then


diff --git a/gcc/testsuite/gnat.dg/limited4.adb b/gcc/testsuite/gnat.dg/limited4.adb
--- a/gcc/testsuite/gnat.dg/limited4.adb
+++ b/gcc/testsuite/gnat.dg/limited4.adb
@@ -22,11 +22,12 @@ procedure Limited4 is
 Obj2 : Lim_Tagged'Class := Lim_Tagged'Class'(raise TBD_Error);
 
 --  b) initialization expression of a CW component_declaration
-
-type Rec is record
-   Comp01 : Lim_Tagged'Class := (raise TBD_Error);
-   Comp02 : Lim_Tagged'Class := Lim_Tagged'Class'((raise TBD_Error));
-end record;
+--  ... is illegal: cannot have unconstrained components.
+--
+--  type Rec is record
+-- Comp01 : Lim_Tagged'Class := (raise TBD_Error);
+-- Comp02 : Lim_Tagged'Class := Lim_Tagged'Class'((raise TBD_Error));
+--  end record;
 
 --  c) the expression of a record_component_association
 
@@ -55,4 +56,4 @@ procedure Limited4 is
 begin
 Check := Do_Test1 (raise TBD_Error);
 Check := Do_Test2;
-end;
\ No newline at end of file
+end;

[Ada] Small cleanup in C header files

2021-06-16 Thread Pierre-Marie de Rodat

This removes obsolete stuff.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* initialize.c: Do not include vxWorks.h and fcntl.h from here.
(__gnat_initialize) [__MINGW32__]: Remove #ifdef and attribute
(__gnat_initialize) [init_float]: Delete.
(__gnat_initialize) [VxWorks]: Likewise.
(__gnat_initialize) [PA-RISC HP-UX 10]: Likewise.
* runtime.h: Add comment about vxWorks.h include.diff --git a/gcc/ada/initialize.c b/gcc/ada/initialize.c
--- a/gcc/ada/initialize.c
+++ b/gcc/ada/initialize.c
@@ -29,29 +29,19 @@
  *  *
  /
 
-/*  This unit provides default implementation for __gnat_initialize ()
-which is called before the elaboration of the partition. It is provided
-in a separate file/object so that users can replace it easily.
-The default implementation should be null on most targets.  */
-
-/* The following include is here to meet the published VxWorks requirement
-   that the __vxworks header appear before any other include.  */
-#ifdef __vxworks
-#include "vxWorks.h"
-#endif
+/*  This unit provides the default implementation of __gnat_initialize, which
+is called before the elaboration of the partition.  It is provided in a
+separate file so that users can replace it easily.  But the implementation
+should be empty on most targets.  */
 
 #ifdef IN_RTS
 #include "runtime.h"
-/* We don't have libiberty, so use malloc.  */
-#define xmalloc(S) malloc (S)
-#define xrealloc(V,S) realloc (V,S)
 #else
 #include "config.h"
 #include "system.h"
 #endif
 
 #include "raise.h"
-#include 
 
 #ifdef __cplusplus
 extern "C" {
@@ -63,65 +53,16 @@ extern "C" {
 
 #if defined (__MINGW32__)
 
-extern void __gnat_install_SEH_handler (void *);
-
 void
-__gnat_initialize (void *eh ATTRIBUTE_UNUSED)
+__gnat_initialize (void *eh)
 {
-   /* Note that we do not activate this for the compiler itself to avoid a
-  bootstrap path problem.  Older version of gnatbind will generate a call
-  to __gnat_initialize() without argument. Therefore we cannot use eh in
-  this case.  It will be possible to remove the following #ifdef at some
-  point.  */
-#ifdef IN_RTS
/* Install the Structured Exception handler.  */
if (eh)
  __gnat_install_SEH_handler (eh);
-#endif
-}
-
-/**/
-/* __gnat_initialize (init_float version) */
-/**/
-
-#elif defined (__Lynx__) || defined (__FreeBSD__) || defined(__NetBSD__) \
-  || defined (__OpenBSD__) || defined (__DragonFly__)
-
-void
-__gnat_initialize (void *eh ATTRIBUTE_UNUSED)
-{
-}
-
-/***/
-/* __gnat_initialize (VxWorks Version) */
-/***/
-
-#elif defined(__vxworks)
-
-void
-__gnat_initialize (void *eh)
-{
-}
-
-#elif defined(_T_HPUX10) || (!defined(IN_RTS) && defined(_X_HPUX10))
-
-//
-/* __gnat_initialize (PA-RISC HP-UX 10 Version) */
-//
-
-extern void __main (void);
-
-void
-__gnat_initialize (void *eh ATTRIBUTE_UNUSED)
-{
-  __main ();
 }
 
 #else
 
-/* For all other versions of GNAT, the initialize routine and handler
-   installation do nothing */
-
 /***/
 /* __gnat_initialize (Default Version) */
 /***/
@@ -130,6 +71,7 @@ void
 __gnat_initialize (void *eh ATTRIBUTE_UNUSED)
 {
 }
+
 #endif
 
 #ifdef __cplusplus


diff --git a/gcc/ada/runtime.h b/gcc/ada/runtime.h
--- a/gcc/ada/runtime.h
+++ b/gcc/ada/runtime.h
@@ -31,9 +31,11 @@
 
 /* This file provides common definitions used by GNAT C runtime files.  */
 
+/* The following include is here to meet the published VxWorks requirement
+   that the vxWorks.h header appear before any other header.  */
 #ifdef __vxworks
 #include "vxWorks.h"
-#endif /* __vxworks */
+#endif
 
 #ifndef ATTRIBUTE_UNUSED
 #define ATTRIBUTE_UNUSED __attribute__((unused))

[Ada] Use more straightforward implementation for Current_Entity_In_Scope

2021-06-16 Thread Pierre-Marie de Rodat

This new version is both more straightforward to understand and to map
to the generated code at high optimization.  No functional changes.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.adb (Current_Entity_In_Scope): Reimplement.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -6955,19 +6955,30 @@ package body Sem_Util is
-
 
function Current_Entity_In_Scope (N : Name_Id) return Entity_Id is
-  E  : Entity_Id;
   CS : constant Entity_Id := Current_Scope;
 
-  Transient_Case : constant Boolean := Scope_Is_Transient;
+  E  : Entity_Id;
 
begin
   E := Get_Name_Entity_Id (N);
-  while Present (E)
-and then Scope (E) /= CS
-and then (not Transient_Case or else Scope (E) /= Scope (CS))
-  loop
- E := Homonym (E);
-  end loop;
+
+  if No (E) then
+ null;
+
+  elsif Scope_Is_Transient then
+ while Present (E) loop
+exit when Scope (E) = CS or else Scope (E) = Scope (CS);
+
+E := Homonym (E);
+ end loop;
+
+  else
+ while Present (E) loop
+exit when Scope (E) = CS;
+
+E := Homonym (E);
+ end loop;
+  end if;
 
   return E;
end Current_Entity_In_Scope;

[Ada] Fix missing array bounds checking

2021-06-16 Thread Pierre-Marie de Rodat

For an assignment statement of the form "A.B(C).D := ...", in a loop,
the index check on C can be missing.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* ghost.adb: Add another special case where full analysis is
needed. This bug is due to quirks in the way
Mark_And_Set_Ghost_Assignment works (it happens very early,
before name resolution is done).diff --git a/gcc/ada/ghost.adb b/gcc/ada/ghost.adb
--- a/gcc/ada/ghost.adb
+++ b/gcc/ada/ghost.adb
@@ -1245,11 +1245,21 @@ package body Ghost is
   --  processing them in that mode can lead to spurious errors.
 
   if Expander_Active then
+ --  Cases where full analysis is needed, involving array indexing
+ --  which would otherwise be missing array-bounds checks:
+
  if not Analyzed (Orig_Lhs)
-   and then Nkind (Orig_Lhs) = N_Indexed_Component
-   and then Nkind (Prefix (Orig_Lhs)) = N_Selected_Component
-   and then Nkind (Prefix (Prefix (Orig_Lhs))) =
-   N_Indexed_Component
+   and then
+ ((Nkind (Orig_Lhs) = N_Indexed_Component
+and then Nkind (Prefix (Orig_Lhs)) = N_Selected_Component
+and then Nkind (Prefix (Prefix (Orig_Lhs))) =
+   N_Indexed_Component)
+  or else
+ (Nkind (Orig_Lhs) = N_Selected_Component
+  and then Nkind (Prefix (Orig_Lhs)) = N_Indexed_Component
+  and then Nkind (Prefix (Prefix (Orig_Lhs))) =
+ N_Selected_Component
+  and then Nkind (Parent (N)) /= N_Loop_Statement))
  then
 Analyze (Orig_Lhs);
  end if;

[Ada] ACATS 4.1R-c611a04: Class-wide preconditions in dispatching calls

2021-06-16 Thread Pierre-Marie de Rodat

Completing previous patch since it introduced a regression on ACATS
c611a03 under certified runtime.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_disp.adb (Build_Class_Wide_Check): Ensure that evaluation
of actuals is side effects free (since the check duplicates
actuals).diff --git a/gcc/ada/exp_disp.adb b/gcc/ada/exp_disp.adb
--- a/gcc/ada/exp_disp.adb
+++ b/gcc/ada/exp_disp.adb
@@ -868,6 +868,7 @@ package body Exp_Disp is
 
  Str_Loc : constant String := Build_Location_String (Loc);
 
+ A: Node_Id;
  Cond : Node_Id;
  Msg  : Node_Id;
  Prec : Node_Id;
@@ -900,6 +901,15 @@ package body Exp_Disp is
return;
 end if;
 
+--  Ensure that the evaluation of the actuals will not produce side
+--  effects (since the check will use a copy of the actuals).
+
+A := First_Actual (Call_Node);
+while Present (A) loop
+   Remove_Side_Effects (A);
+   Next_Actual (A);
+end loop;
+
 --  The expression for the precondition is analyzed within the
 --  generated pragma. The message text is the last parameter of
 --  the generated pragma, indicating source of precondition.

[Ada] Do not perform useless work in Check_No_Parts_Violations

2021-06-16 Thread Pierre-Marie de Rodat

There is no need to walk the hierarchy rooted at a type that does not
come from source only to drop the result on the floor.  No functional
changes.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* freeze.adb (Check_No_Parts_Violations): Return earlier if the
type is elementary or does not come from source.diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -2584,27 +2584,30 @@ package body Freeze is
 
  --  Local declarations
 
- Types_With_Aspect : Elist_Id :=
-   Get_Types_With_Aspect_In_Hierarchy (Typ);
-
- Aspect_Value : Entity_Id;
- Curr_Value   : Entity_Id;
- Curr_Typ_Elmt: Elmt_Id;
- Curr_Body_Elmt   : Elmt_Id;
- Curr_Formal_Elmt : Elmt_Id;
- Gen_Bodies   : Elist_Id;
- Gen_Formals  : Elist_Id;
- Scop : Entity_Id;
+ Aspect_Value  : Entity_Id;
+ Curr_Value: Entity_Id;
+ Curr_Typ_Elmt : Elmt_Id;
+ Curr_Body_Elmt: Elmt_Id;
+ Curr_Formal_Elmt  : Elmt_Id;
+ Gen_Bodies: Elist_Id;
+ Gen_Formals   : Elist_Id;
+ Scop  : Entity_Id;
+ Types_With_Aspect : Elist_Id;
 
   --  Start of processing for Check_No_Parts_Violations
 
   begin
- --  There are no types with No_Parts specified, so there
- --  is nothing to check.
+ --  Nothing to check if the type is elementary or artificial
 
- if Is_Empty_Elmt_List (Types_With_Aspect)
-   or else not Comes_From_Source (Typ)
- then
+ if Is_Elementary_Type (Typ) or else not Comes_From_Source (Typ) then
+return;
+ end if;
+
+ Types_With_Aspect := Get_Types_With_Aspect_In_Hierarchy (Typ);
+
+ --  Nothing to check if there are no types with No_Parts specified
+
+ if Is_Empty_Elmt_List (Types_With_Aspect) then
 return;
  end if;

[Ada] Make Incomplete_Or_Partial_View independent of the context

2021-06-16 Thread Pierre-Marie de Rodat

The value returned by Incomplete_Or_Partial_View depends on the current
scope, which is unexpected at best.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.adb (Incomplete_Or_Partial_View): Retrieve the scope of
the parameter and use it to find its incomplete view, if any.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -14625,6 +14625,8 @@ package body Sem_Util is

 
function Incomplete_Or_Partial_View (Id : Entity_Id) return Entity_Id is
+  S : constant Entity_Id := Scope (Id);
+
   function Inspect_Decls
 (Decls : List_Id;
  Taft  : Boolean := False) return Entity_Id;
@@ -14693,7 +14695,13 @@ package body Sem_Util is
begin
   --  Deferred constant or incomplete type case
 
-  Prev := Current_Entity_In_Scope (Id);
+  Prev := Current_Entity (Id);
+
+  while Present (Prev) loop
+ exit when Scope (Prev) = S;
+
+ Prev := Homonym (Prev);
+  end loop;
 
   if Present (Prev)
 and then (Is_Incomplete_Type (Prev) or else Ekind (Prev) = E_Constant)
@@ -14706,13 +14714,12 @@ package body Sem_Util is
   --  Private or Taft amendment type case
 
   declare
- Pkg  : constant Entity_Id := Scope (Id);
- Pkg_Decl : Node_Id := Pkg;
+ Pkg_Decl : Node_Id;
 
   begin
- if Present (Pkg)
-   and then Is_Package_Or_Generic_Package (Pkg)
- then
+ if Present (S) and then Is_Package_Or_Generic_Package (S) then
+Pkg_Decl := S;
+
 while Nkind (Pkg_Decl) /= N_Package_Specification loop
Pkg_Decl := Parent (Pkg_Decl);
 end loop;
@@ -14737,7 +14744,7 @@ package body Sem_Util is
 --  Taft amendment type. The incomplete view should be located in
 --  the private declarations of the enclosing scope.
 
-elsif In_Package_Body (Pkg) then
+elsif In_Package_Body (S) then
return Inspect_Decls (Private_Declarations (Pkg_Decl), True);
 end if;
  end if;

[Ada] Non-static Interrupt_Priority allowed with restriction Static_Priorities

2021-06-16 Thread Pierre-Marie de Rodat

This patch fixes an issue in the compiler whereby a non-static
expression is allowed as an argument to aspect Interrupt_Priority
despite restriction Static_Priorities being in effect.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch13.adb (Make_Aitem_Pragma): Check for static expressions
in Priority aspect arguments for restriction Static_Priorities.diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -3382,6 +3382,13 @@ package body Sem_Ch13 is
   | Aspect_Interrupt_Priority
   | Aspect_Priority
=>
+  --  Verify the expression is static when Static_Priorities is
+  --  enabled.
+
+  if not Is_OK_Static_Expression (Expr) then
+ Check_Restriction (Static_Priorities, Expr);
+  end if;
+
   if Nkind (N) in N_Subprogram_Body | N_Subprogram_Declaration
   then
  --  Analyze the aspect expression

[Ada] Fix ALI source location for dominance markers

2021-06-16 Thread Pierre-Marie de Rodat

As First_Sloc is used to compute source location for entry guards, it
must be also used to compute the dominance marker source location in
this particular case (when a statement is dominated by an entry guard).

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* par_sco.adb (Set_Statement_Entry): Change sloc for dominance
marker.
(Traverse_One): Fix typo.
(Output_Header): Fix comment.diff --git a/gcc/ada/par_sco.adb b/gcc/ada/par_sco.adb
--- a/gcc/ada/par_sco.adb
+++ b/gcc/ada/par_sco.adb
@@ -683,9 +683,12 @@ package body Par_SCO is
--  two levels (through the pragma argument association) to
--  get to the pragma node itself. For the guard on a select
--  alternative, we do not have access to the token location for
-   --  the WHEN, so we use the first sloc of the condition itself
-   --  (note: we use First_Sloc, not Sloc, because this is what is
-   --  referenced by dominance markers).
+   --  the WHEN, so we use the first sloc of the condition itself.
+   --  First_Sloc gives the most sensible result, but we have to
+   --  beware of also using it when computing the dominance marker
+   --  sloc (in the Set_Statement_Entry procedure), as this is not
+   --  fully equivalent to the "To" sloc computed by
+   --  Sloc_Range (Guard, To, From).
 
--  Doesn't this requirement of using First_Sloc need to be
--  documented in the spec ???
@@ -1579,6 +1582,18 @@ package body Par_SCO is
 To := No_Location;
  end if;
 
+ --  Be consistent with the location determined in
+ --  Output_Header.
+
+ if Current_Dominant.K = 'T'
+and then Nkind (Parent (Current_Dominant.N))
+   in N_Accept_Alternative
+| N_Delay_Alternative
+| N_Terminate_Alternative
+ then
+From := First_Sloc (Current_Dominant.N);
+ end if;
+
  Set_Raw_Table_Entry
(C1 => '>',
 C2 => Current_Dominant.K,
@@ -1867,7 +1882,7 @@ package body Par_SCO is
  Process_Decisions_Defer (Cond, 'G');
 
  --  For an entry body with a barrier, the entry body
- --  is dominanted by a True evaluation of the barrier.
+ --  is dominated by a True evaluation of the barrier.
 
  Inner_Dominant := ('T', N);
   end if;

[Ada] Spurious accessibility error on "for of" loop parameter

2021-06-16 Thread Pierre-Marie de Rodat

This patch fixes an issue in the compiler whereby taking 'Access of a
"for of" loop parameter caused spurious compile-time accessibility
errors.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.adb (Accessibility_Level): Take into account
renamings of loop parameters.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -664,6 +664,15 @@ package body Sem_Util is
return Make_Level_Literal
 (Scope_Depth (Enclosing_Dynamic_Scope (E)) + 1);
 
+--  Check if E is an expansion-generated renaming of an iterator
+--  by examining Related_Expression. If so, determine the
+--  accessibility level based on the original expression.
+
+elsif Ekind (E) in E_Constant | E_Variable
+  and then Present (Related_Expression (E))
+then
+   return Accessibility_Level (Related_Expression (E));
+
 --  Normal object - get the level of the enclosing scope
 
 else

[Ada] Mixing of positional and named entries allowed in enum rep

2021-06-16 Thread Pierre-Marie de Rodat

This patch fixes an issue in the compiler whereby the mixing of
positional and named entries in an enumeration representation clause was
erroneously allowed instead of rejected - as per RM rules.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch13.adb (Analyze_Enumeration_Representation_Clause): Add
check for the mixing of entries.diff --git a/gcc/ada/sem_ch13.adb b/gcc/ada/sem_ch13.adb
--- a/gcc/ada/sem_ch13.adb
+++ b/gcc/ada/sem_ch13.adb
@@ -7999,6 +7999,15 @@ package body Sem_Ch13 is
("extra parentheses surrounding aggregate not allowed", Aggr);
  return;
 
+  --  Reject the mixing of named and positional entries in the aggregate
+
+  elsif Present (Expressions (Aggr))
+and then Present (Component_Associations (Aggr))
+  then
+ Error_Msg_N ("cannot mix positional and named entries in "
+   & "enumeration rep clause", N);
+ return;
+
   --  All tests passed, so set rep clause in place
 
   else
@@ -8013,7 +8022,7 @@ package body Sem_Ch13 is
 
   Elit := First_Literal (Enumtype);
 
-  --  First the positional entries if any
+  --  Process positional entries
 
   if Present (Expressions (Aggr)) then
  Expr := First (Expressions (Aggr));
@@ -8042,11 +8051,10 @@ package body Sem_Ch13 is
 Next (Expr);
 Next (Elit);
  end loop;
-  end if;
 
-  --  Now process the named entries if present
+  --  Process named entries
 
-  if Present (Component_Associations (Aggr)) then
+  elsif Present (Component_Associations (Aggr)) then
  Assoc := First (Component_Associations (Aggr));
  while Present (Assoc) loop
 Choice := First (Choices (Assoc));

[Ada] Remove unused initialization with New_List

2021-06-16 Thread Pierre-Marie de Rodat

Code cleanup; behaviour is unaffected. The extra initialization most
likely come from a initial version of the patch, before the limit of
parameters to a subsequent call to New_List was discovered.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch3.adb (Build_Slice_Assignment): Remove unused
initialization.diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -4388,7 +4388,7 @@ package body Exp_Ch3 is
 
   declare
  Spec: Node_Id;
- Formals : List_Id := New_List;
+ Formals : List_Id;
 
   begin
  Formals := New_List (

[Ada] Adapt Is_Actual_Parameter to also work for entry parameters

2021-06-16 Thread Pierre-Marie de Rodat

Routine Is_Actual_Parameter now also detects entry parameters; this
change is harmless for GNAT and allows this routine to be reused in
GNATprove.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.ads (Is_Actual_Parameter): Update comment.
* sem_util.adb (Is_Actual_Parameter): Also detect entry parameters.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -15467,7 +15467,9 @@ package body Sem_Util is
  when N_Parameter_Association =>
 return N = Explicit_Actual_Parameter (Parent (N));
 
- when N_Subprogram_Call =>
+ when N_Entry_Call_Statement
+| N_Subprogram_Call
+ =>
 return Is_List_Member (N)
   and then
 List_Containing (N) = Parameter_Associations (Parent (N));


diff --git a/gcc/ada/sem_util.ads b/gcc/ada/sem_util.ads
--- a/gcc/ada/sem_util.ads
+++ b/gcc/ada/sem_util.ads
@@ -1726,7 +1726,7 @@ package Sem_Util is
--  subprogram call.
 
function Is_Actual_Parameter (N : Node_Id) return Boolean;
-   --  Determines if N is an actual parameter in a subprogram call
+   --  Determines if N is an actual parameter in a subprogram or entry call
 
function Is_Actual_Tagged_Parameter (N : Node_Id) return Boolean;
--  Determines if N is an actual parameter of a formal of tagged type in a

[Ada] Wrong reference to System.Tasking in expanded code

2021-06-16 Thread Pierre-Marie de Rodat

The expanded code should never reference entities in the tasking runtime
(libgnarl) except when expanding tasking constructs directly.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* rtsfind.ads, libgnarl/s-taskin.ads, exp_ch3.adb, exp_ch4.adb,
exp_ch6.adb, exp_ch9.adb, sem_ch6.adb: Move master related
entities to the expander directly.diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -1696,8 +1696,7 @@ package body Exp_Ch3 is
 
   if Has_Task (Full_Type) then
  if Restriction_Active (No_Task_Hierarchy) then
-Append_To (Args,
-  New_Occurrence_Of (RTE (RE_Library_Task_Level), Loc));
+Append_To (Args, Make_Integer_Literal (Loc, Library_Task_Level));
  else
 Append_To (Args, Make_Identifier (Loc, Name_uMaster));
  end if;
@@ -2218,8 +2217,8 @@ package body Exp_Ch3 is
 
  if Has_Task (Rec_Type) then
 if Restriction_Active (No_Task_Hierarchy) then
-   Append_To (Args,
- New_Occurrence_Of (RTE (RE_Library_Task_Level), Loc));
+   Append_To
+ (Args, Make_Integer_Literal (Loc, Library_Task_Level));
 else
Append_To (Args, Make_Identifier (Loc, Name_uMaster));
 end if;
@@ -9071,7 +9070,7 @@ package body Exp_Ch3 is
  Defining_Identifier =>
Make_Defining_Identifier (Loc, Name_uMaster),
  Parameter_Type  =>
-   New_Occurrence_Of (RTE (RE_Master_Id), Loc)));
+   New_Occurrence_Of (Standard_Integer, Loc)));
 
  Set_Has_Master_Entity (Proc_Id);
 


diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -5193,8 +5193,8 @@ package body Exp_Ch4 is
   end if;
 
   if Restriction_Active (No_Task_Hierarchy) then
- Append_To (Args,
-   New_Occurrence_Of (RTE (RE_Library_Task_Level), Loc));
+ Append_To
+   (Args, Make_Integer_Literal (Loc, Library_Task_Level));
   else
  Append_To (Args,
New_Occurrence_Of


diff --git a/gcc/ada/exp_ch6.adb b/gcc/ada/exp_ch6.adb
--- a/gcc/ada/exp_ch6.adb
+++ b/gcc/ada/exp_ch6.adb
@@ -602,7 +602,7 @@ package body Exp_Ch6 is
   --  Use a dummy _master actual in case of No_Task_Hierarchy
 
   if Restriction_Active (No_Task_Hierarchy) then
- Actual := New_Occurrence_Of (RTE (RE_Library_Task_Level), Loc);
+ Actual := Make_Integer_Literal (Loc, Library_Task_Level);
 
   --  In the case where we use the master associated with an access type,
   --  the actual is an entity and requires an explicit reference.


diff --git a/gcc/ada/exp_ch9.adb b/gcc/ada/exp_ch9.adb
--- a/gcc/ada/exp_ch9.adb
+++ b/gcc/ada/exp_ch9.adb
@@ -1756,34 +1756,21 @@ package body Exp_Ch9 is
   --  Generate a dummy master if tasks or tasking hierarchies are
   --  prohibited.
 
-  --_Master : constant Master_Id := 3;
+  --_Master : constant Integer := Library_Task_Level;
 
   if not Tasking_Allowed
 or else Restrictions.Set (No_Task_Hierarchy)
 or else not RTE_Available (RE_Current_Master)
   then
- declare
-Expr : Node_Id;
-
- begin
---  RE_Library_Task_Level is not always available in configurable
---  RunTime
-
-if not RTE_Available (RE_Library_Task_Level) then
-   Expr := Make_Integer_Literal (Loc, Uint_3);
-else
-   Expr := New_Occurrence_Of (RTE (RE_Library_Task_Level), Loc);
-end if;
-
-Master_Decl :=
-  Make_Object_Declaration (Loc,
-Defining_Identifier =>
-  Make_Defining_Identifier (Loc, Name_uMaster),
-Constant_Present=> True,
-Object_Definition   =>
-  New_Occurrence_Of (Standard_Integer, Loc),
-Expression  => Expr);
- end;
+ Master_Decl :=
+   Make_Object_Declaration (Loc,
+ Defining_Identifier =>
+   Make_Defining_Identifier (Loc, Name_uMaster),
+ Constant_Present=> True,
+ Object_Definition   =>
+   New_Occurrence_Of (Standard_Integer, Loc),
+ Expression  =>
+   Make_Integer_Literal (Loc, Library_Task_Level));
 
   --  Generate:
   --_master : constant Integer := Current_Master.all;
@@ -3628,7 +3615,8 @@ package body Exp_Ch9 is
   Master_Decl :=
 Make_Object_Renaming_Declaration (Loc,
   Defining_Identifier => Master_Id,
-  Subtype_Mark=> New_Occurrence_Of (RTE (RE_Master_Id), Loc),
+  Subtype_Mark=>
+New_Occurrence_Of (Standa

[Ada] Implementation of AI12-0152: legality rules for Raise_Expression

2021-06-16 Thread Pierre-Marie de Rodat

This patch implements the legality rules specified in RM 11.3 (2) for
the placement of Raise_Expressions. In Ada2020 aspect specifications can
appear in several new declarative contexts, and the keyword "with" can
be part of a Raise_Expression or the start of a list of aspect
specifications.  To prevent ambiguities a Raise_Expression in such
contexts must appear within parentheses, or be part of a larger
parenthesized expression.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_res.adb (Resolve_Raise_Expression): Apply Ada_2020 rules
concerning the need for parentheses around Raise_Expressions in
various contexts.diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -10532,8 +10532,57 @@ package body Sem_Res is
   if Typ = Raise_Type then
  Error_Msg_N ("cannot find unique type for raise expression", N);
  Set_Etype (N, Any_Type);
+
   else
  Set_Etype (N, Typ);
+
+ --  Apply check for required parentheses in the enclosing
+ --  context of raise_expressions (RM 11.3 (2)), including default
+ --  expressions in contexts that can include aspect specifications,
+ --  and ancestor parts of extension aggregates.
+
+ declare
+Par : Node_Id := Parent (N);
+Parentheses_Found : Boolean := Paren_Count (N) > 0;
+
+ begin
+while Present (Par)
+  and then Nkind (Par) in N_Has_Etype
+loop
+   if Paren_Count (Par) > 0 then
+  Parentheses_Found := True;
+   end if;
+
+   if Nkind (Par) = N_Extension_Aggregate
+ and then N = Ancestor_Part (Par)
+   then
+  exit;
+   end if;
+
+   Par := Parent (Par);
+end loop;
+
+if not Parentheses_Found
+  and then Comes_From_Source (Par)
+  and then
+((Nkind (Par) in N_Modular_Type_Definition
+   | N_Floating_Point_Definition
+   | N_Ordinary_Fixed_Point_Definition
+   | N_Decimal_Fixed_Point_Definition
+   | N_Extension_Aggregate
+   | N_Discriminant_Specification
+   | N_Parameter_Specification
+   | N_Formal_Object_Declaration)
+
+  or else (Nkind (Par) = N_Object_Declaration
+and then
+  Nkind (Parent (Par)) /= N_Extended_Return_Statement))
+then
+   Error_Msg_N
+ ("raise_expression must be parenthesized in this context",
+   N);
+end if;
+ end;
   end if;
end Resolve_Raise_Expression;

[Ada] Fix aliasing check for actual parameters passed by reference

2021-06-16 Thread Pierre-Marie de Rodat

The aliasing check applies when some of the formals has their passing
mechanism unspecified; RM 6.2 (12/3). Previously it only applied when
the first formal had its passing mechanism unspecified and the second
had its passing mechanism either unspecified or by-reference.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* checks.adb (Apply_Scalar_Range_Check): Fix handling of check depending
on the parameter passing mechanism.  Grammar adjustment ("has"
=> "have").
(Parameter_Passing_Mechanism_Specified): Add a hyphen in a comment.diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -2306,6 +2306,11 @@ package body Checks is
is
   Loc : constant Source_Ptr := Sloc (Call);
 
+  function Parameter_Passing_Mechanism_Specified
+(Typ : Entity_Id)
+ return Boolean;
+  --  Returns True if parameter-passing mechanism is specified for type Typ
+
   function May_Cause_Aliasing
 (Formal_1 : Entity_Id;
  Formal_2 : Entity_Id) return Boolean;
@@ -2332,6 +2337,19 @@ package body Checks is
   --  Check contains all and-ed simple tests generated so far or remains
   --  unchanged in the case of detailed exception messaged.
 
+  ---
+  -- Parameter_Passing_Mechanism_Specified --
+  ---
+
+  function Parameter_Passing_Mechanism_Specified
+(Typ : Entity_Id)
+ return Boolean
+  is
+  begin
+ return Is_Elementary_Type (Typ)
+   or else Is_By_Reference_Type (Typ);
+  end Parameter_Passing_Mechanism_Specified;
+
   
   -- May_Cause_Aliasing --
   
@@ -2493,10 +2511,7 @@ package body Checks is
  --  Elementary types are always passed by value, therefore actuals of
  --  such types cannot lead to aliasing. An aggregate is an object in
  --  Ada 2012, but an actual that is an aggregate cannot overlap with
- --  another actual. A type that is By_Reference (such as an array of
- --  controlled types) is not subject to the check because any update
- --  will be done in place and a subsequent read will always see the
- --  correct value, see RM 6.2 (12/3).
+ --  another actual.
 
  if Nkind (Orig_Act_1) = N_Aggregate
or else (Nkind (Orig_Act_1) = N_Qualified_Expression
@@ -2504,10 +2519,7 @@ package body Checks is
  then
 null;
 
- elsif Is_Object_Reference (Orig_Act_1)
-   and then not Is_Elementary_Type (Etype (Orig_Act_1))
-   and then not Is_By_Reference_Type (Etype (Orig_Act_1))
- then
+ elsif Is_Object_Reference (Orig_Act_1) then
 Actual_2 := Next_Actual (Actual_1);
 Formal_2 := Next_Formal (Formal_1);
 while Present (Actual_2) and then Present (Formal_2) loop
@@ -2518,18 +2530,28 @@ package body Checks is
--  the mode of the two formals may lead to aliasing.
 
if Is_Object_Reference (Orig_Act_2)
- and then not Is_Elementary_Type (Etype (Orig_Act_2))
  and then May_Cause_Aliasing (Formal_1, Formal_2)
then
-  Remove_Side_Effects (Actual_1);
-  Remove_Side_Effects (Actual_2);
-
-  Overlap_Check
-(Actual_1 => Actual_1,
- Actual_2 => Actual_2,
- Formal_1 => Formal_1,
- Formal_2 => Formal_2,
- Check=> Check);
+
+  --  The aliasing check only applies when some of the formals
+  --  have their passing mechanism unspecified; RM 6.2 (12/3).
+
+  if Parameter_Passing_Mechanism_Specified (Etype (Orig_Act_1))
+   and then
+ Parameter_Passing_Mechanism_Specified (Etype (Orig_Act_2))
+  then
+ null;
+  else
+ Remove_Side_Effects (Actual_1);
+ Remove_Side_Effects (Actual_2);
+
+ Overlap_Check
+   (Actual_1 => Actual_1,
+Actual_2 => Actual_2,
+Formal_1 => Formal_1,
+Formal_2 => Formal_2,
+Check=> Check);
+  end if;
end if;
 
Next_Actual (Actual_2);

[Ada] Include info about containers in GNAT RM Implementation Advice section

2021-06-16 Thread Pierre-Marie de Rodat

For each instance of implementation advice given in the Ada RM, the
Implementation Advice section of the GNAT RM documents whether the
advice is followed (see Ada RM M.3(1)). Such documention was missing for
the implementation advice regarding Ada.Containers and its child units.
Rectify this omission.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* doc/gnat_rm/implementation_advice.rst: Add a section for RM
A.18 .
* gnat_rm.texi: Regenerate.

patch.diff.gz
Description: application/gzip

[Ada] Do not generate an Itype_Reference node for slices in GNATprove mode

2021-06-16 Thread Pierre-Marie de Rodat

As part of the work on changing side-effects removal in SPARK, a special
case was introduced to generate an Itype_Reference for Itypes in slices.
This was based on a misunderstanding of existing checks for bounds when
analyzing slices. These Itype_Reference are actually not needed to get
the corresponding run-time checks in GNATprove, and are actually harmful
in some cases (inside quantified expressions) as the insertion point for
the Itype_Reference ends up being outside of the quantifier scope,
leading to unprovable checks.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_res.adb (Set_Slice_Subtype): Revert special-case
introduced previously, which is not needed as Itypes created for
slices are precisely always used.diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -12607,10 +12607,9 @@ package body Sem_Res is
   --  the point where actions for the slice are analyzed). Note that this
   --  is different from freezing the itype immediately, which might be
   --  premature (e.g. if the slice is within a transient scope). This needs
-  --  to be done only if expansion is enabled, or in GNATprove mode to
-  --  capture the associated run-time exceptions if any.
+  --  to be done only if expansion is enabled.
 
-  elsif Expander_Active or GNATprove_Mode then
+  elsif Expander_Active then
  Ensure_Defined (Typ => Slice_Subtype, N => N);
   end if;
end Set_Slice_Subtype;

[Ada] Fix floating-point exponentiation with Integer'First exponent

2021-06-16 Thread Pierre-Marie de Rodat

It works neither at compile time nor at run time because of the classical
issue that -Integer'First is not a valid Integer value.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* urealp.adb (Scale): Change first paramter to Uint and adjust.
(Equivalent_Decimal_Exponent): Pass U.Den directly to Scale.
* libgnat/s-exponr.adb (Negative): Rename to...
(Safe_Negative): ...this and change its lower bound.
(Exponr): Adjust to above renaming and deal with Integer'First.diff --git a/gcc/ada/libgnat/s-exponr.adb b/gcc/ada/libgnat/s-exponr.adb
--- a/gcc/ada/libgnat/s-exponr.adb
+++ b/gcc/ada/libgnat/s-exponr.adb
@@ -57,8 +57,8 @@ function System.Exponr (Left : Num; Right : Integer) return Num is
subtype Double_T is Double_Real.Double_T;
--  The double floating-point type
 
-   subtype Negative is Integer range Integer'First .. -1;
-   --  The range of negative exponents
+   subtype Safe_Negative is Integer range Integer'First + 1 .. -1;
+   --  The range of safe negative exponents
 
function Expon (Left : Num; Right : Natural) return Num;
--  Routine used if Right is greater than 4
@@ -113,9 +113,12 @@ begin
 return Num'Machine (Sqr * Sqr);
  end;
 
-  when Negative =>
+  when Safe_Negative =>
  return Num'Machine (1.0 / Exponr (Left, -Right));
 
+  when Integer'First =>
+ return Num'Machine (1.0 / (Exponr (Left, Integer'Last) * Left));
+
   when others =>
  return Num'Machine (Expon (Left, Right));
end case;


diff --git a/gcc/ada/urealp.adb b/gcc/ada/urealp.adb
--- a/gcc/ada/urealp.adb
+++ b/gcc/ada/urealp.adb
@@ -270,23 +270,21 @@ package body Urealp is
 15 => (Num =>  53_385_559, Den =>   45_392_361),  -- 1.176091259055681
 16 => (Num =>  78_897_839, Den =>   65_523_237)); -- 1.204119982655924
 
-  function Scale (X : Int; R : Ratio) return Int;
+  function Scale (X : Uint; R : Ratio) return Int;
   --  Compute the value of X scaled by R
 
   ---
   -- Scale --
   ---
 
-  function Scale (X : Int; R : Ratio) return Int is
- type Wide_Int is range -2**63 .. 2**63 - 1;
-
+  function Scale (X : Uint; R : Ratio) return Int is
   begin
- return Int (Wide_Int (X) * Wide_Int (R.Num) / Wide_Int (R.Den));
+ return UI_To_Int (X * R.Num / R.Den);
   end Scale;
 
begin
   pragma Assert (U.Rbase /= 0);
-  return Scale (UI_To_Int (U.Den), Logs (U.Rbase));
+  return Scale (U.Den, Logs (U.Rbase));
end Equivalent_Decimal_Exponent;

[Ada] Fix detection of volatile expressions in restricted contexts

2021-06-16 Thread Pierre-Marie de Rodat

Detection of volatile expressions, i.e. references to volatile objects
and allocators, is done in two steps: first when analysing entity names
and allocators themselves (except when they occur within actual
parameters of subprogram calls) and then after the subprogram call has
been resolved (so that we know if such volatile expressions are allowed
by the type of the corresponding formal parameter).

However, conditions used in each of these steps were duplicated and thus
inconsistent. This is fixed by this patch, so now all the conditions are
in just one place (i.e. in Is_OK_Volatile_Context whose new parameter
Check_Actuals to examine expressions within subprogram call parameters).

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_res.adb (Flag_Effectively_Volatile_Objects): Detect also
allocators within restricted contexts and not just entity names.
(Resolve_Actuals): Remove duplicated code for detecting
restricted contexts; it is now exclusively done in
Is_OK_Volatile_Context.
(Resolve_Entity_Name): Adapt to new parameter of
Is_OK_Volatile_Context.
* sem_util.ads, sem_util.adb (Is_OK_Volatile_Context): Adapt to
handle contexts both inside and outside of subprogram call
actual parameters.
(Within_Subprogram_Call): Remove; now handled by
Is_OK_Volatile_Context itself and its parameter.diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -3755,19 +3755,18 @@ package body Sem_Res is
 
  begin
 case Nkind (N) is
-
-   --  Do not consider object name appearing in the prefix of
-   --  attribute Address as a read.
-
-   when N_Attribute_Reference =>
-
-  --  Prefix of attribute Address denotes an object, program
-  --  unit, or label; none of them needs to be flagged here.
-
-  if Attribute_Name (N) = Name_Address then
- return Skip;
+   when N_Allocator =>
+  if not Is_OK_Volatile_Context (Context   => Parent (N),
+ Obj_Ref   => N,
+ Check_Actuals => True)
+  then
+ Error_Msg_N
+   ("allocator cannot appear in this context"
+& " (SPARK RM 7.1.3(10))", N);
   end if;
 
+  return Skip;
+
--  Do not consider nested function calls because they have
--  already been processed during their own resolution.
 
@@ -3780,6 +3779,10 @@ package body Sem_Res is
   if Present (Id)
 and then Is_Object (Id)
 and then Is_Effectively_Volatile_For_Reading (Id)
+and then
+  not Is_OK_Volatile_Context (Context   => Parent (N),
+  Obj_Ref   => N,
+  Check_Actuals => True)
   then
  Error_Msg_N
("volatile object cannot appear in this context"
@@ -3789,10 +3792,8 @@ package body Sem_Res is
   return Skip;
 
when others =>
-  null;
+  return OK;
 end case;
-
-return OK;
  end Flag_Object;
 
  procedure Flag_Objects is new Traverse_Proc (Flag_Object);
@@ -4962,40 +4963,14 @@ package body Sem_Res is
 
 if SPARK_Mode = On and then Comes_From_Source (A) then
 
-   --  An effectively volatile object for reading may act as an
-   --  actual when the corresponding formal is of a non-scalar
-   --  effectively volatile type for reading (SPARK RM 7.1.3(10)).
+   --  Inspect the expression and flag each effectively volatile
+   --  object for reading as illegal because it appears within
+   --  an interfering context. Note that this is usually done
+   --  in Resolve_Entity_Name, but when the effectively volatile
+   --  object for reading appears as an actual in a call, the call
+   --  must be resolved first.
 
-   if not Is_Scalar_Type (F_Typ)
- and then Is_Effectively_Volatile_For_Reading (F_Typ)
-   then
-  null;
-
-   --  An effectively volatile object for reading may act as an
-   --  actual in a call to an instance of Unchecked_Conversion.
-   --  (SPARK RM 7.1.3(10)).
-
-   elsif Is_Unchecked_Conversion_Instance (Nam) then
-  null;
-
-   --  The actual denotes an object
-
-   elsif Is_Effectively_Volatile_Object_For_Reading (A) then
-  Error_Ms

[Ada] Fix Is_Volatile_Function for functions declared in protected bodies

2021-06-16 Thread Pierre-Marie de Rodat

Function declared immediately within a protected body is a not a
protected function; the exact definition is RM 9.5.1(1): "A protected
subprogram is a subprogram declared immediately within a protected
definition."

Consequently, functions declared immediately within a protected body are
not volatile.

This fix primarily affects SPARK legality checking; for compilation it
only affects a warning about infinite loops.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.adb (Is_Volatile_Function): Follow the exact wording
of SPARK (regarding volatile functions) and Ada (regarding
protected functions).diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -21105,9 +21105,11 @@ package body Sem_Util is
begin
   pragma Assert (Ekind (Func_Id) in E_Function | E_Generic_Function);
 
-  --  A function declared within a protected type is volatile
+  --  A protected function is volatile
 
-  if Is_Protected_Type (Scope (Func_Id)) then
+  if Nkind (Parent (Unit_Declaration_Node (Func_Id))) =
+   N_Protected_Definition
+  then
  return True;
 
   --  An instance of Ada.Unchecked_Conversion is a volatile function if

[Ada] Ignore volatile restrictions in preanalysis

2021-06-16 Thread Pierre-Marie de Rodat

When detecting references to volatile objects in expressions of the
expression functions we couldn't determine the enclosing function. This
was because we examined a copy of the expression made for preanalysis
and this copy is not properly decorated. Consequently, we wrongly
rejected valid references like:

   Data : Integer
 with Atomic, Async_Readers => True, Async_Writers => True;

   function F return Integer is (Data) with Volatile_Function;

This patch effectively disables the detection of references to volatile
objects in preanalysis by assuming all such references to be legal.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_util.adb (Is_OK_Volatile_Context): All references to
volatile objects are legal in preanalysis.
(Within_Volatile_Function): Previously it was wrongly called on
Empty entities; now it is only called on E_Return_Statement,
which allow the body to be greatly simplified.diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -18871,27 +18871,14 @@ package body Sem_Util is
   --
 
   function Within_Volatile_Function (Id : Entity_Id) return Boolean is
- Func_Id : Entity_Id;
+ pragma Assert (Ekind (Id) = E_Return_Statement);
 
-  begin
- --  Traverse the scope stack looking for a [generic] function
-
- Func_Id := Id;
- while Present (Func_Id) and then Func_Id /= Standard_Standard loop
-if Ekind (Func_Id) in E_Function | E_Generic_Function then
-
-   --  ??? This routine could just use Return_Applies_To, but it
-   --  is currently wrongly called by unanalyzed return statements
-   --  coming from expression functions.
-   pragma Assert (Func_Id = Return_Applies_To (Id));
+ Func_Id : constant Entity_Id := Return_Applies_To (Id);
 
-   return Is_Volatile_Function (Func_Id);
-end if;
-
-Func_Id := Scope (Func_Id);
- end loop;
+  begin
+ pragma Assert (Ekind (Func_Id) in E_Function | E_Generic_Function);
 
- return False;
+ return Is_Volatile_Function (Func_Id);
   end Within_Volatile_Function;
 
   --  Local variables
@@ -18901,6 +1,15 @@ package body Sem_Util is
--  Start of processing for Is_OK_Volatile_Context
 
begin
+  --  Ignore context restriction when doing preanalysis, e.g. on a copy of
+  --  an expression function, because this copy is not fully decorated and
+  --  it is not possible to reliably decide the legality of the context.
+  --  Any violations will be reported anyway when doing the full analysis.
+
+  if not Full_Analysis then
+ return True;
+  end if;
+
   --  For actual parameters within explicit parameter associations switch
   --  the context to the corresponding subprogram call.

[PATCH] testsuite: aarch64: Add zero-high-half tests for narrowing shifts

2021-06-16 Thread Jonathan Wright via Gcc-patches

Hi,

This patch adds tests to verify that Neon narrowing-shift instructions
clear the top half of the result vector. It is sufficient to show that a
subsequent combine with a zero-vector is optimized away - leaving
just the narrowing-shift instruction.

Ok for master?

Thanks,
Jonathan

---

gcc/testsuite/ChangeLog:

2021-06-15  Jonathan Wright  

* gcc.target/aarch64/narrow_zero_high_half.c: New test.


rb14569.patch
Description: rb14569.patch

[PATCH V2] aarch64: Model zero-high-half semantics of XTN instruction in RTL

2021-06-16 Thread Jonathan Wright via Gcc-patches

Hi,

Version 2 of this patch adds tests to verify the benefit of this change.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-06-11  Jonathan Wright  

* config/aarch64/aarch64-simd.md (aarch64_xtn_insn_le):
Define - modeling zero-high-half semantics.
(aarch64_xtn): Change to an expander that emits the
appropriate instruction depending on endianness.
(aarch64_xtn_insn_be): Define - modeling zero-high-half
semantics.
(aarch64_xtn2_le): Rename to...
(aarch64_xtn2_insn_le): This.
(aarch64_xtn2_be): Rename to...
(aarch64_xtn2_insn_be): This.
(vec_pack_trunc_): Emit truncation instruction instead
of aarch64_xtn.
* config/aarch64/iterators.md (Vnarrowd): Add Vnarrowd mode
attribute iterator.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.


From: Gcc-patches  on 
behalf of Jonathan Wright via Gcc-patches 
Sent: 15 June 2021 10:45
To: gcc-patches@gcc.gnu.org 
Subject: [PATCH] aarch64: Model zero-high-half semantics of XTN instruction in 
RTL 
 
Hi,

Modeling the zero-high-half semantics of the XTN narrowing
instruction in RTL indicates to the compiler that this is a totally
destructive operation. This enables more RTL simplifications and also
prevents some register allocation issues.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-06-11  Jonathan Wright  

    * config/aarch64/aarch64-simd.md (aarch64_xtn_insn_le):
    Define - modeling zero-high-half semantics.
    (aarch64_xtn): Change to an expander that emits the
    appropriate instruction depending on endianness.
    (aarch64_xtn_insn_be): Define - modeling zero-high-half
    semantics.
    (aarch64_xtn2_le): Rename to...
    (aarch64_xtn2_insn_le): This.
    (aarch64_xtn2_be): Rename to...
    (aarch64_xtn2_insn_be): This.
    (vec_pack_trunc_): Emit truncation instruction instead
    of aarch64_xtn.
    * config/aarch64/iterators.md (Vnarrowd): Add Vnarrowd mode
    attribute iterator.

rb14563.patch
Description: rb14563.patch

Re: [PATCH] tree-sra: Do not refresh readonly decls (PR 100453)

2021-06-16 Thread Martin Jambor

Hi Richi,

On Tue, Jun 15 2021, Richard Biener wrote:
> On June 15, 2021 5:09:40 PM GMT+02:00, Martin Jambor  wrote:
>>Hi,
>>
>>When SRA transforms an assignment where the RHS is an aggregate decl
>>that it creates replacements for, the (least efficient) fallback method
>>of dealing with them is to store all the replacements back into the
>>original decl and then let the original assignment takes its course.
>>
>>That of course should not need to be done for TREE_READONLY bases which
>>cannot change contents.  The SRA code handled this situation only for
>>DECL_IN_CONSTANT_POOL const decls, this patch modifies the check so
>>that
>>it tests for TREE_READONLY and I also looked at all other callers of
>>generate_subtree_copies and added checks to another one dealing with
>>the
>>same exact situation and one which deals with it in a non-assignment
>>context.
>>
>>This behavior also means that SRA has to disqualify any candidate decl
>>that is read-only and written to.  I plan to continue to hunt down at
>>least some of such occurrences.
>>
>>Bootstrapped and tested on x86_64-linux, i686-linux and aarch64-linux
>>(this time With Ada enabled on all three platforms).  OK for trunk?
>
> Ok. 
>
> Thanks, 
> Richard. 
>

Thanks for a quick approval.  However, when looking for sources of
additional non-read-only TREE_READONLY decls, I found the following code
and comment in setup_one_parameter() in tree-inline.c, and the last
comment sentence made me wonder if my patch is perhaps too strict:

  /* Even if P was TREE_READONLY, the new VAR should not be.
 In the original code, we would have constructed a
 temporary, and then the function body would have never
 changed the value of P.  However, now, we will be
 constructing VAR directly.  The constructor body may
 change its value multiple times as it is being
 constructed.  Therefore, it must not be TREE_READONLY;
 the back-end assumes that TREE_READONLY variable is
 assigned to only once.  */
  if (TYPE_NEEDS_CONSTRUCTING (TREE_TYPE (p)))
TREE_READONLY (var) = 0;

Is the last sentence in the comment true?  Do we want it to be true?  It
contradicts the description of TREE_READONLY in tree.h.  (Would the
described property ever be useful in the middle-end or back-end?)

Thanks,

Martin

>>Thanks,
>>
>>Martin
>>
>>
>>gcc/ChangeLog:
>>
>>2021-06-11  Martin Jambor  
>>
>>  PR tree-optimization/100453
>>  * tree-sra.c (create_access): Disqualify any const candidates
>>  which are written to.
>>  (sra_modify_expr): Do not store sub-replacements back to a const base.
>>  (handle_unscalarized_data_in_subtree): Likewise.
>>  (sra_modify_assign): Likewise.  Earlier, use TREE_READONLy test
>>  instead of constant_decl_p.
>>
>>gcc/testsuite/ChangeLog:
>>
>>2021-06-11  Martin Jambor  
>>
>>  PR tree-optimization/100453
>>  * gcc.dg/tree-ssa/pr100453.c: New test.

[PATCH V2] aarch64: Model zero-high-half semantics of SQXTUN instruction in RTL

2021-06-16 Thread Jonathan Wright via Gcc-patches

Hi,

Version 2 of the patch adds tests to verify the benefit of this change.

Ok for master?

Thanks,
Jonathan

---
gcc/ChangeLog:

2021-06-14  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Split generator
for aarch64_sqmovun builtins into scalar and vector variants.
* config/aarch64/aarch64-simd.md (aarch64_sqmovun):
Split into scalar and vector variants. Change vector variant
to an expander that emits the correct instruction depending
on endianness.
(aarch64_sqmovun_insn_le): Define.
(aarch64_sqmovun_insn_be): Define.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.

From: Gcc-patches  on 
behalf of Jonathan Wright via Gcc-patches 
Sent: 15 June 2021 10:52
To: gcc-patches@gcc.gnu.org 
Subject: [PATCH] aarch64: Model zero-high-half semantics of SQXTUN instruction 
in RTL 
 
Hi,

As subject, this patch first splits the aarch64_sqmovun pattern
into separate scalar and vector variants. It then further split the vector
pattern into big/little endian variants that model the zero-high-half
semantics of the underlying instruction. Modeling these semantics
allows for better RTL combinations while also removing some register
allocation issues as the compiler now knows that the operation is
totally destructive.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-06-14  Jonathan Wright  

    * config/aarch64/aarch64-simd-builtins.def: Split generator
    for aarch64_sqmovun builtins into scalar and vector variants.
    * config/aarch64/aarch64-simd.md (aarch64_sqmovun):
    Split into scalar and vector variants. Change vector variant
    to an expander that emits the correct instruction depending
    on endianness.
    (aarch64_sqmovun_insn_le): Define.
    (aarch64_sqmovun_insn_be): Define.

rb14564.patch
Description: rb14564.patch

[PATCH V2] aarch64: Model zero-high-half semantics of [SU]QXTN instructions

2021-06-16 Thread Jonathan Wright via Gcc-patches

Hi,

Version 2 of the patch adds tests to verify the benefit of this change.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-06-14  Jonathan Wright  

* config/aarch64/aarch64-simd-builtins.def: Split generator
for aarch64_qmovn builtins into scalar and vector
variants.
* config/aarch64/aarch64-simd.md (aarch64_qmovn_insn_le):
Define.
(aarch64_qmovn_insn_be): Define.
(aarch64_qmovn): Split into scalar and vector
variants. Change vector variant to an expander that emits the
correct instruction depending on endianness.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.


From: Gcc-patches  on 
behalf of Jonathan Wright via Gcc-patches 
Sent: 15 June 2021 10:59
To: gcc-patches@gcc.gnu.org 
Subject: [PATCH] aarch64: Model zero-high-half semantics of [SU]QXTN 
instructions 
 
Hi,

As subject, this patch first splits the aarch64_qmovn
pattern into separate scalar and vector variants. It then further splits
the vector RTL  pattern into big/little endian variants that model the
zero-high-half semantics of the underlying instruction. Modeling
these semantics allows for better RTL combinations while also
removing some register allocation issues as the compiler now knows
that the operation is totally destructive.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-06-14  Jonathan Wright  

    * config/aarch64/aarch64-simd-builtins.def: Split generator
    for aarch64_qmovn builtins into scalar and vector
    variants.
    * config/aarch64/aarch64-simd.md (aarch64_qmovn_insn_le):
    Define.
    (aarch64_qmovn_insn_be): Define.
    (aarch64_qmovn): Split into scalar and vector
    variants. Change vector variant to an expander that emits the
    correct instruction depending on endianness.

rb14565.patch
Description: rb14565.patch

[PATCH V2] aarch64: Model zero-high-half semantics of ADDHN/SUBHN instructions

2021-06-16 Thread Jonathan Wright via Gcc-patches

Hi,

Version 2 of this patch adds tests to verify the benefit of this change.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-06-14  Jonathan Wright  

* config/aarch64/aarch64-simd.md (aarch64_hn):
Change to an expander that emits the correct instruction
depending on endianness.
(aarch64_hn_insn_le): Define.
(aarch64_hn_insn_be): Define.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.

From: Gcc-patches  on 
behalf of Jonathan Wright via Gcc-patches 
Sent: 15 June 2021 11:02
To: gcc-patches@gcc.gnu.org 
Subject: [PATCH] aarch64: Model zero-high-half semantics of ADDHN/SUBHN 
instructions 
 
Hi,

As subject, this patch models the zero-high-half semantics of the
narrowing arithmetic Neon instructions in the
aarch64_hn RTL pattern. Modeling these
semantics allows for better RTL combinations while also removing
some register allocation issues as the compiler now knows that the
operation is totally destructive.

Regression tested and bootstrapped on aarch64-none-linux-gnu - no
issues.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-06-14  Jonathan Wright  

    * config/aarch64/aarch64-simd.md (aarch64_hn):
    Change to an expander that emits the correct instruction
    depending on endianness.
    (aarch64_hn_insn_le): Define.
    (aarch64_hn_insn_be): Define.

rb14566.patch
Description: rb14566.patch

Re: [PATCH] gcc/configure.ac: fix register issue for global_load assembler functions

2021-06-16 Thread Marcel Vollweiler


Changed the variable "gcc_cv_as_global_load_fixed" into
"gcc_cv_as_gcn_global_load_fixed" in order to have the "gcn" substring
also in the config.patch file.


Am 09.06.2021 um 16:47 schrieb Marcel Vollweiler:

This patch fixes an issue with global_load assembler functions leading
to a "invalid operand for instruction" error since in different LLVM
versions those functions use either one or two registers.

In this patch a compatibility check is added to the configure.ac.

Marcel
-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
Frank Thürauf


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
gcc/configure.ac: Adapt configuration according to assembler fix of global_load 
functions.

gcc/ChangeLog:

* config.in: Regenerate.
* config/gcn/gcn.c (print_operand_address): Fix for global_load 
assembler
functions.
* configure: Regenerate.
* configure.ac: Fix for global_load assembler functions. 

diff --git a/gcc/config.in b/gcc/config.in
index e54f59c..18e6271 100644
--- a/gcc/config.in
+++ b/gcc/config.in
@@ -1431,6 +1431,12 @@
 #endif
 
 
+/* Define if your assembler has fixed global_load functions. */
+#ifndef USED_FOR_TARGET
+#undef HAVE_GCN_ASM_GLOBAL_LOAD_FIXED
+#endif
+
+
 /* Define to 1 if you have the `getchar_unlocked' function. */
 #ifndef USED_FOR_TARGET
 #undef HAVE_GETCHAR_UNLOCKED
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 283a91f..2d27296 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -5481,13 +5481,24 @@ print_operand_address (FILE *file, rtx mem)
  if (vgpr_offset == NULL_RTX)
/* In this case, the vector offset is zero, so we use the first
   lane of v1, which is initialized to zero.  */
-   fprintf (file, "v[1:2]");
+   {
+#if HAVE_GCN_ASM_GLOBAL_LOAD_FIXED == 1
+   fprintf (file, "v1"); 
+#else
+   fprintf (file, "v[1:2]");
+#endif
+   }
  else if (REG_P (vgpr_offset)
   && VGPR_REGNO_P (REGNO (vgpr_offset)))
{
- fprintf (file, "v[%d:%d]",
-  REGNO (vgpr_offset) - FIRST_VGPR_REG,
-  REGNO (vgpr_offset) - FIRST_VGPR_REG + 1);
+#if HAVE_GCN_ASM_GLOBAL_LOAD_FIXED == 1
+   fprintf (file, "v%d",
+REGNO (vgpr_offset) - FIRST_VGPR_REG);
+#else
+   fprintf (file, "v[%d:%d]",
+REGNO (vgpr_offset) - FIRST_VGPR_REG,
+REGNO (vgpr_offset) - FIRST_VGPR_REG + 1);
+#endif
}
  else
output_operand_lossage ("bad ADDR_SPACE_GLOBAL address");
diff --git a/gcc/configure b/gcc/configure
index 4a9e4fa..8843a8f 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -28909,6 +28909,36 @@ case "$target" in
 ;;
 esac
 
+# This tests if the assembler supports two registers for global_load functions
+# (like in LLVM versions <12) or one register (like in LLVM 12).
+case "$target" in
+  amdgcn-* | gcn-*)
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking assembler fix for 
global_load functions" >&5
+$as_echo_n "checking assembler fix for global_load functions... " >&6; }
+gcc_cv_as_gcn_global_load_fixed=yes
+if test x$gcc_cv_as != x; then
+  cat > conftest.s < /dev/null 2>&1; then
+gcc_cv_as_gcn_global_load_fixed=no
+  fi
+  rm -f conftest.s conftest.o conftest
+fi
+if test x$gcc_cv_as_gcn_global_load_fixed = xyes; then
+
+$as_echo "#define HAVE_GCN_ASM_GLOBAL_LOAD_FIXED 1" >>confdefs.h
+
+else
+
+$as_echo "#define HAVE_GCN_ASM_GLOBAL_LOAD_FIXED 0" >>confdefs.h
+
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: 
$gcc_cv_as_gcn_global_load_fixed" >&5
+$as_echo "$gcc_cv_as_gcn_global_load_fixed" >&6; }
+;;
+esac
+
 # ??? Not all targets support dwarf2 debug_line, even within a version
 # of gas.  Moreover, we need to emit a valid instruction to trigger any
 # info to the output file.  So, as supported targets are added to gas 2.11,
diff --git a/gcc/configure.ac b/gcc/configure.ac
index d9fc3c2..e179ce1 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -5357,6 +5357,30 @@ case "$target" in
 ;;
 esac
 
+# This tests if the assembler supports two registers for global_load functions
+# (like in LLVM versions <12) or one register (like in LLVM 12).
+case "$target" in
+  amdgcn-* | gcn-*)
+AC_MSG_CHECKING(assembler fix for global_load functions)
+gcc_cv_as_gcn_global_load_fixed=yes
+if test x$gcc_cv_as != x; then
+  cat > conftest.s < /dev/null 2>&1; then
+gcc_cv_as_gcn_global_load_fixed=no
+  fi
+  rm -f conftest.s conftest.o conftest
+fi
+if

Re: [PATCH] tree-optimization PR/101014 - Limit new value calculations to first order effects.

2021-06-16 Thread Maxim Kuvyrkov via Gcc-patches



> On 15 Jun 2021, at 00:07, Andrew MacLeod via Gcc-patches 
>  wrote:
> 
> As mentioned in the Text from the PR:
> 
> "When a range is being calculated for an ssa-name, the propagation process 
> often goes along back edges. These back edges sometime require other 
> ssa-names which have not be processed yet. These are flagged as "poor values" 
> and when propagation is done, we visit the list of poor values, calculate 
> them, and see if that may result if a better range for the original ssa-name.
> 
> The problem is that calculating these poor values may also spawn another set 
> of requests since the block at the far end of the back edge has not been 
> processed yet... its highly likely that some additional unprocessed ssa-names 
> are used in the calculation of that name, but typically they do not affect 
> the current range in a significant way.
> 
> Thus we mostly we care about the first order effect only.  It turns out to be 
> very rare that a 2nd order effect on a back edge affects anything that we 
> don't catch later.
> 
> This patch turns off poor-value tagging when looking up the first order 
> values, thus avoiding the 2nd order and beyond cascading effects.
> 
> I haven't found a test case we miss yet because of this change, yet it 
> probably resolves a number of the outstanding compilation problems in a 
> significant way.
> 
> I think this will probably apply to gcc 11 in some form as well, so I'll look 
> at an equivalent patch for there."
> 
> 
> This patch simplifies the enable_new_value routines.. replacing the 
> enable/disable with an enable with flag routine, which returns the previous 
> value.This lets us change the mode and then set it back to what it was 
> before.  Seems better in general.
> 
> Then disables new values for 2nd+ order effects. GCC11 patch forthcoming.
> 
> Bootstraps on x86_64-pc-linux-gnu, no regressions.  pushed.
> 
> Andrew

Hi Andrew,

This causes bootstrap-with-ubsan failure on at least aarch64-linux-gnu, likely, 
others:

# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 48, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 48, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 32, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 48, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 32, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 48, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 32, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 32, which is not a valid value for type 'bool'

> 
> @@ -748,21 +748,15 @@ ranger_cache::dump (FILE *f)
>fprintf (f, "\n");
>  }
>  
> -// Allow the cache to flag and query new values when propagation is forced
> -// to use an unknown value.
> +// Allow or disallow the cache to flag and query new values when propagation
> +// is forced to use an unknown value.  The previous state is returned.
>  
> -void
> -ranger_cache::enable_new_values ()
> -{
> -  m_new_value_p = true;
> -}
> -
> -// Disable new value querying.
> -
> -void
> -ranger_cache::disable_new_values ()
> +bool
> +ranger_cache::enable_new_values (bool state)
>  {
> -  m_new_value_p = false;
> +  bool ret = m_new_value_p;

I think changing this to
  bool ret = (bool) m_new_value_p;
might be enough, but you know this code better.

Would you please take a look at this?

> +  m_new_value_p = state;
> +  return ret;
>  }
>  
>  // Dump the caches for basic block BB to file F.

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org

Re: [PATCH] stor-layout: Create DECL_BIT_FIELD_REPRESENTATIVE even for bitfields in unions [PR101062]

2021-06-16 Thread Richard Biener

On Wed, 16 Jun 2021, Jakub Jelinek wrote:

> Hi!
> 
> The following testcase is miscompiled on x86_64-linux, the bitfield store
> is implemented as a RMW 64-bit operation at d+24 when the d variable has
> size of only 28 bytes and scheduling moves in between the R and W part
> a store to a different variable that happens to be right after the d
> variable.
> 
> The reason for this is that we weren't creating
> DECL_BIT_FIELD_REPRESENTATIVEs for bitfields in unions.
> 
> The following patch does create them, but treats all such bitfields as if
> they were in a structure where the particular bitfield is the only field.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2021-06-16  Jakub Jelinek  
> 
>   PR middle-end/101062
>   * stor-layout.c (finish_bitfield_representative): For fields in unions
>   assume nextf is always NULL.
>   (finish_bitfield_layout): Compute bit field representatives also in
>   unions, but handle it as if each bitfield was the only field in the
>   aggregate.
> 
>   * gcc.dg/pr101062.c: New test.
> 
> --- gcc/stor-layout.c.jj  2021-03-30 18:11:52.537092233 +0200
> +++ gcc/stor-layout.c 2021-06-15 10:58:59.244353965 +0200
> @@ -2072,9 +2072,14 @@ finish_bitfield_representative (tree rep
>bitsize = (bitsize + BITS_PER_UNIT - 1) & ~(BITS_PER_UNIT - 1);
>  
>/* Now nothing tells us how to pad out bitsize ...  */
> -  nextf = DECL_CHAIN (field);
> -  while (nextf && TREE_CODE (nextf) != FIELD_DECL)
> -nextf = DECL_CHAIN (nextf);
> +  if (TREE_CODE (DECL_CONTEXT (field)) == RECORD_TYPE)
> +{
> +  nextf = DECL_CHAIN (field);
> +  while (nextf && TREE_CODE (nextf) != FIELD_DECL)
> + nextf = DECL_CHAIN (nextf);
> +}
> +  else
> +nextf = NULL_TREE;
>if (nextf)
>  {
>tree maxsize;
> @@ -2167,13 +2172,6 @@ finish_bitfield_layout (tree t)
>tree field, prev;
>tree repr = NULL_TREE;
>  
> -  /* Unions would be special, for the ease of type-punning optimizations
> - we could use the underlying type as hint for the representative
> - if the bitfield would fit and the representative would not exceed
> - the union in size.  */
> -  if (TREE_CODE (t) != RECORD_TYPE)
> -return;
> -
>for (prev = NULL_TREE, field = TYPE_FIELDS (t);
> field; field = DECL_CHAIN (field))
>  {
> @@ -2233,7 +2231,13 @@ finish_bitfield_layout (tree t)
>if (repr)
>   DECL_BIT_FIELD_REPRESENTATIVE (field) = repr;
>  
> -  prev = field;
> +  if (TREE_CODE (t) == RECORD_TYPE)
> + prev = field;
> +  else if (repr)
> + {
> +   finish_bitfield_representative (repr, field);
> +   repr = NULL_TREE;
> + }
>  }
>  
>if (repr)
> --- gcc/testsuite/gcc.dg/pr101062.c.jj2021-06-15 10:42:58.642919880 
> +0200
> +++ gcc/testsuite/gcc.dg/pr101062.c   2021-06-15 10:42:40.897171191 +0200
> @@ -0,0 +1,29 @@
> +/* PR middle-end/101062 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fno-toplevel-reorder -frename-registers" } */
> +
> +union U { signed b : 5; };
> +int c;
> +volatile union U d[7] = { { 8 } };
> +short e = 1;
> +
> +__attribute__((noipa)) void
> +foo ()
> +{
> +  d[6].b = 0;
> +  d[6].b = 0;
> +  d[6].b = 0;
> +  d[6].b = 0;
> +  d[6].b = 0;
> +  e = 0;
> +  c = 0;
> +}
> +
> +int
> +main ()
> +{
> +  foo ();
> +  if (e != 0)
> +__builtin_abort ();
> +  return 0;
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH] tree-sra: Do not refresh readonly decls (PR 100453)

2021-06-16 Thread Richard Biener

On Wed, 16 Jun 2021, Martin Jambor wrote:

> Hi Richi,
> 
> On Tue, Jun 15 2021, Richard Biener wrote:
> > On June 15, 2021 5:09:40 PM GMT+02:00, Martin Jambor  
> > wrote:
> >>Hi,
> >>
> >>When SRA transforms an assignment where the RHS is an aggregate decl
> >>that it creates replacements for, the (least efficient) fallback method
> >>of dealing with them is to store all the replacements back into the
> >>original decl and then let the original assignment takes its course.
> >>
> >>That of course should not need to be done for TREE_READONLY bases which
> >>cannot change contents.  The SRA code handled this situation only for
> >>DECL_IN_CONSTANT_POOL const decls, this patch modifies the check so
> >>that
> >>it tests for TREE_READONLY and I also looked at all other callers of
> >>generate_subtree_copies and added checks to another one dealing with
> >>the
> >>same exact situation and one which deals with it in a non-assignment
> >>context.
> >>
> >>This behavior also means that SRA has to disqualify any candidate decl
> >>that is read-only and written to.  I plan to continue to hunt down at
> >>least some of such occurrences.
> >>
> >>Bootstrapped and tested on x86_64-linux, i686-linux and aarch64-linux
> >>(this time With Ada enabled on all three platforms).  OK for trunk?
> >
> > Ok. 
> >
> > Thanks, 
> > Richard. 
> >
> 
> Thanks for a quick approval.  However, when looking for sources of
> additional non-read-only TREE_READONLY decls, I found the following code
> and comment in setup_one_parameter() in tree-inline.c, and the last
> comment sentence made me wonder if my patch is perhaps too strict:
> 
>   /* Even if P was TREE_READONLY, the new VAR should not be.
>  In the original code, we would have constructed a
>  temporary, and then the function body would have never
>  changed the value of P.  However, now, we will be
>  constructing VAR directly.  The constructor body may
>  change its value multiple times as it is being
>  constructed.  Therefore, it must not be TREE_READONLY;
>  the back-end assumes that TREE_READONLY variable is
>  assigned to only once.  */
>   if (TYPE_NEEDS_CONSTRUCTING (TREE_TYPE (p)))
> TREE_READONLY (var) = 0;
> 
> Is the last sentence in the comment true?  Do we want it to be true?  It
> contradicts the description of TREE_READONLY in tree.h.  (Would the
> described property ever be useful in the middle-end or back-end?)

I think the last sentence refers to RTX_UNCHANGING_P which we thankfully
removed.  Now, that means we need to clear TREE_READONLY unconditionally
here I think (unless we can prove it's uninitialized in the caller,
but I guess we don't need to prematurely optimize that case).

Richard.

> Thanks,
> 
> Martin
> 
> >>Thanks,
> >>
> >>Martin
> >>
> >>
> >>gcc/ChangeLog:
> >>
> >>2021-06-11  Martin Jambor  
> >>
> >>PR tree-optimization/100453
> >>* tree-sra.c (create_access): Disqualify any const candidates
> >>which are written to.
> >>(sra_modify_expr): Do not store sub-replacements back to a const base.
> >>(handle_unscalarized_data_in_subtree): Likewise.
> >>(sra_modify_assign): Likewise.  Earlier, use TREE_READONLy test
> >>instead of constant_decl_p.
> >>
> >>gcc/testsuite/ChangeLog:
> >>
> >>2021-06-11  Martin Jambor  
> >>
> >>PR tree-optimization/100453
> >>* gcc.dg/tree-ssa/pr100453.c: New test.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH, V2] auto_vec copy/move improvements

2021-06-16 Thread Richard Biener via Gcc-patches

On Wed, Jun 16, 2021 at 5:18 AM Trevor Saunders  wrote:
>
> - Unfortunately using_auto_storage () needs to handle m_vec being null.
> - Handle self move of an auto_vec to itself.
> - Make sure auto_vec defines the classes move constructor and assignment
>   operator, as well as ones taking vec, so the compiler does not generate
> them for us.  Per https://en.cppreference.com/w/cpp/language/move_constructor
> the ones taking vec do not count as the classes move constructor or
> assignment operator, but we want them as well to assign a plain vec to a
> auto_vec.
> - Explicitly delete auto_vec's copy constructor and assignment operator.  This
>   prevents unintentional expenssive coppies of the vector and makes it clear
> when coppies are needed that that is what is intended.  When it is necessary 
> to
> copy a vector copy () can be used.
>
> Signed-off-by: Trevor Saunders 
>
> This time without the changes to the inline storage version of auto_vec as
> requested.  bootstrap andregtest on x86_64-linux-gnu with the other patches in
> the series ongoing, ok if that passes?

OK.

Thanks,
Richard.

> Thanks
>
> Trev
>
> gcc/ChangeLog:
>
> * vec.h (vl_ptr>::using_auto_storage): Handle null m_vec.
> (auto_vec::auto_vec): Define move constructor, and delete copy
> constructor.
> (auto_vec::operator=): Define move assignment and delete copy
> assignment.
> ---
>  gcc/vec.h | 31 ++-
>  1 file changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/vec.h b/gcc/vec.h
> index 193377cb69c..30ef9a69473 100644
> --- a/gcc/vec.h
> +++ b/gcc/vec.h
> @@ -1570,14 +1570,43 @@ public:
>this->m_vec = r.m_vec;
>r.m_vec = NULL;
>  }
> +
> +  auto_vec (auto_vec &&r)
> +{
> +  gcc_assert (!r.using_auto_storage ());
> +  this->m_vec = r.m_vec;
> +  r.m_vec = NULL;
> +}
> +
>auto_vec& operator= (vec&& r)
>  {
> +   if (this == &r)
> +   return *this;
> +
> +  gcc_assert (!r.using_auto_storage ());
> +  this->release ();
> +  this->m_vec = r.m_vec;
> +  r.m_vec = NULL;
> +  return *this;
> +}
> +
> +  auto_vec& operator= (auto_vec &&r)
> +{
> +   if (this == &r)
> +   return *this;
> +
>gcc_assert (!r.using_auto_storage ());
>this->release ();
>this->m_vec = r.m_vec;
>r.m_vec = NULL;
>return *this;
>  }
> +
> +  // You probably don't want to copy a vector, so these are deleted to 
> prevent
> +  // unintentional use.  If you really need a copy of the vectors contents 
> you
> +  // can use copy ().
> +  auto_vec(const auto_vec &) = delete;
> +  auto_vec &operator= (const auto_vec &) = delete;
>  };
>
>
> @@ -2147,7 +2176,7 @@ template
>  inline bool
>  vec::using_auto_storage () const
>  {
> -  return m_vec->m_vecpfx.m_using_auto_storage;
> +  return m_vec ? m_vec->m_vecpfx.m_using_auto_storage : false;
>  }
>
>  /* Release VEC and call release of all element vectors.  */
> --
> 2.20.1
>

[PATCH] tree-optimization/101088 - fix SM invalidation issue

2021-06-16 Thread Richard Biener

When we face a sm_ord vs sm_unord for the same ref during
store sequence merging we assert that the ref is already marked
unsupported.  But it can be that it will only be marked so
during the ongoing merging so instead of asserting mark it here.

Also apply some optimization to not waste resources to search
for already unsupported refs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-06-16  Richard Biener  

PR tree-optimization/101088
* tree-ssa-loop-im.c (sm_seq_valid_bb): Only look for
supported refs on edges.  Do not assert same ref but
different kind stores are unsuported but mark them so.
(hoist_memory_references): Only look for supported refs
on exits.

* gcc.dg/torture/pr101088.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr101088.c | 45 +
 gcc/tree-ssa-loop-im.c  | 21 +---
 2 files changed, 61 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr101088.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr101088.c 
b/gcc/testsuite/gcc.dg/torture/pr101088.c
new file mode 100644
index 000..00fce39d2f4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr101088.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+
+int bh, on, h0;
+
+void
+qw (int n2)
+{
+  int *e5;
+
+  if (n2 == 0)
+{
+  n2 = 1;
+  while (n2 != 0)
+   for (n2 = 0; n2 < 1; ++n2)
+ {
+ }
+
+  e5 = &n2;
+}
+  else
+e5 = &on;
+
+  while (h0 < 1)
+{
+  if (on == 0)
+   {
+ ++*e5;
+ bh = 0;
+   }
+  else
+   {
+ bh = 0;
+ ++on;
+ *e5 = on;
+ h0 = *e5;
+ if (h0 == 0)
+   {
+ *e5 = 0;
+ ++h0;
+   }
+   }
+
+  ++h0;
+}
+}
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index 1c865b28fd6..7de47edbcb3 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -2340,7 +2340,13 @@ sm_seq_valid_bb (class loop *loop, basic_block bb, tree 
vdef,
  tree vuse = gimple_phi_arg_def (phi, i);
  edge e = gimple_phi_arg_edge (phi, i);
  auto_vec edge_seq;
- bitmap_copy (tem_refs_not_in_seq, refs_not_in_seq);
+ bitmap_and_compl (tem_refs_not_in_seq,
+   refs_not_in_seq, refs_not_supported);
+ /* If we've marked all refs we search for as unsupported
+we can stop processing and use the sequence as before
+the PHI.  */
+ if (bitmap_empty_p (tem_refs_not_in_seq))
+   return 1;
  eret = sm_seq_valid_bb (loop, e->src, vuse, edge_seq,
  tem_refs_not_in_seq, refs_not_supported,
  true, fully_visited);
@@ -2379,9 +2385,9 @@ sm_seq_valid_bb (class loop *loop, basic_block bb, tree 
vdef,
  /* sm_other prevails.  */
  else if (first_edge_seq[i].second != edge_seq[i].second)
{
- /* This is just an optimization.  */
- gcc_assert (bitmap_bit_p (refs_not_supported,
-   first_edge_seq[i].first));
+ /* Make sure the ref is marked as not supported.  */
+ bitmap_set_bit (refs_not_supported,
+ first_edge_seq[i].first);
  first_edge_seq[i].second = sm_other;
  first_edge_seq[i].from = NULL_TREE;
}
@@ -2533,7 +2539,12 @@ hoist_memory_references (class loop *loop, bitmap 
mem_refs,
   vec seq;
   seq.create (4);
   auto_bitmap refs_not_in_seq (&lim_bitmap_obstack);
-  bitmap_copy (refs_not_in_seq, mem_refs);
+  bitmap_and_compl (refs_not_in_seq, mem_refs, refs_not_supported);
+  if (bitmap_empty_p (refs_not_in_seq))
+   {
+ seq.release ();
+ break;
+   }
   auto_bitmap fully_visited;
   int res = sm_seq_valid_bb (loop, e->src, NULL_TREE,
 seq, refs_not_in_seq,
-- 
2.26.2

Re: [ARM] PR97906 - Missed lowering abs(a) >= abs(b) to vacge

2021-06-16 Thread Prathamesh Kulkarni via Gcc-patches

On Mon, 14 Jun 2021 at 16:15, Kyrylo Tkachov  wrote:
>
>
>
> > -Original Message-
> > From: Prathamesh Kulkarni 
> > Sent: 14 June 2021 08:58
> > To: gcc Patches ; Kyrylo Tkachov
> > 
> > Subject: Re: [ARM] PR97906 - Missed lowering abs(a) >= abs(b) to vacge
> >
> > On Mon, 7 Jun 2021 at 12:46, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Tue, 1 Jun 2021 at 16:03, Prathamesh Kulkarni
> > >  wrote:
> > > >
> > > > Hi,
> > > > As mentioned in PR, for following test-case:
> > > >
> > > > #include 
> > > >
> > > > uint32x2_t f1(float32x2_t a, float32x2_t b)
> > > > {
> > > >   return vabs_f32 (a) >= vabs_f32 (b);
> > > > }
> > > >
> > > > uint32x2_t f2(float32x2_t a, float32x2_t b)
> > > > {
> > > >   return (uint32x2_t) __builtin_neon_vcagev2sf (a, b);
> > > > }
> > > >
> > > > We generate vacge for f2, but with -ffast-math, we generate following
> > for f1:
> > > > f1:
> > > > vabs.f32d1, d1
> > > > vabs.f32d0, d0
> > > > vcge.f32d0, d0, d1
> > > > bx  lr
> > > >
> > > > This happens because, the middle-end inverts the comparison to b <= a,
> > > > .optimized dump:
> > > >  _8 = __builtin_neon_vabsv2sf (a_4(D));
> > > >   _7 = __builtin_neon_vabsv2sf (b_5(D));
> > > >   _1 = _7 <= _8;
> > > >   _2 = VIEW_CONVERT_EXPR(_1);
> > > >   _6 = VIEW_CONVERT_EXPR(_2);
> > > >   return _6;
> > > >
> > > > and combine fails to match the following pattern:
> > > > (set (reg:V2SI 121)
> > > > (neg:V2SI (le:V2SI (abs:V2SF (reg:V2SF 123))
> > > > (abs:V2SF (reg:V2SF 122)
> > > >
> > > > because neon_vca pattern has GTGE code iterator.
> > > > The attached patch adjusts the neon_vca patterns to use GLTE instead
> > > > similar to neon_vca_fp16insn, and removes
> > NEON_VACMP iterator.
> > > > Code-gen with patch:
> > > > f1:
> > > > vacle.f32   d0, d1, d0
> > > > bx  lr
> > > >
> > > > Bootstrapped + tested on arm-linux-gnueabihf and cross-tested on arm*-
> > *-*.
> > > > OK to commit ?
>
> Is that inversion guaranteed to happen (is it a canonicalization rule)?
I think it follows the following rule for canonicalization from
tree_swap_operands_p:
  /* It is preferable to swap two SSA_NAME to ensure a canonical form
 for commutative and comparison operators.  Ensuring a canonical
 form allows the optimizers to find additional redundancies without
 having to explicitly check for both orderings.  */
  if (TREE_CODE (arg0) == SSA_NAME
  && TREE_CODE (arg1) == SSA_NAME
  && SSA_NAME_VERSION (arg0) > SSA_NAME_VERSION (arg1))
return 1;

For the above test-case, it's ccp1 that inverts the comparison.
The input to ccp1 pass is:
  _12 = __builtin_neon_vabsv2sf (a_6(D));
  _14 = _12;
  _1 = _14;
  _11 = __builtin_neon_vabsv2sf (b_8(D));
  _16 = _11;
  _2 = _16;
  _3 = _1 >= _2;
  _4 = VEC_COND_EXPR <_3, { -1, -1 }, { 0, 0 }>;
  _10 = VIEW_CONVERT_EXPR(_4);
  return _10;

_3 = _1 >= _2 is folded into:
_3 = _12 >= _11

Since _12 is higher ssa version than _11, it is canonicalized to:
_3 = _11 <= _12.

Thanks,
Prathamesh
> If so, ok.
> Thanks,
> Kyrill
>
>
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > > Thanks,
> > > > Prathamesh

Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-06-16 Thread Andre Vieira (lists) via Gcc-patches




On 14/06/2021 11:57, Richard Biener wrote:

On Mon, 14 Jun 2021, Richard Biener wrote:


Indeed. For example a simple
int a[1024], b[1024], c[1024];

void foo(int n)
{
   for (int i = 0; i < n; ++i)
 a[i+1] += c[i+i] ? b[i+1] : 0;
}

should usually see peeling for alignment (though on x86 you need
exotic -march= since cost models generally have equal aligned and
unaligned access costs).  For example with -mavx2 -mtune=atom
we'll see an alignment peeling prologue, a AVX2 vector loop,
a SSE2 vectorized epilogue and a scalar epilogue.  It also
shows the original scalar loop being used in the scalar prologue
and epilogue.

We're not even trying to make the counting IV easily used
across loops (we're not counting scalar iterations in the
vector loops).

Specifically we see

 [local count: 94607391]:
niters_vector_mult_vf.10_62 = bnd.9_61 << 3;
_67 = niters_vector_mult_vf.10_62 + 7;
_64 = (int) niters_vector_mult_vf.10_62;
tmp.11_63 = i_43 + _64;
if (niters.8_45 == niters_vector_mult_vf.10_62)
   goto ; [12.50%]
else
   goto ; [87.50%]

after the maini vect loop, recomputing the original IV (i) rather
than using the inserted canonical IV.  And then the vectorized
epilogue header check doing

 [local count: 93293400]:
# i_59 = PHI 
# _66 = PHI <_67(33), 0(18)>
_96 = (unsigned int) n_10(D);
niters.26_95 = _96 - _66;
_108 = (unsigned int) n_10(D);
_109 = _108 - _66;
_110 = _109 + 4294967295;
if (_110 <= 3)
   goto ; [10.00%]
else
   goto ; [90.00%]

re-computing everything from scratch again (also notice how
the main vect loop guard jumps around the alignment prologue
as well and lands here - and the vectorized epilogue using
unaligned accesses - good!).

That is, I'd expect _much_ easier jobs if we'd manage to
track the number of performed scalar iterations (or the
number of scalar iterations remaining) using the canonical
IV we add to all loops across all of the involved loops.

Richard.



So I am now looking at using an IV that counts scalar iterations rather 
than vector iterations and reusing that through all loops, (prologue, 
main loop, vect_epilogue and scalar epilogue). The first is easy, since 
that's what we already do for partial vectors or non-constant VFs. The 
latter requires some plumbing and removing a lot of the code in there 
that creates new IV's going from [0, niters - previous iterations]. I 
don't yet have a clear cut view of how to do this, I first thought of 
keeping track of the 'control' IV in the loop_vinfo, but the prologue 
and scalar epilogues won't have one. 'loop' keeps a control_ivs struct, 
but that is used for overflow detection and only keeps track of what 
looks like a constant 'base' and 'step'. Not quite sure how all that 
works, but intuitively doesn't seem like the right thing to reuse.


I'll go hack around and keep you posted on progress.

Regards,
Andre

Re: [RFC] Using main loop's updated IV as base_address for epilogue vectorization

2021-06-16 Thread Richard Biener

On Wed, 16 Jun 2021, Andre Vieira (lists) wrote:

> 
> On 14/06/2021 11:57, Richard Biener wrote:
> > On Mon, 14 Jun 2021, Richard Biener wrote:
> >
> >> Indeed. For example a simple
> >> int a[1024], b[1024], c[1024];
> >>
> >> void foo(int n)
> >> {
> >>for (int i = 0; i < n; ++i)
> >>  a[i+1] += c[i+i] ? b[i+1] : 0;
> >> }
> >>
> >> should usually see peeling for alignment (though on x86 you need
> >> exotic -march= since cost models generally have equal aligned and
> >> unaligned access costs).  For example with -mavx2 -mtune=atom
> >> we'll see an alignment peeling prologue, a AVX2 vector loop,
> >> a SSE2 vectorized epilogue and a scalar epilogue.  It also
> >> shows the original scalar loop being used in the scalar prologue
> >> and epilogue.
> >>
> >> We're not even trying to make the counting IV easily used
> >> across loops (we're not counting scalar iterations in the
> >> vector loops).
> > Specifically we see
> >
> >  [local count: 94607391]:
> > niters_vector_mult_vf.10_62 = bnd.9_61 << 3;
> > _67 = niters_vector_mult_vf.10_62 + 7;
> > _64 = (int) niters_vector_mult_vf.10_62;
> > tmp.11_63 = i_43 + _64;
> > if (niters.8_45 == niters_vector_mult_vf.10_62)
> >goto ; [12.50%]
> > else
> >goto ; [87.50%]
> >
> > after the maini vect loop, recomputing the original IV (i) rather
> > than using the inserted canonical IV.  And then the vectorized
> > epilogue header check doing
> >
> >  [local count: 93293400]:
> > # i_59 = PHI 
> > # _66 = PHI <_67(33), 0(18)>
> > _96 = (unsigned int) n_10(D);
> > niters.26_95 = _96 - _66;
> > _108 = (unsigned int) n_10(D);
> > _109 = _108 - _66;
> > _110 = _109 + 4294967295;
> > if (_110 <= 3)
> >goto ; [10.00%]
> > else
> >goto ; [90.00%]
> >
> > re-computing everything from scratch again (also notice how
> > the main vect loop guard jumps around the alignment prologue
> > as well and lands here - and the vectorized epilogue using
> > unaligned accesses - good!).
> >
> > That is, I'd expect _much_ easier jobs if we'd manage to
> > track the number of performed scalar iterations (or the
> > number of scalar iterations remaining) using the canonical
> > IV we add to all loops across all of the involved loops.
> >
> > Richard.
> 
> 
> So I am now looking at using an IV that counts scalar iterations rather than
> vector iterations and reusing that through all loops, (prologue, main loop,
> vect_epilogue and scalar epilogue). The first is easy, since that's what we
> already do for partial vectors or non-constant VFs. The latter requires some
> plumbing and removing a lot of the code in there that creates new IV's going
> from [0, niters - previous iterations]. I don't yet have a clear cut view of
> how to do this, I first thought of keeping track of the 'control' IV in the
> loop_vinfo, but the prologue and scalar epilogues won't have one. 'loop' keeps
> a control_ivs struct, but that is used for overflow detection and only keeps
> track of what looks like a constant 'base' and 'step'. Not quite sure how all
> that works, but intuitively doesn't seem like the right thing to reuse.

Maybe it's enough to maintain this [remaining] scalar iterations counter
between loops, thus after the vector loop do

  remain_scalar_iter -= vector_iters * vf;

etc., this should make it possible to do some first order cleanups,
avoiding some repeated computations.  It does involve placing
additional PHIs for this remain_scalar_iter var of course (I'd be
hesitant to rely on the SSA renamer for this due to its expense).

I think that for all later jump-around tests tracking remaining
scalar iters is more convenient than tracking performed scalar iters.

> I'll go hack around and keep you posted on progress.

Thanks - it's an iffy area ...
Richard.

[committed] testsuite: Use noipa attribute instead of noinline, noclone

2021-06-16 Thread Jakub Jelinek via Gcc-patches

Hi!

I've noticed this test now on various arches sometimes FAILs, sometimes
PASSes (the line 12 test in particular).

The problem is that a = 0; initialization in the caller no longer happens
before the f(&a) call as what the argument points to is only used in
debug info.

Making the function noipa forces the caller to initialize it and still
tests what the test wants to test, namely that we don't consider *p as
valid location for the c variable at line 18 (after it has been overwritten
with *p = 1;).

Tested on x86_64-linux, committed to trunk as obvious.

Wonder if we shouldn't somehow mark PARM_DECLs that point to (directly or
indirectly) into memory that IPA optimizations (modref?)
chose it is unnecessary to initialize to tell var-tracking that it is unsafe
to use MEMs based on those parameters in debug insns...

2021-06-16  Jakub Jelinek  

* gcc.dg/guality/pr49888.c (f): Use noipa attribute instead of
noinline, noclone.

--- gcc/testsuite/gcc.dg/guality/pr49888.c.jj   2020-01-14 20:02:47.308601970 
+0100
+++ gcc/testsuite/gcc.dg/guality/pr49888.c  2021-06-16 12:58:54.636184065 
+0200
@@ -4,7 +4,7 @@
 
 static int v __attribute__((used));
 
-static void __attribute__((noinline, noclone))
+static void __attribute__((noipa))
 f (int *p)
 {
   int c = *p;


Jakub

Re: [PATCH] tree-sra: Do not refresh readonly decls (PR 100453)

2021-06-16 Thread Jakub Jelinek via Gcc-patches

On Tue, Jun 15, 2021 at 06:11:27PM +0200, Richard Biener wrote:
> >--- a/gcc/tree-sra.c
> >+++ b/gcc/tree-sra.c
> >@@ -915,6 +915,12 @@ create_access (tree expr, gimple *stmt, bool
> >write)
> >if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID
> >(base)))
> > return NULL;
> > 
> >+  if (write && TREE_READONLY (base))
> >+{
> >+  disqualify_candidate (base, "Encountered a store to a read-only
> >decl.");

Wouldn't this be a useful point to also emit some warning (with
some TREE_NO_WARNING prevention) that some particular statement modifies
a const decl?
I guess it can be warned elsewhere though.
As testcases one could use 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100994#c4
and #c5.  Though would be nice if we diagnose that even without those -fno-*
options.

Jakub

Re: [PATCH V2] aarch64: Model zero-high-half semantics of XTN instruction in RTL

2021-06-16 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> Version 2 of this patch adds tests to verify the benefit of this change.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-06-11  Jonathan Wright  
>
> * config/aarch64/aarch64-simd.md (aarch64_xtn_insn_le):
> Define - modeling zero-high-half semantics.
> (aarch64_xtn): Change to an expander that emits the
> appropriate instruction depending on endianness.
> (aarch64_xtn_insn_be): Define - modeling zero-high-half
> semantics.
> (aarch64_xtn2_le): Rename to...
> (aarch64_xtn2_insn_le): This.
> (aarch64_xtn2_be): Rename to...
> (aarch64_xtn2_insn_be): This.
> (vec_pack_trunc_): Emit truncation instruction instead
> of aarch64_xtn.
> * config/aarch64/iterators.md (Vnarrowd): Add Vnarrowd mode
> attribute iterator.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.

OK, thanks.

> From: Gcc-patches  
> on behalf of Jonathan Wright via Gcc-patches 
> Sent: 15 June 2021 10:45
> To: gcc-patches@gcc.gnu.org 
> Subject: [PATCH] aarch64: Model zero-high-half semantics of XTN instruction 
> in RTL
>
> Hi,
>
> Modeling the zero-high-half semantics of the XTN narrowing
> instruction in RTL indicates to the compiler that this is a totally
> destructive operation. This enables more RTL simplifications and also
> prevents some register allocation issues.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-06-11  Jonathan Wright  
>
> * config/aarch64/aarch64-simd.md (aarch64_xtn_insn_le):
> Define - modeling zero-high-half semantics.
> (aarch64_xtn): Change to an expander that emits the
> appropriate instruction depending on endianness.
> (aarch64_xtn_insn_be): Define - modeling zero-high-half
> semantics.
> (aarch64_xtn2_le): Rename to...
> (aarch64_xtn2_insn_le): This.
> (aarch64_xtn2_be): Rename to...
> (aarch64_xtn2_insn_be): This.
> (vec_pack_trunc_): Emit truncation instruction instead
> of aarch64_xtn.
> * config/aarch64/iterators.md (Vnarrowd): Add Vnarrowd mode
> attribute iterator.
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> e750faed1dbd940cdfa216d858b98f3bc25bba42..b23556b551cbbef420950007e9714acf190a534d
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1690,17 +1690,48 @@
>  
>  ;; Narrowing operations.
>  
> -;; For doubles.
> +(define_insn "aarch64_xtn_insn_le"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (truncate: (match_operand:VQN 1 "register_operand" "w"))
> +   (match_operand: 2 "aarch64_simd_or_scalar_imm_zero")))]
> +  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> +  "xtn\\t%0., %1."
> +  [(set_attr "type" "neon_move_narrow_q")]
> +)
>  
> -(define_insn "aarch64_xtn"
> -  [(set (match_operand: 0 "register_operand" "=w")
> - (truncate: (match_operand:VQN 1 "register_operand" "w")))]
> -  "TARGET_SIMD"
> +(define_insn "aarch64_xtn_insn_be"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (match_operand: 2 "aarch64_simd_or_scalar_imm_zero")
> +   (truncate: (match_operand:VQN 1 "register_operand" "w"]
> +  "TARGET_SIMD && BYTES_BIG_ENDIAN"
>"xtn\\t%0., %1."
>[(set_attr "type" "neon_move_narrow_q")]
>  )
>  
> -(define_insn "aarch64_xtn2_le"
> +(define_expand "aarch64_xtn"
> +  [(set (match_operand: 0 "register_operand")
> + (truncate: (match_operand:VQN 1 "register_operand")))]
> +  "TARGET_SIMD"
> +  {
> +rtx tmp = gen_reg_rtx (mode);
> +if (BYTES_BIG_ENDIAN)
> +  emit_insn (gen_aarch64_xtn_insn_be (tmp, operands[1],
> + CONST0_RTX (mode)));
> +else
> +  emit_insn (gen_aarch64_xtn_insn_le (tmp, operands[1],
> + CONST0_RTX (mode)));
> +
> +/* The intrinsic expects a narrow result, so emit a subreg that will get
> +   optimized away as appropriate.  */
> +emit_move_insn (operands[0], lowpart_subreg (mode, tmp,
> +  mode));
> +DONE;
> +  }
> +)
> +
> +(define_insn "aarch64_xtn2_insn_le"
>[(set (match_operand: 0 "register_operand" "=w")
>   (vec_concat:
> (match_operand: 1 "register_operand" "0")
> @@ -1710,7 +1741,7 @@
>[(set_attr "type" "neon_move_narrow_q")]
>  )
>  
> -(define_insn "aarch64_xtn2_be"
> +(define_insn "aarch64_xtn2_insn_be"
>[(set (match_operand: 0 "register_operand" "=w")
>   (vec_concat:
> (truncate: (match_operand:VQN 2 "register_operand" "w"))
> @@ -1727,15 +1758,17 @@
>"TARGET_SIMD"
>{
>  if (BYTES_BIG_ENDIAN)
> -  emit_insn (gen_aarch64_xtn2_be (operands[0], o

Re: [PATCH V2] aarch64: Model zero-high-half semantics of SQXTUN instruction in RTL

2021-06-16 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> Version 2 of the patch adds tests to verify the benefit of this change.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
> gcc/ChangeLog:
>
> 2021-06-14  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Split generator
> for aarch64_sqmovun builtins into scalar and vector variants.
> * config/aarch64/aarch64-simd.md (aarch64_sqmovun):
> Split into scalar and vector variants. Change vector variant
> to an expander that emits the correct instruction depending
> on endianness.
> (aarch64_sqmovun_insn_le): Define.
> (aarch64_sqmovun_insn_be): Define.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.

OK, thanks.

> From: Gcc-patches  
> on behalf of Jonathan Wright via Gcc-patches 
> Sent: 15 June 2021 10:52
> To: gcc-patches@gcc.gnu.org 
> Subject: [PATCH] aarch64: Model zero-high-half semantics of SQXTUN 
> instruction in RTL
>
> Hi,
>
> As subject, this patch first splits the aarch64_sqmovun pattern
> into separate scalar and vector variants. It then further split the vector
> pattern into big/little endian variants that model the zero-high-half
> semantics of the underlying instruction. Modeling these semantics
> allows for better RTL combinations while also removing some register
> allocation issues as the compiler now knows that the operation is
> totally destructive.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-06-14  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Split generator
> for aarch64_sqmovun builtins into scalar and vector variants.
> * config/aarch64/aarch64-simd.md (aarch64_sqmovun):
> Split into scalar and vector variants. Change vector variant
> to an expander that emits the correct instruction depending
> on endianness.
> (aarch64_sqmovun_insn_le): Define.
> (aarch64_sqmovun_insn_be): Define.
>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> 18baa6720b09b2ebda8577b809f8a8683f8b44f0..2adb4b127527794d19b2bbd4859f089d3da47763
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -263,7 +263,9 @@
>BUILTIN_VQ_HSI (TERNOP, smlal_hi_n, 0, NONE)
>BUILTIN_VQ_HSI (TERNOPU, umlal_hi_n, 0, NONE)
>  
> -  BUILTIN_VSQN_HSDI (UNOPUS, sqmovun, 0, NONE)
> +  /* Implemented by aarch64_sqmovun.  */
> +  BUILTIN_VQN (UNOPUS, sqmovun, 0, NONE)
> +  BUILTIN_SD_HSDI (UNOPUS, sqmovun, 0, NONE)
>  
>/* Implemented by aarch64_sqxtun2.  */
>BUILTIN_VQN (BINOP_UUS, sqxtun2, 0, NONE)
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> b23556b551cbbef420950007e9714acf190a534d..59779b851fbeecb17cd2cddbb0ed8770a22762b5
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -4870,17 +4870,6 @@
>[(set_attr "type" "neon_qadd")]
>  )
>  
> -;; sqmovun
> -
> -(define_insn "aarch64_sqmovun"
> -  [(set (match_operand: 0 "register_operand" "=w")
> - (unspec: [(match_operand:VSQN_HSDI 1 "register_operand" "w")]
> -UNSPEC_SQXTUN))]
> -   "TARGET_SIMD"
> -   "sqxtun\\t%0, %1"
> -   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
> -)
> -
>  ;; sqmovn and uqmovn
>  
>  (define_insn "aarch64_qmovn"
> @@ -4931,6 +4920,61 @@
>}
>  )
>  
> +;; sqmovun
> +
> +(define_insn "aarch64_sqmovun"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (unspec: [(match_operand:SD_HSDI 1 "register_operand" "w")]
> +UNSPEC_SQXTUN))]
> +   "TARGET_SIMD"
> +   "sqxtun\\t%0, %1"
> +   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
> +)
> +
> +(define_insn "aarch64_sqmovun_insn_le"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (unspec: [(match_operand:VQN 1 "register_operand" "w")]
> +  UNSPEC_SQXTUN)
> +   (match_operand: 2 "aarch64_simd_or_scalar_imm_zero")))]
> +  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> +  "sqxtun\\t%0, %1"
> +  [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
> +)
> +
> +(define_insn "aarch64_sqmovun_insn_be"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (match_operand: 2 "aarch64_simd_or_scalar_imm_zero")
> +   (unspec: [(match_operand:VQN 1 "register_operand" "w")]
> +  UNSPEC_SQXTUN)))]
> +  "TARGET_SIMD && BYTES_BIG_ENDIAN"
> +  "sqxtun\\t%0, %1"
> +  [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
> +)
> +
> +(define_expand "aarch64_sqmovun"
> +  [(set (match_operand: 0 "register_operand")
> + (unspec: [(match_operand:VQN 1 "register_operand")]
> +UNSPEC_SQXTUN))]
> +  "TARGET_SIMD"
> +

Re: [PATCH V2] aarch64: Model zero-high-half semantics of [SU]QXTN instructions

2021-06-16 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> Version 2 of the patch adds tests to verify the benefit of this change.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-06-14  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Split generator
> for aarch64_qmovn builtins into scalar and vector
> variants.
> * config/aarch64/aarch64-simd.md (aarch64_qmovn_insn_le):
> Define.
> (aarch64_qmovn_insn_be): Define.
> (aarch64_qmovn): Split into scalar and vector
> variants. Change vector variant to an expander that emits the
> correct instruction depending on endianness.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.

OK, thanks.

> From: Gcc-patches  
> on behalf of Jonathan Wright via Gcc-patches 
> Sent: 15 June 2021 10:59
> To: gcc-patches@gcc.gnu.org 
> Subject: [PATCH] aarch64: Model zero-high-half semantics of [SU]QXTN 
> instructions
>
> Hi,
>
> As subject, this patch first splits the aarch64_qmovn
> pattern into separate scalar and vector variants. It then further splits
> the vector RTL  pattern into big/little endian variants that model the
> zero-high-half semantics of the underlying instruction. Modeling
> these semantics allows for better RTL combinations while also
> removing some register allocation issues as the compiler now knows
> that the operation is totally destructive.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-06-14  Jonathan Wright  
>
> * config/aarch64/aarch64-simd-builtins.def: Split generator
> for aarch64_qmovn builtins into scalar and vector
> variants.
> * config/aarch64/aarch64-simd.md (aarch64_qmovn_insn_le):
> Define.
> (aarch64_qmovn_insn_be): Define.
> (aarch64_qmovn): Split into scalar and vector
> variants. Change vector variant to an expander that emits the
> correct instruction depending on endianness.
>
> diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def 
> b/gcc/config/aarch64/aarch64-simd-builtins.def
> index 
> 2adb4b127527794d19b2bbd4859f089d3da47763..ac5d4fc7ff1e61d404e66193b629986382ee4ffd
>  100644
> --- a/gcc/config/aarch64/aarch64-simd-builtins.def
> +++ b/gcc/config/aarch64/aarch64-simd-builtins.def
> @@ -271,8 +271,10 @@
>BUILTIN_VQN (BINOP_UUS, sqxtun2, 0, NONE)
>  
>/* Implemented by aarch64_qmovn.  */
> -  BUILTIN_VSQN_HSDI (UNOP, sqmovn, 0, NONE)
> -  BUILTIN_VSQN_HSDI (UNOP, uqmovn, 0, NONE)
> +  BUILTIN_VQN (UNOP, sqmovn, 0, NONE)
> +  BUILTIN_SD_HSDI (UNOP, sqmovn, 0, NONE)
> +  BUILTIN_VQN (UNOP, uqmovn, 0, NONE)
> +  BUILTIN_SD_HSDI (UNOP, uqmovn, 0, NONE)
>  
>/* Implemented by aarch64_qxtn2.  */
>BUILTIN_VQN (BINOP, sqxtn2, 0, NONE)
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 59779b851fbeecb17cd2cddbb0ed8770a22762b5..2b75e57eb77a0dea449f2c13bd77a88f48c4cea5
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -4875,10 +4875,54 @@
>  (define_insn "aarch64_qmovn"
>[(set (match_operand: 0 "register_operand" "=w")
>   (SAT_TRUNC:
> -(match_operand:VSQN_HSDI 1 "register_operand" "w")))]
> +   (match_operand:SD_HSDI 1 "register_operand" "w")))]
>"TARGET_SIMD"
>"qxtn\\t%0, %1"
> -   [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
> +  [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
> +)
> +
> +(define_insn "aarch64_qmovn_insn_le"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (SAT_TRUNC:
> + (match_operand:VQN 1 "register_operand" "w"))
> +   (match_operand: 2 "aarch64_simd_or_scalar_imm_zero")))]
> +  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> +  "qxtn\\t%0, %1"
> +  [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
> +)
> +
> +(define_insn "aarch64_qmovn_insn_be"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (match_operand: 2 "aarch64_simd_or_scalar_imm_zero")
> +   (SAT_TRUNC:
> + (match_operand:VQN 1 "register_operand" "w"]
> +  "TARGET_SIMD && BYTES_BIG_ENDIAN"
> +  "qxtn\\t%0, %1"
> +  [(set_attr "type" "neon_sat_shift_imm_narrow_q")]
> +)
> +
> +(define_expand "aarch64_qmovn"
> +  [(set (match_operand: 0 "register_operand")
> + (SAT_TRUNC:
> +   (match_operand:VQN 1 "register_operand")))]
> +  "TARGET_SIMD"
> +  {
> +rtx tmp = gen_reg_rtx (mode);
> +if (BYTES_BIG_ENDIAN)
> +  emit_insn (gen_aarch64_qmovn_insn_be (tmp, operands[1],
> + CONST0_RTX (mode)));
> +else
> +  emit_insn (gen_aarch64_qmovn_insn_le (tmp, operands[1],
> + CONST0_RTX (mode)));
> +
> +/* The intrinsic expects a narrow result, so emit a subreg that will get
> +   optimized away as appropriate.  */
> +

Re: [PATCH V2] aarch64: Model zero-high-half semantics of ADDHN/SUBHN instructions

2021-06-16 Thread Richard Sandiford via Gcc-patches

Jonathan Wright  writes:
> Hi,
>
> Version 2 of this patch adds tests to verify the benefit of this change.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-06-14  Jonathan Wright  
>
> * config/aarch64/aarch64-simd.md (aarch64_hn):
> Change to an expander that emits the correct instruction
> depending on endianness.
> (aarch64_hn_insn_le): Define.
> (aarch64_hn_insn_be): Define.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/narrow_zero_high_half.c: Add new tests.

OK, thanks.

> From: Gcc-patches  
> on behalf of Jonathan Wright via Gcc-patches 
> Sent: 15 June 2021 11:02
> To: gcc-patches@gcc.gnu.org 
> Subject: [PATCH] aarch64: Model zero-high-half semantics of ADDHN/SUBHN 
> instructions
>
> Hi,
>
> As subject, this patch models the zero-high-half semantics of the
> narrowing arithmetic Neon instructions in the
> aarch64_hn RTL pattern. Modeling these
> semantics allows for better RTL combinations while also removing
> some register allocation issues as the compiler now knows that the
> operation is totally destructive.
>
> Regression tested and bootstrapped on aarch64-none-linux-gnu - no
> issues.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-06-14  Jonathan Wright  
>
> * config/aarch64/aarch64-simd.md (aarch64_hn):
> Change to an expander that emits the correct instruction
> depending on endianness.
> (aarch64_hn_insn_le): Define.
> (aarch64_hn_insn_be): Define.
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> 2b75e57eb77a0dea449f2c13bd77a88f48c4cea5..540244cf0a919b3ea1d6ebf5929be50fed395179
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -4661,16 +4661,53 @@
>  
>  ;; hn.
>  
> -(define_insn "aarch64_hn"
> -  [(set (match_operand: 0 "register_operand" "=w")
> -(unspec: [(match_operand:VQN 1 "register_operand" "w")
> - (match_operand:VQN 2 "register_operand" "w")]
> -   ADDSUBHN))]
> -  "TARGET_SIMD"
> +(define_insn "aarch64_hn_insn_le"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (unspec: [(match_operand:VQN 1 "register_operand" "w")
> +   (match_operand:VQN 2 "register_operand" "w")]
> +  ADDSUBHN)
> +   (match_operand: 3 "aarch64_simd_or_scalar_imm_zero")))]
> +  "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> +  "hn\\t%0., %1., %2."
> +  [(set_attr "type" "neon__halve_narrow_q")]
> +)
> +
> +(define_insn "aarch64_hn_insn_be"
> +  [(set (match_operand: 0 "register_operand" "=w")
> + (vec_concat:
> +   (match_operand: 3 "aarch64_simd_or_scalar_imm_zero")
> +   (unspec: [(match_operand:VQN 1 "register_operand" "w")
> +   (match_operand:VQN 2 "register_operand" "w")]
> +  ADDSUBHN)))]
> +  "TARGET_SIMD && BYTES_BIG_ENDIAN"
>"hn\\t%0., %1., %2."
>[(set_attr "type" "neon__halve_narrow_q")]
>  )
>  
> +(define_expand "aarch64_hn"
> +  [(set (match_operand: 0 "register_operand")
> + (unspec: [(match_operand:VQN 1 "register_operand")
> + (match_operand:VQN 2 "register_operand")]
> +ADDSUBHN))]
> +  "TARGET_SIMD"
> +  {
> +rtx tmp = gen_reg_rtx (mode);
> +if (BYTES_BIG_ENDIAN)
> +  emit_insn (gen_aarch64_hn_insn_be (tmp, operands[1],
> + operands[2], CONST0_RTX (mode)));
> +else
> +  emit_insn (gen_aarch64_hn_insn_le (tmp, operands[1],
> + operands[2], CONST0_RTX (mode)));
> +
> +/* The intrinsic expects a narrow result, so emit a subreg that will get
> +   optimized away as appropriate.  */
> +emit_move_insn (operands[0], lowpart_subreg (mode, tmp,
> +  mode));
> +DONE;
> +  }
> +)
> +
>  (define_insn "aarch64_hn2_insn_le"
>[(set (match_operand: 0 "register_operand" "=w")
>   (vec_concat:
> diff --git a/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c 
> b/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
> index 
> 3061c15eb8aa6fe30a509cd46b959cf44edcdb73..97342de58bb5586a8317f1b4c92dcb9d6db83733
>  100644
> --- a/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
> +++ b/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
> @@ -74,6 +74,42 @@ TEST_UNARY (vqmovn, uint8x16_t, uint16x8_t, u16, u8)
>  TEST_UNARY (vqmovn, uint16x8_t, uint32x4_t, u32, u16)
>  TEST_UNARY (vqmovn, uint32x4_t, uint64x2_t, u64, u32)
>  
> +#define TEST_ARITH(name, rettype, intype, fs, rs) \
> +  rettype test_ ## name ## _ ## fs ## _zero_high \
> + (intype a, intype b) \
> + { \
> + return vcombine_ ## rs (name ## _ ## fs (a, b), \
> + vdup_n_ ## rs (0)); \
> + }
> +
> +TEST_ARITH (vad

Re: [PATCH] make rich_location safe to copy

2021-06-16 Thread David Malcolm via Gcc-patches

On Tue, 2021-06-15 at 19:48 -0600, Martin Sebor wrote:

Thanks for writing the patch.

> While debugging locations I noticed the semi_embedded_vec template
> in line-map.h doesn't declare a copy ctor or copy assignment, but
> is being copied in a couple of places in the C++ parser (via
> gcc_rich_location).  It gets away with it most likely because it
> never grows beyond the embedded buffer.

Where are these places?  I wasn't aware of this.

> 
> The attached patch defines the copy ctor and also copy assignment
> and adds the corresponding move functions.

Note that rich_location::m_fixit_hints "owns" the fixit_hint instances,
manually deleting them in rich_location's dtor, so simply doing a
shallow copy of it would be wrong.

Also, a rich_location stores other pointers (to range_labels and
diagnostic_path), which are borrowed pointers, where their lifetime is
assumed to outlive any (non-dtor) calls to the rich_location.  So I'm
nervous about code that copies rich_location instances.

I think I'd prefer to forbid copying them; what's the use-case for
copying them?  Am I missing something here?

> 
> Tested on x86_64-linux.
> 
> Martin

Thanks
Dave

Re: [PATCH] testsuite: aarch64: Add zero-high-half tests for narrowing shifts

2021-06-16 Thread Richard Sandiford via Gcc-patches

Jonathan Wright via Gcc-patches  writes:
> Hi,
>
> This patch adds tests to verify that Neon narrowing-shift instructions
> clear the top half of the result vector. It is sufficient to show that a
> subsequent combine with a zero-vector is optimized away - leaving
> just the narrowing-shift instruction.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/testsuite/ChangeLog:
>
> 2021-06-15  Jonathan Wright  
>
>   * gcc.target/aarch64/narrow_zero_high_half.c: New test.
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c 
> b/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
> new file mode 100644
> index 
> ..27fa0e640ab2b37781376c40ce4ca37602c72393
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/narrow_zero_high_half.c
> @@ -0,0 +1,60 @@
> +/* { dg-skip-if "" { arm*-*-* } } */
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +#include 
> +
> +#define TEST_SHIFT(name, rettype, intype, fs, rs) \
> +  rettype test_ ## name ## _ ## fs ## _zero_high \
> + (intype a) \
> + { \
> + return vcombine_ ## rs (name ## _ ## fs (a, 4), \
> + vdup_n_ ## rs (0)); \
> + }
> +
> +TEST_SHIFT (vshrn_n, int8x16_t, int16x8_t, s16, s8)
> +TEST_SHIFT (vshrn_n, int16x8_t, int32x4_t, s32, s16)
> +TEST_SHIFT (vshrn_n, int32x4_t, int64x2_t, s64, s32)
> +TEST_SHIFT (vshrn_n, uint8x16_t, uint16x8_t, u16, u8)
> +TEST_SHIFT (vshrn_n, uint16x8_t, uint32x4_t, u32, u16)
> +TEST_SHIFT (vshrn_n, uint32x4_t, uint64x2_t, u64, u32)
> +
> +TEST_SHIFT (vrshrn_n, int8x16_t, int16x8_t, s16, s8)
> +TEST_SHIFT (vrshrn_n, int16x8_t, int32x4_t, s32, s16)
> +TEST_SHIFT (vrshrn_n, int32x4_t, int64x2_t, s64, s32)
> +TEST_SHIFT (vrshrn_n, uint8x16_t, uint16x8_t, u16, u8)
> +TEST_SHIFT (vrshrn_n, uint16x8_t, uint32x4_t, u32, u16)
> +TEST_SHIFT (vrshrn_n, uint32x4_t, uint64x2_t, u64, u32)
> +
> +TEST_SHIFT (vqshrn_n, int8x16_t, int16x8_t, s16, s8)
> +TEST_SHIFT (vqshrn_n, int16x8_t, int32x4_t, s32, s16)
> +TEST_SHIFT (vqshrn_n, int32x4_t, int64x2_t, s64, s32)
> +TEST_SHIFT (vqshrn_n, uint8x16_t, uint16x8_t, u16, u8)
> +TEST_SHIFT (vqshrn_n, uint16x8_t, uint32x4_t, u32, u16)
> +TEST_SHIFT (vqshrn_n, uint32x4_t, uint64x2_t, u64, u32)
> +
> +TEST_SHIFT (vqrshrn_n, int8x16_t, int16x8_t, s16, s8)
> +TEST_SHIFT (vqrshrn_n, int16x8_t, int32x4_t, s32, s16)
> +TEST_SHIFT (vqrshrn_n, int32x4_t, int64x2_t, s64, s32)
> +TEST_SHIFT (vqrshrn_n, uint8x16_t, uint16x8_t, u16, u8)
> +TEST_SHIFT (vqrshrn_n, uint16x8_t, uint32x4_t, u32, u16)
> +TEST_SHIFT (vqrshrn_n, uint32x4_t, uint64x2_t, u64, u32)
> +
> +TEST_SHIFT (vqshrun_n, uint8x16_t, int16x8_t, s16, u8)
> +TEST_SHIFT (vqshrun_n, uint16x8_t, int32x4_t, s32, u16)
> +TEST_SHIFT (vqshrun_n, uint32x4_t, int64x2_t, s64, u32)
> +
> +TEST_SHIFT (vqrshrun_n, uint8x16_t, int16x8_t, s16, u8)
> +TEST_SHIFT (vqrshrun_n, uint16x8_t, int32x4_t, s32, u16)
> +TEST_SHIFT (vqrshrun_n, uint32x4_t, int64x2_t, s64, u32)
> +
> +/* { dg-final { scan-assembler-not "dup\\t" } } */
> +
> +/* { dg-final { scan-assembler-times "\\trshrn\\tv" 6} }  */
> +/* { dg-final { scan-assembler-times "\\tshrn\\tv" 6} }  */
> +/* { dg-final { scan-assembler-times "\\tsqshrun\\tv" 3} }  */
> +/* { dg-final { scan-assembler-times "\\tsqrshrun\\tv" 3} }  */
> +/* { dg-final { scan-assembler-times "\\tsqshrn\\tv" 3} }  */
> +/* { dg-final { scan-assembler-times "\\tuqshrn\\tv" 3} }  */
> +/* { dg-final { scan-assembler-times "\\tsqrshrn\\tv" 3} }  */
> +/* { dg-final { scan-assembler-times "\\tuqrshrn\\tv" 3} }  */

Very minor, but it would be good to keep the scans in the same
order as the functions, to make comparisons easier.

OK with or without that change, thanks.

Richard

[PATCH] Vectorization of BB reductions

2021-06-16 Thread Richard Biener

This adds a simple reduction vectorization capability to the
non-loop vectorizer.  Simple meaning it lacks any of the fancy
ways to generate the reduction epilogue but only supports
those we can handle via a direct internal function reducing
a vector to a scalar.  One of the main reasons is to avoid
massive refactoring at this point but also that more complex
epilogue operations are hardly profitable.

Mixed sign reductions are for now fend off and I'm not finally
settled with whether we want an explicit SLP node for the
reduction epilogue operation.  Handling mixed signs could be
done by multiplying with a { 1, -1, .. } vector.  Fend off
are also reductions with non-internal operands (constants
or register parameters for example).

Costing is done by accounting the original scalar participating
stmts for the scalar cost and log2 permutes and operations for
the vectorized epilogue.

--

SPEC CPU 2017 FP with rate workload measurements show (picked
fastest runs of three) regressions for 507.cactuBSSN_r (1.5%),
508.namd_r (2.5%), 511.povray_r (2.5%), 526.blender_r (0.5) and
527.cam4_r (2.5%) and improvements for 510.parest_r (5%) and
538.imagick_r (1.5%).  This is with -Ofast -march=znver2 on a Zen2.

Statistics on CPU 2017 shows that the overwhelming number of seeds
we find are reductions of two lanes (well - that's basically every
associative operation).  That means we put a quite high pressure
on the SLP discovery process this way.

In total we find 583218 seeds we put to SLP discovery out of which
66205 pass that and only 6185 of those make it through
code generation checks. 796 of those are discarded because the reduction
is part of a larger SLP instance.  4195 of the remaining
are deemed not profitable to vectorize and 1194 are finally
vectorized.  That's a poor 0.2% rate.

Of the 583218 seeds 486826 (83%) have two lanes, 60912 have three (10%),
28181 four (5%), 4808 five, 909 six and there are instances up to 120
lanes.

There's a set of 54086 candidate seeds we reject because
they contain a constant or invariant (not implemented yet) but still
have two or more lanes that could be put to SLP discovery.

Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also
built and tested SPEC CPU 2017 with -Ofast -march=znver2 successfully.

I do think this is good enough(TM) for this point, please speak up
if you disagree and/or like to see changes.

Thanks,
Richard.

2021-06-16  Richard Biener   

PR tree-optimization/54400
* tree-vectorizer.h (enum slp_instance_kind): Add
slp_inst_kind_bb_reduc.
(reduction_fn_for_scalar_code): Declare.
* tree-vect-data-refs.c (vect_slp_analyze_instance_dependence):
Check SLP_INSTANCE_KIND instead of looking at the
representative.
(vect_slp_analyze_instance_alignment): Likewise.
* tree-vect-loop.c (reduction_fn_for_scalar_code): Export.
* tree-vect-slp.c (vect_slp_linearize_chain): Split out
chain linearization from vect_build_slp_tree_2 and generalize
for the use of BB reduction vectorization.
(vect_build_slp_tree_2): Adjust accordingly.
(vect_optimize_slp): Elide permutes at the root of BB reduction
instances.
(vectorizable_bb_reduc_epilogue): New function.
(vect_slp_prune_covered_roots): Likewise.
(vect_slp_analyze_operations): Use them.
(vect_slp_check_for_constructors): Recognize associatable
chains for BB reduction vectorization.
(vectorize_slp_instance_root_stmt): Generate code for the
BB reduction epilogue.

* gcc.dg/vect/bb-slp-pr54400.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c |  43 +++
 gcc/tree-vect-data-refs.c  |   9 +-
 gcc/tree-vect-loop.c   |   2 +-
 gcc/tree-vect-slp.c| 383 +
 gcc/tree-vectorizer.h  |   2 +
 5 files changed, 367 insertions(+), 72 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c 
b/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c
new file mode 100644
index 000..6b427aac774
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c
@@ -0,0 +1,43 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_float} */
+/* { dg-additional-options "-w -Wno-psabi -ffast-math" } */
+
+#include "tree-vect.h"
+
+typedef float v4sf __attribute__((vector_size(sizeof(float)*4)));
+
+float __attribute__((noipa))
+f(v4sf v)
+{
+  return v[0]+v[1]+v[2]+v[3];
+}
+
+float __attribute__((noipa))
+g(float *v)
+{
+  return v[0]+v[1]+v[2]+v[3];
+}
+
+float __attribute__((noipa))
+h(float *v)
+{
+  return 2*v[0]+3*v[1]+4*v[2]+5*v[3];
+}
+
+int
+main ()
+{
+  check_vect ();
+  v4sf v = (v4sf) { 1.f, 3.f, 4.f, 2.f };
+  if (f (v) != 10.f)
+abort ();
+  if (g (&v[0]) != 10.f)
+abort ();
+  if (h (&v[0]) != 37.f)
+abort ();
+  return 0;
+}
+
+/* We are lacking an

Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-16 Thread Richard Sandiford via Gcc-patches

Richard Biener via Gcc-patches  writes:
> On Tue, Jun 15, 2021 at 8:02 AM Trevor Saunders  wrote:
>>
>> This makes it clear the caller owns the vector, and ensures it is cleaned up.
>>
>> Signed-off-by: Trevor Saunders 
>>
>> bootstrapped and regtested on x86_64-linux-gnu, ok?
>
> OK.
>
> Btw, are "standard API" returns places we can use 'auto'?  That would avoid
> excessive indent for
>
> -  dom_bbs = get_dominated_by_region (CDI_DOMINATORS,
> -bbs.address (),
> -bbs.length ());
> +  auto_vec dom_bbs = get_dominated_by_region (CDI_DOMINATORS,
> +  bbs.address (),
> +  bbs.length ());
>
> and just uses
>
>   auto dom_bbs = get_dominated_by_region (...
>
> Not asking you to do this, just a question for the audience.

Personally I think this would be surprising for something that doesn't
have copy semantics.  (Not that I'm trying to reopen that debate here :-)
FWIW, I agree not having copy semantics is probably the most practical
way forward for now.)

Thanks,
Richard

> Thanks,
> Richard.
>
>> gcc/ChangeLog:
>>
>> * dominance.c (get_dominated_by_region): Return 
>> auto_vec.
>> * dominance.h (get_dominated_by_region): Likewise.
>> * tree-cfg.c (gimple_duplicate_sese_region): Adjust.
>> (gimple_duplicate_sese_tail): Likewise.
>> (move_sese_region_to_fn): Likewise.
>> ---
>>  gcc/dominance.c |  4 ++--
>>  gcc/dominance.h |  2 +-
>>  gcc/tree-cfg.c  | 18 +++---
>>  3 files changed, 10 insertions(+), 14 deletions(-)
>>
>> diff --git a/gcc/dominance.c b/gcc/dominance.c
>> index 0e464cb7282..4943102ff1d 100644
>> --- a/gcc/dominance.c
>> +++ b/gcc/dominance.c
>> @@ -906,13 +906,13 @@ get_dominated_by (enum cdi_direction dir, basic_block 
>> bb)
>> direction DIR) by some block between N_REGION ones stored in REGION,
>> except for blocks in the REGION itself.  */
>>
>> -vec
>> +auto_vec
>>  get_dominated_by_region (enum cdi_direction dir, basic_block *region,
>>  unsigned n_region)
>>  {
>>unsigned i;
>>basic_block dom;
>> -  vec doms = vNULL;
>> +  auto_vec doms;
>>
>>for (i = 0; i < n_region; i++)
>>  region[i]->flags |= BB_DUPLICATED;
>> diff --git a/gcc/dominance.h b/gcc/dominance.h
>> index 515a369aacf..c74ad297c6a 100644
>> --- a/gcc/dominance.h
>> +++ b/gcc/dominance.h
>> @@ -47,7 +47,7 @@ extern basic_block get_immediate_dominator (enum 
>> cdi_direction, basic_block);
>>  extern void set_immediate_dominator (enum cdi_direction, basic_block,
>>  basic_block);
>>  extern auto_vec get_dominated_by (enum cdi_direction, 
>> basic_block);
>> -extern vec get_dominated_by_region (enum cdi_direction,
>> +extern auto_vec get_dominated_by_region (enum cdi_direction,
>>  basic_block *,
>>  unsigned);
>>  extern vec get_dominated_to_depth (enum cdi_direction,
>> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
>> index 6bdd1a561fd..c9403deed19 100644
>> --- a/gcc/tree-cfg.c
>> +++ b/gcc/tree-cfg.c
>> @@ -6495,7 +6495,6 @@ gimple_duplicate_sese_region (edge entry, edge exit,
>>bool free_region_copy = false, copying_header = false;
>>class loop *loop = entry->dest->loop_father;
>>edge exit_copy;
>> -  vec doms = vNULL;
>>edge redirected;
>>profile_count total_count = profile_count::uninitialized ();
>>profile_count entry_count = profile_count::uninitialized ();
>> @@ -6549,9 +6548,9 @@ gimple_duplicate_sese_region (edge entry, edge exit,
>>
>>/* Record blocks outside the region that are dominated by something
>>   inside.  */
>> +  auto_vec doms;
>>if (update_dominance)
>>  {
>> -  doms.create (0);
>>doms = get_dominated_by_region (CDI_DOMINATORS, region, n_region);
>>  }
>>
>> @@ -6596,7 +6595,6 @@ gimple_duplicate_sese_region (edge entry, edge exit,
>>set_immediate_dominator (CDI_DOMINATORS, entry->dest, entry->src);
>>doms.safe_push (get_bb_original (entry->dest));
>>iterate_fix_dominators (CDI_DOMINATORS, doms, false);
>> -  doms.release ();
>>  }
>>
>>/* Add the other PHI node arguments.  */
>> @@ -6662,7 +6660,6 @@ gimple_duplicate_sese_tail (edge entry, edge exit,
>>class loop *loop = exit->dest->loop_father;
>>class loop *orig_loop = entry->dest->loop_father;
>>basic_block switch_bb, entry_bb, nentry_bb;
>> -  vec doms;
>>profile_count total_count = profile_count::uninitialized (),
>> exit_count = profile_count::uninitialized ();
>>edge exits[2], nexits[2], e;
>> @@ -6705,7 +6702,8 @@ gimple_duplicate_sese_tail (edge entry, edge exit,
>>
>>/* Record blocks outside the region that are dominated by something
>>   inside.  */
>> -  doms = get_dominated_by_region

Re: [PATCH] Vectorization of BB reductions

2021-06-16 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> This adds a simple reduction vectorization capability to the
> non-loop vectorizer.  Simple meaning it lacks any of the fancy
> ways to generate the reduction epilogue but only supports
> those we can handle via a direct internal function reducing
> a vector to a scalar.  One of the main reasons is to avoid
> massive refactoring at this point but also that more complex
> epilogue operations are hardly profitable.
>
> Mixed sign reductions are for now fend off and I'm not finally
> settled with whether we want an explicit SLP node for the
> reduction epilogue operation.  Handling mixed signs could be
> done by multiplying with a { 1, -1, .. } vector.  Fend off
> are also reductions with non-internal operands (constants
> or register parameters for example).
>
> Costing is done by accounting the original scalar participating
> stmts for the scalar cost and log2 permutes and operations for
> the vectorized epilogue.

It would be good if we have had a standard way of asking for this
cost for both loops and SLP, perhaps based on the internal function.
E.g. for aarch64 we have a cost table that gives a more precise cost
(and log2 of the scalar op isn't always it :-)).

I don't have any specific suggestion how though.  And I guess it
can be a follow-on patch anyway.

> SPEC CPU 2017 FP with rate workload measurements show (picked
> fastest runs of three) regressions for 507.cactuBSSN_r (1.5%),
> 508.namd_r (2.5%), 511.povray_r (2.5%), 526.blender_r (0.5) and
> 527.cam4_r (2.5%) and improvements for 510.parest_r (5%) and
> 538.imagick_r (1.5%).  This is with -Ofast -march=znver2 on a Zen2.
>
> Statistics on CPU 2017 shows that the overwhelming number of seeds
> we find are reductions of two lanes (well - that's basically every
> associative operation).  That means we put a quite high pressure
> on the SLP discovery process this way.
>
> In total we find 583218 seeds we put to SLP discovery out of which
> 66205 pass that and only 6185 of those make it through
> code generation checks. 796 of those are discarded because the reduction
> is part of a larger SLP instance.  4195 of the remaining
> are deemed not profitable to vectorize and 1194 are finally
> vectorized.  That's a poor 0.2% rate.

Oof.

> Of the 583218 seeds 486826 (83%) have two lanes, 60912 have three (10%),
> 28181 four (5%), 4808 five, 909 six and there are instances up to 120
> lanes.
>
> There's a set of 54086 candidate seeds we reject because
> they contain a constant or invariant (not implemented yet) but still
> have two or more lanes that could be put to SLP discovery.

It looks like the patch doesn't explicitly forbid 2-element reductions
and instead relies on the cost model.  Is that right?

> Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also
> built and tested SPEC CPU 2017 with -Ofast -march=znver2 successfully.
>
> I do think this is good enough(TM) for this point, please speak up
> if you disagree and/or like to see changes.

No objection from me FWIW.  Looks like a nice feature :-)

Thanks,
Richard

>
> Thanks,
> Richard.
>
> 2021-06-16  Richard Biener   
>
>   PR tree-optimization/54400
>   * tree-vectorizer.h (enum slp_instance_kind): Add
>   slp_inst_kind_bb_reduc.
>   (reduction_fn_for_scalar_code): Declare.
>   * tree-vect-data-refs.c (vect_slp_analyze_instance_dependence):
>   Check SLP_INSTANCE_KIND instead of looking at the
>   representative.
>   (vect_slp_analyze_instance_alignment): Likewise.
>   * tree-vect-loop.c (reduction_fn_for_scalar_code): Export.
>   * tree-vect-slp.c (vect_slp_linearize_chain): Split out
>   chain linearization from vect_build_slp_tree_2 and generalize
>   for the use of BB reduction vectorization.
>   (vect_build_slp_tree_2): Adjust accordingly.
>   (vect_optimize_slp): Elide permutes at the root of BB reduction
>   instances.
>   (vectorizable_bb_reduc_epilogue): New function.
>   (vect_slp_prune_covered_roots): Likewise.
>   (vect_slp_analyze_operations): Use them.
>   (vect_slp_check_for_constructors): Recognize associatable
>   chains for BB reduction vectorization.
>   (vectorize_slp_instance_root_stmt): Generate code for the
>   BB reduction epilogue.
>
>   * gcc.dg/vect/bb-slp-pr54400.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c |  43 +++
>  gcc/tree-vect-data-refs.c  |   9 +-
>  gcc/tree-vect-loop.c   |   2 +-
>  gcc/tree-vect-slp.c| 383 +
>  gcc/tree-vectorizer.h  |   2 +
>  5 files changed, 367 insertions(+), 72 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-pr54400.c

Re: [committed] libstdc++: Make ranges CPOs final and not addressable

2021-06-16 Thread Jonathan Wakely via Gcc-patches

On Tue, 15 Jun 2021 at 22:33, Jonathan Wakely wrote:
>
> On Tue, 15 Jun 2021 at 21:32, Tim Song wrote:
> >
> > CPOs are specified as actual semiregular function objects that can be
> > copied and constructed freely, so it seems a bit hostile to make them
> > final/non-addressable? (It's debatable whether the type of a CPO is a
> > type "specified in the C++ standard library" for which [derivation]/4
> > would apply.)
>
> I noticed that libstdc++ was failing some libc++ tests, but that was
> only for ranges::advance etc and not the CPOs. I guess I got a bit
> carried away, and it shouldn't apply to the CPOs, only the
> [range.iter.ops] "function templates" (which are not really function
> templates).

This reverts the changes to the [range.access] CPOs, and improves some
tests slightly.

Tested powerpc64le-linux. Pushed to trunk.
commit b9e35ee6d64bc9f82b8fe641aa8ac12a9e259fe8
Author: Jonathan Wakely 
Date:   Wed Jun 16 12:34:52 2021

libstdc++: Revert final/non-addressable changes to ranges CPOs

In r12-1489-g8b93548778a487f31f21e0c6afe7e0bde9711fc4 I made the
[range.access] CPO types final and non-addressable. Tim Song pointed out
this is wrong. Only the [range.iter.ops] functions should be final and
non-addressable. Revert the changes to the [range.access] objects.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/ranges_base.h (ranges::begin, ranges::end)
(ranges::cbegin, ranges::cend, ranges::rbeing, ranges::rend)
(ranges::crbegin, ranges::crend, ranges::size, ranges::ssize)
(ranges::empty, ranges::data, ranges::cdata): Remove final
keywords and deleted operator& overloads.
* testsuite/24_iterators/customization_points/iter_move.cc: Use
new is_customization_point_object function.
* testsuite/24_iterators/customization_points/iter_swap.cc:
Likewise.
* testsuite/std/concepts/concepts.lang/concept.swappable/swap.cc:
Likewise.
* testsuite/std/ranges/access/begin.cc: Likewise.
* testsuite/std/ranges/access/cbegin.cc: Likewise.
* testsuite/std/ranges/access/cdata.cc: Likewise.
* testsuite/std/ranges/access/cend.cc: Likewise.
* testsuite/std/ranges/access/crbegin.cc: Likewise.
* testsuite/std/ranges/access/crend.cc: Likewise.
* testsuite/std/ranges/access/data.cc: Likewise.
* testsuite/std/ranges/access/empty.cc: Likewise.
* testsuite/std/ranges/access/end.cc: Likewise.
* testsuite/std/ranges/access/rbegin.cc: Likewise.
* testsuite/std/ranges/access/rend.cc: Likewise.
* testsuite/std/ranges/access/size.cc: Likewise.
* testsuite/std/ranges/access/ssize.cc: Likewise.
* testsuite/util/testsuite_iterators.h
(is_customization_point_object): New function.

diff --git a/libstdc++-v3/include/bits/ranges_base.h 
b/libstdc++-v3/include/bits/ranges_base.h
index e392c370fcd..25af4b742a6 100644
--- a/libstdc++-v3/include/bits/ranges_base.h
+++ b/libstdc++-v3/include/bits/ranges_base.h
@@ -91,7 +91,7 @@ namespace ranges
 using std::ranges::__detail::__maybe_borrowed_range;
 using std::__detail::__range_iter_t;

-struct _Begin final
+struct _Begin
 {
 private:
   template
@@ -106,8 +106,6 @@ namespace ranges
return noexcept(__decay_copy(begin(std::declval<_Tp&>(;
}

-  void operator&() const = delete;
-
 public:
   template<__maybe_borrowed_range _Tp>
requires is_array_v> || __member_begin<_Tp>
@@ -144,7 +142,7 @@ namespace ranges
  { __decay_copy(end(__t)) } -> sentinel_for<__range_iter_t<_Tp>>;
};

-struct _End final
+struct _End
 {
 private:
   template
@@ -159,8 +157,6 @@ namespace ranges
return noexcept(__decay_copy(end(std::declval<_Tp&>(;
}

-  void operator&() const = delete;
-
 public:
   template<__maybe_borrowed_range _Tp>
requires is_bounded_array_v>
@@ -193,7 +189,7 @@ namespace ranges
  return static_cast(__t);
   }

-struct _CBegin final
+struct _CBegin
 {
   template
constexpr auto
@@ -203,8 +199,6 @@ namespace ranges
{
  return _Begin{}(__cust_access::__as_const<_Tp>(__e));
}
-
-  void operator&() const = delete;
 };

 struct _CEnd final
@@ -217,8 +211,6 @@ namespace ranges
{
  return _End{}(__cust_access::__as_const<_Tp>(__e));
}
-
-  void operator&() const = delete;
 };

 template
@@ -244,7 +236,7 @@ namespace ranges
  { _End{}(__t) } -> same_as;
};

-struct _RBegin final
+struct _RBegin
 {
 private:
   template
@@ -268,8 +260,6 @@ namespace ranges
}
}

-  void operator&() const = delete;
-
 public:

Re: [committed] libstdc++: Use function object for __decay_copy helper

2021-06-16 Thread Jonathan Wakely via Gcc-patches

On Tue, 15 Jun 2021 at 19:28, Jonathan Wakely wrote:
>
> By changing __cust_access::__decay_copy from a function template to a
> function object we avoid ADL. That means it's fine to call it
> unqualified (the compiler won't waste time doing ADL in associated
> namespaces, and won't try to complete associated types).
>
> This also makes some other minor simplications to other concepts for the
> [range.access] CPOs.
>
> Signed-off-by: Jonathan Wakely 
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/iterator_concepts.h (__cust_access::__decay_copy):
> Replace with function object.
> (__cust_access::__member_begin, ___cust_access::_adl_begin): Use
> __decay_copy unqualified.
> * include/bits/ranges_base.h (__member_end, __adl_end):
> Likewise. Use __range_iter_t for type of ranges::begin.
> (__member_rend): Use correct value category for rbegin argument.
> (__member_data): Use __decay_copy unqualified.
> (__begin_data): Use __range_iter_t for type of ranges::begin.

That change makes it impossible to import the header in a module.
Fixed by this patch.

Tested powerpc64le-linux. Pushed to trunk.
commit c25e3bf87975280a603ff18fba387c6707ce4a95
Author: Jonathan Wakely 
Date:   Wed Jun 16 12:47:32 2021

libstdc++: Use named struct for __decay_copy

In r12-1486-gcb326a6442f09cb36b05ce556fc91e10bfeb0cf6 I changed
__decay_copy to be a function object of unnamed class type. This causes
problems when importing the library headers:

error: conflicting global module declaration 'constexpr const 
std::ranges::__cust_access:: 
std::ranges::__cust_access::__decay_copy'

The fix is to use a named struct instead of an anonymous one.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/iterator_concepts.h (__decay_copy): Name type.

diff --git a/libstdc++-v3/include/bits/iterator_concepts.h 
b/libstdc++-v3/include/bits/iterator_concepts.h
index d18ae32bf20..11748e5ed7b 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -930,7 +930,8 @@ namespace ranges
   {
 using std::__detail::__class_or_enum;
 
-struct {
+struct _Decay_copy final
+{
   template
constexpr decay_t<_Tp>
operator()(_Tp&& __t) const

Re: [PATCH] Vectorization of BB reductions

2021-06-16 Thread Richard Biener

On Wed, 16 Jun 2021, Richard Sandiford wrote:

> Richard Biener  writes:
> > This adds a simple reduction vectorization capability to the
> > non-loop vectorizer.  Simple meaning it lacks any of the fancy
> > ways to generate the reduction epilogue but only supports
> > those we can handle via a direct internal function reducing
> > a vector to a scalar.  One of the main reasons is to avoid
> > massive refactoring at this point but also that more complex
> > epilogue operations are hardly profitable.
> >
> > Mixed sign reductions are for now fend off and I'm not finally
> > settled with whether we want an explicit SLP node for the
> > reduction epilogue operation.  Handling mixed signs could be
> > done by multiplying with a { 1, -1, .. } vector.  Fend off
> > are also reductions with non-internal operands (constants
> > or register parameters for example).
> >
> > Costing is done by accounting the original scalar participating
> > stmts for the scalar cost and log2 permutes and operations for
> > the vectorized epilogue.
> 
> It would be good if we have had a standard way of asking for this
> cost for both loops and SLP, perhaps based on the internal function.
> E.g. for aarch64 we have a cost table that gives a more precise cost
> (and log2 of the scalar op isn't always it :-)).
> 
> I don't have any specific suggestion how though.  And I guess it
> can be a follow-on patch anyway.

Yeah, the only idea I came up with that would work "now" would
be to build a fake gimple stmt and feed that to the add_stmt_cost
hook ...

In the end the cost hook will be passed the SLP nodes, but then
at the moment the reduction op doesn't have any but it's
implicit in the SLP instance kind "root" handling.  We would
need to add a lane-reducing SLP node kind, I'd rather not
abuse VEC_PERM_EXPR nodes directly here.  Theres some PR
asking for straight-line vectorization use of SAD which
usually ends up doing a 16xchar -> 4xint "reduction" - we'd
need some way to lay out input / output lanes for such
operation (and of course specify the reduction operation).
The root SLP node for a reduction-to-scalar operation would
then be .IFN_REDUC_PLUS_SCAL and it would have a single output
lane.  But as said I would want it to handle "partial" reductions
as well, like for SAD pattern detection.  Maybe treating it
as black-box is good enough though - who knows ;)

Thus for now I went with a conservative estimate - I hope
it _is_ conservative in all cases, is it?  (not exactly as can
be seen below)

Well, if all fails adding some entry to vect_cost_for_stmt would
work as well I guess.  We do pass the last stmt of the reduction
and with say vector_reduc_to_scalar the backend could second-guess
the actual operation carried out - but then it would have to,
since the default hook implementations are not selective on
new cost kinds.

> > SPEC CPU 2017 FP with rate workload measurements show (picked
> > fastest runs of three) regressions for 507.cactuBSSN_r (1.5%),
> > 508.namd_r (2.5%), 511.povray_r (2.5%), 526.blender_r (0.5) and
> > 527.cam4_r (2.5%) and improvements for 510.parest_r (5%) and
> > 538.imagick_r (1.5%).  This is with -Ofast -march=znver2 on a Zen2.
> >
> > Statistics on CPU 2017 shows that the overwhelming number of seeds
> > we find are reductions of two lanes (well - that's basically every
> > associative operation).  That means we put a quite high pressure
> > on the SLP discovery process this way.
> >
> > In total we find 583218 seeds we put to SLP discovery out of which
> > 66205 pass that and only 6185 of those make it through
> > code generation checks. 796 of those are discarded because the reduction
> > is part of a larger SLP instance.  4195 of the remaining
> > are deemed not profitable to vectorize and 1194 are finally
> > vectorized.  That's a poor 0.2% rate.
> 
> Oof.

Yeah, that's of course because every single 'plus' is a reduction
of two lanes...

> > Of the 583218 seeds 486826 (83%) have two lanes, 60912 have three (10%),
> > 28181 four (5%), 4808 five, 909 six and there are instances up to 120
> > lanes.
> >
> > There's a set of 54086 candidate seeds we reject because
> > they contain a constant or invariant (not implemented yet) but still
> > have two or more lanes that could be put to SLP discovery.
> 
> It looks like the patch doesn't explicitly forbid 2-element reductions
> and instead relies on the cost model.  Is that right?

Yes, because it's the only "seed" we'd start trying to vectorize
like (x[0] * (b[0] + d[0])) + (x[1] * (b[1] + d[1])).  But yes,
I do hope that for plain x[0] + x[1] we either say costing isn't
worthwhile or we generate reasonable code.  On x86_64 it produces
(with generic costing):

movupd  (%rdi), %xmm1
movapd  %xmm1, %xmm0
unpckhpd%xmm1, %xmm0
addpd   %xmm1, %xmm0

compared to

movsd   (%rdi), %xmm0
addsd   8(%rdi), %xmm0

:/  But this also feels like some missed optimization on RTL.
The x86 backend expands .REDUC_PLUS to f

Re: [committed] libstdc++: Add noexcept specifiers to some range adaptors

2021-06-16 Thread Jonathan Wakely via Gcc-patches

On Tue, 15 Jun 2021 at 19:32, Jonathan Wakely wrote:
>
> Signed-off-by: Jonathan Wakely 
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/ranges_util.h (view_interface): Add noexcept to
> empty, operator bool, data and size members.
> (subrange): Add noexcept to constructors.
> * include/std/ranges (single_view, ref_view): Add noexcept to
> constructors.
> (views::single, views::all): Add noexcept.
> * testsuite/std/ranges/adaptors/all.cc: Check noexcept.
> * testsuite/std/ranges/single_view.cc: Likewise.
>
> Tested powerpc64le-linux. Committed to trunk.

This one also breaks modules, but seems to be a bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101095

If the compiler bug isn't fixed quickly I'll probably have to remove
the new noexcept-specifiers from the subrange constructors.

Re: [PATCH 2/1] libstdc++: Non-triv-copyable extra args aren't simple [PR100940]

2021-06-16 Thread Jonathan Wakely via Gcc-patches

On Tue, 15 Jun 2021 at 20:29, Patrick Palka via Libstdc++
 wrote:
>
> On Tue, 15 Jun 2021, Patrick Palka wrote:
>
> > This force-enables perfect forwarding call wrapper semantics whenever
> > the extra arguments of a partially applied range adaptor aren't all
> > trivially copyable, so as to avoid incurring unnecessary copies of
> > potentially expensive-to-copy objects (such as std::function objects)
> > when invoking the adaptor.
> >
> > Tested on x86_64-pc-linux-gnu, does this look OK for trunk/11?
> >
> >   PR libstdc++/100940
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/std/ranges (__adaptor::__adaptor_has_simple_extra_args): 
> > Also
> >   require that the extra arguments are trivially copyable.
> >   * testsuite/std/ranges/adaptors/100577.cc (test04): New test.
> > ---
> >  libstdc++-v3/include/std/ranges   |  6 --
> >  .../testsuite/std/ranges/adaptors/100577.cc   | 19 +++
> >  2 files changed, 23 insertions(+), 2 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/std/ranges 
> > b/libstdc++-v3/include/std/ranges
> > index 856975c6934..e858df88088 100644
> > --- a/libstdc++-v3/include/std/ranges
> > +++ b/libstdc++-v3/include/std/ranges
> > @@ -818,8 +818,10 @@ namespace views::__adaptor
> >// True if the behavior of the range adaptor non-closure _Adaptor is
> >// independent of the value category of its extra arguments _Args.
> >template
> > -concept __adaptor_has_simple_extra_args = 
> > _Adaptor::_S_has_simple_extra_args
> > -  || _Adaptor::template _S_has_simple_extra_args<_Args...>;
> > +concept __adaptor_has_simple_extra_args
> > +  = (_Adaptor::_S_has_simple_extra_args
> > +  || _Adaptor::template _S_has_simple_extra_args<_Args...>)
> > + && (is_trivially_copyable_v<_Args> && ...);
>
> On second thought, perhaps it'd be cleaner to leave this concept alone
> and instead encode the trivial-copyability requirement as a separate
> constraint on the relevant partial specializations of _Partial?
> Something like:

OK for trunk and 11, thanks.


>
> -- >8 --
>
>
> PR libstdc++/100940
>
> libstdc++-v3/ChangeLog:
>
> * include/std/ranges (__adaptor::_Partial): For the "simple"
> forwarding partial specializations, also require that
> the extra arguments are trivially copyable.
> * testsuite/std/ranges/adaptors/100577.cc (test04): New test.
> ---
>  libstdc++-v3/include/std/ranges|  8 +---
>  .../testsuite/std/ranges/adaptors/100577.cc| 14 ++
>  2 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
> index 856975c6934..24411124580 100644
> --- a/libstdc++-v3/include/std/ranges
> +++ b/libstdc++-v3/include/std/ranges
> @@ -892,11 +892,12 @@ namespace views::__adaptor
>  };
>
>// Partial specialization of the primary template for the case where the 
> extra
> -  // arguments of the adaptor can always be safely forwarded by const 
> reference.
> -  // This lets us get away with a single operator() overload, which makes
> -  // overload resolution failure diagnostics more concise.
> +  // arguments of the adaptor can always be safely and efficiently forwarded 
> by
> +  // const reference.  This lets us get away with a single operator() 
> overload,
> +  // which makes overload resolution failure diagnostics more concise.
>template
>  requires __adaptor_has_simple_extra_args<_Adaptor, _Args...>
> +  && (is_trivially_copyable_v<_Args> && ...)
>  struct _Partial<_Adaptor, _Args...> : _RangeAdaptorClosure
>  {
>tuple<_Args...> _M_args;
> @@ -926,6 +927,7 @@ namespace views::__adaptor
>// where _Adaptor accepts a single extra argument.
>template
>  requires __adaptor_has_simple_extra_args<_Adaptor, _Arg>
> +  && is_trivially_copyable_v<_Arg>
>  struct _Partial<_Adaptor, _Arg> : _RangeAdaptorClosure
>  {
>_Arg _M_arg;
> diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/100577.cc 
> b/libstdc++-v3/testsuite/std/ranges/adaptors/100577.cc
> index 8ef084621f9..06be4980ddb 100644
> --- a/libstdc++-v3/testsuite/std/ranges/adaptors/100577.cc
> +++ b/libstdc++-v3/testsuite/std/ranges/adaptors/100577.cc
> @@ -21,6 +21,7 @@
>  // PR libstdc++/100577
>
>  #include 
> +#include 
>
>  namespace ranges = std::ranges;
>  namespace views = std::ranges::views;
> @@ -113,4 +114,17 @@ test03()
>x | std::views::drop(S{});
>  }
>
> +void
> +test04()
> +{
> +  // Non-trivially-copyable extra arguments make a closure not simple.
> +  using F = std::function;
> +  static_assert(!std::is_trivially_copyable_v);
> +  using views::__adaptor::__closure_has_simple_call_op;
> +  
> static_assert(!__closure_has_simple_call_op()))>);
> +  
> static_assert(!__closure_has_simple_call_op()))>);
> +  
> static_assert(!__closure_has_simple_call_op()))>);
> +  
> static_assert(!__closure_has_simple_call_op()))>);
> +}

Re: [PATCH 2/2] libstdc++: Use template form for pretty-printing tuple elements

2021-06-16 Thread Jonathan Wakely via Gcc-patches

On Mon, 14 Jun 2021 at 19:14, Paul Smith via Libstdc++
 wrote:
>
> std::tuple elements are retrieved via std::get<> (template) not
> [] (array); have the generated output string match this.

Both of your patches seem to be based on the idea that the output is
supposed to correspond to how you access the tuple, but that isn't
meant to be the case. The fact we show [1] isn't suppose to mean you
can access that element as tup[1]. For example, the std::set printer
shows:

$1 = std::set with 3 elements = {[0] = 1, [1] = 2, [2] = 3}

This isn't supposed to imply that you can access the member as s[0].
However, it does use a zero-based index! I think using a zero-based
index for tuples makes sense too, although your patch will cause
testsuite failures, won't it? The test needs to change too.

[PATCH] i386: Add missing two element 64bit vector permutations [PR89021]

2021-06-16 Thread Uros Bizjak via Gcc-patches

In addition to V8QI permutations, several other missing permutations are
added for 64bit vector modes for TARGET_SSSE3 and TARGET_SSE4_1 targets.

2021-06-16  Uroš Bizjak  

gcc/
PR target/89021
* config/i386/i386-expand.c (expand_vec_perm_2perm_pblendv):
Handle 64bit modes for TARGET_SSE4_1.
(expand_vec_perm_pshufb2): Handle 64bit modes for TARGET_SSSE3.
(expand_vec_perm_even_odd_pack): Handle V4HI mode.
(expand_vec_perm_even_odd_1) : Expand via
expand_vec_perm_pshufb2 for TARGET_SSSE3 and via
expand_vec_perm_even_odd_pack for TARGET_SSE4_1.
* config/i386/mmx.md (mmx_packusdw): New insn pattern.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index dee3df2e3a0..eb6f9b0684e 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -18972,7 +18974,8 @@ expand_vec_perm_2perm_pblendv (struct expand_vec_perm_d 
*d, bool two_insn)
 ;
   else if (TARGET_AVX && (vmode == V4DFmode || vmode == V8SFmode))
 ;
-  else if (TARGET_SSE4_1 && GET_MODE_SIZE (vmode) == 16)
+  else if (TARGET_SSE4_1 && (GET_MODE_SIZE (vmode) == 16
+|| GET_MODE_SIZE (vmode) == 8))
 ;
   else
 return false;
@@ -19229,14 +19232,31 @@ expand_vec_perm_pshufb2 (struct expand_vec_perm_d *d)
 {
   rtx rperm[2][16], vperm, l, h, op, m128;
   unsigned int i, nelt, eltsz;
+  machine_mode mode;
+  rtx (*gen) (rtx, rtx, rtx);
 
-  if (!TARGET_SSSE3 || GET_MODE_SIZE (d->vmode) != 16)
+  if (!TARGET_SSSE3 || (GET_MODE_SIZE (d->vmode) != 16
+   && GET_MODE_SIZE (d->vmode) != 8))
 return false;
   gcc_assert (!d->one_operand_p);
 
   if (d->testing_p)
 return true;
 
+  switch (GET_MODE_SIZE (d->vmode))
+{
+case 8:
+  mode = V8QImode;
+  gen = gen_mmx_pshufbv8qi3;
+  break;
+case 16:
+  mode = V16QImode;
+  gen = gen_ssse3_pshufbv16qi3;
+  break;
+default:
+  gcc_unreachable ();
+}
+
   nelt = d->nelt;
   eltsz = GET_MODE_UNIT_SIZE (d->vmode);
 
@@ -19247,7 +19267,7 @@ expand_vec_perm_pshufb2 (struct expand_vec_perm_d *d)
   m128 = GEN_INT (-128);
   for (i = 0; i < nelt; ++i)
 {
-  unsigned j, e = d->perm[i];
+  unsigned j, k, e = d->perm[i];
   unsigned which = (e >= nelt);
   if (e >= nelt)
e -= nelt;
@@ -19257,26 +19277,29 @@ expand_vec_perm_pshufb2 (struct expand_vec_perm_d *d)
  rperm[which][i*eltsz + j] = GEN_INT (e*eltsz + j);
  rperm[1-which][i*eltsz + j] = m128;
}
+
+  for (k = i*eltsz + j; k < 16; ++k)
+   rperm[0][k] = rperm[1][k] = m128;
 }
 
   vperm = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, rperm[0]));
   vperm = force_reg (V16QImode, vperm);
 
-  l = gen_reg_rtx (V16QImode);
-  op = gen_lowpart (V16QImode, d->op0);
-  emit_insn (gen_ssse3_pshufbv16qi3 (l, op, vperm));
+  l = gen_reg_rtx (mode);
+  op = gen_lowpart (mode, d->op0);
+  emit_insn (gen (l, op, vperm));
 
   vperm = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, rperm[1]));
   vperm = force_reg (V16QImode, vperm);
 
-  h = gen_reg_rtx (V16QImode);
-  op = gen_lowpart (V16QImode, d->op1);
-  emit_insn (gen_ssse3_pshufbv16qi3 (h, op, vperm));
+  h = gen_reg_rtx (mode);
+  op = gen_lowpart (mode, d->op1);
+  emit_insn (gen (h, op, vperm));
 
   op = d->target;
-  if (d->vmode != V16QImode)
-op = gen_reg_rtx (V16QImode);
-  emit_insn (gen_iorv16qi3 (op, l, h));
+  if (d->vmode != mode)
+op = gen_reg_rtx (mode);
+  emit_insn (gen_rtx_SET (op, gen_rtx_IOR (mode, l, h)));
   if (op != d->target)
 emit_move_insn (d->target, gen_lowpart (d->vmode, op));
 
@@ -19455,6 +19478,17 @@ expand_vec_perm_even_odd_pack (struct 
expand_vec_perm_d *d)
 
   switch (d->vmode)
 {
+case E_V4HImode:
+  /* Required for "pack".  */
+  if (!TARGET_SSE4_1)
+   return false;
+  c = 0x;
+  s = 16;
+  half_mode = V2SImode;
+  gen_and = gen_andv2si3;
+  gen_pack = gen_mmx_packusdw;
+  gen_shift = gen_lshrv2si3;
+  break;
 case E_V8HImode:
   /* Required for "pack".  */
   if (!TARGET_SSE4_1)
@@ -19507,7 +19541,7 @@ expand_vec_perm_even_odd_pack (struct expand_vec_perm_d 
*d)
   end_perm = true;
   break;
 default:
-  /* Only V8QI, V8HI, V16QI, V16HI and V32QI modes
+  /* Only V4HI, V8QI, V8HI, V16QI, V16HI and V32QI modes
 are more profitable than general shuffles.  */
   return false;
 }
@@ -19698,18 +19732,25 @@ expand_vec_perm_even_odd_1 (struct expand_vec_perm_d 
*d, unsigned odd)
   break;
 
 case E_V4HImode:
-  if (d->testing_p)
-   break;
-  /* We need 2*log2(N)-1 operations to achieve odd/even
-with interleave. */
-  t1 = gen_reg_rtx (V4HImode);
-  emit_insn (gen_mmx_punpckhwd (t1, d->op0, d->op1));
-  emit_insn (gen_mmx_punpcklwd (d->target, d->op0, d->op1));
-  if (odd)
-   t2 = g

Re: [RFC] ldist: Recognize rawmemchr loop patterns

2021-06-16 Thread Richard Biener via Gcc-patches

On Mon, Jun 14, 2021 at 7:26 PM Stefan Schulze Frielinghaus
 wrote:
>
> On Thu, May 20, 2021 at 08:37:24PM +0200, Stefan Schulze Frielinghaus wrote:
> [...]
> > > but we won't ever arrive here because of the niters condition.  But
> > > yes, doing the pattern matching in the innermost loop processing code
> > > looks good to me - for the specific case it would be
> > >
> > >   /* Don't distribute loop if niters is unknown.  */
> > >   tree niters = number_of_latch_executions (loop);
> > >   if (niters == NULL_TREE || niters == chrec_dont_know)
> > > ---> here?
> > > continue;
> >
> > Right, please find attached a new version of the patch where everything
> > is included in the loop distribution pass.  I will do a bootstrap and
> > regtest on IBM Z over night.  If you give me green light I will also do
> > the same on x86_64.
>
> Meanwhile I gave it a shot on x86_64 where the testsuite runs fine (at
> least the ldist-strlen testcase).  If you are Ok with the patch, then I
> would rebase and run the testsuites again and post a patch series
> including the rawmemchr implementation for IBM Z.

@@ -3257,6 +3261,464 @@ find_seed_stmts_for_distribution (class loop
*loop, vec *work_list)
   return work_list->length () > 0;
 }

+static void
+generate_rawmemchr_builtin (loop_p loop, tree reduction_var,
+   data_reference_p store_dr, tree base, tree pattern,
+   location_t loc)
+{

this new function needs a comment.  Applies to all of the new ones, btw.

+  gcc_checking_assert (POINTER_TYPE_P (TREE_TYPE (base))
+  && TREE_TYPE (TREE_TYPE (base)) == TREE_TYPE (pattern));

this looks fragile and is probably unnecessary as well.

+  gcc_checking_assert (TREE_TYPE (reduction_var) == TREE_TYPE (base));

in general you want types_compatible_p () checks which for pointers means
all pointers are compatible ...

(skipping stuff)

@@ -3321,10 +3783,20 @@ loop_distribution::execute (function *fun)
  && !optimize_loop_for_speed_p (loop)))
continue;

-  /* Don't distribute loop if niters is unknown.  */
+  /* If niters is unknown don't distribute loop but rather try to transform
+it to a call to a builtin.  */
   tree niters = number_of_latch_executions (loop);
   if (niters == NULL_TREE || niters == chrec_dont_know)
-   continue;
+   {
+ if (transform_reduction_loop (loop))
+   {
+ changed = true;
+ loops_to_be_destroyed.safe_push (loop);
+ if (dump_file)
+   fprintf (dump_file, "Loop %d transformed into a
builtin.\n", loop->num);
+   }
+ continue;
+   }

please look at

  if (nb_generated_loops + nb_generated_calls > 0)
{
  changed = true;
  if (dump_enabled_p ())
dump_printf_loc (MSG_OPTIMIZED_LOCATIONS,
 loc, "Loop%s %d distributed: split to
%d loops "
 "and %d library calls.\n", str, loop->num,
 nb_generated_loops, nb_generated_calls);

and follow the use of dump_* and MSG_OPTIMIZED_LOCATIONS so the
transforms are reported with -fopt-info-loop

+
+  return transform_reduction_loop_1 (loop, load_dr, store_dr, reduction_var);
+}

what's the point in tail-calling here and visually splitting the
function in half?

(sorry for picking random pieces now ;))

+  for (gphi_iterator bsi = gsi_start_phis (bb); !gsi_end_p (bsi);
+  gsi_next (&bsi), ++ninsns)
+   {

this counts debug insns, I guess you want gsi_next_nondebug at least.
not sure why you are counting PHIs at all btw - for the loops you match
you are expecting at most two, one IV and eventually one for the virtual
operand of the store?

+ if (gimple_has_volatile_ops (phi))
+   return false;

PHIs never have volatile ops.

+ if (gimple_clobber_p (phi))
+   continue;

or are clobbers.

Btw, can you factor out a helper from find_single_drs working on a
stmt to reduce code duplication?

+  tree reduction_var;
+  switch (gimple_code (reduction_stmt))
+{
+case GIMPLE_PHI:
+  reduction_var = gimple_phi_result (reduction_stmt);
+  break;
+case GIMPLE_ASSIGN:
+  reduction_var = gimple_assign_lhs (reduction_stmt);
+  break;
+default:
+  /* Bail out e.g. for GIMPLE_CALL.  */
+  return false;

gimple_get_lhs (reduction_stmt); would work for both PHIs
and assigns.

+  if (reduction_var == NULL)
+return false;

it can never be NULL here.

+  /* Bail out if this is a bitfield memory reference.  */
+  if (TREE_CODE (DR_REF (load_dr)) == COMPONENT_REF
+  && DECL_BIT_FIELD (TREE_OPERAND (DR_REF (load_dr), 1)))
+return false;
...

I see this is again quite some code copied from find_single_drs, please
see how to avoid this much duplication by splitting out helpers.

+static bool
+transform_reduction_loop_1 (loop_p loop,
+

Re: [PATCH] tree-sra: Do not refresh readonly decls (PR 100453)

2021-06-16 Thread Jeff Law via Gcc-patches





On 6/16/2021 4:00 AM, Richard Biener wrote:

On Wed, 16 Jun 2021, Martin Jambor wrote:


Hi Richi,

On Tue, Jun 15 2021, Richard Biener wrote:

On June 15, 2021 5:09:40 PM GMT+02:00, Martin Jambor  wrote:

Hi,

When SRA transforms an assignment where the RHS is an aggregate decl
that it creates replacements for, the (least efficient) fallback method
of dealing with them is to store all the replacements back into the
original decl and then let the original assignment takes its course.

That of course should not need to be done for TREE_READONLY bases which
cannot change contents.  The SRA code handled this situation only for
DECL_IN_CONSTANT_POOL const decls, this patch modifies the check so
that
it tests for TREE_READONLY and I also looked at all other callers of
generate_subtree_copies and added checks to another one dealing with
the
same exact situation and one which deals with it in a non-assignment
context.

This behavior also means that SRA has to disqualify any candidate decl
that is read-only and written to.  I plan to continue to hunt down at
least some of such occurrences.

Bootstrapped and tested on x86_64-linux, i686-linux and aarch64-linux
(this time With Ada enabled on all three platforms).  OK for trunk?

Ok.

Thanks,
Richard.


Thanks for a quick approval.  However, when looking for sources of
additional non-read-only TREE_READONLY decls, I found the following code
and comment in setup_one_parameter() in tree-inline.c, and the last
comment sentence made me wonder if my patch is perhaps too strict:

   /* Even if P was TREE_READONLY, the new VAR should not be.
  In the original code, we would have constructed a
  temporary, and then the function body would have never
  changed the value of P.  However, now, we will be
  constructing VAR directly.  The constructor body may
  change its value multiple times as it is being
  constructed.  Therefore, it must not be TREE_READONLY;
  the back-end assumes that TREE_READONLY variable is
  assigned to only once.  */
   if (TYPE_NEEDS_CONSTRUCTING (TREE_TYPE (p)))
 TREE_READONLY (var) = 0;

Is the last sentence in the comment true?  Do we want it to be true?  It
contradicts the description of TREE_READONLY in tree.h.  (Would the
described property ever be useful in the middle-end or back-end?)

I think the last sentence refers to RTX_UNCHANGING_P which we thankfully
removed.  Now, that means we need to clear TREE_READONLY unconditionally
here I think (unless we can prove it's uninitialized in the caller,
but I guess we don't need to prematurely optimize that case).
Yea, I suspect that TREE_READONLY would morph into RTX_UNCHANGING_P 
which we did assume was written only once and it was nothing but trouble.

jeff

Re: [PATCH] make rich_location safe to copy

2021-06-16 Thread Martin Sebor via Gcc-patches


On 6/16/21 6:38 AM, David Malcolm wrote:

On Tue, 2021-06-15 at 19:48 -0600, Martin Sebor wrote:

Thanks for writing the patch.


While debugging locations I noticed the semi_embedded_vec template
in line-map.h doesn't declare a copy ctor or copy assignment, but
is being copied in a couple of places in the C++ parser (via
gcc_rich_location).  It gets away with it most likely because it
never grows beyond the embedded buffer.


Where are these places?  I wasn't aware of this.


They're in the attached file along with the diff to reproduce
the errors.

I was seeing strange behavior in my tests that led me to rich_location
and the m_ranges member.  The problem turned out to be unrelated but
before I figured it out I noticed the missing copy ctor and deleted
it to see if it was being used.  Since that's such a pervasive bug
in GCC code (and likely elsewhere as well) I'm thinking I should take
the time to develop the warning I've been thinking about to detect it.



The attached patch defines the copy ctor and also copy assignment
and adds the corresponding move functions.


Note that rich_location::m_fixit_hints "owns" the fixit_hint instances,
manually deleting them in rich_location's dtor, so simply doing a
shallow copy of it would be wrong.

Also, a rich_location stores other pointers (to range_labels and
diagnostic_path), which are borrowed pointers, where their lifetime is
assumed to outlive any (non-dtor) calls to the rich_location.  So I'm
nervous about code that copies rich_location instances.

I think I'd prefer to forbid copying them; what's the use-case for
copying them?  Am I missing something here?


I noticed and fixed just the one problem I uncovered by accident with
the missing copy ctor.  If there are others I don't know about them.
Preventing code from copying rich_location might make sense
independently of fixing the vec class to be safely copyable.

Martin





Tested on x86_64-linux.

Martin


Thanks
Dave



/src/gcc/master/gcc/cp/parser.c: In function ‘tree_node* cp_parser_selection_statement(cp_parser*, bool*, vec*)’:
/src/gcc/master/gcc/cp/parser.c:12358:39: error: use of deleted function ‘gcc_rich_location::gcc_rich_location(gcc_rich_location&&)’
  gcc_rich_location richloc = tok->location;
   ^~~~
In file included from /src/gcc/master/gcc/cp/parser.c:44:
/src/gcc/master/gcc/gcc-rich-location.h:25:7: note: ‘gcc_rich_location::gcc_rich_location(gcc_rich_location&&)’ is implicitly deleted because the default definition would be ill-formed:
 class gcc_rich_location : public rich_location
   ^
/src/gcc/master/gcc/gcc-rich-location.h:25:7: error: use of deleted function ‘rich_location::rich_location(rich_location&)’
In file included from /src/gcc/master/gcc/input.h:24,
 from /src/gcc/master/gcc/coretypes.h:482,
 from /src/gcc/master/gcc/cp/parser.c:24:
/src/gcc/master/gcc/../libcpp/include/line-map.h:1664:7: note: ‘rich_location::rich_location(rich_location&)’ is implicitly deleted because the default definition would be ill-formed:
 class rich_location
   ^
/src/gcc/master/gcc/../libcpp/include/line-map.h:1664:7: error: use of deleted function ‘semi_embedded_vec::semi_embedded_vec(semi_embedded_vec&) [with T = location_range; int NUM_EMBEDDED = 3]’
/src/gcc/master/gcc/../libcpp/include/line-map.h:1379:3: note: declared here
   semi_embedded_vec (semi_embedded_vec &) = delete;
   ^
/src/gcc/master/gcc/../libcpp/include/line-map.h:1664:7: error: use of deleted function ‘semi_embedded_vec::semi_embedded_vec(semi_embedded_vec&) [with T = fixit_hint*; int NUM_EMBEDDED = 2]’
 class rich_location
   ^
/src/gcc/master/gcc/../libcpp/include/line-map.h:1379:3: note: declared here
   semi_embedded_vec (semi_embedded_vec &) = delete;
   ^
In file included from /src/gcc/master/gcc/cp/parser.c:44:
/src/gcc/master/gcc/gcc-rich-location.h:31:3: note:   after user-defined conversion: ‘gcc_rich_location::gcc_rich_location(location_t, const range_label*)’
   gcc_rich_location (location_t loc, const range_label *label = NULL)
   ^
/src/gcc/master/gcc/cp/parser.c:12386:46: error: use of deleted function ‘gcc_rich_location::gcc_rich_location(gcc_rich_location&&)’
   gcc_rich_location else_richloc = else_tok->location;
  ^~~~
In file included from /src/gcc/master/gcc/cp/parser.c:44:
/src/gcc/master/gcc/gcc-rich-location.h:31:3: note:   after user-defined conversion: ‘gcc_rich_location::gcc_rich_location(location_t, const range_label*)’
   gcc_rich_location (location_t loc, const range_label *label = NULL)
   ^


diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 7d964172469..5bedb5708ca 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -1376,6 +1376,9 @@ class semi_embedded_vec
   semi_embedded_vec ();
   ~semi_embedded_vec ();

Re: [PATCH] tree-sra: Do not refresh readonly decls (PR 100453)

2021-06-16 Thread Martin Sebor via Gcc-patches


On 6/16/21 5:48 AM, Jakub Jelinek wrote:

On Tue, Jun 15, 2021 at 06:11:27PM +0200, Richard Biener wrote:

--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -915,6 +915,12 @@ create_access (tree expr, gimple *stmt, bool
write)
if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID
(base)))
 return NULL;

+  if (write && TREE_READONLY (base))
+{
+  disqualify_candidate (base, "Encountered a store to a read-only
decl.");


Wouldn't this be a useful point to also emit some warning (with
some TREE_NO_WARNING prevention) that some particular statement modifies
a const decl?
I guess it can be warned elsewhere though.
As testcases one could use 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100994#c4
and #c5.  Though would be nice if we diagnose that even without those -fno-*
options.


I didn't finish my patch to diagnose these bugs in time for GCC 11
but I'm hoping to finish and submit it for GCC 12.  (It's being
tracked in PR 90404).  My approach is the same as for similar
warnings such as -Wfree-nonheap-object or -Wstringop-overflow.
It depends on no particular optimization options (with all
the expected consequences at -O0).

Martin




Jakub

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-16 Thread Qing Zhao via Gcc-patches

Hi, Richard,

On Jun 16, 2021, at 1:19 AM, Richard Biener 
mailto:rguent...@suse.de>> wrote:

+/* Expand the IFN_DEFERRED_INIT function according to its second
argument.  */
+static void
+expand_DEFERRED_INIT (internal_fn, gcall *stmt)
+{
+  tree var = gimple_call_lhs (stmt);
+  tree init = NULL_TREE;
+  enum auto_init_type init_type
+= (enum auto_init_type) TREE_INT_CST_LOW (gimple_call_arg (stmt, 1));
+
+  switch (init_type)
+{
+default:
+  gcc_unreachable ();
+case AUTO_INIT_PATTERN:
+  init = build_pattern_cst_for_auto_init (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+case AUTO_INIT_ZERO:
+  init = build_zero_cst (TREE_TYPE (var));
+  expand_assignment (var, init, false);
+  break;
+}

I think actually building build_pattern_cst_for_auto_init can generate
massive garbage and for big auto vars code size is also a concern and
ideally on x86 you'd produce rep movq.  So I don't think going
via expand_assignment is good.  Instead you possibly want to lower
.DEFERRED_INIT to MEMs following expand_builtin_memset and
eventually enhance that to allow storing pieces larger than a byte.

Due to “BOOLEAN_TYPE” and “POINTER_TYPE”, we cannot always have a
repeated byte-pattern for variables that include BOOLEAN_TYPE Or Pointer
types. Therefore, lowering the .DEFERRED_INIT for “PATTERN”
initialization through “memset” is not always possible.

Let me know if I miss anything in the above. Do you have other suggestions?

The main point is that you need to avoid building the explicit initializer
only to have it consumed by assignment expansion.  If you want to keep
all the singing and dancing (as opposed to maybe initializing with a
0x1 byte pattern) then I think for efficiency you still want to
block-initialize the variable and then only fixup the special fields.

Yes, this is a good idea.

We can memset the whole structure with repeated pattern “0xAA” first,
Then mixup BOOLEAN_TYPE and POINTER TYPE for 32-bit platform.
That might be more efficient.

However, after more consideration, I feel that this might be a more
general optimization for “store_constructor” itself:

I.e,  if the “constructor” includes repeated byte value “0xAA” or any other 
value over a certain threshold,
i.e, 70% of the total size, then we might need to use a call to memset first, 
and then emit some additional single
field stores  to fix up the fields that have different initialization values?

Just like the current handling of “zeroes” in the current “store_constructor”, 
if “zeroes” occupy most of the constructor, then
“Clear the whole structure” first, then emit additional single field stories to 
fix up other fields that do not hold zeros.

So, I think that it might be better to keep the current
“expand_assignment” for “Pattern initialization” as it is in this patch.

And then, later we can add a separate patch to add this more general
optimization in “store_constructor” to improve the run time performance
and code size in general?

What’s your opinion on this?

My point is that _building_ the constructor is what we want to avoid
since that involves a lot of overhead memory-wise, it also requires
yet another complex structure field walk with much room for errors.

Block-initializing the object is so much easier and more efficient.
Implementing block initialization with a block size different from
a single byte should be also reasonably possible.  I mean there's
wmemset (not in GCC), so such block initialization would have other
uses as well.

If the pattern of the value that is used to initialize is repeatable, then
Block-initializing is ideal. However, Since the patterns of the values that
are used to initialize might not be completely repeatable due to BOOLEAN (0),
POINTER_TYPE at 32-bit platform (0x00AA) and FLOATING TYPE (NaN),
After block initializing of the whole object, we still need to add additional 
fix up
stores of these different patterns to the corresponding fields.

But that's a bug with the pattern used then.  You can never be sure that
an object is used only as its declared type but you are initializing it
as if it were.  Also all uninit uses invoke undefined behavior so I don't
see why you need to pay special attention here.  After all this makes
pattern init so much more fragile than zero-init which makes me question
it even more ...

Yes, you are right.  The major reason for the complexity of the code to handle 
pattern initialization
is because multiple different patterns are assigned to different types.

This is for the compatibility with CLANG. -:). (https://reviews.llvm.org/D54604)

For reference, I copied the part for pattern initialization from CLANG’s patch 
below:


1. Pattern initialization

  This is the recommended initialization approach. Pattern initialization's
  goal is to initialize automatic variables with values which will likely
  transform logic bugs into crashes down the line, are easily recognizable in
  a crash dump, wit

Re: [PATCH] make rich_location safe to copy

2021-06-16 Thread David Malcolm via Gcc-patches

On Wed, 2021-06-16 at 08:52 -0600, Martin Sebor wrote:
> On 6/16/21 6:38 AM, David Malcolm wrote:
> > On Tue, 2021-06-15 at 19:48 -0600, Martin Sebor wrote:
> > 
> > Thanks for writing the patch.
> > 
> > > While debugging locations I noticed the semi_embedded_vec template
> > > in line-map.h doesn't declare a copy ctor or copy assignment, but
> > > is being copied in a couple of places in the C++ parser (via
> > > gcc_rich_location).  It gets away with it most likely because it
> > > never grows beyond the embedded buffer.
> > 
> > Where are these places?  I wasn't aware of this.
> 
> They're in the attached file along with the diff to reproduce
> the errors.

Thanks.

Looks like:

   gcc_rich_location richloc = tok->location;

is implicitly constructing a gcc_rich_location, then copying it to
richloc.  This should instead be simply:

   gcc_rich_location richloc (tok->location);

which directly constructs the richloc in place, as I understand it.

Dave

> 
> I was seeing strange behavior in my tests that led me to rich_location
> and the m_ranges member.  The problem turned out to be unrelated but
> before I figured it out I noticed the missing copy ctor and deleted
> it to see if it was being used.  Since that's such a pervasive bug
> in GCC code (and likely elsewhere as well) I'm thinking I should take
> the time to develop the warning I've been thinking about to detect it.
> 
> 
> > > The attached patch defines the copy ctor and also copy assignment
> > > and adds the corresponding move functions.
> > 
> > Note that rich_location::m_fixit_hints "owns" the fixit_hint
> > instances,
> > manually deleting them in rich_location's dtor, so simply doing a
> > shallow copy of it would be wrong.
> > 
> > Also, a rich_location stores other pointers (to range_labels and
> > diagnostic_path), which are borrowed pointers, where their lifetime
> > is
> > assumed to outlive any (non-dtor) calls to the rich_location.  So I'm
> > nervous about code that copies rich_location instances.
> > 
> > I think I'd prefer to forbid copying them; what's the use-case for
> > copying them?  Am I missing something here?
> 
> I noticed and fixed just the one problem I uncovered by accident with
> the missing copy ctor.  If there are others I don't know about them.
> Preventing code from copying rich_location might make sense
> independently of fixing the vec class to be safely copyable.
> 
> Martin
> 
> > 
> > > 
> > > Tested on x86_64-linux.
> > > 
> > > Martin
> > 
> > Thanks
> > Dave
> > 
>

PING: [Patch, fortran] PR fortran/96724 - Bogus warnings with the repeat intrinsic and the flag -Wconversion-extra

2021-06-16 Thread José Rui Faustino de Sousa via Gcc-patches


*PING*


 Forwarded Message 
Subject: [Patch, fortran] PR fortran/96724 - Bogus warnings with the 
repeat intrinsic and the flag -Wconversion-extra

Date: Thu, 20 Aug 2020 16:52:10 +
From: José Rui Faustino de Sousa 
To: fort...@gcc.gnu.org, gcc-patches@gcc.gnu.org

Hi all!

Proposed patch to PR96724 - Bogus warnings with the repeat intrinsic and 
the flag -Wconversion-extra.


Patch tested only on x86_64-pc-linux-gnu.

Add code to force conversion to the default wider integer type before 
multiplication.


Thank you very much.

Best regards,
José Rui


2020-8-20  José Rui Faustino de Sousa  

  PR fortran/96724
  * iresolve.c (gfc_resolve_repeat): Force conversion to
  gfc_index_integer_kind before the call to gfc_multiply.

2020-8-20  José Rui Faustino de Sousa  

  PR fortran/96724
  * repeat_8.f90.f90: New test.

diff --git a/gcc/fortran/iresolve.c b/gcc/fortran/iresolve.c
index 7376961..74075a7 100644
--- a/gcc/fortran/iresolve.c
+++ b/gcc/fortran/iresolve.c
@@ -2332,7 +2332,22 @@ gfc_resolve_repeat (gfc_expr *f, gfc_expr *string,
 }
 
   if (tmp)
-f->ts.u.cl->length = gfc_multiply (tmp, gfc_copy_expr (ncopies));
+{
+  gfc_expr *e = gfc_copy_expr (ncopies);
+
+  /* Force-convert to index_kind so that we don't need
+	 so many runtime variations.  */
+  if (e->ts.kind != gfc_index_integer_kind)
+	{
+	  gfc_typespec ts = e->ts;
+
+	  ts.kind = gfc_index_integer_kind;
+	  gfc_convert_type_warn (e, &ts, 2, 0);
+	}
+  if (tmp->ts.kind != gfc_index_integer_kind)
+	gfc_convert_type_warn (tmp, &e->ts, 2, 0);
+  f->ts.u.cl->length = gfc_multiply (tmp, e);
+}
 }
 
 
diff --git a/gcc/testsuite/gfortran.dg/repeat_8.f90 b/gcc/testsuite/gfortran.dg/repeat_8.f90
new file mode 100644
index 000..6876af9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/repeat_8.f90
@@ -0,0 +1,88 @@
+! { dg-do compile }
+! { dg-additional-options "-Wconversion-extra" }
+!
+! Test fix for PR96724
+!
+
+program repeat_p
+
+  use, intrinsic :: iso_fortran_env, only: &
+int8, int16, int32, int64
+  
+  implicit none
+
+  integer, parameter :: n = 20
+
+  integer(kind=int8),  parameter :: p08 = int(n, kind=int8)
+  integer(kind=int16), parameter :: p16 = int(n, kind=int16)
+  integer(kind=int16), parameter :: p32 = int(n, kind=int32)
+  integer(kind=int16), parameter :: p64 = int(n, kind=int64)
+  
+  integer(kind=int8)  :: i08
+  integer(kind=int16) :: i16
+  integer(kind=int32) :: i32
+  integer(kind=int64) :: i64
+  
+  character(len=n) :: c
+
+  i08 = p08
+  c = repeat('X', 20_int8)
+  c = repeat('X', i08)
+  c = repeat('X', p08)
+  c = repeat('X', len08(c))
+  i16 = p16
+  c = repeat('X', 20_int16)
+  c = repeat('X', i16)
+  c = repeat('X', p16)
+  c = repeat('X', len16(c))
+  i32 = p32
+  c = repeat('X', 20_int32)
+  c = repeat('X', i32)
+  c = repeat('X', p32)
+  c = repeat('X', len32(c))
+  i64 = p64
+  c = repeat('X', 20_int64)
+  c = repeat('X', i64)
+  c = repeat('X', p64)
+  c = repeat('X', len64(c))
+  stop
+
+contains
+
+  function len08(x) result(l)
+character(len=*), intent(in) :: x
+
+integer(kind=int8) :: l
+
+l = int(len(x), kind=int8)
+return
+  end function len08
+  
+  function len16(x) result(l)
+character(len=*), intent(in) :: x
+
+integer(kind=int16) :: l
+
+l = int(len(x), kind=int16)
+return
+  end function len16
+  
+  function len32(x) result(l)
+character(len=*), intent(in) :: x
+
+integer(kind=int32) :: l
+
+l = int(len(x), kind=int32)
+return
+  end function len32
+  
+  function len64(x) result(l)
+character(len=*), intent(in) :: x
+
+integer(kind=int64) :: l
+
+l = int(len(x), kind=int64)
+return
+  end function len64
+  
+end program repeat_p

PING: [Patch, fortran] PR fortran/96870 - Class name on error message

2021-06-16 Thread José Rui Faustino de Sousa via Gcc-patches


*PING*


 Forwarded Message 
Subject: [Patch, fortran] PR fortran/96870 - Class name on error message
Date: Mon, 31 Aug 2020 16:09:32 +
From: José Rui Faustino de Sousa 
To: fort...@gcc.gnu.org, gcc-patches@gcc.gnu.org

Hi all!

Proposed patch to PR96870 - Class name on error message.

Patch tested only on x86_64-pc-linux-gnu.

Make the error message more intelligible for the average user.

Thank you very much.

Best regards,
José Rui


2020-8-21  José Rui Faustino de Sousa  

gcc/fortran/ChangeLog:

PR fortran/96870
* misc.c (gfc_typename): use class name instead of internal name
on error message.

gcc/testsuite/ChangeLog:

PR fortran/96870
* gfortran.dg/PR96870.f90: New test.



diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c
index 65bcfa6..43edfd8 100644
--- a/gcc/fortran/misc.c
+++ b/gcc/fortran/misc.c
@@ -184,8 +184,11 @@ gfc_typename (gfc_typespec *ts, bool for_hash)
 	  break;
 	}
   ts1 = ts->u.derived->components ? &ts->u.derived->components->ts : NULL;
-  if (ts1 && ts1->u.derived && ts1->u.derived->attr.unlimited_polymorphic)
-	sprintf (buffer, "CLASS(*)");
+  if (ts1 && ts1->u.derived)
+	if (ts1->u.derived->attr.unlimited_polymorphic)
+	  sprintf (buffer, "CLASS(*)");
+	else
+	  sprintf (buffer, "CLASS(%s)", ts1->u.derived->name);
   else
 	sprintf (buffer, "CLASS(%s)", ts->u.derived->name);
   break;
diff --git a/gcc/testsuite/gfortran.dg/PR96870.f90 b/gcc/testsuite/gfortran.dg/PR96870.f90
new file mode 100644
index 000..c1b321e
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR96870.f90
@@ -0,0 +1,41 @@
+! { dg-do compile }
+!
+! Test fix for PR96870
+!
+
+Program main_p
+
+  implicit none
+  
+  Type :: t0
+  End Type t0
+  
+  Type, extends(t0) :: t1
+  End Type t1
+  
+  type(t0),   target :: x
+  class(t0), pointer :: p
+
+  p => x
+  Call sub_1(x) ! { dg-error "Type mismatch in argument .p. at .1.; passed TYPE\\(t0\\) to CLASS\\(t1\\)" }
+  Call sub_1(p) ! { dg-error "Type mismatch in argument .p. at .1.; passed CLASS\\(t0\\) to CLASS\\(t1\\)" }
+  Call sub_2(x) ! { dg-error "Type mismatch in argument .p. at .1.; passed TYPE\\(t0\\) to TYPE\\(t1\\)" }
+  Call sub_2(p) ! { dg-error "Type mismatch in argument .p. at .1.; passed CLASS\\(t0\\) to TYPE\\(t1\\)" }
+  stop
+  
+Contains
+  
+  Subroutine sub_1(p)
+class(t1), Intent(In) :: p
+
+return
+  End Subroutine sub_1
+  
+  Subroutine sub_2(p)
+type(t1), Intent(In) :: p
+
+return
+  End Subroutine sub_2
+  
+End Program main_p
+

[pushed] libcpp: location comparison within macro [PR100796]

2021-06-16 Thread Jason Merrill via Gcc-patches

The patch for 96391 changed linemap_compare_locations to give up on
comparing locations from macro expansions if we don't have column
information.  But in this testcase, the BOILERPLATE macro is multiple lines
long, so we do want to compare locations within the macro.  So this patch
moves the LINE_MAP_MAX_LOCATION_WITH_COLS check inside the block, to use it
for failing gracefully.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/100796
PR preprocessor/96391

libcpp/ChangeLog:

* line-map.c (linemap_compare_locations): Only use comparison with
LINE_MAP_MAX_LOCATION_WITH_COLS to avoid abort.

gcc/testsuite/ChangeLog:

* g++.dg/plugin/location-overflow-test-pr100796.c: New test.
* g++.dg/plugin/plugin.exp: Run it.
---
 .../plugin/location-overflow-test-pr100796.c  | 25 +++
 libcpp/line-map.c | 20 ---
 gcc/testsuite/g++.dg/plugin/plugin.exp|  3 ++-
 3 files changed, 38 insertions(+), 10 deletions(-)
 create mode 100644 
gcc/testsuite/g++.dg/plugin/location-overflow-test-pr100796.c

diff --git a/gcc/testsuite/g++.dg/plugin/location-overflow-test-pr100796.c 
b/gcc/testsuite/g++.dg/plugin/location-overflow-test-pr100796.c
new file mode 100644
index 000..7fa964c07e5
--- /dev/null
+++ b/gcc/testsuite/g++.dg/plugin/location-overflow-test-pr100796.c
@@ -0,0 +1,25 @@
+// PR c++/100796
+// { dg-additional-options "-Wsuggest-override 
-fplugin-arg-location_overflow_plugin-value=0x6001" }
+// Passing LINE_MAP_MAX_LOCATION_WITH_COLS meant we stopped distinguishing 
between lines in a macro.
+
+#define DO_PRAGMA(text)   _Pragma(#text)
+#define WARNING_PUSH  DO_PRAGMA(GCC diagnostic push)
+#define WARNING_POP   DO_PRAGMA(GCC diagnostic pop)
+#define WARNING_DISABLE(text) DO_PRAGMA(GCC diagnostic ignored text)
+#define NO_OVERRIDE_WARNING   WARNING_DISABLE("-Wsuggest-override")
+
+#define BOILERPLATE\
+  WARNING_PUSH \
+  NO_OVERRIDE_WARNING  \
+  void f();\
+  WARNING_POP
+
+struct B
+{
+  virtual void f();
+};
+
+struct D: B
+{
+  BOILERPLATE
+};
diff --git a/libcpp/line-map.c b/libcpp/line-map.c
index a03d6760a8e..1a6902acdb7 100644
--- a/libcpp/line-map.c
+++ b/libcpp/line-map.c
@@ -1421,23 +1421,25 @@ linemap_compare_locations (line_maps *set,
 
   if (l0 == l1
   && pre_virtual_p
-  && post_virtual_p
-  && l0 <= LINE_MAP_MAX_LOCATION_WITH_COLS)
+  && post_virtual_p)
 {
   /* So pre and post represent two tokens that are present in a
 same macro expansion.  Let's see if the token for pre was
 before the token for post in that expansion.  */
-  unsigned i0, i1;
   const struct line_map *map =
first_map_in_common (set, pre, post, &l0, &l1);
 
   if (map == NULL)
-   /* This should not be possible.  */
-   abort ();
-
-  i0 = l0 - MAP_START_LOCATION (map);
-  i1 = l1 - MAP_START_LOCATION (map);
-  return i1 - i0;
+   /* This should not be possible while we have column information, but if
+  we don't, the tokens could be from separate macro expansions on the
+  same line.  */
+   gcc_assert (l0 > LINE_MAP_MAX_LOCATION_WITH_COLS);
+  else
+   {
+ unsigned i0 = l0 - MAP_START_LOCATION (map);
+ unsigned i1 = l1 - MAP_START_LOCATION (map);
+ return i1 - i0;
+   }
 }
 
   if (IS_ADHOC_LOC (l0))
diff --git a/gcc/testsuite/g++.dg/plugin/plugin.exp 
b/gcc/testsuite/g++.dg/plugin/plugin.exp
index 5cd4b4bff90..74e12df207c 100644
--- a/gcc/testsuite/g++.dg/plugin/plugin.exp
+++ b/gcc/testsuite/g++.dg/plugin/plugin.exp
@@ -73,7 +73,8 @@ set plugin_test_list [list \
  ../../gcc.dg/plugin/diagnostic-test-string-literals-3.c \
  ../../gcc.dg/plugin/diagnostic-test-string-literals-4.c } \
 { ../../gcc.dg/plugin/location_overflow_plugin.c \
- location-overflow-test-pr96391.c } \
+ location-overflow-test-pr96391.c \
+  location-overflow-test-pr100796.c } \
 { show_template_tree_color_plugin.c \
  show-template-tree-color.C \
  show-template-tree-color-labels.C \

base-commit: 3155d51bfd1de8b6c4645dcb2292248a8d7cc3c9
-- 
2.27.0

Re: [PATCH] tree-sra: Do not refresh readonly decls (PR 100453)

2021-06-16 Thread Martin Jambor

Hi,

On Wed, Jun 16 2021, Jakub Jelinek wrote:
> On Tue, Jun 15, 2021 at 06:11:27PM +0200, Richard Biener wrote:
>> >--- a/gcc/tree-sra.c
>> >+++ b/gcc/tree-sra.c
>> >@@ -915,6 +915,12 @@ create_access (tree expr, gimple *stmt, bool
>> >write)
>> >if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID
>> >(base)))
>> > return NULL;
>> > 
>> >+  if (write && TREE_READONLY (base))
>> >+{
>> >+  disqualify_candidate (base, "Encountered a store to a read-only
>> >decl.");
>
> Wouldn't this be a useful point to also emit some warning (with
> some TREE_NO_WARNING prevention) that some particular statement modifies
> a const decl?
> I guess it can be warned elsewhere though.

I would prefer it to be elsewhere.

> As testcases one could use 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100994#c4
> and #c5.  Though would be nice if we diagnose that even without those -fno-*
> options.
>

My holy grail would be to actually add a condition to the gimple
verifier that TREE_READONLY decl is never on a LHS of an assignment or a
call (or generally not seen from the store callback of
walk_stmt_load_store_addr_ops).

Of course, in order to do that, either the gimplifier or the pass that
propagated a TREE_READONLY decl into a store would have to drop the flag
and at that point could emit a warning.

But seeing things like the addition in a0d371a2514 I am afraid that
getting there might be difficult.

Martin

Re: [PATCH 5/6] make get_domminated_by_region return a auto_vec

2021-06-16 Thread Martin Sebor via Gcc-patches


On 6/16/21 6:46 AM, Richard Sandiford via Gcc-patches wrote:

Richard Biener via Gcc-patches  writes:

On Tue, Jun 15, 2021 at 8:02 AM Trevor Saunders  wrote:


This makes it clear the caller owns the vector, and ensures it is cleaned up.

Signed-off-by: Trevor Saunders 

bootstrapped and regtested on x86_64-linux-gnu, ok?


OK.

Btw, are "standard API" returns places we can use 'auto'?  That would avoid
excessive indent for

-  dom_bbs = get_dominated_by_region (CDI_DOMINATORS,
-bbs.address (),
-bbs.length ());
+  auto_vec dom_bbs = get_dominated_by_region (CDI_DOMINATORS,
+  bbs.address (),
+  bbs.length ());

and just uses

   auto dom_bbs = get_dominated_by_region (...

Not asking you to do this, just a question for the audience.


Personally I think this would be surprising for something that doesn't
have copy semantics.  (Not that I'm trying to reopen that debate here :-)
FWIW, I agree not having copy semantics is probably the most practical
way forward for now.)


But you did open the door for me to reiterate my strong disagreement
with that.  The best C++ practice going back to the early 1990's is
to make types safely copyable and assignable.  It is the default for
all types, in both C++ and C, and so natural and expected.

Preventing copying is appropriate in special and rare circumstances
(e.g, a mutex may not be copyable, or a file or iostream object may
not be because they represent a unique physical resource.)

In the absence of such special circumstances preventing copying is
unexpected, and in the case of an essential building block such as
a container, makes the type difficult to use.

The only argument for disabling copying that has been given is
that it could be surprising(*).  But because all types are copyable
by default the "surprise" is usually when one can't be.

I think Richi's "surprising" has to do with the fact that it lets
one inadvertently copy a large amount of data, thus leading to
an inefficiency.  But by analogy, there are infinitely many ways
to end up with inefficient code (e.g., deep recursion, or heap
allocation in a loop), and they are not a reason to ban the coding
constructs that might lead to it.

IIUC, Jason's comment about surprising effects was about implicit
conversion from auto_vec to vec.  I share that concern, and agree
that it should be addressed by preventing the conversion (as Jason
suggested).

Martin



Thanks,
Richard


Thanks,
Richard.


gcc/ChangeLog:

 * dominance.c (get_dominated_by_region): Return auto_vec.
 * dominance.h (get_dominated_by_region): Likewise.
 * tree-cfg.c (gimple_duplicate_sese_region): Adjust.
 (gimple_duplicate_sese_tail): Likewise.
 (move_sese_region_to_fn): Likewise.
---
  gcc/dominance.c |  4 ++--
  gcc/dominance.h |  2 +-
  gcc/tree-cfg.c  | 18 +++---
  3 files changed, 10 insertions(+), 14 deletions(-)

diff --git a/gcc/dominance.c b/gcc/dominance.c
index 0e464cb7282..4943102ff1d 100644
--- a/gcc/dominance.c
+++ b/gcc/dominance.c
@@ -906,13 +906,13 @@ get_dominated_by (enum cdi_direction dir, basic_block bb)
 direction DIR) by some block between N_REGION ones stored in REGION,
 except for blocks in the REGION itself.  */

-vec
+auto_vec
  get_dominated_by_region (enum cdi_direction dir, basic_block *region,
  unsigned n_region)
  {
unsigned i;
basic_block dom;
-  vec doms = vNULL;
+  auto_vec doms;

for (i = 0; i < n_region; i++)
  region[i]->flags |= BB_DUPLICATED;
diff --git a/gcc/dominance.h b/gcc/dominance.h
index 515a369aacf..c74ad297c6a 100644
--- a/gcc/dominance.h
+++ b/gcc/dominance.h
@@ -47,7 +47,7 @@ extern basic_block get_immediate_dominator (enum 
cdi_direction, basic_block);
  extern void set_immediate_dominator (enum cdi_direction, basic_block,
  basic_block);
  extern auto_vec get_dominated_by (enum cdi_direction, 
basic_block);
-extern vec get_dominated_by_region (enum cdi_direction,
+extern auto_vec get_dominated_by_region (enum cdi_direction,
  basic_block *,
  unsigned);
  extern vec get_dominated_to_depth (enum cdi_direction,
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index 6bdd1a561fd..c9403deed19 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -6495,7 +6495,6 @@ gimple_duplicate_sese_region (edge entry, edge exit,
bool free_region_copy = false, copying_header = false;
class loop *loop = entry->dest->loop_father;
edge exit_copy;
-  vec doms = vNULL;
edge redirected;
profile_count total_count = profile_count::uninitialized ();
profile_count entry_count = profile_count::uninitialized ();
@@ -6549,9 +6548,9 @@ gimple_duplicate_sese_region (edge entry, e

Re: [PATCH] gcc/configure.ac: fix register issue for global_load assembler functions

2021-06-16 Thread Julian Brown

At the risk of overstepping my GCN backend review remit...

On Wed, 16 Jun 2021 11:34:53 +0200
Marcel Vollweiler  wrote:

> index d9fc3c2..e179ce1 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -5357,6 +5357,30 @@ case "$target" in
>  ;;
>  esac
>  
> +# This tests if the assembler supports two registers for global_load
> functions +# (like in LLVM versions <12) or one register (like in
> LLVM 12). +case "$target" in
> +  amdgcn-* | gcn-*)
> +AC_MSG_CHECKING(assembler fix for global_load functions)
> +gcc_cv_as_gcn_global_load_fixed=yes
> +if test x$gcc_cv_as != x; then
> +  cat > conftest.s < + global_store_dwordx2v[[1:2]], v[[4:5]], s[[14:15]]
> +EOF
> +  if $gcc_cv_as -triple=amdgcn--amdhsa -filetype=obj
> -mcpu=gfx900 -o conftest.o conftest.s > /dev/null 2>&1; then
> +gcc_cv_as_gcn_global_load_fixed=no
> +  fi
> +  rm -f conftest.s conftest.o conftest
> +fi
> +if test x$gcc_cv_as_gcn_global_load_fixed = xyes; then
> +  AC_DEFINE(HAVE_GCN_ASM_GLOBAL_LOAD_FIXED, 1, [Define if your
> assembler has fixed global_load functions.])
> +else
> +  AC_DEFINE(HAVE_GCN_ASM_GLOBAL_LOAD_FIXED, 0, [Define if your
> assembler has fixed global_load functions.])
> +fi
> +AC_MSG_RESULT($gcc_cv_as_gcn_global_load_fixed)
> +;;
> +esac

I think the more-common idiom seems to be just having a single
AC_DEFINE if the feature is present -- like (as a random example)
HAVE_AS_IX86_REP_LOCK_PREFIX, which omits the "define ... 0" case you
have here. (You'd use "#ifdef ..." instead of "#if ... == 1" to check
the feature then, of course).

Then OK with that change (as long as a global maintainer doesn't object
in, say, the next 24 hours?) -- but please watch the mailing list for
configuration problems that might spring up on other targets.

Thanks,

Julian

Re: [PATCH] tree-optimization PR/101014 - Limit new value calculations to first order effects.

2021-06-16 Thread Andrew MacLeod via Gcc-patches


On 6/16/21 5:41 AM, Maxim Kuvyrkov wrote:

On 15 Jun 2021, at 00:07, Andrew MacLeod via Gcc-patches 
 wrote:

As mentioned in the Text from the PR:

"When a range is being calculated for an ssa-name, the propagation process often goes 
along back edges. These back edges sometime require other ssa-names which have not be 
processed yet. These are flagged as "poor values" and when propagation is done, we 
visit the list of poor values, calculate them, and see if that may result if a better range 
for the original ssa-name.

The problem is that calculating these poor values may also spawn another set of 
requests since the block at the far end of the back edge has not been processed 
yet... its highly likely that some additional unprocessed ssa-names are used in 
the calculation of that name, but typically they do not affect the current 
range in a significant way.

Thus we mostly we care about the first order effect only.  It turns out to be 
very rare that a 2nd order effect on a back edge affects anything that we don't 
catch later.

This patch turns off poor-value tagging when looking up the first order values, 
thus avoiding the 2nd order and beyond cascading effects.

I haven't found a test case we miss yet because of this change, yet it probably 
resolves a number of the outstanding compilation problems in a significant way.

I think this will probably apply to gcc 11 in some form as well, so I'll look at an 
equivalent patch for there."


This patch simplifies the enable_new_value routines.. replacing the 
enable/disable with an enable with flag routine, which returns the previous 
value.This lets us change the mode and then set it back to what it was before.  
Seems better in general.

Then disables new values for 2nd+ order effects. GCC11 patch forthcoming.

Bootstraps on x86_64-pc-linux-gnu, no regressions.  pushed.

Andrew

Hi Andrew,

This causes bootstrap-with-ubsan failure on at least aarch64-linux-gnu, likely, 
others:

# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 48, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 48, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 32, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 48, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 32, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 48, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 32, which is not a valid value for type 'bool'
# 00:42:32 
/home/tcwg-buildslave/workspace/tcwg_gnu_0/abe/snapshots/gcc.git~master/gcc/gimple-range-cache.cc:757:8:
 runtime error: load of value 32, which is not a valid value for type 'bool'


@@ -748,21 +748,15 @@ ranger_cache::dump (FILE *f)
fprintf (f, "\n");
  }
  
-// Allow the cache to flag and query new values when propagation is forced

-// to use an unknown value.
+// Allow or disallow the cache to flag and query new values when propagation
+// is forced to use an unknown value.  The previous state is returned.
  
-void

-ranger_cache::enable_new_values ()
-{
-  m_new_value_p = true;
-}
-
-// Disable new value querying.
-
-void
-ranger_cache::disable_new_values ()
+bool
+ranger_cache::enable_new_values (bool state)
  {
-  m_new_value_p = false;
+  bool ret = m_new_value_p;

I think changing this to
   bool ret = (bool) m_new_value_p;
might be enough, but you know this code better.

Would you please take a look at this?


+  m_new_value_p = state;
+  return ret;
  }
  
  // Dump the caches for basic block BB to file F.

Thanks,


Huh, not sure why that would matter since m_new_value_p is a bool.

My guess is (and this bugged me after I checked it in, just haven't 
gotten to it yet), is that this is initialized in the constructor with a 
call, and the return value ignored.  Which means there is techincally a 
load of an uninitialized value, which is then ignored.  but the load may 
happen.  Im going to guess thats the issue.  It needs fixing anyway


Im testing this fix, which i will check in.  See if that solves the 
ubsan issue.




@@ -727,7 +727,7 @@ ranger_cache::ranger_cache (gimple_ranger &q) : 
query (q)

   if (bb)

Re: [PATCH] make rich_location safe to copy

2021-06-16 Thread Martin Sebor via Gcc-patches


On 6/16/21 9:06 AM, David Malcolm wrote:

On Wed, 2021-06-16 at 08:52 -0600, Martin Sebor wrote:

On 6/16/21 6:38 AM, David Malcolm wrote:

On Tue, 2021-06-15 at 19:48 -0600, Martin Sebor wrote:

Thanks for writing the patch.


While debugging locations I noticed the semi_embedded_vec template
in line-map.h doesn't declare a copy ctor or copy assignment, but
is being copied in a couple of places in the C++ parser (via
gcc_rich_location).  It gets away with it most likely because it
never grows beyond the embedded buffer.


Where are these places?  I wasn't aware of this.


They're in the attached file along with the diff to reproduce
the errors.


Thanks.

Looks like:

gcc_rich_location richloc = tok->location;

is implicitly constructing a gcc_rich_location, then copying it to
richloc.  This should instead be simply:

gcc_rich_location richloc (tok->location);

which directly constructs the richloc in place, as I understand it.


I see, tok->location is location_t here, and the gcc_rich_location
ctor that takes it is not declared explicit (that would prevent
the implicit conversion).

The attached patch solves the rich_location problem by a) making
the ctor explicit, b) disabling the rich_location copy ctor, c)
changing the parser to use direct initialization.  (I CC Jason
as a heads up on the C++ FE bits.)

The semi_embedded_vec should be fixed as well, regardless of this
particular use and patch.  Let me know if it's okay to commit (I'm
not open to disabling its copy ctor).

Martin



Dave



I was seeing strange behavior in my tests that led me to rich_location
and the m_ranges member.  The problem turned out to be unrelated but
before I figured it out I noticed the missing copy ctor and deleted
it to see if it was being used.  Since that's such a pervasive bug
in GCC code (and likely elsewhere as well) I'm thinking I should take
the time to develop the warning I've been thinking about to detect it.



The attached patch defines the copy ctor and also copy assignment
and adds the corresponding move functions.


Note that rich_location::m_fixit_hints "owns" the fixit_hint
instances,
manually deleting them in rich_location's dtor, so simply doing a
shallow copy of it would be wrong.

Also, a rich_location stores other pointers (to range_labels and
diagnostic_path), which are borrowed pointers, where their lifetime
is
assumed to outlive any (non-dtor) calls to the rich_location.  So I'm
nervous about code that copies rich_location instances.

I think I'd prefer to forbid copying them; what's the use-case for
copying them?  Am I missing something here?


I noticed and fixed just the one problem I uncovered by accident with
the missing copy ctor.  If there are others I don't know about them.
Preventing code from copying rich_location might make sense
independently of fixing the vec class to be safely copyable.

Martin





Tested on x86_64-linux.

Martin


Thanks
Dave








gcc/cp/ChangeLog:

	* parser.c (cp_parser_selection_statement): Use direct initialization
	instead of copy.

gcc/ChangeLog:

	* gcc-rich-location.h (gcc_rich_location): Make ctor explicit.

libcpp/ChangeLog:

	* include/line-map.h (class rich_location): Disable copying and
	assignment.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d57ddc4560d..848e4823258 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -12355,7 +12355,7 @@ cp_parser_selection_statement (cp_parser* parser, bool *if_p,
 	IF_STMT_CONSTEVAL_P (statement) = true;
 	condition = finish_if_stmt_cond (boolean_false_node, statement);
 
-	gcc_rich_location richloc = tok->location;
+	gcc_rich_location richloc (tok->location);
 	bool non_compound_stmt_p = false;
 	if (cp_lexer_next_token_is_not (parser->lexer, CPP_OPEN_BRACE))
 	  {
@@ -12383,7 +12383,7 @@ cp_parser_selection_statement (cp_parser* parser, bool *if_p,
 		RID_ELSE))
 	  {
 		cp_token *else_tok = cp_lexer_peek_token (parser->lexer);
-		gcc_rich_location else_richloc = else_tok->location;
+		gcc_rich_location else_richloc (else_tok->location);
 		guard_tinfo = get_token_indent_info (else_tok);
 		/* Consume the `else' keyword.  */
 		cp_lexer_consume_token (parser->lexer);
diff --git a/gcc/gcc-rich-location.h b/gcc/gcc-rich-location.h
index 00747631025..2a9e5db65d5 100644
--- a/gcc/gcc-rich-location.h
+++ b/gcc/gcc-rich-location.h
@@ -21,14 +21,16 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_RICH_LOCATION_H
 
 /* A gcc_rich_location is libcpp's rich_location with additional
-   helper methods for working with gcc's types.  */
+   helper methods for working with gcc's types.  The class is not
+   copyable or assignable because rich_location isn't. */
+
 class gcc_rich_location : public rich_location
 {
  public:
   /* Constructors.  */
 
   /* Constructing from a location.  */
-  gcc_rich_location (location_t loc, const range_label *label = NULL)
+  explicit gcc_rich_location (location_t loc, const range_label *label = NULL)
   : rich_locat

Re: [PATCH] make rich_location safe to copy

2021-06-16 Thread Jason Merrill via Gcc-patches


On 6/16/21 12:11 PM, Martin Sebor wrote:

On 6/16/21 9:06 AM, David Malcolm wrote:

On Wed, 2021-06-16 at 08:52 -0600, Martin Sebor wrote:

On 6/16/21 6:38 AM, David Malcolm wrote:

On Tue, 2021-06-15 at 19:48 -0600, Martin Sebor wrote:

Thanks for writing the patch.


While debugging locations I noticed the semi_embedded_vec template
in line-map.h doesn't declare a copy ctor or copy assignment, but
is being copied in a couple of places in the C++ parser (via
gcc_rich_location).  It gets away with it most likely because it
never grows beyond the embedded buffer.


Where are these places?  I wasn't aware of this.


They're in the attached file along with the diff to reproduce
the errors.


Thanks.

Looks like:

    gcc_rich_location richloc = tok->location;

is implicitly constructing a gcc_rich_location, then copying it to
richloc.  This should instead be simply:

    gcc_rich_location richloc (tok->location);

which directly constructs the richloc in place, as I understand it.


I see, tok->location is location_t here, and the gcc_rich_location
ctor that takes it is not declared explicit (that would prevent
the implicit conversion).

The attached patch solves the rich_location problem by a) making
the ctor explicit, b) disabling the rich_location copy ctor, c)
changing the parser to use direct initialization.  (I CC Jason
as a heads up on the C++ FE bits.)


The C++ bits are fine.


The semi_embedded_vec should be fixed as well, regardless of this
particular use and patch.  Let me know if it's okay to commit (I'm
not open to disabling its copy ctor).



+  /* Not copyable or assignable.  */


This comment needs a rationale.

Jason

[Patch, fortran V2] PR fortran/100097 PR fortran/100098 - [Unlimited] polymorphic pointers and allocatables have incorrect rank

2021-06-16 Thread José Rui Faustino de Sousa via Gcc-patches


Hi All!

Proposed patch to:

PR100097 - Unlimited polymorphic pointers and allocatables have 
incorrect rank

PR100098 - Polymorphic pointers and allocatables have incorrect rank

Patch tested only on x86_64-pc-linux-gnu.

Version 2 no longer re-initializes explicit initialized variables, which 
are taken care of elsewhere.


Pointers, and allocatables, must carry TKR information even when 
undefined. The patch adds code to initialize, for both pointers and 
allocatables, the class descriptor element size, rank and type as soon 
as possible to do so.


Thank you very much.

Best regards,
José Rui

Fortran: Add missing TKR initialization to class variables [PR100097, 
PR100098]


gcc/fortran/ChangeLog:

PR fortran/100097
PR fortran/100098
* trans-array.c (gfc_trans_class_array): new function to
initialize class descriptor's TKR information.
* trans-array.h (gfc_trans_class_array): add function prototype.
* trans-decl.c (gfc_trans_deferred_vars): add calls to the new
function for both pointers and allocatables.

gcc/testsuite/ChangeLog:

PR fortran/100097
* gfortran.dg/PR100097.f90: New test.

PR fortran/100098
* gfortran.dg/PR100098.f90: New test.
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index a6bcd2b..feec734 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -10786,6 +10786,57 @@ gfc_alloc_allocatable_for_assignment (gfc_loopinfo *loop,
 }
 
 
+/* Initialize class descriptor's TKR infomation.  */
+
+void
+gfc_trans_class_array (gfc_symbol * sym, gfc_wrapped_block * block)
+{
+  tree type, etype;
+  tree tmp;
+  tree descriptor;
+  stmtblock_t init;
+  locus loc;
+  int rank;
+
+  /* Make sure the frontend gets these right.  */
+  gcc_assert (sym->ts.type == BT_CLASS && CLASS_DATA (sym)
+	  && (CLASS_DATA (sym)->attr.class_pointer
+		  || CLASS_DATA (sym)->attr.allocatable));
+
+  gcc_assert (VAR_P (sym->backend_decl)
+	  || TREE_CODE (sym->backend_decl) == PARM_DECL);
+
+  if (sym->attr.dummy)
+return;
+
+  descriptor = gfc_class_data_get (sym->backend_decl);
+
+  /* Explicit initialization is done elsewhere.  */
+  if (sym->attr.save || TREE_STATIC (descriptor))
+return;
+  
+  type = TREE_TYPE (descriptor);
+
+  if (type == NULL || !GFC_DESCRIPTOR_TYPE_P (type))
+return;
+
+  gfc_save_backend_locus (&loc);
+  gfc_set_backend_locus (&sym->declared_at);
+  gfc_init_block (&init);
+
+  rank = CLASS_DATA (sym)->as ? (CLASS_DATA (sym)->as->rank) : (0);
+  gcc_assert (rank>=0);
+  tmp = gfc_conv_descriptor_dtype (descriptor);
+  etype = gfc_get_element_type (type);
+  tmp = fold_build2_loc (input_location, MODIFY_EXPR, TREE_TYPE (tmp), tmp,
+			 gfc_get_dtype_rank_type (rank, etype));
+  gfc_add_expr_to_block (&init, tmp);
+
+  gfc_add_init_cleanup (block, gfc_finish_block (&init), NULL_TREE);
+  gfc_restore_backend_locus (&loc);
+}
+
+
 /* NULLIFY an allocatable/pointer array on function entry, free it on exit.
Do likewise, recursively if necessary, with the allocatable components of
derived types.  This function is also called for assumed-rank arrays, which
diff --git a/gcc/fortran/trans-array.h b/gcc/fortran/trans-array.h
index e4d443d..d2768f1 100644
--- a/gcc/fortran/trans-array.h
+++ b/gcc/fortran/trans-array.h
@@ -67,6 +67,8 @@ tree gfc_check_pdt_dummy (gfc_symbol *, tree, int, gfc_actual_arglist *);
 
 tree gfc_alloc_allocatable_for_assignment (gfc_loopinfo*, gfc_expr*, gfc_expr*);
 
+/* Add initialization for class descriptors  */
+void gfc_trans_class_array (gfc_symbol *, gfc_wrapped_block *);
 /* Add initialization for deferred arrays.  */
 void gfc_trans_deferred_array (gfc_symbol *, gfc_wrapped_block *);
 /* Generate an initializer for a static pointer or allocatable array.  */
diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index 479ba6f..659e973 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -4943,7 +4943,7 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block)
   else if ((!sym->attr.dummy || sym->ts.deferred)
 		&& (sym->ts.type == BT_CLASS
 		&& CLASS_DATA (sym)->attr.class_pointer))
-	continue;
+	gfc_trans_class_array (sym, block);
   else if ((!sym->attr.dummy || sym->ts.deferred)
 		&& (sym->attr.allocatable
 		|| (sym->attr.pointer && sym->attr.result)
@@ -5027,6 +5027,10 @@ gfc_trans_deferred_vars (gfc_symbol * proc_sym, gfc_wrapped_block * block)
 		  tmp = NULL_TREE;
 		}
 
+	  /* Initialize descriptor's TKR information.  */
+	  if (sym->ts.type == BT_CLASS)
+		gfc_trans_class_array (sym, block);
+
 	  /* Deallocate when leaving the scope. Nullifying is not
 		 needed.  */
 	  if (!sym->attr.result && !sym->attr.dummy && !sym->attr.pointer
diff --git a/gcc/testsuite/gfortran.dg/PR100097.f90 b/gcc/testsuite/gfortran.dg/PR100097.f90
new file mode 100644
index 000..926eb6c
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/PR100097.f90
@@ -0,0 +1,41 @@
+! { dg-do run }
+!
+! Te

Re: [PATCH, V2] auto_vec copy/move improvements

2021-06-16 Thread Martin Sebor via Gcc-patches


On 6/16/21 4:13 AM, Richard Biener via Gcc-patches wrote:

On Wed, Jun 16, 2021 at 5:18 AM Trevor Saunders  wrote:


- Unfortunately using_auto_storage () needs to handle m_vec being null.
- Handle self move of an auto_vec to itself.
- Make sure auto_vec defines the classes move constructor and assignment
   operator, as well as ones taking vec, so the compiler does not generate
them for us.  Per https://en.cppreference.com/w/cpp/language/move_constructor
the ones taking vec do not count as the classes move constructor or
assignment operator, but we want them as well to assign a plain vec to a
auto_vec.
- Explicitly delete auto_vec's copy constructor and assignment operator.  This
   prevents unintentional expenssive coppies of the vector and makes it clear
when coppies are needed that that is what is intended.  When it is necessary to
copy a vector copy () can be used.

Signed-off-by: Trevor Saunders 

This time without the changes to the inline storage version of auto_vec as
requested.  bootstrap andregtest on x86_64-linux-gnu with the other patches in
the series ongoing, ok if that passes?


OK.


...

+
+  // You probably don't want to copy a vector, so these are deleted to prevent
+  // unintentional use.  If you really need a copy of the vectors contents you
+  // can use copy ().
+  auto_vec(const auto_vec &) = delete;
+  auto_vec &operator= (const auto_vec &) = delete;


To reiterate, I strongly disagree with this change as well as with
the comment.

Martin



  };


@@ -2147,7 +2176,7 @@ template
  inline bool
  vec::using_auto_storage () const
  {
-  return m_vec->m_vecpfx.m_using_auto_storage;
+  return m_vec ? m_vec->m_vecpfx.m_using_auto_storage : false;
  }

  /* Release VEC and call release of all element vectors.  */
--
2.20.1

[PATCH] Avoid loading an undefined value in the ranger_cache constructor.

2021-06-16 Thread Andrew MacLeod via Gcc-patches


On 6/16/21 5:41 AM, Maxim Kuvyrkov wrote:



+  m_new_value_p = state;
+  return ret;
  }
  
  // Dump the caches for basic block BB to file F.

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org


Let me know if the problem is resolved.

pushed as obvious.

Andrew


commit bdfc1207bd20cf1ad81fca121e4f7df4995cc0d6
Author: Andrew MacLeod 
Date:   Wed Jun 16 13:01:21 2021 -0400

Avoid loading an undefined value in the ranger_cache constructor.

Enable_new_values takes a boolean, returning the old value.  The constructor
for ranger_cache initialized the m_new_value_p field by calling this routine
and ignorng the result.  This potentially loads the old value uninitialized.

* gimple-range-cache.cc (ranger_cache::ranger_cache): Initialize
m_new_value_p directly.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index d9a57c294df..37e2acb19f9 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -727,7 +727,7 @@ ranger_cache::ranger_cache (gimple_ranger &q) : query (q)
   if (bb)
 	m_gori.exports (bb);
 }
-  enable_new_values (true);
+  m_new_value_p = true;
 }
 
 ranger_cache::~ranger_cache ()

Re: [PATCH] gcc/configure.ac: fix register issue for global_load assembler functions

2021-06-16 Thread Joseph Myers

On Wed, 16 Jun 2021, Julian Brown wrote:

> > +if test x$gcc_cv_as_gcn_global_load_fixed = xyes; then
> > +  AC_DEFINE(HAVE_GCN_ASM_GLOBAL_LOAD_FIXED, 1, [Define if your
> > assembler has fixed global_load functions.])
> > +else
> > +  AC_DEFINE(HAVE_GCN_ASM_GLOBAL_LOAD_FIXED, 0, [Define if your
> > assembler has fixed global_load functions.])
> > +fi
> > +AC_MSG_RESULT($gcc_cv_as_gcn_global_load_fixed)
> > +;;
> > +esac
> 
> I think the more-common idiom seems to be just having a single
> AC_DEFINE if the feature is present -- like (as a random example)
> HAVE_AS_IX86_REP_LOCK_PREFIX, which omits the "define ... 0" case you
> have here. (You'd use "#ifdef ..." instead of "#if ... == 1" to check
> the feature then, of course).

Actually I think what's preferable is the approach used with e.g. 
GATHER_STATISTICS - define to 0 or 1 using a single AC_DEFINE_UNQUOTED 
call (via a shell variable that's set to 0 or 1 as appropriate), then test 
in "if" conditions, not #if, as far as possible, so that both alternatives 
in the conditional code always get syntax-checked when compiling GCC (for 
this target).

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: [PATCH] make rich_location safe to copy

2021-06-16 Thread Martin Sebor via Gcc-patches


On 6/16/21 10:35 AM, Jason Merrill wrote:

On 6/16/21 12:11 PM, Martin Sebor wrote:

On 6/16/21 9:06 AM, David Malcolm wrote:

On Wed, 2021-06-16 at 08:52 -0600, Martin Sebor wrote:

On 6/16/21 6:38 AM, David Malcolm wrote:

On Tue, 2021-06-15 at 19:48 -0600, Martin Sebor wrote:

Thanks for writing the patch.


While debugging locations I noticed the semi_embedded_vec template
in line-map.h doesn't declare a copy ctor or copy assignment, but
is being copied in a couple of places in the C++ parser (via
gcc_rich_location).  It gets away with it most likely because it
never grows beyond the embedded buffer.


Where are these places?  I wasn't aware of this.


They're in the attached file along with the diff to reproduce
the errors.


Thanks.

Looks like:

    gcc_rich_location richloc = tok->location;

is implicitly constructing a gcc_rich_location, then copying it to
richloc.  This should instead be simply:

    gcc_rich_location richloc (tok->location);

which directly constructs the richloc in place, as I understand it.


I see, tok->location is location_t here, and the gcc_rich_location
ctor that takes it is not declared explicit (that would prevent
the implicit conversion).

The attached patch solves the rich_location problem by a) making
the ctor explicit, b) disabling the rich_location copy ctor, c)
changing the parser to use direct initialization.  (I CC Jason
as a heads up on the C++ FE bits.)


The C++ bits are fine.


The semi_embedded_vec should be fixed as well, regardless of this
particular use and patch.  Let me know if it's okay to commit (I'm
not open to disabling its copy ctor).



+  /* Not copyable or assignable.  */


This comment needs a rationale.


Done in the attached patch.

Providing a rationale in each instance sounds like a good addition
to the coding conventions.  Let me propose a patch for that.

Martin
gcc/cp/ChangeLog:

	* parser.c (cp_parser_selection_statement): Use direct initialization
	instead of copy.

gcc/ChangeLog:

	* gcc-rich-location.h (gcc_rich_location): Make ctor explicit.

libcpp/ChangeLog:

	* include/line-map.h (class rich_location): Disable copying and
	assignment.

diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index d57ddc4560d..848e4823258 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -12355,7 +12355,7 @@ cp_parser_selection_statement (cp_parser* parser, bool *if_p,
 	IF_STMT_CONSTEVAL_P (statement) = true;
 	condition = finish_if_stmt_cond (boolean_false_node, statement);
 
-	gcc_rich_location richloc = tok->location;
+	gcc_rich_location richloc (tok->location);
 	bool non_compound_stmt_p = false;
 	if (cp_lexer_next_token_is_not (parser->lexer, CPP_OPEN_BRACE))
 	  {
@@ -12383,7 +12383,7 @@ cp_parser_selection_statement (cp_parser* parser, bool *if_p,
 		RID_ELSE))
 	  {
 		cp_token *else_tok = cp_lexer_peek_token (parser->lexer);
-		gcc_rich_location else_richloc = else_tok->location;
+		gcc_rich_location else_richloc (else_tok->location);
 		guard_tinfo = get_token_indent_info (else_tok);
 		/* Consume the `else' keyword.  */
 		cp_lexer_consume_token (parser->lexer);
diff --git a/gcc/gcc-rich-location.h b/gcc/gcc-rich-location.h
index 00747631025..2a9e5db65d5 100644
--- a/gcc/gcc-rich-location.h
+++ b/gcc/gcc-rich-location.h
@@ -21,14 +21,16 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_RICH_LOCATION_H
 
 /* A gcc_rich_location is libcpp's rich_location with additional
-   helper methods for working with gcc's types.  */
+   helper methods for working with gcc's types.  The class is not
+   copyable or assignable because rich_location isn't. */
+
 class gcc_rich_location : public rich_location
 {
  public:
   /* Constructors.  */
 
   /* Constructing from a location.  */
-  gcc_rich_location (location_t loc, const range_label *label = NULL)
+  explicit gcc_rich_location (location_t loc, const range_label *label = NULL)
   : rich_location (line_table, loc, label)
   {
   }
diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 7d964172469..464494bfb5b 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -1670,6 +1670,12 @@ class rich_location
   /* Destructor.  */
   ~rich_location ();
 
+  /* The class manages the memory pointed to by the elements of
+ the M_FIXIT_HINTS vector and is not meant to be copied or
+ assigned.  */
+  rich_location (const rich_location &) = delete;
+  void operator= (const rich_location &) = delete;
+
   /* Accessors.  */
   location_t get_loc () const { return get_loc (0); }
   location_t get_loc (unsigned int idx) const;

[PATCH] libstdc++: Fix for deadlock in std::counting_semaphore [PR100806]

2021-06-16 Thread Thomas Rodgers

This is an 'interim' fix. For now it forces all waiting threads to wake
on _M_release(). This isn't exactly efficient but resolves the issue
in the immediate term.

libstdc++-v3/ChangeLog:
libstdc++/PR100806
* include/bits/semaphore_base.h (__atomic_semaphore::_M_release():
Force _M_release() to wake all waiting threads.
* testsuite/30_threads/semaphore/100806.cc: New test.
---
 libstdc++-v3/include/bits/semaphore_base.h|  4 +-
 .../testsuite/30_threads/semaphore/100806.cc  | 77 +++
 2 files changed, 80 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/30_threads/semaphore/100806.cc

diff --git a/libstdc++-v3/include/bits/semaphore_base.h 
b/libstdc++-v3/include/bits/semaphore_base.h
index 9a55978068f..c4565d7e560 100644
--- a/libstdc++-v3/include/bits/semaphore_base.h
+++ b/libstdc++-v3/include/bits/semaphore_base.h
@@ -256,7 +256,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   if (__update > 1)
__atomic_notify_address_bare(&_M_counter, true);
   else
-   __atomic_notify_address_bare(&_M_counter, false);
+   __atomic_notify_address_bare(&_M_counter, true);
+// FIXME - Figure out why this does not wake a waiting thread
+// __atomic_notify_address_bare(&_M_counter, false);
 }
 
   private:
diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/100806.cc 
b/libstdc++-v3/testsuite/30_threads/semaphore/100806.cc
new file mode 100644
index 000..483779caf0a
--- /dev/null
+++ b/libstdc++-v3/testsuite/30_threads/semaphore/100806.cc
@@ -0,0 +1,77 @@
+// Copyright (C) 2020-2021 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a -pthread" }
+// { dg-do run { target c++2a } }
+// { dg-require-effective-target pthread }
+// { dg-require-gthreads "" }
+// { dg-add-options libatomic }
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+std::counting_semaphore<4> semaphore{6};
+
+std::mutex mtx;
+std::vector results; 
+
+void thread_main(size_t x)
+{
+  semaphore.acquire();
+  std::this_thread::sleep_for(std::chrono::milliseconds(100));
+  semaphore.release();
+  {
+std::ostringstream stm;
+stm << "Thread " << x << " finished.";
+std::lock_guard g{ mtx };
+results.push_back(stm.str());
+  }
+}
+
+int main()
+{
+
+constexpr auto nthreads = 10;
+
+std::vector threads(nthreads);
+
+
+size_t counter{0};
+for(auto& t : threads)
+{
+t = std::thread(thread_main, counter++);
+}
+
+for(auto& t : threads)
+  {
+t.join();
+{
+  std::lock_guard g{ mtx };
+  for (auto&& r : results)
+std::cout << r << '\n';
+  std::cout.flush();
+  results.clear();
+}
+  }
+}
-- 
2.26.2

Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed by multiply [PR95176]

2021-06-16 Thread Victor Tong via Gcc-patches

Hi Richard,

Thanks for the feedback. From what you said, I can think of two possible 
solutions (though I'm not sure if either is feasible/fully correct):

Option 1: Have the new X * (Y / X) --> Y - (Y % X) optimization only run in 
scenarios that don't interfere with the existing X - (X / Y) * Y --> X % Y 
optimization. 

This would involve checking the expression one level up to see if there's a 
subtraction that would trigger the existing optimization. I looked through the 
match.pd file and couldn't find a bail condition like this. It doesn't seem 
like there's a link from an expression to its parent expression one level up. 
This also feels a bit counter-intuitive since it would be doing the opposite of 
the bottom-up expression matching where the compiler would like to match a 
larger expression rather than a smaller one.

Option 2: Add a new pattern to support scenarios that the existing nop_convert 
pattern bails out on.

Existing pattern:

(simplify
   (minus (nop_convert1? @0) (nop_convert2? (minus (nop_convert3? @@0) @1)))
   (view_convert @1))

New pattern to add:

  /* X - (X - Y) --> Y */
  (simplify
  (minus @0 (convert? (minus @@0 @1)))
  (if (INTEGRAL_TYPE_P (type) 
&& TYPE_OVERFLOW_UNDEFINED(type)
&& INTEGRAL_TYPE_P (TREE_TYPE(@1))
&& TYPE_OVERFLOW_UNDEFINED(TREE_TYPE(@1))
&& !TYPE_UNSIGNED (TREE_TYPE (@1))
&& !TYPE_UNSIGNED (type)
&& TYPE_PRECISION (TREE_TYPE (@1)) <= TYPE_PRECISION (type))
(convert @1)))

I think the truncation concerns that you brought up should be covered if the 
external expression type precision is greater than or equal to the internal 
expression type. There may be a sign extension operation (which is why the 
nop_convert check fails) but that shouldn't affect the value of the expression. 
And if the types involved are signed integers where overflow/underflow results 
in undefined behavior, the X - (X - Y) --> Y optimization should be legal.

Please correct me if I'm wrong with either one of these options, or if you can 
think of a better option to fix the regression.

Thanks,
Victor

From: Richard Biener 
Sent: Monday, June 7, 2021 1:25 AM
To: Victor Tong 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division 
followed by multiply [PR95176] 

On Wed, Jun 2, 2021 at 10:55 PM Victor Tong  wrote:
>
> Hi Richard,
>
> Thanks for reviewing my patch. I did a search online and you're right -- 
> there isn't a vector modulo instruction. I'll remove the X * (Y / X) --> Y - 
> (Y % X) pattern and the existing X - (X / Y) * Y --> X % Y from triggering on 
> vector types.
>
> I looked into why the following pattern isn't triggering:
>
>   (simplify
>    (minus @0 (nop_convert1? (minus (nop_convert2? @0) @1)))
>    (view_convert @1))
>
> The nop_converts expand into tree_nop_conversion_p checks. In fn2() of the 
> testsuite/gcc.dg/fold-minus-6.c, the expression during generic matching looks 
> like:
>
> 42 - (long int) (42 - 42 % x)
>
> When looking at the right-hand side of the expression (the (long int) (42 - 
> 42 % x)), the tree_nop_conversion_p check fails because of the type precision 
> difference. The expression inside of the cast has a 32-bit precision and the 
> outer expression has a 64-bit precision.
>
> I looked around at other patterns and it seems like nop_convert and 
> view_convert are used because of underflow/overflow concerns. I'm not 
> familiar with the two constructs. What's the difference between using them 
> and checking TYPE_OVERFLOW_UNDEFINED? In the scenario above, since 
> TYPE_OVERFLOW_UNDEFINED is true, the second pattern that I added (X - (X - Y) 
> --> Y) gets triggered.

But TYPE_OVERFLOW_UNDEFINED is not a good condition here since the
conversion is the problematic one and
conversions have implementation defined behavior.  Now, the above does
not match because it wasn't designed to,
and for non-constant '42' it would have needed a (convert ...) around
the first @0 as well (matching of constants is
by value, not by value + type).

That said, your

+/* X - (X - Y) --> Y */
+(simplify
+ (minus (convert1? @0) (convert2? (minus @@0 @1)))
+ (if ((INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) &&
TYPE_OVERFLOW_UNDEFINED(type))
+  (convert @1)))

would match (int)x - (int)(x - y) where you assert the outer subtract
has undefined behavior
on overflow but the inner subtract could wrap and the (int) conversion
can be truncating
or widening.  Is that really always a valid transform then?

Richard.

> Thanks,
> Victor
>
>
> From: Richard Biener 
> Sent: Tuesday, April 27, 2021 1:29 AM
> To: Victor Tong 
> Cc: gcc-patches@gcc.gnu.org 
> Subject: [EXTERNAL] Re: [PATCH] tree-optimization: Optimize division followed 
> by multiply [PR95176]
>
> On Thu, Apr 1, 2021 at 1:03 AM Victor Tong via Gcc-patches
>  wrote:
> >
> > Hello,
> >
> > This patch fixes PR tree-optimization/95176. A new pattern in match.pd was 
> > added to transform "a * (b / a)" -

Re: [PATCH] make rich_location safe to copy

2021-06-16 Thread David Malcolm via Gcc-patches

On Wed, 2021-06-16 at 11:21 -0600, Martin Sebor wrote:
> On 6/16/21 10:35 AM, Jason Merrill wrote:
> > On 6/16/21 12:11 PM, Martin Sebor wrote:
> > > On 6/16/21 9:06 AM, David Malcolm wrote:
> > > > On Wed, 2021-06-16 at 08:52 -0600, Martin Sebor wrote:
> > > > > On 6/16/21 6:38 AM, David Malcolm wrote:
> > > > > > On Tue, 2021-06-15 at 19:48 -0600, Martin Sebor wrote:
> > > > > > 
> > > > > > Thanks for writing the patch.
> > > > > > 
> > > > > > > While debugging locations I noticed the semi_embedded_vec
> > > > > > > template
> > > > > > > in line-map.h doesn't declare a copy ctor or copy
> > > > > > > assignment, but
> > > > > > > is being copied in a couple of places in the C++ parser
> > > > > > > (via
> > > > > > > gcc_rich_location).  It gets away with it most likely
> > > > > > > because it
> > > > > > > never grows beyond the embedded buffer.
> > > > > > 
> > > > > > Where are these places?  I wasn't aware of this.
> > > > > 
> > > > > They're in the attached file along with the diff to reproduce
> > > > > the errors.
> > > > 
> > > > Thanks.
> > > > 
> > > > Looks like:
> > > > 
> > > >     gcc_rich_location richloc = tok->location;
> > > > 
> > > > is implicitly constructing a gcc_rich_location, then copying it
> > > > to
> > > > richloc.  This should instead be simply:
> > > > 
> > > >     gcc_rich_location richloc (tok->location);
> > > > 
> > > > which directly constructs the richloc in place, as I understand
> > > > it.
> > > 
> > > I see, tok->location is location_t here, and the
> > > gcc_rich_location
> > > ctor that takes it is not declared explicit (that would prevent
> > > the implicit conversion).
> > > 
> > > The attached patch solves the rich_location problem by a) making
> > > the ctor explicit, b) disabling the rich_location copy ctor, c)
> > > changing the parser to use direct initialization.  (I CC Jason
> > > as a heads up on the C++ FE bits.)
> > 
> > The C++ bits are fine.
> > 
> > > The semi_embedded_vec should be fixed as well, regardless of this
> > > particular use and patch.  Let me know if it's okay to commit
> > > (I'm
> > > not open to disabling its copy ctor).
> > 
> > > +  /* Not copyable or assignable.  */
> > 
> > This comment needs a rationale.
> 
> Done in the attached patch.

LGTM; thanks

Dave

> 
> Providing a rationale in each instance sounds like a good addition
> to the coding conventions.  Let me propose a patch for that.
> 
> Martin

[PATCH] Add needed earlyclobber to fusion patterns

2021-06-16 Thread Aaron Sawdey via Gcc-patches

The add-logical and add-add fusion patterns all have constraint
alternatives "=0,1,&r,r" for the output (3). The inputs 0 and 1
are used in the first fusion instruction and then either may be
reused as a temp for the output of the first insn which is
input to the second. However, if input 2 is the same as 0 or 1,
it gets clobbered unexpectedly. So the first 2 alts need to be
"=&0,&1,&r,r" instead to indicate that in alts 0 and 1, the
register used for 3 is earlyclobber, hence can't be the same as
input 2.

This was actually encountered in the backport of the add-logical
fusion patch to gcc-11. Some code in go hit this case:

   :andc r30,r30,r9
r30 now (~(x|((x&c)+c)))&(~c) --> this is new x
   :b 
   :addi r31,r31,-1
r31 now m-1
   :srd r31,r30,r31
r31 now x>>(m-1)
   :subf r30,r31,r30
r30 now x-(x>>(m-1))
   :or r30,r30,r30   # mdoom
nop
   :not r3,r30
r3 now ~(x-(x>>(m-1))) -- WHOOPS

The or r30,r30,r30 was meant to be or-ing in the earlier value
of r30 which was overwritten by the output of the subf.

OK for trunk and backport to 11 if bootstrap and regtest pass?

Separately I will be updating the fusion regtests because this
change has again shifted which pattern alternatives get used 
and how many times.

Thanks!
Aaron

gcc/ChangeLog

* config/rs6000/genfusion.pl (gen_logical_addsubf): Add
earlyclobber to alts 0/1.
(gen_addadd): Add earlyclobber to alts 0/1.
* config/rs6000/fusion.md: Regenerate file.
---
 gcc/config/rs6000/fusion.md| 300 -
 gcc/config/rs6000/genfusion.pl |   4 +-
 2 files changed, 152 insertions(+), 152 deletions(-)

diff --git a/gcc/config/rs6000/fusion.md b/gcc/config/rs6000/fusion.md
index e642ff5f95f..516baa0bb0b 100644
--- a/gcc/config/rs6000/fusion.md
+++ b/gcc/config/rs6000/fusion.md
@@ -358,7 +358,7 @@ (define_insn_and_split "*lbz_cmpldi_cr0_QI_GPR_CCUNS_zero"
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; scalar and -> and
 (define_insn "*fuse_and_and"
-  [(set (match_operand:GPR 3 "gpc_reg_operand" "=0,1,&r,r")
+  [(set (match_operand:GPR 3 "gpc_reg_operand" "=&0,&1,&r,r")
 (and:GPR (and:GPR (match_operand:GPR 0 "gpc_reg_operand" "r,r,r,r")
   (match_operand:GPR 1 "gpc_reg_operand" "%r,r,r,r"))
  (match_operand:GPR 2 "gpc_reg_operand" "r,r,r,r")))
@@ -376,7 +376,7 @@ (define_insn "*fuse_and_and"
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; scalar andc -> and
 (define_insn "*fuse_andc_and"
-  [(set (match_operand:GPR 3 "gpc_reg_operand" "=0,1,&r,r")
+  [(set (match_operand:GPR 3 "gpc_reg_operand" "=&0,&1,&r,r")
 (and:GPR (and:GPR (not:GPR (match_operand:GPR 0 "gpc_reg_operand" 
"r,r,r,r"))
   (match_operand:GPR 1 "gpc_reg_operand" "r,r,r,r"))
  (match_operand:GPR 2 "gpc_reg_operand" "r,r,r,r")))
@@ -394,7 +394,7 @@ (define_insn "*fuse_andc_and"
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; scalar eqv -> and
 (define_insn "*fuse_eqv_and"
-  [(set (match_operand:GPR 3 "gpc_reg_operand" "=0,1,&r,r")
+  [(set (match_operand:GPR 3 "gpc_reg_operand" "=&0,&1,&r,r")
 (and:GPR (not:GPR (xor:GPR (match_operand:GPR 0 "gpc_reg_operand" 
"r,r,r,r")
   (match_operand:GPR 1 "gpc_reg_operand" "r,r,r,r")))
  (match_operand:GPR 2 "gpc_reg_operand" "r,r,r,r")))
@@ -412,7 +412,7 @@ (define_insn "*fuse_eqv_and"
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; scalar nand -> and
 (define_insn "*fuse_nand_and"
-  [(set (match_operand:GPR 3 "gpc_reg_operand" "=0,1,&r,r")
+  [(set (match_operand:GPR 3 "gpc_reg_operand" "=&0,&1,&r,r")
 (and:GPR (ior:GPR (not:GPR (match_operand:GPR 0 "gpc_reg_operand" 
"r,r,r,r"))
   (not:GPR (match_operand:GPR 1 "gpc_reg_operand" 
"r,r,r,r")))
  (match_operand:GPR 2 "gpc_reg_operand" "r,r,r,r")))
@@ -430,7 +430,7 @@ (define_insn "*fuse_nand_and"
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; scalar nor -> and
 (define_insn "*fuse_nor_and"
-  [(set (match_operand:GPR 3 "gpc_reg_operand" "=0,1,&r,r")
+  [(set (match_operand:GPR 3 "gpc_reg_operand" "=&0,&1,&r,r")
 (and:GPR (and:GPR (not:GPR (match_operand:GPR 0 "gpc_reg_operand" 
"r,r,r,r"))
   (not:GPR (match_operand:GPR 1 "gpc_reg_operand" 
"r,r,r,r")))
  (match_operand:GPR 2 "gpc_reg_operand" "r,r,r,r")))
@@ -448,7 +448,7 @@ (define_insn "*fuse_nor_and"
 ;; logical-logical fusion pattern generated by gen_logical_addsubf
 ;; scalar or -> and
 (define_insn "*fuse_or_and"
-  [(set (match_operand:GPR 3 "gpc_reg_operand" "=0,1,&r,r")
+  [(set (match_operand:GPR 3 "gpc_reg_operand" "=&0,&1,&r,r")
 (and:GPR (ior:GPR (match_operand:GPR 0 "gpc_reg_operand" "r,r,r,r")
   (match_operand:GPR 1 "gpc_reg_opera

Re: [PATCH][version 3]add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-06-16 Thread Qing Zhao via Gcc-patches

So, the major question now is:

Is one single repeatable pattern enough for pattern initialization for all 
different types of auto variables?

If YES, then the implementation for pattern initialization will be much easier 
and simpler
  as you pointed out. And will save me a lot of pain to implement this part.
If NO, then we have to keep the current complicate implementation since it 
provides us
  the flexibility to assign different patterns to different types.

Honestly, I don’t have a good justification on this question myself.

The previous references I have so far are the current behavior of CLANG and 
Microsoft compiler.

For your reference,
. CLANG uses different patterns for INTEGER  (0x) and FLOAT 
(0x) and 32-bit pointer (0x00AA)
https://reviews.llvm.org/D54604
. Microsoft uses different patterns for INTEGERS ( 0xE2), FLOAT (1.0)
https://msrc-blog.microsoft.com/2020/05/13/solving-uninitialized-stack-memory-on-windows/

My understanding from CLANG’s comment is, the patterns are easier to crash the 
program for the certain type, therefore easier to
catch any potential bugs.
Don’t know why Microsoft chose the pattern like this.

So, For GCC, what should we do on the pattern initializations, shall we choose 
one single repeatable pattern for all the types as you suggested,
Or chose different patterns for different types as Clang and Microsoft 
compiler’s behavior?

Kees, do you have any comment on this?

How did Linux Kernel use -ftrivial-auto-var-init=pattern feature of CLANG?

Thanks.

Qing

[PATCH] libstdc++: Fix for deadlock in std::counting_semaphore [PR100806]

2021-06-16 Thread Thomas Rodgers

Same as previous version except removing the copyright notice from the
test.

libstdc++-v3/ChangeLog:
libstdc++/PR100806
* include/bits/semaphore_base.h (__atomic_semaphore::_M_release():
Force _M_release() to wake all waiting threads.
* testsuite/30_threads/semaphore/100806.cc: New test.
---
 libstdc++-v3/include/bits/semaphore_base.h|  4 +-
 .../testsuite/30_threads/semaphore/100806.cc  | 60 +++
 2 files changed, 63 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/30_threads/semaphore/100806.cc

diff --git a/libstdc++-v3/include/bits/semaphore_base.h 
b/libstdc++-v3/include/bits/semaphore_base.h
index 9a55978068f..c4565d7e560 100644
--- a/libstdc++-v3/include/bits/semaphore_base.h
+++ b/libstdc++-v3/include/bits/semaphore_base.h
@@ -256,7 +256,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   if (__update > 1)
__atomic_notify_address_bare(&_M_counter, true);
   else
-   __atomic_notify_address_bare(&_M_counter, false);
+   __atomic_notify_address_bare(&_M_counter, true);
+// FIXME - Figure out why this does not wake a waiting thread
+// __atomic_notify_address_bare(&_M_counter, false);
 }
 
   private:
diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/100806.cc 
b/libstdc++-v3/testsuite/30_threads/semaphore/100806.cc
new file mode 100644
index 000..938c2793be1
--- /dev/null
+++ b/libstdc++-v3/testsuite/30_threads/semaphore/100806.cc
@@ -0,0 +1,60 @@
+// { dg-options "-std=gnu++2a -pthread" }
+// { dg-do run { target c++2a } }
+// { dg-require-effective-target pthread }
+// { dg-require-gthreads "" }
+// { dg-add-options libatomic }
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+std::counting_semaphore<4> semaphore{6};
+
+std::mutex mtx;
+std::vector results; 
+
+void thread_main(size_t x)
+{
+  semaphore.acquire();
+  std::this_thread::sleep_for(std::chrono::milliseconds(100));
+  semaphore.release();
+  {
+std::ostringstream stm;
+stm << "Thread " << x << " finished.";
+std::lock_guard g{ mtx };
+results.push_back(stm.str());
+  }
+}
+
+int main()
+{
+
+constexpr auto nthreads = 10;
+
+std::vector threads(nthreads);
+
+
+size_t counter{0};
+for(auto& t : threads)
+{
+t = std::thread(thread_main, counter++);
+}
+
+for(auto& t : threads)
+  {
+t.join();
+{
+  std::lock_guard g{ mtx };
+  for (auto&& r : results)
+std::cout << r << '\n';
+  std::cout.flush();
+  results.clear();
+}
+  }
+}
-- 
2.26.2

Re: [PATCH] Add needed earlyclobber to fusion patterns

2021-06-16 Thread Segher Boessenkool

On Wed, Jun 16, 2021 at 01:51:16PM -0500, Aaron Sawdey wrote:
> The add-logical and add-add fusion patterns all have constraint
> alternatives "=0,1,&r,r" for the output (3). The inputs 0 and 1
> are used in the first fusion instruction and then either may be
> reused as a temp for the output of the first insn which is
> input to the second. However, if input 2 is the same as 0 or 1,
> it gets clobbered unexpectedly. So the first 2 alts need to be
> "=&0,&1,&r,r" instead to indicate that in alts 0 and 1, the
> register used for 3 is earlyclobber, hence can't be the same as
> input 2.

>   * config/rs6000/genfusion.pl (gen_logical_addsubf): Add
>   earlyclobber to alts 0/1.

You can break that line later, after "to" even.  Just FYI :-)

This is okay for trunk and backport to 10.  Thanks!


Segher

[pushed] c++: static memfn from non-dependent base [PR101078]

2021-06-16 Thread Jason Merrill via Gcc-patches

After my patch for PR91706, or before that with the qualified call,
tsubst_baselink returned a BASELINK with BASELINK_BINFO indicating a base of
a still-dependent derived class.  We need to look up the relevant base binfo
in the substituted class.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/101078
PR c++/91706

gcc/cp/ChangeLog:

* pt.c (tsubst_baselink): Update binfos in non-dependent case.

gcc/testsuite/ChangeLog:

* g++.dg/template/access39.C: New test.
---
 gcc/cp/pt.c  | 15 +--
 gcc/testsuite/g++.dg/template/access39.C | 17 +
 2 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/access39.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index d4bb5cc5eaf..15947b2c812 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -16249,8 +16249,19 @@ tsubst_baselink (tree baselink, tree object_type,
fns = BASELINK_FUNCTIONS (baselink);
 }
   else
-/* We're going to overwrite pieces below, make a duplicate.  */
-baselink = copy_node (baselink);
+{
+  /* We're going to overwrite pieces below, make a duplicate.  */
+  baselink = copy_node (baselink);
+
+  if (qualifying_scope != BINFO_TYPE (BASELINK_ACCESS_BINFO (baselink)))
+   {
+ /* The decl we found was from non-dependent scope, but we still need
+to update the binfos for the instantiated qualifying_scope.  */
+ BASELINK_ACCESS_BINFO (baselink) = TYPE_BINFO (qualifying_scope);
+ BASELINK_BINFO (baselink) = lookup_base (qualifying_scope, binfo_type,
+  ba_unique, nullptr, 
complain);
+   }
+}
 
   /* If lookup found a single function, mark it as used at this point.
  (If lookup found multiple functions the one selected later by
diff --git a/gcc/testsuite/g++.dg/template/access39.C 
b/gcc/testsuite/g++.dg/template/access39.C
new file mode 100644
index 000..d94177e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/access39.C
@@ -0,0 +1,17 @@
+// PR c++/101078
+
+struct A {
+  static void f();
+};
+
+template
+struct B : private A {
+  struct C {
+void g() { f(); }
+void g2() { B::f(); }
+  };
+};
+
+int main() {
+  B::C().g();
+}

base-commit: 9e64426dae129cca5b62355ef6c5a3bd6137e830
-- 
2.27.0

[pushed] correct -Wmismatched-dealloc default

2021-06-16 Thread Martin Sebor via Gcc-patches


A user pointed out to me that the manual documented the wrong
defaults for the -Wmismatched-dealloc and -Wmismatched-new-delete
options.  I pushed the correction in r12-1544 (also attached).

Martin
commit 139564821dd2e4a9cbb08677ff12cf291d7d0218
Author: Martin Sebor 
Date:   Wed Jun 16 16:49:56 2021 -0600

gcc/ChangeLog:

* doc/invoke.texi (-Wmismatched-dealloc, -Wmismatched-new-delete):
Correct documented defaults.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 510f24e55ab..fe812cbd512 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -255,7 +255,7 @@ in the following sections.
 -Wno-inherited-variadic-ctor  -Wno-init-list-lifetime @gol
 -Winvalid-imported-macros @gol
 -Wno-invalid-offsetof  -Wno-literal-suffix @gol
--Wno-mismatched-new-delete -Wmismatched-tags @gol
+-Wmismatched-new-delete -Wmismatched-tags @gol
 -Wmultiple-inheritance  -Wnamespaces  -Wnarrowing @gol
 -Wnoexcept  -Wnoexcept-type  -Wnon-virtual-dtor @gol
 -Wpessimizing-move  -Wno-placement-new  -Wplacement-new=@var{n} @gol
@@ -3963,7 +3963,7 @@ The warning is inactive inside a system header file, such as the STL, so
 one can still use the STL.  One may also instantiate or specialize
 templates.
 
-@item -Wno-mismatched-new-delete @r{(C++ and Objective-C++ only)}
+@item -Wmismatched-new-delete @r{(C++ and Objective-C++ only)}
 @opindex Wmismatched-new-delete
 @opindex Wno-mismatched-new-delete
 Warn for mismatches between calls to @code{operator new} or @code{operator
@@ -3995,7 +3995,7 @@ The related option @option{-Wmismatched-dealloc} diagnoses mismatches
 involving allocation and deallocation functions other than @code{operator
 new} and @code{operator delete}.
 
-@option{-Wmismatched-new-delete} is enabled by default.
+@option{-Wmismatched-new-delete} is included in @option{-Wall}.
 
 @item -Wmismatched-tags @r{(C++ and Objective-C++ only)}
 @opindex Wmismatched-tags
@@ -5539,6 +5539,8 @@ Options} and @ref{Objective-C and Objective-C++ Dialect Options}.
 -Wmemset-elt-size @gol
 -Wmemset-transposed-args @gol
 -Wmisleading-indentation @r{(only for C/C++)} @gol
+-Wmismatched-dealloc @gol
+-Wmismatched-new-delete @r{(only for C/C++)} @gol
 -Wmissing-attributes @gol
 -Wmissing-braces @r{(only for C/ObjC)} @gol
 -Wmultistatement-macros  @gol
@@ -6435,7 +6437,7 @@ Ignoring the warning can result in poorly optimized code.
 disable the warning, but this is not recommended and should be done only
 when non-existent profile data is justified.
 
-@item -Wno-mismatched-dealloc
+@item -Wmismatched-dealloc
 @opindex Wmismatched-dealloc
 @opindex Wno-mismatched-dealloc
 
@@ -6468,7 +6470,7 @@ void f (void)
 In C++, the related option @option{-Wmismatched-new-delete} diagnoses
 mismatches involving either @code{operator new} or @code{operator delete}.
 
-Option @option{-Wmismatched-dealloc} is enabled by default.
+Option @option{-Wmismatched-dealloc} is included in @option{-Wall}.
 
 @item -Wmultistatement-macros
 @opindex Wmultistatement-macros
@@ -7958,9 +7960,9 @@ Warnings controlled by the option can be disabled either by specifying
 Disable @option{-Wframe-larger-than=} warnings.  The option is equivalent
 to @option{-Wframe-larger-than=}@samp{SIZE_MAX} or larger.
 
-@item -Wno-free-nonheap-object
-@opindex Wno-free-nonheap-object
+@item -Wfree-nonheap-object
 @opindex Wfree-nonheap-object
+@opindex Wno-free-nonheap-object
 Warn when attempting to deallocate an object that was either not allocated
 on the heap, or by using a pointer that was not returned from a prior call
 to the corresponding allocation function.  For example, because the call
@@ -7977,7 +7979,7 @@ void f (char *p)
 @}
 @end smallexample
 
-@option{-Wfree-nonheap-object} is enabled by default.
+@option{-Wfree-nonheap-object} is included in @option{-Wall}.
 
 @item -Wstack-usage=@var{byte-size}
 @opindex Wstack-usage

[PATCH] Range_on_edge in ranger_cache should return true for all ranges.

2021-06-16 Thread Andrew MacLeod via Gcc-patches

This patch mostly affects an internal routine in the ranger_cache.  When 
doing the cache rework to also be a range_query object, I implemented 
range_on_edge to always return a range, but only return TRUE when the 
range was actually changed on the edge.  This was a well intention but 
misguided attempt to combine has_edge_range_p() and 
outgoing_edge_range_p ().


With the GORI engine now using the ranger_cache as a "read only"range 
query to access the cache and fold statements, range_on_edge returning 
false meant that the fold_using_range code would turn constants and 
other ssa_name ranges into VARYING because the value FALSE was returned.


This corrects that mistake, and changes range_on_edge to return TRUE 
whenever any range is calculated.


Bootstraps on  x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

commit 8a22a10c7879109c3906e1b865c50fe236b89f41
Author: Andrew MacLeod 
Date:   Wed Jun 16 11:14:36 2021 -0400

Range_on_edge in ranger_cache should return true for all ranges.

Range_on_edge was implemented in the cache to always return a range, but
only returned true when the edge actally changed the range.
Return true with any range that can be calculated.

* gimple-range-cache.cc (ranger_cache::range_on_edge): Always return
true when a range can be calculated.
* gimple-range.cc (gimple_ranger::dump_bb): Check has_edge_range_p.

diff --git a/gcc/gimple-range-cache.cc b/gcc/gimple-range-cache.cc
index 37e2acb19f9..cc2b7092dad 100644
--- a/gcc/gimple-range-cache.cc
+++ b/gcc/gimple-range-cache.cc
@@ -978,8 +978,7 @@ ranger_cache::range_of_expr (irange &r, tree name, gimple *stmt)
 }
 
 
-// Implement range_on_edge. Return TRUE if the edge generates a range,
-// otherwise false.. but still return a range.
+// Implement range_on_edge.  Always return the best available range.
 
  bool
  ranger_cache::range_on_edge (irange &r, edge e, tree expr)
@@ -989,14 +988,11 @@ ranger_cache::range_of_expr (irange &r, tree name, gimple *stmt)
   exit_range (r, expr, e->src);
   int_range_max edge_range;
   if (m_gori.outgoing_edge_range_p (edge_range, e, expr, *this))
-	{
-	  r.intersect (edge_range);
-	  return true;
-	}
+	r.intersect (edge_range);
+  return true;
 }
-  else
-get_tree_range (r, expr, NULL);
-  return false;
+
+  return get_tree_range (r, expr, NULL);
 }
 
 
diff --git a/gcc/gimple-range.cc b/gcc/gimple-range.cc
index 481b89b2b80..efb919f1595 100644
--- a/gcc/gimple-range.cc
+++ b/gcc/gimple-range.cc
@@ -1394,7 +1394,8 @@ gimple_ranger::dump_bb (FILE *f, basic_block bb)
   for (x = 1; x < num_ssa_names; x++)
 	{
 	  tree name = gimple_range_ssa_p (ssa_name (x));
-	  if (name && m_cache.range_on_edge (range, e, name))
+	  if (name && gori ().has_edge_range_p (name, e)
+	  && m_cache.range_on_edge (range, e, name))
 	{
 	  gimple *s = SSA_NAME_DEF_STMT (name);
 	  // Only print the range if this is the def block, or

[PATCH] Add recomputation to outgoing_edge_range.

2021-06-16 Thread Andrew MacLeod via Gcc-patches


This change adds first degree recomputation to the gori engine.

"Exports" are any ssa_name which is in the definition chain of one of 
the SSA_NAMES on the condition.


   a_2 = b_3 +6
   if (a_2 > 10 && a_2 <20)    // a_2 has the range [11, 19]

The condition tells us that the range of a_2 is [11, 19] on the true 
edge, and since b_3 is used in the definition chain of a_2, the 
gori_compute engine can solve the equation    [11, 19] = b_3 + 6    to 
deduce that b_3 has a range of [5, 13] on the true range.


These export chains are built for each basic block with a condition, and 
in this block, a_2 and b_3 are in the export list.   IF an SSA_NAME is 
not in the export list,we cannot calculate a range for it.    If we 
tweak to the original case:


   x_5 = b_3 * 2
   a_2 = b_3 +6
   if (a_2 > 10 && a_2 <20)    // a_2 has the range [11, 19]

Following just definition chains, there is no use/def relationship 
between x_5 and a_2.   Using the export list, we know that a_2 and b_3 
are both exports from this block. The new GORI rework also contains 
dependency summaries as well.  Each statement has up to 2 "direct 
dependencies" cached.  These are ssa_names which occur in the statement 
and could change it value.


This patch adds the ability to recompute values by checking if one of 
these direct dependants ise an export, and recomputing the original 
statement using the values as they appear on this edge.


In this latter example, b_3 is a direct dependent for x_5's statement,  
and x_5 is therefore determined to be recomputable. Using 
fold_using_range we ask for the range of x_5 to be calculated as if it 
were on that edge.  It can determine that b_3 is [5, 13] on the edge and 
will now calculate x_5 as [10, 26] on the edge


The nature of SSA makes this a safe operation (assuming the statement 
has no side effects).  It also removes any need for x_5 = b_3 * 2 to 
occur in this block.. it can be anywhere in the IL and this still works.


This is integrated directly in outgoing_edge_range_p and picks up quite 
a few new opportunities.


Bootstraps on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew



commit 786188e8b8cab47cb31177c6f4ab1a1578a607c3
Author: Andrew MacLeod 
Date:   Wed Jun 16 18:08:03 2021 -0400

Add recomputation to outgoing_edge_range.

The gori engine can calculate outgoing ranges for exported values.  This
change allows 1st degree recomputation.  If a name is not exported from a
block, but one of the ssa_names used directly in computing it is, then
we can recompute the ssa_name on the edge using the edge values for its
operands.

* gimple-range-gori.cc (gori_compute::has_edge_range_p): Check with
may_recompute_p.
(gori_compute::may_recompute_p): New.
(gori_compute::outgoing_edge_range_p): Perform recomputations.
* gimple-range-gori.h (class gori_compute): Add prototype.

diff --git a/gcc/gimple-range-gori.cc b/gcc/gimple-range-gori.cc
index 09dcd694319..647f4964769 100644
--- a/gcc/gimple-range-gori.cc
+++ b/gcc/gimple-range-gori.cc
@@ -972,16 +972,18 @@ gori_compute::compute_operand1_and_operand2_range (irange &r,
   r.intersect (op_range);
   return true;
 }
-// Return TRUE if a range can be calcalated for NAME on edge E.
+// Return TRUE if a range can be calculated or recomputed for NAME on edge E.
 
 bool
 gori_compute::has_edge_range_p (tree name, edge e)
 {
-  // If no edge is specified, check if NAME is an export on any edge.
-  if (!e)
-return is_export_p (name);
+  // Check if NAME is an export or can be recomputed.
+  if (e)
+return is_export_p (name, e->src) || may_recompute_p (name, e);
 
-  return is_export_p (name, e->src);
+  // If no edge is specified, check if NAME can have a range calculated
+  // on any edge.
+  return is_export_p (name) || may_recompute_p (name);
 }
 
 // Dump what is known to GORI computes to listing file F.
@@ -992,6 +994,32 @@ gori_compute::dump (FILE *f)
   gori_map::dump (f);
 }
 
+// Return TRUE if NAME can be recomputed on edge E.  If any direct dependant
+// is exported on edge E, it may change the computed value of NAME.
+
+bool
+gori_compute::may_recompute_p (tree name, edge e)
+{
+  tree dep1 = depend1 (name);
+  tree dep2 = depend2 (name);
+
+  // If the first dependency is not set, there is no recompuation.
+  if (!dep1)
+return false;
+
+  // Don't recalculate PHIs or statements with side_effects.
+  gimple *s = SSA_NAME_DEF_STMT (name);
+  if (is_a (s) || gimple_has_side_effects (s))
+return false;
+
+  // If edge is specified, check if NAME can be recalculated on that edge.
+  if (e)
+return ((is_export_p (dep1, e->src))
+	|| (dep2 && is_export_p (dep2, e->src)));
+
+  return (is_export_p (dep1)) || (dep2 && is_export_p (dep2));
+}
+
 // Calculate a range on edge E and return it in R.  Try to evaluate a
 // range for NAME on this edge.  Return FALSE if this is either not a
 // control edge or NAME is not define

Ping: [PATCH 1/3] Add IEEE 128-bit min/max support on PowerPC.

2021-06-16 Thread Michael Meissner via Gcc-patches

Ping patch.  In particular, we would like to get this into the GCC 11.2
backport as it is power10 enablement.

| Date: Tue, 8 Jun 2021 20:21:25 -0400
| Subject: [PATCH 1/3] Add IEEE 128-bit min/max support on PowerPC.
| Message-ID: <20210609002125.ga18...@ibm-toto.the-meissners.org>

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Ping: [PATCH 2/3] Fix IEEE 128-bit min/max test.

2021-06-16 Thread Michael Meissner via Gcc-patches

Ping patch.  In particular, we like to get this into GCC 11.2 as part of
power10 enablement.

| Date: Tue, 8 Jun 2021 20:22:40 -0400
| Subject: [PATCH 2/3] Fix IEEE 128-bit min/max test.
| Message-ID: <20210609002240.gb18...@ibm-toto.the-meissners.org>

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Ping: [PATCH 3/3] Add IEEE 128-bit fp conditional move on PowerPC.

2021-06-16 Thread Michael Meissner via Gcc-patches

Ping patch.  In particular, we would like to get this patch into GCC 11.2 for
power10 enablement.

| Date: Tue, 8 Jun 2021 20:24:47 -0400
| Subject: [PATCH 3/3] Add IEEE 128-bit fp conditional move on PowerPC.
| Message-ID: <20210609002447.gc18...@ibm-toto.the-meissners.org>

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

Ping [PATCH]: Generate 128-bit divide/modulus

2021-06-16 Thread Michael Meissner via Gcc-patches

Ping patch.  In particular, we would like to get this to GCC 11.2 because it is
power10 enablement.

| Date: Fri, 4 Jun 2021 11:10:37 -0400
| Subject: Generate 128-bit divide/modulus
| Message-ID: <20210604151037.ga27...@ibm-toto.the-meissners.org>

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797

RE: [EXTERNAL] [PATCH] gcov: Use system IO buffering

2021-06-16 Thread Eugene Rozenfeld via Gcc-patches

The commit from this patch 
(https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=23eb66d1d46a34cb28c4acbdf8a1deb80a7c5a05)
 changed the semantics of gcov_read_string and gcov_write_string. Before this 
change the string storage was as described in gcov-io.h:

"Strings are
   padded with 1 to 4 NUL bytes, to bring the length up to a multiple
   of 4. The number of 4 bytes is stored, followed by the padded
   string."

After this change the number before the string indicates the number of bytes 
(not words) and there is no padding.

Was this file format change intentional? It breaks AutoFDO because create_gcov 
produces strings in the format specified in gcov-io.h.

Thanks,

Eugene

-Original Message-
From: Gcc-patches  On Behalf Of Martin Liška
Sent: Wednesday, April 21, 2021 12:52 AM
To: gcc-patches@gcc.gnu.org
Subject: [EXTERNAL] [PATCH] gcov: Use system IO buffering

Hey.

I/O buffering in gcov seems duplicite to what modern C library can provide.
The patch is a simplification and can provide easier interface for system that 
don't have a filesystem and would like using GCOV.

I'm going to install the patch after 11.1 if there are no objections.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Thanks,
Martin

gcc/ChangeLog:

* gcov-io.c (gcov_write_block): Remove.
(gcov_write_words): Likewise.
(gcov_read_words): Re-implement using gcov_read_bytes.
(gcov_allocate): Remove.
(GCOV_BLOCK_SIZE): Likewise.
(struct gcov_var): Remove most of the fields.
(gcov_position): Implement with ftell.
(gcov_rewrite): Remove setting of start and offset fields.
(from_file): Re-format.
(gcov_open): Remove setbuf call. It should not be needed.
(gcov_close): Remove internal buffer handling.
(gcov_magic): Use __builtin_bswap32.
(gcov_write_counter): Use directly gcov_write_unsigned.
(gcov_write_string): Use direct fwrite and do not round
to 4 bytes.
(gcov_seek): Use directly fseek.
(gcov_write_tag): Use gcov_write_unsigned directly.
(gcov_write_length): Likewise.
(gcov_write_tag_length): Likewise.
(gcov_read_bytes): Use directly fread.
(gcov_read_unsigned): Use gcov_read_words.
(gcov_read_counter): Likewise.
(gcov_read_string): Use gcov_read_bytes.
* gcov-io.h (GCOV_WORD_SIZE): Adjust to reflect
that size is not in bytes, but words (4B).
(GCOV_TAG_FUNCTION_LENGTH): Likewise.
(GCOV_TAG_ARCS_LENGTH): Likewise.
(GCOV_TAG_ARCS_NUM): Likewise.
(GCOV_TAG_COUNTER_LENGTH): Likewise.
(GCOV_TAG_COUNTER_NUM): Likewise.
(GCOV_TAG_SUMMARY_LENGTH): Likewise.

libgcc/ChangeLog:

* libgcov-driver.c: Fix GNU coding style.
---
 gcc/gcov-io.c   | 282 +---
 gcc/gcov-io.h   |  17 ++-
 libgcc/libgcov-driver.c |   6 +-
 3 files changed, 76 insertions(+), 229 deletions(-)

diff --git a/gcc/gcov-io.c b/gcc/gcov-io.c index 80c9082a649..bd2316dedab 100644
--- a/gcc/gcov-io.c
+++ b/gcc/gcov-io.c
@@ -27,40 +27,14 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 /* Routines declared in gcov-io.h.  This file should be #included by
another source file, after having #included gcov-io.h.  */
 
-#if !IN_GCOV
-static void gcov_write_block (unsigned); -static gcov_unsigned_t 
*gcov_write_words (unsigned); -#endif -static const gcov_unsigned_t 
*gcov_read_words (unsigned); -#if !IN_LIBGCOV -static void gcov_allocate 
(unsigned); -#endif
-
-/* Optimum number of gcov_unsigned_t's read from or written to disk.  */ 
-#define GCOV_BLOCK_SIZE (1 << 10)
+static gcov_unsigned_t *gcov_read_words (void *buffer, unsigned);
 
 struct gcov_var
 {
   FILE *file;
-  gcov_position_t start;   /* Position of first byte of block */
-  unsigned offset; /* Read/write position within the block.  */
-  unsigned length; /* Read limit in the block.  */
-  unsigned overread;   /* Number of words overread.  */
   int error;   /* < 0 overflow, > 0 disk error.  */
-  int mode;/* < 0 writing, > 0 reading */
+  int mode;/* < 0 writing, > 0 reading.  */
   int endian;  /* Swap endianness.  */
-#if IN_LIBGCOV
-  /* Holds one block plus 4 bytes, thus all coverage reads & writes
- fit within this buffer and we always can transfer GCOV_BLOCK_SIZE
- to and from the disk. libgcov never backtracks and only writes 4
- or 8 byte objects.  */
-  gcov_unsigned_t buffer[GCOV_BLOCK_SIZE + 1]; -#else
-  /* Holds a variable length block, as the compiler can write
- strings and needs to backtrack.  */
-  size_t alloc;
-  gcov_unsigned_t *buffer;
-#endif
 } gcov_var;
 
 /* Save the current position in the gcov file.  */ @@ -71,8 +45,7 @@ static 
inline  gcov_position_t  gcov_position (void)  {
-  gcov_nonruntime_assert (gcov_var.mode

Unable to build Ada on fedora

2021-06-16 Thread Andrew MacLeod via Gcc-patches


I've been unable to build ada on my fedora box since:

commit abcf5174979bcb91ac4c921eaa19a5b37f231ae4
Author: Arnaud Charlet 
Date:   Wed Jan 13 08:49:15 2021 -0500

    [Ada] Use runtime from base compiler during stage1

    gcc/ada/

    * Make-generated.in: Add rule to copy runtime files needed
    during stage1.
    * raise.c: Remove obsolete symbols used during bootstrap.
    * gcc-interface/Make-lang.in: Do not use libgnat sources during
    stage1.
    (GNAT_ADA_OBJS, GNATBIND_OBJS): Split in two parts, the common
    part and the part only used outside of stage1.
    (ADA_GENERATED_FILES): Add runtime files needed during 
bootstrap

    when recent APIs are needed.
    (ada/b_gnatb.adb): Remove prerequisite.
    * gcc-interface/system.ads: Remove obsolete entries.

It fails on fedora 28 and fedora 31, configured with

--verbose --enable-languages=c,c++,fortran,ada,obj-c++,jit 
--enable-host-shared


Im going to guess its got something to do with the --enable-host-shared 
since the error I'm seeing is:


ada/gnat1drv.o ada/b_gnat1.o libbackend.a main.o libcommon-target.a 
libcommon.a ../libcpp/libcpp.a ../libdecnumber/libdecnumber.a attribs.o -g \
  libcommon-target.a libcommon.a ../libcpp/libcpp.a 
../libbacktrace/.libs/libbacktrace.a ../libiberty/pic/libiberty.a 
../libdecnumber/libdecnumber.a   -lisl -lmpc -lmpfr -lgmp -rdynamic 
-ldl  -L./../zlib -lz 
/usr/lib/gcc/x86_64-redhat-linux/9/adalib/libgnat.a -ldl
g++: error: /usr/lib/gcc/x86_64-redhat-linux/9/adalib/libgnat.a: No such 
file or directory
make[3]: *** 
[/opt/notnfs/amacleod/master/gcc/gcc/ada/gcc-interface/Make-lang.in:740: 
gnat1] Error 1

make[3]: Leaving directory '/opt/notnfs/amacleod/master/build/gcc'
make[2]: *** [Makefile:4762: all-stage1-gcc] Error 2
make[2]: Leaving directory '/opt/notnfs/amacleod/master/build'
make[1]: *** [Makefile:26446: stage1-bubble] Error 2
make[1]: Leaving directory '/opt/notnfs/amacleod/master/build'
make: *** [Makefile:1000: all] Error 2

It cant find /usr/lib/gcc/x86_64-redhat-linux/9/adalib/libgnat.a

but there are .so files:

ls /usr/lib/gcc/x86_64-redhat-linux/9/adalib/libgnat*
/usr/lib/gcc/x86_64-redhat-linux/9/adalib/libgnat-9.so 
/usr/lib/gcc/x86_64-redhat-linux/9/adalib/libgnat_pic.a 
/usr/lib/gcc/x86_64-redhat-linux/9/adalib/libgnat.so


Andrew

Re: Ping: [PATCH 1/2] correct BB frequencies after loop changed

2021-06-16 Thread guojiufu via Gcc-patches

On 2021-06-15 12:57, guojiufu via Gcc-patches wrote:

On 2021-06-14 17:16, Jan Hubicka wrote:

On 5/6/2021 8:36 PM, guojiufu via Gcc-patches wrote:
> Gentle ping.
>
> Original message:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html
I think you need a more aggressive ping  :-)

OK for the trunk.  Sorry for the long delay.  I kept hoping someone 
else

would step in and look at it.

Sorry, the patch was on my todo list to think through for a while :(
It seems to me that both old and new code needs bit more work.  First
the exit loop frequency is set to

 prob = profile_probability::always ().apply_scale (1, new_est_niter + 
1);

which is only correct if the estimated number of iterations is 
accurate.
If we do not have profile feedback and trip count is not known 
precisely

in most cases it won't be.  We estimate loops to iterate about 3 times
and then niter_for_unrolled_loop will apply the capping to 5 
iterations

that is completely arbitrary.

Forcing exit probability to precise may then disable futher loop
optimizations since after the change we will think we know the loop
iterates 5 times and thus it is not worthy for loop opt (which is 
quite
oposite with the fact that we are just unrolling it thinking it is 
hot).

Thanks, understand your concern, both new and old code are assuming the
the number of iterations is accurate.
Maybe we could add code to reset exit probability for the case
where "!count_in.reliable_p ()".

Old code does
 1) scale body down so only one iteration is done
 2) set exit edge probability to be 1/(new_est_iter+1)
precisely
 3) scale up accoring to the 1/new_nonexit_prob
which would be correct if the nonexit probability was updated to
1-exit_probability but that does not seem to happen.

New code does

Yes, this is intended: we know that the enter-count should be
equal to the exit-count of one loop, and then the
"loop-body-count * exit-probability = exit-count".
Also, the entry count of the loop would not be changed before and after
one optimization (or slightly change,e.g. peeling count).

Based on this, we could adjust the loop body count according to
exit-count (or say enter-count) and exit-probability, when the
exit-probability is easy to estimate.

 1) give up when there are multiple exits.
I wonder how common this is - we do outer loop vectorizaiton

The computation in the new code is based on a single exit. This is
also a requirement of old code, and it would be true when run to here.

To support multiple exits, I'm thinking about the way to calculate the
count/probability for each basic_block and each exit edge.  While it 
seems
the count/prob may not scale up on the same ratio.  This is another 
reason

I give up these cases with multi-exits.

Any suggestions about supporting these cases?

BR,
Jiufu Guo

 2) adjust loop body count according to the exit
 3) updat profile of BB after the exit edge.

Why do you need:
+  if (current_ir_type () != IR_GIMPLE)
+update_br_prob_note (exit->src);

It is tree_transform_and_unroll_loop, so I think we should always have
IR_GIMPLE?

These two lines are added to "recompute_loop_frequencies" which can be 
used

in rtl, like the second patch of this:
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555872.html
Oh, maybe these two lines code would be put to 
tree_transform_and_unroll_loop

instead of common code recompute_loop_frequencies.

Thanks a lot for the review in your busy time!

BR.
Jiufu Guo

Honza

jeff

[pushed] c++: Tweak PR101029 fix

2021-06-16 Thread Jason Merrill via Gcc-patches

The case of an initializer with side effects for a zero-length array seems
extremely unlikely, but we should still return the right type in that case.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/101029

gcc/cp/ChangeLog:

* init.c (build_vec_init): Preserve the type of base.
---
 gcc/cp/init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/cp/init.c b/gcc/cp/init.c
index 622d6e9d0c5..4bd942f3f74 100644
--- a/gcc/cp/init.c
+++ b/gcc/cp/init.c
@@ -4226,7 +4226,7 @@ build_vec_init (tree base, tree maxindex, tree init,
 {
   /* Shortcut zero element case to avoid unneeded constructor synthesis.  
*/
   if (init && TREE_SIDE_EFFECTS (init))
-   base = build2 (COMPOUND_EXPR, void_type_node, init, base);
+   base = build2 (COMPOUND_EXPR, ptype, init, base);
   return base;
 }
 

base-commit: 6816a44dfe1b5fa9414490a18a4aa723b6f38f18
-- 
2.27.0

1 2 >

1 - 100 of 107 matches

Mail list logo