[Ping #3, Patch V2] Add conversions between _Float128 and Decimal.

2021-02-16 Thread Michael Meissner via Gcc-patches
Ping patch

| Subject: [PATCH, V2] Add conversions between _Float128 and Decimal.
| Message-ID: <20210209073505.ga11...@ibm-toto.the-meissners.org>
https://gcc.gnu.org/pipermail/gcc-patches/2021-February/565009.html

I was asked to resubmit the patch with the copyright dates fixed, and Will's
comments addressed.  I have done this (Will's comments other than the copyright
dates were all about the check-in message, and not the patch itself).  I have
attempted to make the check-in message clearer.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: Patch for PR analyzer/98797

2021-02-16 Thread brian.sobulefsky via Gcc-patches
David,

Two items I needs answered to get you the patch.

1) The get_field_at_bit_offset function is static in region.cc, so I cannot use 
it outside of the file (you want me to use it in store.cc) without updating 
that definition to extern. I am assuming you want this.

2) I am updating my direct access of the tree fields. I am using 
int_bit_position for the position of a field in a struct as you requested. What 
do I use for the size of the struct. I had been using 
-decl_common.size-int_cst[0].


Brian


Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐
On Saturday, February 13, 2021 9:53 AM, brian.sobulefsky 
 wrote:

 Hi, answers below. Note, do you want my resubmitted patch to be with
 commit --amend, and therefore relative to "real" commits in the history,
 or do you want me to chain it onto the last submission?

 I have already sent to assign at gnu for the form.

 I will update the commit message to call it s3 now.

 The case did not arise "in practice." After solving the xfail from the
 test suite, I tried dividing the array to have more than one field and
 found that this still did not work, so I tracked down why. I would
 argue that the case of a struct made of only a single array as its one
 field is a lot less likely to arise in practice. For what its worth,
 the single field example basically works "for the wrong
 reason."
 When get_binding_recursive goes up the parent chain, it finds
 a binding
 in the parent cast region because that cast is a struct made of
 only one
 field, and so its binding_key is the same as that field. The
 "right"
 logic would check for a field covering the desired region.

 I'll fix the ChangeLog formatting parentheses. I found the formatting
 script, but I did not have one of the imported python files so I just 
copied others.

 I can add the helper routine you have requested. Where would you like it,
 I am thinking analyzer.h so it is visible basically everywhere?

 I can add the check for size matching, which helper routine is easiest.

 Yes, in the event of *p == 'A', *p, having a constant zerop offset
 would get sent to get_cast_region, where it would become a cast region.
 This defeats the new logic in maybe_fold_sub_svalue. I think in the case
 where we are pointing to an array of the requested type, it is much more
 correct to return the offset region with a zero offset, as arr[0] should 
not
 be different than arr[1]. Sometimes the 0 tree would be of pointer_type,
 and I was not sure if this would defeat it or not, so I made sure it was
 of integer_type. This may just be a matter of my being new and not
 knowing for sure how everything works and so erring on the side of safety.

 As of now, the routine immediately rejects any case other than where
 reg is a field_region and parent is a cast_region. I will think if there
 is a C like syntax for the function, really it is checking if the original
 form of parent had a field covering the requested field.

 I will remove the first guard, leave the second, and try to reformat
 the latters into a similar rejection style.

 Currently, get_field_at_bit_offset is not an externally visible function.
 I am taking your instruction to reuse it to mean a prototype should
 be added to the relevant header (it would be region.h I think, as long as
 both region-model-manager.cc and store.cc include those).

 As you know, I am very new to gcc and so was happy when I could hack my
 way to this working at all. I already assumed my direct access was not
 correct. Most of what I did is based on "RTFS" since the manual does not
 really cover the right way to do these things. I will try the routine
 or otherwise macro you say.

 I will change the testfile to leave pre exisiting lines in place.

 Sent with ProtonMail Secure Email.

 ‐‐‐ Original Message ‐‐‐
 On Friday, February 12, 2021 4:16 PM, David Malcolm dmalc...@redhat.com 
wrote:

  On Wed, 2021-02-10 at 19:42 +, brian.sobulefsky wrote:
  Hi Brian
  Thanks for the patch.
  The patch is large enough to count as legally significant; I've sent
  you information on copyright assignment separately; the patch can't be
  committed until that paperwork is in place.
  In the meantime, I've added some review notes inline below throughout:
 
   Address the bug found in test file 
gcc/testsuite/gcc.dg/analyzer/casts-1.c, as
   recorded by the XFAIL, and some simpler and more complex 
versions found during
   the investigation of it. PR analyzer/98797 was opened for 
these bugs.
  
   The simplest manifestation of this bug can be replicated 
with:
  
   char arr[] = {'A', 'B', 'C', 'D'};
   char *pt = ar;
   __analyzer_eval(*(pt + 0) == 'A');
   __analyzer_eval(*(pt + 1) == 'B');
   //etc
  
   The result of the eval call is "UNKNOWN" because the 
analyzer is unable to
   determine the value at the pointer. Note that array access 
(such as arr[0]) is
   correctly handled. The code responsible for this is in 

[PATCH] handle VLA of zero length arrays and vice versa (PR 99121)

2021-02-16 Thread Martin Sebor via Gcc-patches

-Warray-bounds makes a couple of assumptions about arrays that hold
neither for VLAs of zero-length arrays nor for zero-length arrays
of VLAs.  The attached patch removes these assumptions and simplifies
the code that deals with them in the process.

This resolves a P2 GCC 9-11 regression so I'm looking for approval
to commit this fix to both release branches as well as the trunk.

Tested on x86_64-linux.

Martin
PR tree-optimization/99121 - ICE in -Warray-bounds on a VLA of zero-length array

gcc/ChangeLog:

	PR tree-optimization/99121
	* gimple-array-bounds.cc (array_bounds_checker::check_mem_ref):
	Avoid assuming array element size is constant.  Handle zero-length
	arrays of VLAs.

gcc/testsuite/ChangeLog:

	PR tree-optimization/99121
	* c-c++-common/Warray-bounds-9.c: New test.
	* gcc.dg/Warray-bounds-71.c: New test.

diff --git a/gcc/gimple-array-bounds.cc b/gcc/gimple-array-bounds.cc
index 2576556f76b..25b91f6e215 100644
--- a/gcc/gimple-array-bounds.cc
+++ b/gcc/gimple-array-bounds.cc
@@ -594,34 +594,32 @@ array_bounds_checker::check_mem_ref (location_t location, tree ref,
 	  || (DECL_EXTERNAL (arg) && array_at_struct_end_p (ref
 	return false;
 
-  /* FIXME: Should this be 1 for Fortran?  */
   arrbounds[0] = 0;
 
   if (TREE_CODE (reftype) == ARRAY_TYPE)
 	{
-	  /* Set to the size of the array element (and adjust below).  */
-	  eltsize = wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (reftype)));
-	  /* Use log2 of size to convert the array byte size in to its
-	 upper bound in elements.  */
-	  const offset_int eltsizelog2 = wi::floor_log2 (eltsize);
-	  if (tree dom = TYPE_DOMAIN (reftype))
+	  tree nelts = array_type_nelts (reftype);
+	  if (integer_all_onesp (nelts))
+	/* Zero length array.  */
+	eltsize = 0;
+	  else
 	{
-	  tree bnds[] = { TYPE_MIN_VALUE (dom), TYPE_MAX_VALUE (dom) };
-	  if (TREE_CODE (arg) == COMPONENT_REF)
-		{
-		  offset_int size = maxobjsize;
-		  if (tree fldsize = component_ref_size (arg))
-		size = wi::to_offset (fldsize);
-		  arrbounds[1] = wi::lrshift (size, eltsizelog2);
-		}
-	  else if (array_at_struct_end_p (arg) || !bnds[0] || !bnds[1])
-		arrbounds[1] = wi::lrshift (maxobjsize, eltsizelog2);
-	  else
-		arrbounds[1] = (wi::to_offset (bnds[1]) - wi::to_offset (bnds[0])
-+ 1) * eltsize;
+	  tree esz = TYPE_SIZE_UNIT (TREE_TYPE (reftype));
+	  if (TREE_CODE (esz) == INTEGER_CST)
+		/* Array element is not a VLA.  */
+		eltsize = wi::to_offset (esz);
 	}
+
+	  if (!array_at_struct_end_p (arg)
+	  && TREE_CODE (nelts) == INTEGER_CST)
+	arrbounds[1] = (wi::to_offset (nelts) + 1) * eltsize;
 	  else
-	arrbounds[1] = wi::lrshift (maxobjsize, eltsizelog2);
+	{
+	  /* Use log2 of size to convert the array byte size in to its
+		 upper bound in elements.  */
+	  const offset_int eltsizelog2 = wi::floor_log2 (eltsize);
+	  arrbounds[1] = wi::lrshift (maxobjsize, eltsizelog2);
+	}
 
 	  /* Determine a tighter bound of the non-array element type.  */
 	  tree eltype = TREE_TYPE (reftype);
diff --git a/gcc/testsuite/c-c++-common/Warray-bounds-9.c b/gcc/testsuite/c-c++-common/Warray-bounds-9.c
new file mode 100644
index 000..48ad16828f2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Warray-bounds-9.c
@@ -0,0 +1,55 @@
+/* PR tree-optimization/99121 - ICE in -Warray-bounds on a multidimensional
+   VLA
+   { dg-do compile }
+   { dg-options "-O2 -Wall -ftrack-macro-expansion=0" } */
+
+#define NOIPA __attribute__ ((noipa))
+
+void sink (void*, ...);
+#define T(a, x) sink (a, x)
+
+
+NOIPA void a_0_n (int n)
+{
+  int a[0][n];
+
+  sink (a);
+
+  T (a, ((int *) a)[0]);  // { dg-warning "\\\[-Warray-bounds" }
+  T (a, ((char *) a)[1]); // { dg-warning "\\\[-Warray-bounds" }
+  T (a, ((float *) a)[n]);// { dg-warning "\\\[-Warray-bounds" }
+}
+
+NOIPA void a_n_0 (int n)
+{
+  int a[n][0];
+
+  sink (a);
+
+  T (a, ((int *) a)[0]);  // { dg-warning "\\\[-Warray-bounds" }
+  T (a, ((char *) a)[1]); // { dg-warning "\\\[-Warray-bounds" }
+  T (a, ((float *) a)[n]);// { dg-warning "\\\[-Warray-bounds" }
+}
+
+
+NOIPA void f_2_n_0 (int n)
+{
+  int a[2][n][0];
+
+  sink (a);
+
+  T (a, ((int *) a)[0]);  // { dg-warning "\\\[-Warray-bounds" }
+  T (a, ((char *) a)[1]); // { dg-warning "\\\[-Warray-bounds" }
+  T (a, ((float *) a)[n]);// { dg-warning "\\\[-Warray-bounds" }
+}
+
+NOIPA void f_n_n_0 (int n)
+{
+  int a[n][n][0];
+
+  sink (a);
+
+  T (a, ((int *) a)[0]);  // { dg-warning "\\\[-Warray-bounds" }
+  T (a, ((char *) a)[1]); // { dg-warning "\\\[-Warray-bounds" }
+  T (a, ((float *) a)[n]);// { dg-warning "\\\[-Warray-bounds" }
+}
diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-71.c b/gcc/testsuite/gcc.dg/Warray-bounds-71.c
new file mode 100644
index 000..c2af9bef78c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Warray-bounds-71.c
@@ -0,0 +1,53 @@
+/* PR tree-optimization/99121 - ICE in -Warray-bounds on a 

[committed] c++: Revert EXPR_LOCATION change to build_aggr_init_expr [PR96997]

2021-02-16 Thread Patrick Palka via Gcc-patches
My change in r10-7718 to make build_aggr_init_expr set EXPR_LOCATION
(mimicking build_target_expr) causes the debuginfo regression PR96997.
Given that this change is mostly independent of the rest of the commit,
and that the only fallout of reverting it is a less accurate error
message location in a testcase introduced in the same commit, it seems
the best way forward is to just revert this part of the commit.

Bootstrapped and regtested on x86_64-pc-linux-gnu, committed to trunk
under the "obvious" rule, backport to the 10 branch pending.

gcc/cp/ChangeLog:

PR debug/96997
PR c++/94034
* tree.c (build_aggr_init_expr): Revert r10-7718 change.

gcc/testsuite/ChangeLog:

PR debug/96997
PR c++/94034
* g++.dg/cpp1y/constexpr-nsdmi7b.C:  Adjust expected location of
"call to non-'constexpr' function" error message.
---
 gcc/cp/tree.c  | 3 ---
 gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C | 4 ++--
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index e6ced274959..3c469750e9d 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -674,9 +674,6 @@ build_aggr_init_expr (tree type, tree init)
   else
 rval = init;
 
-  if (location_t loc = EXPR_LOCATION (init))
-SET_EXPR_LOCATION (rval, loc);
-
   return rval;
 }
 
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C
index 86d8ab4e759..ec10ddd2be8 100644
--- a/gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-nsdmi7b.C
@@ -20,8 +20,8 @@ bar()
 {
   A a = foo();
   a.p->n = 5;
-  return a; // { dg-error "non-.constexpr." }
-}
+  return a;
+} // { dg-error "non-.constexpr." }
 
 constexpr int
 baz()
-- 
2.30.1.489.g328c109303



[patch, fortran] PR96686 Namelist group objects shall be defined before appearing in namelist

2021-02-16 Thread Jerry DeLisle

Hi all,

Attached patch adds to checks.  In the case of IMPLICIT typing it checks 
to see if the objects listed in the NAMELIST have defined types andf if 
not, sets them to the default implicit types.


In the case of IMPLICIT NONE, the types are required be declared before 
the NAMELIST.   If an object type is found to not be declared already, 
an error is issued.


One new test case added and one modified to pass.

Regression tested.

OK for trunk?

Regards,

Jerry

fortran: Object types should be declared before use in NAMELIST.

gcc/fortran/ChangeLog:

    PR fortran/98686
    * match.c (gfc_match_namelist): Add checks for IMPLICIT NONE and
    whether the type for each namelist object has been defined before
    the namelist declaration.  For IMPLICIT, set the types so that
    any subsequent use of objects will have their types confirmed.

gcc/testsuite/ChangeLog:

    PR fortran/98686
    * gfortran.dg/namelist_4.f90: Modify.
    * gfortran.dg/namelist_98.f90: New test.

diff --git a/gcc/fortran/match.c b/gcc/fortran/match.c
index 2df6191d7e6..3a06f308812 100644
--- a/gcc/fortran/match.c
+++ b/gcc/fortran/match.c
@@ -5536,6 +5536,27 @@ gfc_match_namelist (void)
 	  if (m == MATCH_ERROR)
 	goto error;
 
+	  if (!gfc_current_ns->seen_implicit_none)
+	{
+	  /* If the type is not set already, we set it here to the
+		 implicit default type.  It is not allowed to set it
+		 later to any other type.  */
+	  if (sym->ts.type == BT_UNKNOWN)
+		gfc_set_default_type (sym, 0, gfc_current_ns);
+	}
+	  else
+	{
+	  /* It is required that members of a namelist be declared
+		 before the namelist.  We check this by checking if the
+		 symbol has a defined type for IMPLICIT NONE.  */
+	  if (sym->ts.type == BT_UNKNOWN)
+		{
+		  gfc_error ("Symbol %qs in namelist %qs at %C must be "
+			 "declared before the namelist is declared.",
+			 sym->name, group_name->name);
+		  gfc_error_check ();
+		}
+	}
 	  if (sym->attr.in_namelist == 0
 	  && !gfc_add_in_namelist (>attr, sym->name, NULL))
 	goto error;
diff --git a/gcc/testsuite/gfortran.dg/namelist_4.f90 b/gcc/testsuite/gfortran.dg/namelist_4.f90
index 538bceaa4b6..4e021253f01 100644
--- a/gcc/testsuite/gfortran.dg/namelist_4.f90
+++ b/gcc/testsuite/gfortran.dg/namelist_4.f90
@@ -27,14 +27,14 @@ END module M1
 program P1
 CONTAINS
 ! This has the additional wrinkle of a reference to the object.
+  INTEGER FUNCTION F2()
+F2=1
+  END FUNCTION
   INTEGER FUNCTION F1()
 NAMELIST /NML3/ F2 ! { dg-error "PROCEDURE attribute conflicts" }
 ! Used to ICE here
-f2 = 1 ! { dg-error "is not a VALUE" }
+f2 = 1 ! { dg-error "is not a variable" }
 F1=1
   END FUNCTION
-  INTEGER FUNCTION F2()
-F2=1
-  END FUNCTION
 END
 
diff --git a/gcc/testsuite/gfortran.dg/namelist_98.f90 b/gcc/testsuite/gfortran.dg/namelist_98.f90
new file mode 100644
index 000..19a7e869f92
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/namelist_98.f90
@@ -0,0 +1,11 @@
+! { dg-do compile }
+! pr98686
+  implicit none
+  real:: x, m
+  namelist /NML/ x, m, q ! { dg-error "must be declared before the namelist*" }
+  integer :: q
+  x = 1.0
+  m = 2.0
+  q = 3
+  write(*, nml=NML)
+end


Re: [PATCH] RISC-V: Add implementation for builtin overflow

2021-02-16 Thread Jim Wilson
On Thu, Jan 21, 2021 at 2:25 AM Levy  wrote:

> Added implementation for builtin overflow detection, new patterns are
> listed below.
>

For rv32 SImode and rv64 DImode, the unsigned add/sub and signed/unsigned
mul patterns seem to give the same result as the default code generation.
That has me wondering if we really need the patterns.

For rv64 SImode, the signed add/sub patterns are generating worse code. Signed
add overflow without the pattern is
add a1,a0,a1
sext.w  a0,a1
bne a1,a0,.L23
and with the pattern is
mv  a5,a0
addwa0,a0,a1
slt a5,a0,a5
sltia1,a1,0
bne a1,a5,.L27
ignoring the register allocation issue we have one more instruction using the
pattern which is undesirable.  This needs a fix.  We could just use X instead
of GPR.  We could then optionally add rv64 SImode patterns.  If we don't
add them it looks like a bug, so I think we should add another pattern for
this.

For rv64 SImode, the unsigned add/sub patterns are generating better code.  We
have unnecessary zero extends without the patterns.  This suggests that the
unsigned add/sub patterns really are useful.  Only 1 of the 3 expansions is
useful, but if we only implement one of the 3 expansions that looks like a
bug so I think this is OK as is.

The signed/unsigned mul patterns only support rv32 SImode and rv64 DImode.
The rv64 SImode support is missing.  But even though there is no pattern
for it we can still get worse code for it.  An unpatched tree for signed
mul gives
mul a1,a0,a1
sext.w  a0,a1
bne a1,a0,.L37
and a patched tree gives
mul a0,a0,a1
sraiw   a4,a0,31
sraia5,a0,32
bne a4,a5,.L43
On the other hand, an unpatched tree for unsigned mul gives
sllia0,a0,32
sllia1,a1,32
srlia0,a0,32
srlia1,a1,32
mul a5,a0,a1
sllia4,a5,32
srlia4,a4,32
bne a5,a4,.L61
and a patched tree gives
sllia0,a0,32
sllia1,a1,32
srlia0,a0,32
srlia1,a1,32
mul a0,a0,a1
sraia5,a0,32
bne a5,zero,.L69
which is one instruction shorter.  In both cases, the difference is due to
the hook to set min precision to 32.  Presumably we need this hook for
add/sub, so we need to add the missing rv64 SImode signed mul pattern.
Since we have to have one of them, we may as well have all of them for
completeness.

There are a few minor style issues.

In the define_expands, the patterns aren't formatted properly.  An open
parenthesis should be aligned to an open paren at the same depth on the
line above.  If you are an emacs user, you can use M-x emacs-lisp-mode and
hit tab to align them.

In a define_expand the constraints aren't useful and shouldn't be
included.  In a define_expand that always emits its own RTL the pattern
isn't useful either.  Only the operand modes, numbers, and predicates are
useful.  This is why some define_expands only list the operands and don't
include the pattern.  But including the pattern is OK, it just isn't
required.  So the only real fix we need here is to drop the constraints.

I attached the testcase I used for testing purposes.  Every function should
be same size or smaller with the patch.

Jim
#include 

int
sub_add_p (int i, int j)
{
  int k;
  return __builtin_add_overflow_p (i, j, k);
}

int
sub_sub_p (int i, int j)
{
  int k;
  return __builtin_sub_overflow_p (i, j, k);
}

int
sub_mul_p (int i, int j)
{
  int k;
  return __builtin_mul_overflow_p (i, j, k);
}

long
sub_add_p_long (long i, long j)
{
  long k;
  return __builtin_add_overflow_p (i, j, k);
}

long
sub_sub_p_long (long i, long j)
{
  long k;
  return __builtin_sub_overflow_p (i, j, k);
}

long
sub_mul_p_long (long i, long j)
{
  long k;
  return __builtin_mul_overflow_p (i, j, k);
}

int
sub_add (int i, int j)
{
  int k;
  if (__builtin_sadd_overflow (i, j, ))
abort ();
  return k;
}

int
sub_sub (int i, int j)
{
  int k;
  if (__builtin_ssub_overflow (i, j, ))
abort ();
  return k;
}

int
sub_mul (int i, int j)
{
  int k;
  if (__builtin_smul_overflow (i, j, ))
abort ();
  return k;
}

int
sub_uadd (int i, int j)
{
  int k;
  if (__builtin_uadd_overflow (i, j, ))
abort ();
  return k;
}

int
sub_usub (int i, int j)
{
  int k;
  if (__builtin_usub_overflow (i, j, ))
abort ();
  return k;
}

int
sub_umul (int i, int j)
{
  int k;
  if (__builtin_umul_overflow (i, j, ))
abort ();
  return k;
}

long
sub_add_long (long i, long j)
{
  long k;
  if (__builtin_saddl_overflow (i, j, ))
abort ();
  return k;
}

long
sub_sub_long (long i, long j)
{
  long k;
  if (__builtin_ssubl_overflow (i, j, ))
abort ();
  return k;
}

long
sub_mul_long (long i, long j)
{
  long k;
  if (__builtin_smull_overflow (i, j, ))
abort ();
  return k;
}

long
sub_uadd_long (long i, long j)
{
  long k;
  if (__builtin_uaddl_overflow 

Re: [PATCH] Change semantics of -frecord-gcc-switches and add -frecord-gcc-switches-format.

2021-02-16 Thread Qing Zhao via Gcc-patches
Hello,

What’s the status of this patch now? Is there any major technical issue with 
the patch?

Our company has been waiting for this patch for almost one year, we need it for 
our important application.

Could this one be approved and committed to gcc11?

Thanks.

Qing

> On Feb 5, 2021, at 3:34 AM, Martin Liška  wrote:
> 
> Hello.
> 
> Based on discussion with Richi, I'm re-sending the patch. Note that the patch
> has been waiting for a review for almost one year and I would like to see
> it in GCC 11.1.
> 
> Thank you,
> Martin
> <0001-Change-semantics-of-frecord-gcc-switches-and-add-fre.patch>



Go patch committed: Unalias receiver type in export data

2021-02-16 Thread Ian Lance Taylor via Gcc-patches
This patch to the Go frontend unaliases the receiver type when
outputting export data for a method.  This avoids a crash when
importing such a package.  The test case for this is
https://golang.org/cl/292009.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
91dbe2eee631df74bd96153aaf34d6f0da4b1a02
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 3175ba0e005..eed9ce01905 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-a5d7c4225fbbd06b97db6fa424cc0cb5191082d4
+c406de0594782b1d6782a732a50f5b76387852dc
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/gogo.cc b/gcc/go/gofrontend/gogo.cc
index 62b06be827b..38a2f6f8f9e 100644
--- a/gcc/go/gofrontend/gogo.cc
+++ b/gcc/go/gofrontend/gogo.cc
@@ -5898,7 +5898,7 @@ Function::export_func_with_type(Export* exp, const 
Named_object* no,
   exp->write_name(receiver->name());
   exp->write_escape(receiver->note());
   exp->write_c_string(" ");
-  exp->write_type(receiver->type());
+  exp->write_type(receiver->type()->unalias());
   exp->write_c_string(") ");
 }
 


Re: [PATCH] profiling: fix streaming of TOPN counters

2021-02-16 Thread Richard Biener via Gcc-patches
On February 16, 2021 7:32:16 PM GMT+01:00, "Martin Liška"  
wrote:
>Hello.
>
>As Honza noticed, when using GCC 11 during train run of Clang leads to
>very slow training run. Apparently, it's caused by a corrupted TOP N
>profile.
>The patch fixed that by preallocating a buffer that is latter stream
>out.
>
>With the patch, I can profiledbootstrap GCC and the instrumented clang
>finished
>in ~6 minutes its training run.
>Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
>Ready to be installed?

Looks like this can only shrink the race window. Don't you need atomic accesses 
or locking here? Can the number of counters change? IIRC the counters are 
dynamically allocated. Are they ever reallocated? 

>Thanks,
>Martin
>
>libgcc/ChangeLog:
>
>   PR gcov-profile/99105
>   * libgcov-driver.c (write_top_counters): Rename to ...
>   (write_topn_counters): ... this.
>   Make a temporary allocation of counters before streaming. That
>   helps in multi-threaded environment.
>   (write_one_data): Use renamed function.
>   * libgcov-merge.c (__gcov_merge_topn): Add assert about tracked
>   number of counters.
>---
>  libgcc/libgcov-driver.c | 40 ++--
>  libgcc/libgcov-merge.c  |  1 +
>  2 files changed, 31 insertions(+), 10 deletions(-)
>
>diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
>index e474e032b54..74c68ab04e2 100644
>--- a/libgcc/libgcov-driver.c
>+++ b/libgcc/libgcov-driver.c
>@@ -337,7 +337,7 @@ read_error:
>  /* Store all TOP N counters where each has a dynamic length.  */
>  
>  static void
>-write_top_counters (const struct gcov_ctr_info *ci_ptr,
>+write_topn_counters (const struct gcov_ctr_info *ci_ptr,
>   unsigned t_ix,
>   gcov_unsigned_t n_counts)
>  {
>@@ -347,22 +347,42 @@ write_top_counters (const struct gcov_ctr_info
>*ci_ptr,
>for (unsigned i = 0; i < counters; i++)
>  pair_total += ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i + 1];
>unsigned disk_size = GCOV_TOPN_DISK_COUNTERS * counters + 2 *
>pair_total;
>-  gcov_write_tag_length (GCOV_TAG_FOR_COUNTER (t_ix),
>-   GCOV_TAG_COUNTER_LENGTH (disk_size));
>+
>+  /* It can happen in multi-threaded environment that number of
>counters is smaller.
>+ Because of that we build a buffer which we stream latter in this
>function.  */
>+  gcov_type *buffer
>+= (gcov_type *) xmalloc (disk_size * sizeof (gcov_type));
>+  gcov_type *ptr = buffer;
>  
>for (unsigned i = 0; i < counters; i++)
>  {
>-  gcov_type pair_count = ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i
>+ 1];
>-  gcov_write_counter (ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i]);
>-  gcov_write_counter (pair_count);
>+  (*ptr++) = ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i];
>+  /* Skip pair count for now.  */
>+  gcov_type *pair_count_ptr = ptr;
>+  ++ptr;
>+
>+  unsigned pair_count = 0;
>  gcov_type start = ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i + 2];
>   for (struct gcov_kvp *node = (struct gcov_kvp *)(intptr_t)start;
>- node != NULL; node = node->next)
>+ node != NULL; node = node->next, ++pair_count)
>   {
>-gcov_write_counter (node->value);
>-gcov_write_counter (node->count);
>+(*ptr++) = node->value;
>+(*ptr++) = node->count;
>   }
>+
>+  *pair_count_ptr = pair_count;
>  }
>+
>+  unsigned int length = ptr - buffer;
>+  gcc_assert (length <= disk_size);
>+  gcov_write_tag_length (GCOV_TAG_FOR_COUNTER (t_ix),
>+   GCOV_TAG_COUNTER_LENGTH (length));
>+
>+  /* FIXME: use a block store  */
>+  for (unsigned i = 0; i < length; i++)
>+gcov_write_counter (buffer[i]);
>+
>+  free (buffer);
>  }
>  
>  /* Write counters in GI_PTR and the summary in PRG to a gcda file. In
>@@ -425,7 +445,7 @@ write_one_data (const struct gcov_info *gi_ptr,
> n_counts = ci_ptr->num;
>  
> if (t_ix == GCOV_COUNTER_V_TOPN || t_ix == GCOV_COUNTER_V_INDIR)
>-  write_top_counters (ci_ptr, t_ix, n_counts);
>+  write_topn_counters (ci_ptr, t_ix, n_counts);
> else
>   {
> /* Do not stream when all counters are zero.  */
>diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
>index 7db188a4f4c..3379b688128 100644
>--- a/libgcc/libgcov-merge.c
>+++ b/libgcc/libgcov-merge.c
>@@ -109,6 +109,7 @@ __gcov_merge_topn (gcov_type *counters, unsigned
>n_counters)
>  /* First value is number of total executions of the profiler.  */
>gcov_type all = gcov_get_counter_ignore_scaling (-1);
>gcov_type n = gcov_get_counter_ignore_scaling (-1);
>+  gcc_assert (n <= GCOV_TOPN_MAXIMUM_TRACKED_VALUES);
>  
>unsigned full = all < 0;
>gcov_type *total = [GCOV_TOPN_MEM_COUNTERS * i];



c++: directives-only preprocessing and include translation [PR 99050]

2021-02-16 Thread Nathan Sidwell
We make	sure files end in \n by	placing	one at the limit of the	buffer 
(just past the end of what is read).  We need to do the	same for buffers 
generated via include-translation.  Fortunately	they have space.


libcpp/
* files.c (_cpp_stack_file): Make buffers end in unread \n.
gcc/testsuite/
* g++.dg/modules/pr99050_a.H: New.
* g++.dg/modules/pr99050_b.C: New.

--
Nathan Sidwell
diff --git c/gcc/testsuite/g++.dg/modules/pr99050_a.H w/gcc/testsuite/g++.dg/modules/pr99050_a.H
new file mode 100644
index 000..137e37f567f
--- /dev/null
+++ w/gcc/testsuite/g++.dg/modules/pr99050_a.H
@@ -0,0 +1,4 @@
+// PR c++/99050 ICE with directives only
+// { dg-additional-options -fmodule-header }
+// { dg-module-cmi {} }
+void f ();
diff --git c/gcc/testsuite/g++.dg/modules/pr99050_b.C w/gcc/testsuite/g++.dg/modules/pr99050_b.C
new file mode 100644
index 000..439e216eb16
--- /dev/null
+++ w/gcc/testsuite/g++.dg/modules/pr99050_b.C
@@ -0,0 +1,7 @@
+// { dg-do preprocess }
+// { dg-additional-options {-fdirectives-only -fmodules-ts} }
+#include "pr99050_a.H"
+
+int main () {}
+
+// { dg-final { scan-file pr99050_b.i {import  "[^\n]*99050_a.H" \[\[__translated\]\];\n} }  }
diff --git c/libcpp/files.c w/libcpp/files.c
index 5ea3f8e1bf3..3a35f7c9743 100644
--- c/libcpp/files.c
+++ w/libcpp/files.c
@@ -918,13 +918,17 @@ _cpp_stack_file (cpp_reader *pfile, _cpp_file *file, include_type type,
 	 because we don't usually need that location (we're popping an
 	 include file).  However in this case we do want to do the
 	 increment.  So push a writable buffer of two newlines to acheive
-	 that.  */
-  static uchar newlines[] = "\n\n";
+	 that.  (We also need an extra newline, so this looks like a regular
+	 file, which we do that to to make sure we don't fall off the end in the
+	 middle of a line.  */
+  static uchar newlines[] = "\n\n\n";
   cpp_push_buffer (pfile, newlines, 2, true);
 
+  size_t len = strlen (buf);
+  buf[len] = '\n'; /* See above  */
   cpp_buffer *buffer
 	= cpp_push_buffer (pfile, reinterpret_cast (buf),
-			   strlen (buf), true);
+			   len, true);
   buffer->to_free = buffer->buf;
 
   file->header_unit = +1;


[PATCH v2] Add -fgnu-retain to place used symbols in SHF_GNU_RETAIN section

2021-02-16 Thread H.J. Lu via Gcc-patches
On Tue, Feb 16, 2021 at 8:45 AM Jakub Jelinek  wrote:
>
> On Mon, Feb 15, 2021 at 02:35:07PM -0800, H.J. Lu via Gcc-patches wrote:
> > When building Linux kernel, ld in bninutils 2.36 with GCC 11 generates
> > thousands of
> >
> > ld: warning: orphan section `.data.event_initcall_finish' from 
> > `init/main.o' being placed in section `.data.event_initcall_finish'
> > ld: warning: orphan section `.data.event_initcall_start' from `init/main.o' 
> > being placed in section `.data.event_initcall_start'
> > ld: warning: orphan section `.data.event_initcall_level' from `init/main.o' 
> > being placed in section `.data.event_initcall_level'
> >
> > Since these sections are marked with SHF_GNU_RETAIN, they are placed in
> > separate sections.  They become orphan sections since they aren't expected
> > in the Linux kernel linker script. But orphan sections normally don't work
> > well with the Linux kernel linker script and the resulting kernel crashed.
> >
> > Add -fgnu-retain to disable SHF_GNU_RETAIN for Linux kernel build with
> > -fno-gnu-retain.
>
> I'd say this shows that the changing of meaning of "used" attribute wasn't
> really a good idea, the Linux kernel certainly won't be the only problem
> and people use "used" attribute for many reasons and don't necessarily want
> the different sections or different behavior of those sections if they use
> it.
>
> So, can't we instead:
> 1) restore the old behavior of "used" by default
> 2) add "retain" attribute that implies the SHF_GNU_RETAIN stuff and fails
>if the configured assembler/linker doesn't support those
> 3) add a non-default option through which one could opt in for "used"
>attribute to imply retain attribute too for projects that want that?
>
> Jakub
>

Here is the updated patch to #1 and #3.  I will leave #2 to Jozef.

-- 
H.J.
From 1775ec951577325871b624d57a3341a983260cac Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Mon, 15 Feb 2021 11:31:12 -0800
Subject: [PATCH v2] Add -fgnu-retain to place used symbols in SHF_GNU_RETAIN
 section

When building Linux kernel, ld in bninutils 2.36 with GCC 11 generates
thousands of

ld: warning: orphan section `.data.event_initcall_finish' from `init/main.o' being placed in section `.data.event_initcall_finish'
ld: warning: orphan section `.data.event_initcall_start' from `init/main.o' being placed in section `.data.event_initcall_start'
ld: warning: orphan section `.data.event_initcall_level' from `init/main.o' being placed in section `.data.event_initcall_level'

Since these sections are marked with SHF_GNU_RETAIN, they are placed in
separate sections.  They become orphan sections since they aren't expected
in the Linux kernel linker script. But orphan sections normally don't work
well with the Linux kernel linker script and the resulting kernel crashed.

Add -fgnu-retain to place used symbols in SHF_GNU_RETAIN section and
issue a warning if SHF_GNU_RETAIN is not supported.

gcc/

	PR target/99113
	* common.opt: Add -fgnu-retain.
	* toplev.c (process_options): Issue a warning for -fgnu-retain
	without SUPPORTS_SHF_GNU_RETAIN.
	* varasm.c (get_section): Replace SUPPORTS_SHF_GNU_RETAIN with
	flag_gnu_retain.
	(resolve_unique_section): Likewise.
	(get_variable_section): Likewise.
	(switch_to_section): Likewise.
	* doc/invoke.texi: Document -fgnu-retain/-fno-gnu-retain.

gcc/testsuite/

	* c-c++-common/attr-used.c: Pass -fgnu-retain if supported.
	* c-c++-common/attr-used-2.c: Likewise.
	* c-c++-common/attr-used-3.c: Likewise.
	* c-c++-common/attr-used-4.c: Likewise.
	* c-c++-common/attr-used-5.c: Likewise.
	* c-c++-common/attr-used-6.c: Likewise.
	* c-c++-common/attr-used-7.c: Likewise.
	* c-c++-common/attr-used-8.c: Likewise.
	* c-c++-common/attr-used-9.c: Likewise.
	* gcc.c-torture/compile/attr-used-retain-1.c: Pass -fgnu-retain.
	* gcc.c-torture/compile/attr-used-retain-2.c: Likewise.
	* c-c++-common/pr99113.c: New test.
---
 gcc/common.opt   | 4 
 gcc/doc/invoke.texi  | 5 +
 gcc/testsuite/c-c++-common/attr-used-2.c | 1 +
 gcc/testsuite/c-c++-common/attr-used-3.c | 1 +
 gcc/testsuite/c-c++-common/attr-used-4.c | 1 +
 gcc/testsuite/c-c++-common/attr-used-5.c | 1 +
 gcc/testsuite/c-c++-common/attr-used-6.c | 1 +
 gcc/testsuite/c-c++-common/attr-used-7.c | 1 +
 gcc/testsuite/c-c++-common/attr-used-8.c | 1 +
 gcc/testsuite/c-c++-common/attr-used-9.c | 1 +
 gcc/testsuite/c-c++-common/attr-used.c   | 1 +
 gcc/testsuite/c-c++-common/pr99113.c | 7 +++
 gcc/testsuite/gcc.c-torture/compile/attr-used-retain-1.c | 1 +
 gcc/testsuite/gcc.c-torture/compile/attr-used-retain-2.c | 2 +-
 gcc/toplev.c | 7 +++
 gcc/varasm.c | 8 
 16 files changed, 38 insertions(+), 

Re: [PATCH 1/3] MIPS: add -mcompact-branches=prefer option

2021-02-16 Thread Maciej W. Rozycki
On Tue, 16 Feb 2021, Jeff Law via Gcc-patches wrote:

> I think this will be OK once the wording in patch 2/3 of this series is
> fixed.

 As I noted with 2/3 I would like to know what this extra complication is 
exactly needed for, and then whether we can't reuse the existing options.  
Once settled I think it would best be placed in the change description for 
posterity, so that discussions do not have to be chased like I had to with 
the original change (when we had a different policy on commit messages).

  Maciej


Re: [PATCH 2/3] MIPS: add builtime option for -mcompact-branches

2021-02-16 Thread Maciej W. Rozycki
On Tue, 16 Feb 2021, Jeff Law via Gcc-patches wrote:

> > diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> > index 4c38244ae58..6b9520569ba 100644
> > --- a/gcc/doc/install.texi
> > +++ b/gcc/doc/install.texi
> > @@ -1464,6 +1464,29 @@ systems that support conditional traps).
> >  Division by zero checks use the break instruction.
> >  @end table
> >  
> > +@item --with-compact-branches=@var{policy}
> > +Specify how the compiler should generate code for checking for
> > +division by zero.  This option is only supported on the MIPS target.
> > +The possibilities for @var{type} are:
> Is this really correct -- I would expect that the change affects
> branches in general, not just checks for division by zero.

 I wonder if we need to multiply the options here at all.  The original 
change:  was 
discussed here:  
in this respect:

On Mon, 17 Aug 2015, Matthew Fortune wrote:

> > > Compact branches are used based on a branch policy. The polices are:
> > >
> > > never: Only use delay slot branches
> > > optimal: Do whatever is best for the current architecture.  This will
> > >  generally mean that delay slot branches will be used if the delay
> > >  slot gets filled but otherwise a compact branch will be used. A
> > >  special case here is that JAL and J will not be used in R6 code
> > >  regardless of whether the delay slot could be filled.
> > > always: Never emit a delay slot form of a branch if a compact form exists.
> > > This policy cannot apply 100% as FP branches (and MSA branches 
> > > when
> > > committed) only have delay slot forms.
> > >
> > > These user choices are combined with the features available in the chosen
> > > architecture and, in particular, the optimal form will get handled like
> > > 'never' when there are no compact branches available and will get handled
> > > like 'always' when there are no delay slot branches available.
> > >
> > 
> > Why did you choose to make this a user-selectable option?  Why not always 
> > generated
> > optimal?
> > I don't have a strong opinion about it, but the options seem to complicate 
> > things and I'm
> > interested in your rationale.
> 
> This is down to micro-architecture decisions that different implementers may 
> make.
> Honestly, I have not absorbed all of the rationale behind choosing one form 
> over
> the other but our arch team have made enough comments about this to mean the 
> support
> in the compiler is worth the extra bit of effort. I can attempt a write-up of 
> some
> of the pipeline possibilities if you would like more detail but I'd probably 
> have to
> refresh my mind on this with our hardware teams.

 My understanding therefore is that the original assumption that `optimal' 
will serve people best is no longer true.

 First, I think it would be good if we knew why.  I find proliferating 
variants of defaults, especially for the less obvious cases, will cause 
user confusion as one won't know what code model to expect, especially as 
(please correct me if I am wrong) we don't actually provide a way to dump 
the compiler's overridable configuration defaults.

 Second, I wonder if it makes sense to just keep things simple, and rather 
than adding `prefer' (to stand for "`always' if available"), and possibly 
`avoid' (to stand for "`never' if available"), whether we shouldn't just 
relax the checks for `always' and `never', and let them through whether 
the architecture selected provides for the option chosen or not.

 Please note that in the discussion quoted Catherine has already raised a 
concern I agree with of adding a complication here, and now we would 
complicate it even further by not only adding a fourth choice, but another 
overridable configuration default as well.

  Maciej


Re: [PATCH 1/3] MIPS: add -mcompact-branches=prefer option

2021-02-16 Thread Jeff Law via Gcc-patches



On 2/5/21 2:53 AM, YunQiang Su wrote:
> For MIPSr6, we may wish to use compact-branches only.
> Currently, we have to use `always' option, while it is mark as conflict
> with pre-R6.
>   cc1: error: unsupported combination: ‘mips32r2’ -mcompact-branches=always
>
> It make some trouble for distributions to make -mcompact-branches=always
> default for R6 only.
>
> The new added `prefer' option:
>just ignored by pre-R6 target.
>do the same as `always' for R6+.
> ---
>  gcc/config/mips/mips-opts.h|  3 ++-
>  gcc/config/mips/mips.h |  6 --
>  gcc/config/mips/mips.opt   |  3 +++
>  gcc/doc/invoke.texi|  6 ++
>  gcc/testsuite/gcc.target/mips/compact-branches-8.c | 10 ++
>  gcc/testsuite/gcc.target/mips/compact-branches-9.c | 10 ++
>  6 files changed, 35 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/mips/compact-branches-8.c
>  create mode 100644 gcc/testsuite/gcc.target/mips/compact-branches-9.c
I think this will be OK once the wording in patch 2/3 of this series is
fixed.

It would be helpful if you could also include a ChangeLog entry.

Jeff



Re: [PATCH 1/2] MIPS: unaligned load: use SImode for SUBREG if OK (PR98996)

2021-02-16 Thread Jeff Law via Gcc-patches



On 2/14/21 6:33 PM, YunQiang Su wrote:
> It is found by ada s-pack96.adb ftbfs, due to 96bit load: 96 = 64 + 32.
> While the 32bit pair of l r is mark as SUBREG, so they are
> not in SImode, make it fail to find suitable insn.
>
> gcc/ChangeLog:
>
>* config/mips/mips.c (mips_expand_ext_as_unaligned_load):
>If TARGET_64BIT and dest is SUBREG, we check the width, if it
>equal to SImode, we use SImode operation, just like what we are
>doing for REG one.
> ---
>  gcc/ChangeLog  | 8 
>  gcc/config/mips/mips.c | 2 +-
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index ddf4c7f92d7..fb12eeb971d 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,11 @@
> +2021-02-15  YunQiang Su  
> +
> + PR target/98996
> + * config/mips/mips.c (mips_expand_ext_as_unaligned_load):
> + If TARGET_64BIT and dest is SUBREG, we check the width, if it
> + equal to SImode, we use SImode operation, just like what we are
> + doing for REG one.
Do you need to do any checking on the contents of the SUBREG?  ie, do
you need to know if you've got (subreg (reg)) vs (subreg (mem))

Similarly I'd expect you may need to look at the mode of the inner
object.  You could have a true subreg (outer mode is smaller than inner
mode), but you might also be able to have a paradoxical subreg (outer
mode is larger than inner mode).

Jeff



Re: [PATCH v2] c-family: ICE with assume_aligned attribute [PR99062]

2021-02-16 Thread Jeff Law via Gcc-patches



On 2/16/21 12:04 PM, Martin Sebor via Gcc-patches wrote:
> On 2/10/21 5:24 PM, Marek Polacek via Gcc-patches wrote:
>> On Wed, Feb 10, 2021 at 03:54:52PM -0700, Martin Sebor via
>> Gcc-patches wrote:
>>> On 2/10/21 3:33 PM, Marek Polacek wrote:
 We ICE in handle_assume_aligned_attribute since r271338 which added

 @@ -2935,8 +2936,8 @@ handle_assume_aligned_attribute (tree *node,
 tree name, tree args, int,
     /* The misalignment specified by the second argument
    must be non-negative and less than the alignment.  */
     warning (OPT_Wattributes,
 -  "%qE attribute argument %E is not in the range
 [0, %E)",
 -  name, val, align);
 +  "%qE attribute argument %E is not in the range
 [0, %wu]",
 +  name, val, tree_to_uhwi (align) - 1);
     *no_add_attrs = true;
     return NULL_TREE;
   }
 because align is INT_MIN and tree_to_uhwi asserts tree_fits_uhwi_p
 -- which
 ALIGN does not and the prior tree_fits_shwi_p check is fine with
 it, as
 well as the integer_pow2p check.

 Since neither of the arguments to assume_aligned can be negative, I've
 hoisted the tree_int_cst_sgn check.  And add the missing "argument"
 word to an existing warning.

 Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?
>>>
>>> Thanks for taking this on!  As I mentioned in the bug, I should
>>> relax the warning to understand that [x, y) is a half-open range
>>> so that these changes aren't necessary.
>>>
>>> I'm surprised that integer_pow2p() returns true for negative values.
>>> That seems like a trap for the unwary.  The comment above the function
>>> says:
>>>
>>>    Return 1 if EXPR is an integer constant that is a power of 2
>>>    (i.e., has only one bit on), or a location wrapper for such
>>>    a constant.
>>>
>>> but an "integer power of 2" isn't the same as "has only one bit
>>> on."  I would suggest to rename the function (independently of
>>> the fix for the ICE).  There aren't too many uses of it so it
>>> shouldn't be too intrusive.  I can do that for GCC 12 if no-one
>>> objects.
>>>
>>> Just one comment on the patch:
>>>

 gcc/c-family/ChangeLog:

 PR c++/99062
 * c-attribs.c (handle_assume_aligned_attribute): Check that the
 alignment argument is non-negative.  Tweak a warning message.

 gcc/testsuite/ChangeLog:

 PR c++/99062
 * gcc.dg/attr-assume_aligned-4.c: Adjust dg-warning.
 * g++.dg/ext/attr-assume-aligned.C: New test.
 ---
    gcc/c-family/c-attribs.c   | 12 ++--
    gcc/testsuite/g++.dg/ext/attr-assume-aligned.C |  5 +
    gcc/testsuite/gcc.dg/attr-assume_aligned-4.c   |  4 ++--
    3 files changed, 17 insertions(+), 4 deletions(-)
    create mode 100644 gcc/testsuite/g++.dg/ext/attr-assume-aligned.C

 diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
 index 84ec86b2091..e343429f934 100644
 --- a/gcc/c-family/c-attribs.c
 +++ b/gcc/c-family/c-attribs.c
 @@ -3536,7 +3536,15 @@ handle_assume_aligned_attribute (tree *node,
 tree name, tree args, int,
  if (!tree_fits_shwi_p (val))
    {
  warning (OPT_Wattributes,
 -   "%qE attribute %E is not an integer constant",
 +   "%qE attribute argument %E is not an integer constant",
 +   name, val);
 +  *no_add_attrs = true;
 +  return NULL_TREE;
 +    }
 +  else if (tree_int_cst_sgn (val) < 0)
 +    {
 +  warning (OPT_Wattributes,
 +   "%qE attribute argument %E must be non-negative",
   name, val);
>>>
>>> The phrasing here doesn't sound quite right.  For the test case it
>>> will print:
>>>
>>>    warning: 'assume_aligned' attribute argument -1 must be non-negative
>>>
>>> which isn't possible: -1 can't be non-negative.  I'd suggest either
>>> making that descriptive rather than prescriptive (along the lines
>>> of the other warnings):
>>>
>>>  warning (OPT_Wattributes,
>>>    "%qE attribute argument %E is not positive",
>>>    name, val);
>>>
>>> or referring to the positional argument instead.
>>
>> Good point, that was very poor wording.  Fixed in v2:
>>
>> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?
> Thanks, the rewording looks good to me!
And with that, this is fine for the trunk.

Thanks Martin & Marek.

jeff



Re: [PATCH v2] c-family: ICE with assume_aligned attribute [PR99062]

2021-02-16 Thread Martin Sebor via Gcc-patches

On 2/10/21 5:24 PM, Marek Polacek via Gcc-patches wrote:

On Wed, Feb 10, 2021 at 03:54:52PM -0700, Martin Sebor via Gcc-patches wrote:

On 2/10/21 3:33 PM, Marek Polacek wrote:

We ICE in handle_assume_aligned_attribute since r271338 which added

@@ -2935,8 +2936,8 @@ handle_assume_aligned_attribute (tree *node, tree name, 
tree args, int,
/* The misalignment specified by the second argument
   must be non-negative and less than the alignment.  */
warning (OPT_Wattributes,
-  "%qE attribute argument %E is not in the range [0, %E)",
-  name, val, align);
+  "%qE attribute argument %E is not in the range [0, %wu]",
+  name, val, tree_to_uhwi (align) - 1);
*no_add_attrs = true;
return NULL_TREE;
  }
because align is INT_MIN and tree_to_uhwi asserts tree_fits_uhwi_p -- which
ALIGN does not and the prior tree_fits_shwi_p check is fine with it, as
well as the integer_pow2p check.

Since neither of the arguments to assume_aligned can be negative, I've
hoisted the tree_int_cst_sgn check.  And add the missing "argument"
word to an existing warning.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?


Thanks for taking this on!  As I mentioned in the bug, I should
relax the warning to understand that [x, y) is a half-open range
so that these changes aren't necessary.

I'm surprised that integer_pow2p() returns true for negative values.
That seems like a trap for the unwary.  The comment above the function
says:

   Return 1 if EXPR is an integer constant that is a power of 2
   (i.e., has only one bit on), or a location wrapper for such
   a constant.

but an "integer power of 2" isn't the same as "has only one bit
on."  I would suggest to rename the function (independently of
the fix for the ICE).  There aren't too many uses of it so it
shouldn't be too intrusive.  I can do that for GCC 12 if no-one
objects.

Just one comment on the patch:



gcc/c-family/ChangeLog:

PR c++/99062
* c-attribs.c (handle_assume_aligned_attribute): Check that the
alignment argument is non-negative.  Tweak a warning message.

gcc/testsuite/ChangeLog:

PR c++/99062
* gcc.dg/attr-assume_aligned-4.c: Adjust dg-warning.
* g++.dg/ext/attr-assume-aligned.C: New test.
---
   gcc/c-family/c-attribs.c   | 12 ++--
   gcc/testsuite/g++.dg/ext/attr-assume-aligned.C |  5 +
   gcc/testsuite/gcc.dg/attr-assume_aligned-4.c   |  4 ++--
   3 files changed, 17 insertions(+), 4 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/ext/attr-assume-aligned.C

diff --git a/gcc/c-family/c-attribs.c b/gcc/c-family/c-attribs.c
index 84ec86b2091..e343429f934 100644
--- a/gcc/c-family/c-attribs.c
+++ b/gcc/c-family/c-attribs.c
@@ -3536,7 +3536,15 @@ handle_assume_aligned_attribute (tree *node, tree name, 
tree args, int,
 if (!tree_fits_shwi_p (val))
{
  warning (OPT_Wattributes,
-  "%qE attribute %E is not an integer constant",
+  "%qE attribute argument %E is not an integer constant",
+  name, val);
+ *no_add_attrs = true;
+ return NULL_TREE;
+   }
+  else if (tree_int_cst_sgn (val) < 0)
+   {
+ warning (OPT_Wattributes,
+  "%qE attribute argument %E must be non-negative",
   name, val);


The phrasing here doesn't sound quite right.  For the test case it
will print:

   warning: 'assume_aligned' attribute argument -1 must be non-negative

which isn't possible: -1 can't be non-negative.  I'd suggest either
making that descriptive rather than prescriptive (along the lines
of the other warnings):

  warning (OPT_Wattributes,
   "%qE attribute argument %E is not positive",
   name, val);

or referring to the positional argument instead.


Good point, that was very poor wording.  Fixed in v2:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/10?

Thanks, the rewording looks good to me!

Martin



-- >8 --
We ICE in handle_assume_aligned_attribute since r271338 which added

@@ -2935,8 +2936,8 @@ handle_assume_aligned_attribute (tree *node, tree name, 
tree args, int,
   /* The misalignment specified by the second argument
  must be non-negative and less than the alignment.  */
   warning (OPT_Wattributes,
-  "%qE attribute argument %E is not in the range [0, %E)",
-  name, val, align);
+  "%qE attribute argument %E is not in the range [0, %wu]",
+  name, val, tree_to_uhwi (align) - 1);
   *no_add_attrs = true;
   return NULL_TREE;
 }
because align is INT_MIN and tree_to_uhwi asserts tree_fits_uhwi_p -- which
ALIGN does not and the prior tree_fits_shwi_p check is fine with it, as
well as the integer_pow2p check.

Since neither of the 

Re: [PATCH] Make switchconv smarter.

2021-02-16 Thread Martin Liška

On 2/16/21 12:30 AM, Jeff Law wrote:

Given this is "just" a missed optimization, are we deferring to gcc-12?


Yes.

Thanks,
Martin


Re: [PATCH] libiberty: autogenerate aclocal.m4

2021-02-16 Thread Jeff Law via Gcc-patches



On 2/13/21 7:32 PM, Mike Frysinger via Gcc-patches wrote:
> Move custom macros to acinclude.m4 so we can autogenerate aclocal.m4
> with aclocal.  This matches every other project in the tree.
>
> libiberty/ChangeLog:
>
>   * Makefile.in (ACLOCAL, ACLOCAL_AMFLAGS, $(srcdir)/aclocal.m4): Define.
>   (configure_deps): Rename to ...
>   (aclocal_deps): ... this.  Replace aclocal.m4 with acinclude.m4.
>   ($(srcdir)/configure): Replace $(configure_deps) with
>   $(srcdir)/aclocal.m4.
>   * aclocal.m4: Move libiberty macros to acinclude.m4, then regenerate.
>   * acinclude.m4: New file.
>   * configure: Regenerate.
OK for the trunk.

Do you have commit privs or do you need someone to commit for you?

Thanks,
Jeff



Re: [PATCH RFA] cgraph: flatten and same_body aliases [PR96078]

2021-02-16 Thread Jeff Law via Gcc-patches



On 2/11/21 10:18 PM, Jason Merrill via Gcc-patches wrote:
> The patch for PR92372 made us start warning about a flatten attribute on an
> alias.  But in the case of C++ 'tor base/complete variants, the user didn't
> create the alias, so we shouldn't warn.
>
> I could also remove the attribute in maybe_clone_body, but here seems a bit
> better.
>
> Tested x86_64-pc-linux-gnu.  OK for trunk?
>
> gcc/ChangeLog:
>
>   PR c++/96078
>   * cgraph.c (cgraph_node::create_same_body_alias): Remove flatten
>   attribute from alias.
>
> gcc/testsuite/ChangeLog:
>
>   PR c++/96078
>   * g++.dg/ext/attr-flatten1.C: New test.
But shouldn't we validate that we've got a C++ ctor/dtor rather than
blindly removing the attribute from all aliases?  ISTM like the patch
as-is would always suppress warnings when we create a same-body alias
regardless of why we created a same-body alias.

jeff



Re: [PATCH 2/3] MIPS: add builtime option for -mcompact-branches

2021-02-16 Thread Jeff Law via Gcc-patches



On 2/5/21 2:53 AM, YunQiang Su wrote:
> For R6+, it allows to configure gcc to use compact branches only.
> ---
>  gcc/config.gcc   | 18 +-
>  gcc/doc/install.texi | 23 +++
>  2 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 17fea83b2e4..7d50e7995d2 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -4743,7 +4743,7 @@ case "${target}" in
>   ;;
>  
>   mips*-*-*)
> - supported_defaults="abi arch arch_32 arch_64 float fpu nan 
> fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 
> madd4"
> + supported_defaults="abi arch arch_32 arch_64 float fpu nan 
> fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 
> madd4 compact-branches"
>  
>   case ${with_float} in
>   "" | soft | hard)
> @@ -4896,6 +4896,22 @@ case "${target}" in
>   exit 1
>   ;;
>   esac
> +
> + case ${with_compact_branches} in
> + never | always | optimal | prefer)
> + if test "$with_compact_branches" = "always" -a \
> + "$default_mips_arch" != "mips32r6" -a  \
> + "$default_mips_arch" != "mips64r6";then
> + echo "Compact-branch=always is not allowed for 
> pre-R6" 1>&2
> + exit 1
> + fi
> + with_compact_branches=${with_compact_branches}
> + ;;
> + *)
> + echo "Unknown compact-branches policy used in 
> --with-compact-branches" 1>&2
> + exit 1
> + ;;
> + esac
>   ;;
>  
>   nds32*-*-*)
> diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
> index 4c38244ae58..6b9520569ba 100644
> --- a/gcc/doc/install.texi
> +++ b/gcc/doc/install.texi
> @@ -1464,6 +1464,29 @@ systems that support conditional traps).
>  Division by zero checks use the break instruction.
>  @end table
>  
> +@item --with-compact-branches=@var{policy}
> +Specify how the compiler should generate code for checking for
> +division by zero.  This option is only supported on the MIPS target.
> +The possibilities for @var{type} are:
Is this really correct -- I would expect that the change affects
branches in general, not just checks for division by zero.

jeff



Re: [PATCH 3/3] MIPS: fix compact-branches test FAIL for PIC default configuration

2021-02-16 Thread Jeff Law via Gcc-patches



On 2/5/21 2:53 AM, YunQiang Su wrote:
> GCC may be configured to use PIC by default, then the test with
> -mno-abicall may fail. Just add -fno-PIC option for it.
THanks.  Installed on the trunk.

jeff



[PATCH] profiling: fix streaming of TOPN counters

2021-02-16 Thread Martin Liška

Hello.

As Honza noticed, when using GCC 11 during train run of Clang leads to
very slow training run. Apparently, it's caused by a corrupted TOP N profile.
The patch fixed that by preallocating a buffer that is latter stream out.

With the patch, I can profiledbootstrap GCC and the instrumented clang finished
in ~6 minutes its training run.
Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

libgcc/ChangeLog:

PR gcov-profile/99105
* libgcov-driver.c (write_top_counters): Rename to ...
(write_topn_counters): ... this.
Make a temporary allocation of counters before streaming. That
helps in multi-threaded environment.
(write_one_data): Use renamed function.
* libgcov-merge.c (__gcov_merge_topn): Add assert about tracked
number of counters.
---
 libgcc/libgcov-driver.c | 40 ++--
 libgcc/libgcov-merge.c  |  1 +
 2 files changed, 31 insertions(+), 10 deletions(-)

diff --git a/libgcc/libgcov-driver.c b/libgcc/libgcov-driver.c
index e474e032b54..74c68ab04e2 100644
--- a/libgcc/libgcov-driver.c
+++ b/libgcc/libgcov-driver.c
@@ -337,7 +337,7 @@ read_error:
 /* Store all TOP N counters where each has a dynamic length.  */
 
 static void

-write_top_counters (const struct gcov_ctr_info *ci_ptr,
+write_topn_counters (const struct gcov_ctr_info *ci_ptr,
unsigned t_ix,
gcov_unsigned_t n_counts)
 {
@@ -347,22 +347,42 @@ write_top_counters (const struct gcov_ctr_info *ci_ptr,
   for (unsigned i = 0; i < counters; i++)
 pair_total += ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i + 1];
   unsigned disk_size = GCOV_TOPN_DISK_COUNTERS * counters + 2 * pair_total;
-  gcov_write_tag_length (GCOV_TAG_FOR_COUNTER (t_ix),
-GCOV_TAG_COUNTER_LENGTH (disk_size));
+
+  /* It can happen in multi-threaded environment that number of counters is 
smaller.
+ Because of that we build a buffer which we stream latter in this 
function.  */
+  gcov_type *buffer
+= (gcov_type *) xmalloc (disk_size * sizeof (gcov_type));
+  gcov_type *ptr = buffer;
 
   for (unsigned i = 0; i < counters; i++)

 {
-  gcov_type pair_count = ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i + 1];
-  gcov_write_counter (ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i]);
-  gcov_write_counter (pair_count);
+  (*ptr++) = ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i];
+  /* Skip pair count for now.  */
+  gcov_type *pair_count_ptr = ptr;
+  ++ptr;
+
+  unsigned pair_count = 0;
   gcov_type start = ci_ptr->values[GCOV_TOPN_MEM_COUNTERS * i + 2];
   for (struct gcov_kvp *node = (struct gcov_kvp *)(intptr_t)start;
-  node != NULL; node = node->next)
+  node != NULL; node = node->next, ++pair_count)
{
- gcov_write_counter (node->value);
- gcov_write_counter (node->count);
+ (*ptr++) = node->value;
+ (*ptr++) = node->count;
}
+
+  *pair_count_ptr = pair_count;
 }
+
+  unsigned int length = ptr - buffer;
+  gcc_assert (length <= disk_size);
+  gcov_write_tag_length (GCOV_TAG_FOR_COUNTER (t_ix),
+GCOV_TAG_COUNTER_LENGTH (length));
+
+  /* FIXME: use a block store  */
+  for (unsigned i = 0; i < length; i++)
+gcov_write_counter (buffer[i]);
+
+  free (buffer);
 }
 
 /* Write counters in GI_PTR and the summary in PRG to a gcda file. In

@@ -425,7 +445,7 @@ write_one_data (const struct gcov_info *gi_ptr,
  n_counts = ci_ptr->num;
 
 	  if (t_ix == GCOV_COUNTER_V_TOPN || t_ix == GCOV_COUNTER_V_INDIR)

-   write_top_counters (ci_ptr, t_ix, n_counts);
+   write_topn_counters (ci_ptr, t_ix, n_counts);
  else
{
  /* Do not stream when all counters are zero.  */
diff --git a/libgcc/libgcov-merge.c b/libgcc/libgcov-merge.c
index 7db188a4f4c..3379b688128 100644
--- a/libgcc/libgcov-merge.c
+++ b/libgcc/libgcov-merge.c
@@ -109,6 +109,7 @@ __gcov_merge_topn (gcov_type *counters, unsigned n_counters)
   /* First value is number of total executions of the profiler.  */
   gcov_type all = gcov_get_counter_ignore_scaling (-1);
   gcov_type n = gcov_get_counter_ignore_scaling (-1);
+  gcc_assert (n <= GCOV_TOPN_MAXIMUM_TRACKED_VALUES);
 
   unsigned full = all < 0;

   gcov_type *total = [GCOV_TOPN_MEM_COUNTERS * i];
--
2.30.0



Re: [committed][nvptx] Set -misa=sm_35 by default

2021-02-16 Thread Thomas Schwinge
Hi!

Tom, ping?


Also, I have now pushed to devel/omp/gcc-10 branch cherry-picks of Tom's
"[nvptx] Set -misa=sm_35 by default" in commit
d4b73f42df11282c7c170b55eb29cd8ff6b998d8, and Tobias' "nvptx -
invoke.texi: Update default of -misa" in commit
c77b42af4fa245ea87aee0ad55fa2b76611afb47, see attached, too.


Grüße
 Thomas


On 2021-02-05T17:10:01+0100, Thomas Schwinge  wrote:
> Hi Tom!
>
> Ping.
>
>
> Grüße
>  Thomas
>
>
> On 2021-01-22T16:23:57+0100, I wrote:
>> Hi Tom!
>>
>> Ping.
>>
>>
>> Grüße
>>  Thomas
>>
>>
>> On 2021-01-13T12:59:14+0100, I wrote:
>>> Hi Tom!
>>>
>>> On 2020-10-09T13:56:09+0200, Tom de Vries  wrote:
 The nvptx-as assembler verifies the ptx code using ptxas, if there's any
 in the PATH.

>>>
>>> After quite some digression to first add a testsuite to nvptx-tools (see
>>>  or just
>>> ), which
>>> I found advisable generally, and then given the kinds of changes we're
>>> now doing :-) -- I've now prepared nvptx-as code changes as discussed in
>>>  "nvptx-as
>>> should not assume a default sm version".  (Currently testing.)
>>>
 The default in the nvptx port for -misa=sm_xx is sm_30, but the ptxas of 
 the
 latest cuda release (11.1) no longer supports sm_30.

 Consequently we cannot build gcc against that release (although we should
 still be able to build without any cuda release).

 Fix this by setting -misa=sm_35 by default.

 Tested check-gcc on nvptx.

 Tested libgomp on x86_64-linux with nvpx accelerator.

 Both build against cuda 9.1.

 Committed to trunk.
>>>
>>> ACK.
>>>
>>> What is your opinion about backporting that (plus Tobias' documentation
>>> update, plus corresponding web 'changes.html' updates?) to release
>>> branches, so that nvptx offloading users may use GCC 10/9/8 with CUDA
>>> 11.0+?
>>>
>>> I don't think losing sm_30 support is a major concern: support for sm_35
>>> has been introduced with PTX ISA 3.1, CUDA 5.0, driver r302.
>>>
>>>
>>> Further:
>>>
 [nvptx] Set -misa=sm_35 by default
>>>
 PR target/97348
 * config/nvptx/nvptx.h (ASM_SPEC): Also pass -m to nvptx-as if
 default is used.
 * config/nvptx/nvptx.opt (misa): Init with PTX_ISA_SM35.
>>>
 --- a/gcc/config/nvptx/nvptx.h
 +++ b/gcc/config/nvptx/nvptx.h
>>>
 -#define ASM_SPEC "%{misa=*:-m %*}"
 +/* Default needs to be in sync with default for misa in nvptx.opt.
 +   We add a default here to work around a hard-coded sm_30 default in
 +   nvptx-as.  */
 +#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}"
>>>
 --- a/gcc/config/nvptx/nvptx.opt
 +++ b/gcc/config/nvptx/nvptx.opt
>>>
 +; Default needs to be in sync with default in ASM_SPEC in nvptx.h.
  misa=
 -Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
 Init(PTX_ISA_SM30)
 +Target RejectNegative ToLower Joined Enum(ptx_isa) Var(ptx_isa_option) 
 Init(PTX_ISA_SM35)
  Specify the version of the ptx ISA to use.
>>>
>>> As I'd suggested in
>>>  "nvptx-as
>>> should not assume a default sm version", I'd then push the attached
>>> "[nvptx] Let nvptx-as figure out the target architecture [PR97348]" to
>>> GCC master branch, OK?  (Currently testing.)
>>>
>>> That one I wouldn't backport to GCC release branches, so that we don't
>>> require users to update nvptx-tools for these builds.
>>>
>>>
>>> Grüße
>>>  Thomas


-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
>From eac0d3458f38cd5bb4c930b2887a547b64b046ef Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Wed, 13 Jan 2021 09:04:47 +0100
Subject: [PATCH] [nvptx] Let nvptx-as figure out the target architecture
 [PR97348]

... now that it has been enhanced to do so.

This is a follow-up to PR97348 commit 383400a6078d75bbfa1216c9af2c37f7e88740c9
"[nvptx] Set -misa=sm_35 by default".

	gcc/
	PR target/97348
	* config/nvptx/nvptx.h (ASM_SPEC): Don't set.
	* config/nvptx/nvptx.opt (misa): Adjust comment.
---
 gcc/config/nvptx/nvptx.h   | 5 -
 gcc/config/nvptx/nvptx.opt | 1 -
 2 files changed, 6 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.h b/gcc/config/nvptx/nvptx.h
index 2451703e77f..1a61e6207f6 100644
--- a/gcc/config/nvptx/nvptx.h
+++ b/gcc/config/nvptx/nvptx.h
@@ -29,11 +29,6 @@
 
 #define STARTFILE_SPEC "%{mmainkernel:crt0.o}"
 
-/* Default needs to be in sync with default for misa in nvptx.opt.
-   We add a default here to work around a hard-coded sm_30 default in
-   nvptx-as.  */
-#define ASM_SPEC "%{misa=*:-m %*; :-m sm_35}"
-
 #define TARGET_CPU_CPP_BUILTINS()		\
   do		\
 {		\
diff --git a/gcc/config/nvptx/nvptx.opt 

Re: [PATCH] Add -fgnu-retain/-fno-gnu-retain

2021-02-16 Thread Jakub Jelinek via Gcc-patches
On Mon, Feb 15, 2021 at 02:35:07PM -0800, H.J. Lu via Gcc-patches wrote:
> When building Linux kernel, ld in bninutils 2.36 with GCC 11 generates
> thousands of
> 
> ld: warning: orphan section `.data.event_initcall_finish' from `init/main.o' 
> being placed in section `.data.event_initcall_finish'
> ld: warning: orphan section `.data.event_initcall_start' from `init/main.o' 
> being placed in section `.data.event_initcall_start'
> ld: warning: orphan section `.data.event_initcall_level' from `init/main.o' 
> being placed in section `.data.event_initcall_level'
> 
> Since these sections are marked with SHF_GNU_RETAIN, they are placed in
> separate sections.  They become orphan sections since they aren't expected
> in the Linux kernel linker script. But orphan sections normally don't work
> well with the Linux kernel linker script and the resulting kernel crashed.
> 
> Add -fgnu-retain to disable SHF_GNU_RETAIN for Linux kernel build with
> -fno-gnu-retain.

I'd say this shows that the changing of meaning of "used" attribute wasn't
really a good idea, the Linux kernel certainly won't be the only problem
and people use "used" attribute for many reasons and don't necessarily want
the different sections or different behavior of those sections if they use
it.

So, can't we instead:
1) restore the old behavior of "used" by default
2) add "retain" attribute that implies the SHF_GNU_RETAIN stuff and fails
   if the configured assembler/linker doesn't support those
3) add a non-default option through which one could opt in for "used"
   attribute to imply retain attribute too for projects that want that?

Jakub



Re: [PATCH] Fix producer string memory leaks

2021-02-16 Thread Tom Tromey
> "Martin" == Martin Sebor via Gcc-patches  writes:

Martin> FWIW, I have prototyped
Martin> a simple string class over the weekend (based on auto_vec) that I'm
Martin> willing to contribute if std::string turns out to be out of favor.

I wonder whether GDB and GCC can or should collaborate in this area.

GDB has been using C++ for a while now, and we've encountered many of
these same transitional problems.  My sense is that, unlike GCC, GDB has
been a bit more aggressive about adopting C++11 style though.

For this particular area, GDB uses std::string pretty freely -- but not
universally, both due to its history as a C code base, but also because
std::string can be too heavy for some uses.  Not all code needs to carry
around the length, etc.

Pedro adapted the C++17 string_view from libstdc++, and this is used in
some spots in GDB.  See gdbsupport/gdb_string_view.h.

GDB also uses a unique_ptr specialization that wraps xmalloc/xfree.
See gdbsupport/gdb_unique_ptr.h.

So in GDB, instead of writing a new string-ish class, I suppose we'd
simply hook string_view up to unique_xmalloc_ptr or something
along those lines.

Martin> There is std::unique_ptr that we could use rather than rolling our
Martin> own.  That said, I don't think using std::unique_ptr over a string
Martin> class would be appropriate for things like local (string) variables
Martin> or return types of functions returning strings.

In GDB we do this kind of thing all the time.  The main idea is to
indicate ownership transfer via the type system.  This helps eliminate
comments like "the caller must free the result" -- the return of a
unique_ptr explains this directly.

>> But then there's the issue of introducing lifetime bugs because
>> you definitely need to have the pointer escape at points like
>> the printf ...

This is a valid concern in C++, but hasn't been a big practical issue in
GDB.

Tom


Re: [PATCH] Fix producer string memory leaks

2021-02-16 Thread Martin Sebor via Gcc-patches

On 2/15/21 7:39 AM, Richard Biener wrote:

On Mon, Feb 15, 2021 at 2:46 PM Martin Liška  wrote:


On 2/12/21 5:56 PM, Martin Sebor wrote:

On 2/12/21 2:09 AM, Richard Biener via Gcc-patches wrote:

On Thu, Feb 11, 2021 at 6:41 PM Martin Liška  wrote:


Hello.

This fixes 2 memory leaks I noticed.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?


OK.


Thanks,
Martin

gcc/ChangeLog:

  * opts-common.c (decode_cmdline_option): Release werror_arg.
  * opts.c (gen_producer_string): Release output of
  gen_command_line_string.
---
gcc/opts-common.c | 1 +
gcc/opts.c| 7 +--
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/gcc/opts-common.c b/gcc/opts-common.c
index 6cb5602896d..5e10edeb4cf 100644
--- a/gcc/opts-common.c
+++ b/gcc/opts-common.c
@@ -766,6 +766,7 @@ decode_cmdline_option (const char *const *argv, unsigned 
int lang_mask,
  werror_arg[0] = 'W';

  size_t warning_index = find_opt (werror_arg, lang_mask);
+  free (werror_arg);


Sorry to butt in here, but since we're having a discussion on this
same subject in another review of a fix for a memory leak, I thought
I'd pipe up: I would suggest a better (more in line with C++ and more
future-proof) solution is to replace the call to xstrdup (and the need
to subsequently call free) with std::string.


Hello.

To be honest, I like the suggested approach using std::string. On the other 
hand,
I don't want to mix existing C API (char *) with std::string.


That's one of the main problems.


I'm sending a patch that fixed the remaining leaks.


OK.




  if (warning_index != OPT_SPECIAL_unknown)
  {
const struct cl_option *warning_option
diff --git a/gcc/opts.c b/gcc/opts.c
index fc5f61e13cc..24bb64198c8 100644
--- a/gcc/opts.c
+++ b/gcc/opts.c
@@ -3401,8 +3401,11 @@ char *
gen_producer_string (const char *language_string, cl_decoded_option 
*options,
   unsigned int options_count)
{
-  return concat (language_string, " ", version_string, " ",
-gen_command_line_string (options, options_count), NULL);
+  char *cmdline = gen_command_line_string (options, options_count);
+  char *combined = concat (language_string, " ", version_string, " ",
+  cmdline, NULL);
+  free (cmdline);
+  return combined;
}


Here, changing gen_command_line_string() to return std::string instead
of a raw pointer would similarly avoid having to remember to free
the pointer (and forgetting).  The function has two other callers,
both in gcc/toplev.c, and both also leak the memory for the same
reason as here.


Btw. have we made a general consensus that using std::string is fine in the
GCC internals?


No, we didn't.  But if Martin likes RAII adding sth like


It's not so much what I like as about using established C++ idioms
to help us avoid common memory management mistakes (leaks, dangling
pointers, etc.)

With respect to the C++ standard library, the GCC Coding Conventions
say:

  Use of the standard library is permitted.  Note, however, that it
  is currently not usable with garbage collected data.

So as a return value of a function, or as a local variable, using
std::string seems entirely appropriate, and I have been using it
that way.

Richard, if that's not in line with your understanding of
the intent of the text in the conventions or with your expectations,
please propose a change for everyone's consideration.  If a consensus
emerges not to use std::string in GCC (or any other part of the C++
library) let's update the coding conventions.  FWIW, I have prototyped
a simple string class over the weekend (based on auto_vec) that I'm
willing to contribute if std::string turns out to be out of favor.


template 
class auto_free
{
auto_free (T *ptr) : m_ptr (ptr) {};
   ~auto_free () { m_ptr->~T (); free (m_ptr); }
   T  *m_ptr;
};

with appropriate move CTORs to support returning this
should be straight-forward.  You should then be able to
change the return type from char * to auto_free or so.


There is std::unique_ptr that we could use rather than rolling our
own.  That said, I don't think using std::unique_ptr over a string
class would be appropriate for things like local (string) variables
or return types of functions returning strings.  (GCC garbage
collection means std::string might not be suitable as a member of
persistent data structures).

Martin



But then there's the issue of introducing lifetime bugs because
you definitely need to have the pointer escape at points like
the printf ...

Richard.


Martin



Martin



#if CHECKING_P
--
2.30.0









Re: [Patch] Fortran: %re/%im fixes for OpenMP/OpenACC + gfc_is_simplify_contiguous

2021-02-16 Thread Paul Richard Thomas via Gcc-patches
Hi Tobias,

I think that constitutes an 'obvious' patch. OK for mainline.

Thanks

Paul


On Mon, 8 Feb 2021 at 18:53, Tobias Burnus  wrote:

> Found when looking at Julian's 3/4 OpenACC patch, which has not
> yet been committed (and needs still to be revised a bit.)
>
> The fix (a) avoids an ICE when Julian's patch has been applied.
> The patch (b) just makes one error message less confusing.
>
> The testcase shows that only %re/%im run reach the new code as
> %kind/%len are != EXPR_VARIABLE. (The error message is slightly
> misleading if the 'list item'/'var' is not a variable.)
>
>
> (a) We need to handle REF_INQUIRY gfc_is_simplify_contiguous.
>
> That function is used for 'is_contiguous(a(:)%re)', but it works
> without this patch and simplifies already to .false.
> And it is used in openmp.c, which can ICE without this patch.
>
> (b) Just makes the error message nicer - as only EXPR_VARIABLE
> reaches that code, only %re and %im are mentioned in the
> error message.
>
> (Actually, it is not completely clear whether %re/%im are invalid
> or not; I think they should be – but one can argue otherwise.
> For OpenMP I just saw that it is tacked internally in Issue 2661,
> for OpenACC it is tracked as Issue 346 – but neither has been
> discussed, yet.)
>
> OK for mainline?
>
> Tobias
>
> PS: '%re'/'%im' permit accessing the real/imaginary part of a
> complex variable as lvalue (in the C++ sense) and also permit
> "var(:)%re = 1.0", which real(z) or imag(z) does not permit.
>
> var%kind == kind(var), but derived types also permit
> parameterized derived types (PDT), which can use '%foo' for respective
> 'len' and 'kind' components.
>
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank
> Thürauf
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


[OG10] Reverted patch on OG10 branch

2021-02-16 Thread Moore, Catherine
FYI - I reverted this patch:

commit ea43db9372133e84f95bdb2c6934a5cc3db3d530
Author: Catherine Moore 
Date:   Sat Feb 13 09:04:44 2021 -0800

Revert "Enable gimplify GOMP_MAP_STRUCT handling of (COMPONENT_REF 
(INDIRECT_REF ...)) map clauses."

This reverts commit bf8605f14ec33ea31233a3567f3184fee667b695.

Regressions were reported for BZ 9574 - C++ class member mapping.


[PATCH] aarch64: Add internal tune flag to minimise VL-based scalar ops

2021-02-16 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

This patch introduces an internal tune flag to break up VL-based scalar ops
into a GP-reg scalar op with the VL read kept separate. This can be preferable 
on some CPUs.

I went for a tune param rather than extending the rtx costs as our RTX costs 
tables aren't set up to track
this intricacy.

I've confirmed that on the simple loop:
void vadd (int *dst, int *op1, int *op2, int count)
{
  for (int i = 0; i < count; ++i)
dst[i] = op1[i] + op2[i];
}

we now split the incw into a cntw outside the loop and the add inside.

+   cntwx5
...
loop:
-   incwx4
+   add x4, x4, x5

Bootstrapped and tested on aarch64-none-linux-gnu.
This is a minimally invasive fix to help the performance of just 
-mcpu=neoverse-v1 for
GCC 11 so I'd like to have it in stage4 if possible,
but I'd appreciate some feedback on the risk assessment of it.

Thanks,
Kyrill

gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def (cse_sve_vl_constants):
Define.
* config/aarch64/aarch64.md (add3): Force CONST_POLY_INT 
immediates
into a register when the above is enabled.
* config/aarch64/aarch64.c (neoversev1_tunings): 
AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS.
(aarch64_rtx_costs): Use AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS.

gcc/testsuite/

* gcc.target/aarch64/sve/cse_sve_vl_constants_1.c: New test.


cse_vl_constants.patch
Description: cse_vl_constants.patch


Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Tue, Feb 16, 2021 at 01:09:43PM +, Richard Sandiford wrote:
>> Can I put in a plea to put this in recog.[hc], and possibly also make
>> it a copy constructor for recog_data_d?  I can't think of any legitimate
>> cases in which we'd want to copy the whole structure, instead of just
>> the active parts.
>
> Good idea.  So like this?
>
> 2021-02-16  Jakub Jelinek  
>
>   PR target/99104
>   * recog.h (recog_data_d::recog_data_d ()): New defaulted default ctor.
>   (recog_data_d::recog_data_d (const recog_data_d &)): Declare.
>   (recog_data_d::operator = (const recog_data_d &)): Declare.
>   * recog.c (recog_data_d::operator = (const recog_data_d &)): New
>   assignment operator.
>   (recog_data_d::recog_data_d (const recog_data_d &)): New copy ctor.

OK, thanks.

Richard

>   * config/i386/i386.c (distance_non_agu_define): Don't call
>   extract_insn_cached here.
>   (ix86_lea_outperforms): Save and restore recog_data around call
>   to distance_non_agu_define and distance_agu_use.
>   (ix86_ok_to_clobber_flags): Remove.
>   (ix86_avoid_lea_for_add): Don't call ix86_ok_to_clobber_flags.
>   (ix86_avoid_lea_for_addr): Likewise.  Adjust function comment.
>   * config/i386/i386.m (*lea): Change from define_insn_and_split
>   into define_insn.  Move the splitting to define_peephole2 and
>   check there using peep2_regno_dead_p if FLAGS_REG is dead.
>
>   * gcc.dg/pr99104.c: New test.
>
> --- gcc/recog.h.jj2021-01-15 19:04:42.436445517 +0100
> +++ gcc/recog.h   2021-02-16 14:26:00.384934711 +0100
> @@ -372,6 +372,12 @@ struct recog_data_d
>  
>/* In case we are caching, hold insn data was generated for.  */
>rtx_insn *insn;
> +
> +  recog_data_d () = default;
> +  /* Copy constructor and assignment operator, so that when copying the
> + structure we copy only the active elements.  */
> +  recog_data_d (const recog_data_d &);
> +  recog_data_d  = (const recog_data_d &);
>  };
>  
>  extern struct recog_data_d recog_data;
> --- gcc/recog.c.jj2021-02-16 08:57:21.277960594 +0100
> +++ gcc/recog.c   2021-02-16 14:26:42.235464536 +0100
> @@ -109,7 +109,41 @@ init_recog (void)
>  {
>volatile_ok = 1;
>  }
> +
> +/* recog_data_d assignment operator, so that recog_data_d copying
> +   only copies the active elements.  */
> +
> +recog_data_d &
> +recog_data_d::operator = (const recog_data_d )
> +{
> +  n_operands = src.n_operands;
> +  n_dups = src.n_dups;
> +  n_alternatives = src.n_alternatives;
> +  is_asm = src.is_asm;
> +  insn = src.insn;
> +  for (int i = 0; i < src.n_operands; i++)
> +{
> +  operand[i] = src.operand[i];
> +  operand_loc[i] = src.operand_loc[i];
> +  constraints[i] = src.constraints[i];
> +  is_operator[i] = src.is_operator[i];
> +  operand_mode[i] = src.operand_mode[i];
> +  operand_type[i] = src.operand_type[i];
> +}
> +  for (int i = 0; i < src.n_dups; i++)
> +{
> +  dup_loc[i] = src.dup_loc[i];
> +  dup_num[i] = src.dup_num[i];
> +}
> +  return *this;
> +}
>  
> +/* recog_data_d copy constructor, so that recog_data_d construction
> +   only copies the active elements.  */
> +recog_data_d::recog_data_d (const recog_data_d )
> +{
> +  *this = src;
> +}
>  
>  /* Return true if labels in asm operands BODY are LABEL_REFs.  */
>  
> --- gcc/config/i386/i386.c.jj 2021-02-16 08:58:55.197890195 +0100
> +++ gcc/config/i386/i386.c2021-02-16 14:20:19.061764891 +0100
> @@ -14754,11 +14754,6 @@ distance_non_agu_define (unsigned int re
>   }
>  }
>  
> -  /* get_attr_type may modify recog data.  We want to make sure
> - that recog data is valid for instruction INSN, on which
> - distance_non_agu_define is called.  INSN is unchanged here.  */
> -  extract_insn_cached (insn);
> -
>if (!found)
>  return -1;
>  
> @@ -14928,17 +14923,15 @@ ix86_lea_outperforms (rtx_insn *insn, un
>return true;
>  }
>  
> -  rtx_insn *rinsn = recog_data.insn;
> +  /* Remember recog_data content.  */
> +  struct recog_data_d recog_data_save = recog_data;
>  
>dist_define = distance_non_agu_define (regno1, regno2, insn);
>dist_use = distance_agu_use (regno0, insn);
>  
> -  /* distance_non_agu_define can call extract_insn_cached.  If this function
> - is called from define_split conditions, that can break insn splitting,
> - because split_insns works by clearing recog_data.insn and then modifying
> - recog_data.operand array and match the various split conditions.  */
> -  if (recog_data.insn != rinsn)
> -recog_data.insn = NULL;
> +  /* distance_non_agu_define can call get_attr_type which can call
> + recog_memoized, restore recog_data back to previous content.  */
> +  recog_data = recog_data_save;
>  
>if (dist_define < 0 || dist_define >= LEA_MAX_STALL)
>  {
> @@ -14968,38 +14961,6 @@ ix86_lea_outperforms (rtx_insn *insn, un
>return dist_define >= dist_use;
>  }
>  
> 

Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 16, 2021 at 01:09:43PM +, Richard Sandiford wrote:
> Can I put in a plea to put this in recog.[hc], and possibly also make
> it a copy constructor for recog_data_d?  I can't think of any legitimate
> cases in which we'd want to copy the whole structure, instead of just
> the active parts.

Good idea.  So like this?

2021-02-16  Jakub Jelinek  

PR target/99104
* recog.h (recog_data_d::recog_data_d ()): New defaulted default ctor.
(recog_data_d::recog_data_d (const recog_data_d &)): Declare.
(recog_data_d::operator = (const recog_data_d &)): Declare.
* recog.c (recog_data_d::operator = (const recog_data_d &)): New
assignment operator.
(recog_data_d::recog_data_d (const recog_data_d &)): New copy ctor.
* config/i386/i386.c (distance_non_agu_define): Don't call
extract_insn_cached here.
(ix86_lea_outperforms): Save and restore recog_data around call
to distance_non_agu_define and distance_agu_use.
(ix86_ok_to_clobber_flags): Remove.
(ix86_avoid_lea_for_add): Don't call ix86_ok_to_clobber_flags.
(ix86_avoid_lea_for_addr): Likewise.  Adjust function comment.
* config/i386/i386.m (*lea): Change from define_insn_and_split
into define_insn.  Move the splitting to define_peephole2 and
check there using peep2_regno_dead_p if FLAGS_REG is dead.

* gcc.dg/pr99104.c: New test.

--- gcc/recog.h.jj  2021-01-15 19:04:42.436445517 +0100
+++ gcc/recog.h 2021-02-16 14:26:00.384934711 +0100
@@ -372,6 +372,12 @@ struct recog_data_d
 
   /* In case we are caching, hold insn data was generated for.  */
   rtx_insn *insn;
+
+  recog_data_d () = default;
+  /* Copy constructor and assignment operator, so that when copying the
+ structure we copy only the active elements.  */
+  recog_data_d (const recog_data_d &);
+  recog_data_d  = (const recog_data_d &);
 };
 
 extern struct recog_data_d recog_data;
--- gcc/recog.c.jj  2021-02-16 08:57:21.277960594 +0100
+++ gcc/recog.c 2021-02-16 14:26:42.235464536 +0100
@@ -109,7 +109,41 @@ init_recog (void)
 {
   volatile_ok = 1;
 }
+
+/* recog_data_d assignment operator, so that recog_data_d copying
+   only copies the active elements.  */
+
+recog_data_d &
+recog_data_d::operator = (const recog_data_d )
+{
+  n_operands = src.n_operands;
+  n_dups = src.n_dups;
+  n_alternatives = src.n_alternatives;
+  is_asm = src.is_asm;
+  insn = src.insn;
+  for (int i = 0; i < src.n_operands; i++)
+{
+  operand[i] = src.operand[i];
+  operand_loc[i] = src.operand_loc[i];
+  constraints[i] = src.constraints[i];
+  is_operator[i] = src.is_operator[i];
+  operand_mode[i] = src.operand_mode[i];
+  operand_type[i] = src.operand_type[i];
+}
+  for (int i = 0; i < src.n_dups; i++)
+{
+  dup_loc[i] = src.dup_loc[i];
+  dup_num[i] = src.dup_num[i];
+}
+  return *this;
+}
 
+/* recog_data_d copy constructor, so that recog_data_d construction
+   only copies the active elements.  */
+recog_data_d::recog_data_d (const recog_data_d )
+{
+  *this = src;
+}
 
 /* Return true if labels in asm operands BODY are LABEL_REFs.  */
 
--- gcc/config/i386/i386.c.jj   2021-02-16 08:58:55.197890195 +0100
+++ gcc/config/i386/i386.c  2021-02-16 14:20:19.061764891 +0100
@@ -14754,11 +14754,6 @@ distance_non_agu_define (unsigned int re
}
 }
 
-  /* get_attr_type may modify recog data.  We want to make sure
- that recog data is valid for instruction INSN, on which
- distance_non_agu_define is called.  INSN is unchanged here.  */
-  extract_insn_cached (insn);
-
   if (!found)
 return -1;
 
@@ -14928,17 +14923,15 @@ ix86_lea_outperforms (rtx_insn *insn, un
   return true;
 }
 
-  rtx_insn *rinsn = recog_data.insn;
+  /* Remember recog_data content.  */
+  struct recog_data_d recog_data_save = recog_data;
 
   dist_define = distance_non_agu_define (regno1, regno2, insn);
   dist_use = distance_agu_use (regno0, insn);
 
-  /* distance_non_agu_define can call extract_insn_cached.  If this function
- is called from define_split conditions, that can break insn splitting,
- because split_insns works by clearing recog_data.insn and then modifying
- recog_data.operand array and match the various split conditions.  */
-  if (recog_data.insn != rinsn)
-recog_data.insn = NULL;
+  /* distance_non_agu_define can call get_attr_type which can call
+ recog_memoized, restore recog_data back to previous content.  */
+  recog_data = recog_data_save;
 
   if (dist_define < 0 || dist_define >= LEA_MAX_STALL)
 {
@@ -14968,38 +14961,6 @@ ix86_lea_outperforms (rtx_insn *insn, un
   return dist_define >= dist_use;
 }
 
-/* Return true if it is legal to clobber flags by INSN and
-   false otherwise.  */
-
-static bool
-ix86_ok_to_clobber_flags (rtx_insn *insn)
-{
-  basic_block bb = BLOCK_FOR_INSN (insn);
-  df_ref use;
-  bitmap live;
-
-  while (insn)
-{
-  if 

*ping* – Re: [Patch] Fortran: %re/%im fixes for OpenMP/OpenACC + gfc_is_simplify_contiguous

2021-02-16 Thread Tobias Burnus

*PING*

this avoids an ICE in gfc_is_simplify_contiguous, which can be at least
triggered with OpenMP/OpenACC.

For %re/%im it outputs a nicer error than an ICE or misleading message.
[Whether %re may be used is still unclear, recently opened OpenMP +
OpenACC spec issues still have to be discussed.]

Tobias

On 09.02.21 12:41, Tobias Burnus wrote:

Hi Thomas, hi all

Updated patch. Changes: Testcases split + updated/extended.
OK for mainline?

Regarding the comments:

On 09.02.21 10:45, Thomas Schwinge wrote:

Thanks for filing/locating these discussion items for OpenACC/OpenMP
upstream.  May also put these references into the testcases, so that
once
these get addressed, we have something to 'grep' for in GCC?


Actually, they are already in the file. Alternative is to add them as
link, but I am not sure that's better. (I moved them now to the top.)


I note that 'zz' variants (see below) are not being checked for OpenMP


I have now added them; I had them before but as many checks triggered,
I thought the tests were not really worthwhile.


+!$acc update self(zz%re)
+!$acc update self(zz%im)
+end

And for OpenACC, the 'zz' variants do not emit this error message here.
(That's not immediately obvious to me.)


Answer: 'git add' missing. The reason is that with and without
Julian's patch, the error message is different. Without his patch,
the error is:
!$acc update self(zz%re) ! { dg-error "not a proper array section" }


I can see how data mapping of '[...]%re' etc. are problematic (we're
constructing an "incomplete object"?), but 'update' etc. I'd have
expected to work: would just copy the respective "part".

Granted. The array(:)%re access might update too much, but that's not
different to array with strides or with contiguous arrays sections
which contain component reference (and more than one component).

But that's more a question for the spec committee – if it is supposed
to work, the code needs to be updated.

Tobias




-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf


Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Uros Bizjak via Gcc-patches
On Tue, Feb 16, 2021 at 1:53 PM Jakub Jelinek  wrote:
>
> On Tue, Feb 16, 2021 at 12:39:39PM +0100, Uros Bizjak wrote:
> > > @@ -5197,7 +5209,7 @@ (define_insn_and_split "*lea"
> > >
> > >/* ix86_avoid_lea_for_addr re-recognizes insn and may
> > >   change operands[] array behind our back.  */
> > > -  pat = PATTERN (curr_insn);
> > > +  pat = PATTERN (peep2_next_insn (0));
> >
> > I remember there were several problems with the insn, so I introduced
> > the above hack. Probably, there is better solution for this...
>
> I'm afraid the define_peephole2 conditions are called with the same problem as
> define_split conditions, operands point into recog_data.operands.
>
> I wonder if instead of doing hacks like that we just couldn't save/restore
> recog_data variable around the
>   dist_define = distance_non_agu_define (regno1, regno2, insn);
>   dist_use = distance_agu_use (regno0, insn);
> calls.
>
> As in the following updated patch:
>
> 2021-02-16  Jakub Jelinek  
>
> PR target/99104
> * config/i386/i386.c (distance_non_agu_define): Don't call
> extract_insn_cached here.
> (copy_recog_data): New function.
> (ix86_lea_outperforms): Save and restore recog_data around call
> to distance_non_agu_define and distance_agu_use.
> (ix86_ok_to_clobber_flags): Remove.
> (ix86_avoid_lea_for_add): Don't call ix86_ok_to_clobber_flags.
> (ix86_avoid_lea_for_addr): Likewise.  Adjust function comment.
> * config/i386/i386.m (*lea): Change from define_insn_and_split
> into define_insn.  Move the splitting to define_peephole2 and
> check there using peep2_regno_dead_p if FLAGS_REG is dead.
>
> * gcc.dg/pr99104.c: New test.

IMO copy_recog_data fits into generic code (and I'm sure generic code
use this new helper).

OK w/ or w/o the above change.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2021-02-16 08:58:55.197890195 +0100
> +++ gcc/config/i386/i386.c  2021-02-16 13:47:15.500969539 +0100
> @@ -14754,11 +14754,6 @@ distance_non_agu_define (unsigned int re
> }
>  }
>
> -  /* get_attr_type may modify recog data.  We want to make sure
> - that recog data is valid for instruction INSN, on which
> - distance_non_agu_define is called.  INSN is unchanged here.  */
> -  extract_insn_cached (insn);
> -
>if (!found)
>  return -1;
>
> @@ -14897,6 +14892,32 @@ distance_agu_use (unsigned int regno0, r
>return distance >> 1;
>  }
>
> +/* Copy recog_data_d from SRC to DEST.  */
> +
> +static void
> +copy_recog_data (recog_data_d *dest, recog_data_d *src)
> +{
> +  dest->n_operands = src->n_operands;
> +  dest->n_dups = src->n_dups;
> +  dest->n_alternatives = src->n_alternatives;
> +  dest->is_asm = src->is_asm;
> +  dest->insn = src->insn;
> +  for (int i = 0; i < src->n_operands; i++)
> +{
> +  dest->operand[i] = src->operand[i];
> +  dest->operand_loc[i] = src->operand_loc[i];
> +  dest->constraints[i] = src->constraints[i];
> +  dest->is_operator[i] = src->is_operator[i];
> +  dest->operand_mode[i] = src->operand_mode[i];
> +  dest->operand_type[i] = src->operand_type[i];
> +}
> +  for (int i = 0; i < src->n_dups; i++)
> +{
> +  dest->dup_loc[i] = src->dup_loc[i];
> +  dest->dup_num[i] = src->dup_num[i];
> +}
> +}
> +
>  /* Define this macro to tune LEA priority vs ADD, it take effect when
> there is a dilemma of choosing LEA or ADD
> Negative value: ADD is more preferred than LEA
> @@ -14928,17 +14949,16 @@ ix86_lea_outperforms (rtx_insn *insn, un
>return true;
>  }
>
> -  rtx_insn *rinsn = recog_data.insn;
> +  /* Remember recog_data content.  */
> +  struct recog_data_d recog_data_copy;
> +  copy_recog_data (_data_copy, _data);
>
>dist_define = distance_non_agu_define (regno1, regno2, insn);
>dist_use = distance_agu_use (regno0, insn);
>
> -  /* distance_non_agu_define can call extract_insn_cached.  If this function
> - is called from define_split conditions, that can break insn splitting,
> - because split_insns works by clearing recog_data.insn and then modifying
> - recog_data.operand array and match the various split conditions.  */
> -  if (recog_data.insn != rinsn)
> -recog_data.insn = NULL;
> +  /* distance_non_agu_define can call get_attr_type which can call
> + recog_memoized, restore recog_data back to previous content.  */
> +  copy_recog_data (_data, _data_copy);
>
>if (dist_define < 0 || dist_define >= LEA_MAX_STALL)
>  {
> @@ -14968,38 +14988,6 @@ ix86_lea_outperforms (rtx_insn *insn, un
>return dist_define >= dist_use;
>  }
>
> -/* Return true if it is legal to clobber flags by INSN and
> -   false otherwise.  */
> -
> -static bool
> -ix86_ok_to_clobber_flags (rtx_insn *insn)
> -{
> -  basic_block bb = BLOCK_FOR_INSN (insn);
> -  df_ref use;
> -  bitmap live;
> -
> -  while (insn)
> -{
> -  if (NONDEBUG_INSN_P (insn))
> -   {
> - 

Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches  writes:
> @@ -14897,6 +14892,32 @@ distance_agu_use (unsigned int regno0, r
>return distance >> 1;
>  }
>  
> +/* Copy recog_data_d from SRC to DEST.  */
> +
> +static void
> +copy_recog_data (recog_data_d *dest, recog_data_d *src)
> +{
> +  dest->n_operands = src->n_operands;
> +  dest->n_dups = src->n_dups;
> +  dest->n_alternatives = src->n_alternatives;
> +  dest->is_asm = src->is_asm;
> +  dest->insn = src->insn;
> +  for (int i = 0; i < src->n_operands; i++)
> +{
> +  dest->operand[i] = src->operand[i];
> +  dest->operand_loc[i] = src->operand_loc[i];
> +  dest->constraints[i] = src->constraints[i];
> +  dest->is_operator[i] = src->is_operator[i];
> +  dest->operand_mode[i] = src->operand_mode[i];
> +  dest->operand_type[i] = src->operand_type[i];
> +}
> +  for (int i = 0; i < src->n_dups; i++)
> +{
> +  dest->dup_loc[i] = src->dup_loc[i];
> +  dest->dup_num[i] = src->dup_num[i];
> +}
> +}
> +
>  /* Define this macro to tune LEA priority vs ADD, it take effect when
> there is a dilemma of choosing LEA or ADD
> Negative value: ADD is more preferred than LEA

Can I put in a plea to put this in recog.[hc], and possibly also make
it a copy constructor for recog_data_d?  I can't think of any legitimate
cases in which we'd want to copy the whole structure, instead of just
the active parts.

Thanks,
Richard


Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 16, 2021 at 12:39:39PM +0100, Uros Bizjak wrote:
> > @@ -5197,7 +5209,7 @@ (define_insn_and_split "*lea"
> >
> >/* ix86_avoid_lea_for_addr re-recognizes insn and may
> >   change operands[] array behind our back.  */
> > -  pat = PATTERN (curr_insn);
> > +  pat = PATTERN (peep2_next_insn (0));
> 
> I remember there were several problems with the insn, so I introduced
> the above hack. Probably, there is better solution for this...

I'm afraid the define_peephole2 conditions are called with the same problem as
define_split conditions, operands point into recog_data.operands.

I wonder if instead of doing hacks like that we just couldn't save/restore
recog_data variable around the
  dist_define = distance_non_agu_define (regno1, regno2, insn);
  dist_use = distance_agu_use (regno0, insn);
calls.

As in the following updated patch:

2021-02-16  Jakub Jelinek  

PR target/99104
* config/i386/i386.c (distance_non_agu_define): Don't call
extract_insn_cached here.
(copy_recog_data): New function.
(ix86_lea_outperforms): Save and restore recog_data around call
to distance_non_agu_define and distance_agu_use.
(ix86_ok_to_clobber_flags): Remove.
(ix86_avoid_lea_for_add): Don't call ix86_ok_to_clobber_flags.
(ix86_avoid_lea_for_addr): Likewise.  Adjust function comment.
* config/i386/i386.m (*lea): Change from define_insn_and_split
into define_insn.  Move the splitting to define_peephole2 and
check there using peep2_regno_dead_p if FLAGS_REG is dead.

* gcc.dg/pr99104.c: New test.

--- gcc/config/i386/i386.c.jj   2021-02-16 08:58:55.197890195 +0100
+++ gcc/config/i386/i386.c  2021-02-16 13:47:15.500969539 +0100
@@ -14754,11 +14754,6 @@ distance_non_agu_define (unsigned int re
}
 }
 
-  /* get_attr_type may modify recog data.  We want to make sure
- that recog data is valid for instruction INSN, on which
- distance_non_agu_define is called.  INSN is unchanged here.  */
-  extract_insn_cached (insn);
-
   if (!found)
 return -1;
 
@@ -14897,6 +14892,32 @@ distance_agu_use (unsigned int regno0, r
   return distance >> 1;
 }
 
+/* Copy recog_data_d from SRC to DEST.  */
+
+static void
+copy_recog_data (recog_data_d *dest, recog_data_d *src)
+{
+  dest->n_operands = src->n_operands;
+  dest->n_dups = src->n_dups;
+  dest->n_alternatives = src->n_alternatives;
+  dest->is_asm = src->is_asm;
+  dest->insn = src->insn;
+  for (int i = 0; i < src->n_operands; i++)
+{
+  dest->operand[i] = src->operand[i];
+  dest->operand_loc[i] = src->operand_loc[i];
+  dest->constraints[i] = src->constraints[i];
+  dest->is_operator[i] = src->is_operator[i];
+  dest->operand_mode[i] = src->operand_mode[i];
+  dest->operand_type[i] = src->operand_type[i];
+}
+  for (int i = 0; i < src->n_dups; i++)
+{
+  dest->dup_loc[i] = src->dup_loc[i];
+  dest->dup_num[i] = src->dup_num[i];
+}
+}
+
 /* Define this macro to tune LEA priority vs ADD, it take effect when
there is a dilemma of choosing LEA or ADD
Negative value: ADD is more preferred than LEA
@@ -14928,17 +14949,16 @@ ix86_lea_outperforms (rtx_insn *insn, un
   return true;
 }
 
-  rtx_insn *rinsn = recog_data.insn;
+  /* Remember recog_data content.  */
+  struct recog_data_d recog_data_copy;
+  copy_recog_data (_data_copy, _data);
 
   dist_define = distance_non_agu_define (regno1, regno2, insn);
   dist_use = distance_agu_use (regno0, insn);
 
-  /* distance_non_agu_define can call extract_insn_cached.  If this function
- is called from define_split conditions, that can break insn splitting,
- because split_insns works by clearing recog_data.insn and then modifying
- recog_data.operand array and match the various split conditions.  */
-  if (recog_data.insn != rinsn)
-recog_data.insn = NULL;
+  /* distance_non_agu_define can call get_attr_type which can call
+ recog_memoized, restore recog_data back to previous content.  */
+  copy_recog_data (_data, _data_copy);
 
   if (dist_define < 0 || dist_define >= LEA_MAX_STALL)
 {
@@ -14968,38 +14988,6 @@ ix86_lea_outperforms (rtx_insn *insn, un
   return dist_define >= dist_use;
 }
 
-/* Return true if it is legal to clobber flags by INSN and
-   false otherwise.  */
-
-static bool
-ix86_ok_to_clobber_flags (rtx_insn *insn)
-{
-  basic_block bb = BLOCK_FOR_INSN (insn);
-  df_ref use;
-  bitmap live;
-
-  while (insn)
-{
-  if (NONDEBUG_INSN_P (insn))
-   {
- FOR_EACH_INSN_USE (use, insn)
-   if (DF_REF_REG_USE_P (use) && DF_REF_REGNO (use) == FLAGS_REG)
- return false;
-
- if (insn_defines_reg (FLAGS_REG, INVALID_REGNUM, insn))
-   return true;
-   }
-
-  if (insn == BB_END (bb))
-   break;
-
-  insn = NEXT_INSN (insn);
-}
-
-  live = df_get_live_out(bb);
-  return !REGNO_REG_SET_P (live, FLAGS_REG);
-}
-
 /* Return true if we need to 

[PATCH] tree-optimization/38474 - improve PTA varinfo sorting

2021-02-16 Thread Richard Biener
This improves a previous heuristic to sort address-taken variables
first (because those appear in points-to bitmaps) by tracking which
variables appear in ADDRESSOF constraints (there's also
graph->address_taken but that's computed only later).

This shaves off 30s worth of compile-time for the full testcase in
PR38474 (which then still takes 965s to compile at -O2).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

2021-02-16  Richard Biener  

PR tree-optimization/38474
* tree-ssa-structalias.c (variable_info::address_taken): New.
(new_var_info): Initialize address_taken.
(process_constraint): Set address_taken.
(solve_constraints): Use the new address_taken flag rather
than is_reg_var for sorting variables.
(dump_constraint): Dump the variable number if the name
is just NULL.
---
 gcc/tree-ssa-structalias.c | 25 ++---
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/gcc/tree-ssa-structalias.c b/gcc/tree-ssa-structalias.c
index d6b2661fa5e..529ec3a5b80 100644
--- a/gcc/tree-ssa-structalias.c
+++ b/gcc/tree-ssa-structalias.c
@@ -280,6 +280,9 @@ struct variable_info
   /* True if this represents a IPA function info.  */
   unsigned int is_fn_info : 1;
 
+  /* True if this appears as RHS in a ADDRESSOF constraint.  */
+  unsigned int address_taken : 1;
+
   /* ???  Store somewhere better.  */
   unsigned short ruid;
 
@@ -393,6 +396,7 @@ new_var_info (tree t, const char *name, bool add_id)
   ret->is_global_var = (t == NULL_TREE);
   ret->is_ipa_escape_point = false;
   ret->is_fn_info = false;
+  ret->address_taken = false;
   if (t && DECL_P (t))
 ret->is_global_var = (is_global_var (t)
  /* We have to treat even local register variables
@@ -674,7 +678,10 @@ dump_constraint (FILE *file, constraint_t c)
 fprintf (file, "&");
   else if (c->lhs.type == DEREF)
 fprintf (file, "*");
-  fprintf (file, "%s", get_varinfo (c->lhs.var)->name);
+  if (dump_file)
+fprintf (file, "%s", get_varinfo (c->lhs.var)->name);
+  else
+fprintf (file, "V%d", c->lhs.var);
   if (c->lhs.offset == UNKNOWN_OFFSET)
 fprintf (file, " + UNKNOWN");
   else if (c->lhs.offset != 0)
@@ -684,7 +691,10 @@ dump_constraint (FILE *file, constraint_t c)
 fprintf (file, "&");
   else if (c->rhs.type == DEREF)
 fprintf (file, "*");
-  fprintf (file, "%s", get_varinfo (c->rhs.var)->name);
+  if (dump_file)
+fprintf (file, "%s", get_varinfo (c->rhs.var)->name);
+  else
+fprintf (file, "V%d", c->rhs.var);
   if (c->rhs.offset == UNKNOWN_OFFSET)
 fprintf (file, " + UNKNOWN");
   else if (c->rhs.offset != 0)
@@ -3101,6 +3111,8 @@ process_constraint (constraint_t t)
   else
 {
   gcc_assert (rhs.type != ADDRESSOF || rhs.offset == 0);
+  if (rhs.type == ADDRESSOF)
+   get_varinfo (get_varinfo (rhs.var)->head)->address_taken = true;
   constraints.safe_push (t);
 }
 }
@@ -7288,15 +7300,14 @@ solve_constraints (void)
   unsigned int *map = XNEWVEC (unsigned int, varmap.length ());
   for (unsigned i = 0; i < integer_id + 1; ++i)
 map[i] = i;
-  /* Start with non-register vars (as possibly address-taken), followed
- by register vars as conservative set of vars never appearing in
- the points-to solution bitmaps.  */
+  /* Start with address-taken vars, followed by not address-taken vars
+ to move vars never appearing in the points-to solution bitmaps last.  */
   unsigned j = integer_id + 1;
   for (unsigned i = integer_id + 1; i < varmap.length (); ++i)
-if (! varmap[i]->is_reg_var)
+if (varmap[varmap[i]->head]->address_taken)
   map[i] = j++;
   for (unsigned i = integer_id + 1; i < varmap.length (); ++i)
-if (varmap[i]->is_reg_var)
+if (! varmap[varmap[i]->head]->address_taken)
   map[i] = j++;
   /* Shuffle varmap according to map.  */
   for (unsigned i = integer_id + 1; i < varmap.length (); ++i)
-- 
2.26.2


Re: [RFC] test builtin ratio for loop distribution

2021-02-16 Thread Richard Biener via Gcc-patches
On Tue, Feb 16, 2021 at 11:48 AM Alexandre Oliva  wrote:
>
> On Feb 16, 2021, Alexandre Oliva  wrote:
>
> >> So I wonder whether we should instead re-run CCP after loop opts which
> >> computes nonzero bits as well instead of the above "hack".
>
> That works.  It takes care of both the dest alignment and the len ctz.
>
> Explicitly masking out the len tz from nonzero bits also works, but I
> suppose the ccp pass does a better job, and it takes care of memcpy and
> memmove as well.

It's certainly more general, yes.

> I noticed that ccp releases CDI_DOMINATORS information.  It looks like
> the last copy_prop pass, now replaced with ccp in my tree, was late
> enough that it doesn't seem to matter, but at first I thought you meant
> an earlier copy_prop, that runs before graphite, and dropping dominator
> info at that point was a problem for e.g. cunroll.

We indeed shouldn't release CDI_DOMINATORS info, but if it is a problem
elsewhere then that pass fails to compute dominator info.  But yes,
in the loop pipeline we expect dominator info to be present (but fast
query can disable itself leading to issues if doms are not recomputed).

Richard.

> --
> Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
>Free Software Activist GNU Toolchain Engineer
> Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: [Patch] Fortran: Reject DT as fmt in I/O statments [PR99111]

2021-02-16 Thread Paul Richard Thomas via Gcc-patches
Hi Tobias,

The fix and the message are just fine for me. I am trying to imagine how
few people will ever encounter this!

OK for 10- and 11-branches

Thanks

Paul


On Tue, 16 Feb 2021 at 11:28, Tobias Burnus  wrote:

> GCC started to accept as legacy extension array-valued
> non-characters, which works for integer (+ real/logical).
>
> Some other – mostly unrelated patch regarding resolving
> array constructors (r10-5607-gde89b5748d68b76b, PR92996)
> turned this into an ICE.
>
> While it might be in theory fixable for derived types,
> just rejecting DT + CLASS + c_(fun)ptr makes more sense.
>
> I am not quite sure about the error-message wording.
> Better suggestions? In the 'rank > 0' branch, Hollerith
> should not occur and (as legacy extension) integer etc.
> are permitted. — Thoughts?
>
> Otherwise: OK for GCC 11 and GCC 10?
> (ICE is new in GCC 10.)
>
> Tobias
>
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank
> Thürauf
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Uros Bizjak via Gcc-patches
On Tue, Feb 16, 2021 at 12:05 PM Jakub Jelinek  wrote:
>
> On Tue, Feb 16, 2021 at 11:47:51AM +0100, Uros Bizjak wrote:
> > > In this case the match_scratch wouldn't work, since CC_REGNUM is fixed.
> > > But as you said on irc, there's peep2_regno_dead_p instead.
> > >
> > > Haven't tried it and so don't know whether it would work with only
> > > one construct in the first […].  But it seems like it would be a better
> > > fit, since peephole2 is designed to have up-to-date register information.
> > >
> > > Uros, does that sound reasonable, or is it a non-starter?
> >
> > We have some splits after peephole2 on x86, e.g. "convert impossible
> > pushes of immediate". There we try to get scratch register with
> > define_peephole2 and if this fails (or when peephole2 pass is not ran
> > at all with -O0), define_split after peephole2 splits the impossible
> > insn to some less optimal sequence.
> >
> > Another usage of late split pass is SSE partial register dependency
> > stall that should run after allocated registers won't change anymore.
> >
> > Please also note that peephole2 pass doesn't run with -O0.
>
> True, but it seems these ix86_avoid_lea_for_add{,r} splitters are only
> an optimization (mostly for the Atom/Bonell like CPUs).
> And actually looking at it, there are 2 separate cases.
>
> One is ix86_avoid_lea_for_add where I see no reason to check that
> flags are dead, because it is used in splitters:
> (define_split
>   [(set (match_operand:SWI48 0 "register_operand")
> (plus:SWI48 (match_operand:SWI48 1 "register_operand")
> (match_operand:SWI48 2 "x86_64_nonmemory_operand")))
>(clobber (reg:CC FLAGS_REG))]
>   "reload_completed && ix86_avoid_lea_for_add (insn, operands)"
> ...
> (define_split
>   [(set (match_operand:DI 0 "register_operand")
> (zero_extend:DI
>   (plus:SI (match_operand:SI 1 "register_operand")
>(match_operand:SI 2 "x86_64_nonmemory_operand"
>(clobber (reg:CC FLAGS_REG))]
>   "TARGET_64BIT
>&& reload_completed && ix86_avoid_lea_for_add (insn, operands)"
> where the splitter will only match if the flags reg is dead at that point.
> So, I think we can just keep those as splitters and just don't call the
> function that makes no sense in that case.

Agreed.

> And then there is the ix86_avoid_lea_for_addr case, where indeed we
> need to check if flags reg is dead at that point (i.e. if we can clobber
> it).  But the define_peephole2 pass seems to have much better infrastructure
> for that, the walking of the rest of the bb can be quadratic.
>
> The splitting in that case when optimizing (nobody should care about code
> performance microoptimizations with -O0) could happen during split2 and
> split3 passes (when people don't disable sched2 or enable sel sched - the
> latter only with a few days old trunk):
>
>   NEXT_PASS (pass_split_after_reload);
>   NEXT_PASS (pass_ree);
>   NEXT_PASS (pass_compare_elim_after_reload);
>   NEXT_PASS (pass_thread_prologue_and_epilogue);
>   NEXT_PASS (pass_rtl_dse2);
>   NEXT_PASS (pass_stack_adjustments);
>   NEXT_PASS (pass_jump2);
>   NEXT_PASS (pass_duplicate_computed_gotos);
>   NEXT_PASS (pass_sched_fusion);
>   NEXT_PASS (pass_peephole2);
>   NEXT_PASS (pass_if_after_reload);
>   NEXT_PASS (pass_regrename);
>   NEXT_PASS (pass_cprop_hardreg);
>   NEXT_PASS (pass_fast_rtl_dce);
>   NEXT_PASS (pass_reorder_blocks);
>   NEXT_PASS (pass_leaf_regs);
>   NEXT_PASS (pass_split_before_sched2);
>   NEXT_PASS (pass_sched2);
>
> I think the passes between split2 and peephole2 won't really benefit
> from the splitting being done earlier, and I doubt the passes between
> peephole2 and split3 will often create new opportunities to split.
> And during/after sched2 we weren't splitting anymore.

OK.

> So, I think doing following would be best:
>
> 2021-02-16  Jakub Jelinek  
>
> PR target/99104
> * config/i386/i386.c (ix86_ok_to_clobber_flags): Remove.
> (ix86_avoid_lea_for_add): Don't call ix86_ok_to_clobber_flags.
> (ix86_avoid_lea_for_addr): Likewise.  Adjust function comment.
> * config/i386/i386.m (*lea): Change from define_insn_and_split
> into define_insn.  Move the splitting to define_peephole2 and
> check there using peep2_regno_dead_p if FLAGS_REG is dead.
>
> * gcc.dg/pr99104.c: New test.

OK. Please see a small comment inline.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.c.jj   2021-02-16 08:58:55.197890195 +0100
> +++ gcc/config/i386/i386.c  2021-02-16 11:45:44.233204555 +0100
> @@ -14968,38 +14968,6 @@ ix86_lea_outperforms (rtx_insn *insn, un
>return dist_define >= dist_use;
>  }
>
> -/* Return true if it is legal to clobber flags by INSN and
> -   false otherwise.  */
> -
> -static bool
> -ix86_ok_to_clobber_flags (rtx_insn *insn)
> -{
> -  

[Patch] Fortran: Reject DT as fmt in I/O statments [PR99111]

2021-02-16 Thread Tobias Burnus

GCC started to accept as legacy extension array-valued
non-characters, which works for integer (+ real/logical).

Some other – mostly unrelated patch regarding resolving
array constructors (r10-5607-gde89b5748d68b76b, PR92996)
turned this into an ICE.

While it might be in theory fixable for derived types,
just rejecting DT + CLASS + c_(fun)ptr makes more sense.

I am not quite sure about the error-message wording.
Better suggestions? In the 'rank > 0' branch, Hollerith
should not occur and (as legacy extension) integer etc.
are permitted. — Thoughts?

Otherwise: OK for GCC 11 and GCC 10?
(ICE is new in GCC 10.)

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
Fortran: Reject DT as fmt in I/O statments [PR99111]

gcc/fortran/ChangeLog:

	PR fortran/99111
	* io.c (resolve_tag_format): Reject BT_DERIVED/CLASS/VOID
	as (array-valued) FORMAT tag.

gcc/testsuite/ChangeLog:

	PR fortran/99111
	* gfortran.dg/fmt_nonchar_1.f90: New test.
	* gfortran.dg/fmt_nonchar_2.f90: New test.

 gcc/fortran/io.c|  7 +
 gcc/testsuite/gfortran.dg/fmt_nonchar_1.f90 | 46 +
 gcc/testsuite/gfortran.dg/fmt_nonchar_2.f90 | 22 ++
 3 files changed, 75 insertions(+)

diff --git a/gcc/fortran/io.c b/gcc/fortran/io.c
index da6ad177ec3..40cd76eb585 100644
--- a/gcc/fortran/io.c
+++ b/gcc/fortran/io.c
@@ -1762,6 +1762,13 @@ resolve_tag_format (gfc_expr *e)
  It may be assigned an Hollerith constant.  */
   if (e->ts.type != BT_CHARACTER)
 {
+  if (e->ts.type == BT_DERIVED || e->ts.type == BT_CLASS
+	  || e->ts.type == BT_VOID)
+	{
+	  gfc_error ("Non-character non-Hollerith in FORMAT tag at %L",
+		 >where);
+	  return false;
+	}
   if (!gfc_notify_std (GFC_STD_LEGACY, "Non-character in FORMAT tag "
 			   "at %L", >where))
 	return false;
diff --git a/gcc/testsuite/gfortran.dg/fmt_nonchar_1.f90 b/gcc/testsuite/gfortran.dg/fmt_nonchar_1.f90
new file mode 100644
index 000..431b61569e2
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/fmt_nonchar_1.f90
@@ -0,0 +1,46 @@
+! { dg-do compile }
+!
+! PR fortran/99111
+!
+program p
+   use iso_c_binding
+   implicit none
+   type t
+  integer :: a(1)
+   end type
+   type(t), parameter :: x(3) = [t(transfer('("he', 1)), &
+ t(transfer('llo ', 1)), &
+ t(transfer('W1")', 1))]
+   type t2
+ procedure(), pointer, nopass :: ppt
+   end type t2
+   type(t2) :: ppcomp(1)
+   interface
+ function fptr()
+   procedure(), pointer :: fptr
+ end function
+   end interface
+   class(t), allocatable :: cl(:)
+   type(c_ptr) :: cptr(1)
+   type(c_funptr) :: cfunptr(1)
+   procedure(), pointer :: proc
+   external proc2
+
+   print x ! { dg-error "Non-character non-Hollerith in FORMAT tag" }
+   print cl ! { dg-error "Non-character non-Hollerith in FORMAT tag" }
+   print cptr ! { dg-error "Non-character non-Hollerith in FORMAT tag" }
+   print cfunptr ! { dg-error "Non-character non-Hollerith in FORMAT tag" }
+
+   print proc ! { dg-error "Syntax error in PRINT statement" }
+   print proc2 ! { dg-error "Syntax error in PRINT statement" }
+   print ppcomp%ppt ! { dg-error "Syntax error in PRINT statement" }
+
+   print fptr() ! { dg-error "must be of type default-kind CHARACTER or of INTEGER" }
+
+   call bar(1)
+contains
+   subroutine bar (xx)
+ type(*) :: xx
+ print xx  ! { dg-error "Assumed-type variable xx at ... may only be used as actual argument" }
+   end
+end
diff --git a/gcc/testsuite/gfortran.dg/fmt_nonchar_2.f90 b/gcc/testsuite/gfortran.dg/fmt_nonchar_2.f90
new file mode 100644
index 000..7c0f524c3c9
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/fmt_nonchar_2.f90
@@ -0,0 +1,22 @@
+! { dg-do run }
+!
+! PR fortran/99111
+!
+program p
+   implicit none
+   type t
+  integer :: a(1)
+   end type
+   type(t), parameter :: x(3) = [t(transfer('("he', 1)), &
+ t(transfer('llo ', 1)), &
+ t(transfer('W1")', 1))]
+
+   integer, parameter :: y(3) = transfer('("hello W2")', 1, size=3)
+   real, parameter :: z(3) = transfer('("hello W3")', 1.0, size=3)
+
+   print y  ! { dg-warning "Legacy Extension: Non-character in FORMAT" }
+   print z  ! { dg-warning "Legacy Extension: Non-character in FORMAT" }
+   print x%a(1) ! { dg-warning "Legacy Extension: Non-character in FORMAT" }
+end
+
+! { dg-output "hello W2(\n|\r\n|\r)hello W3(\n|\r\n|\r)hello W1" }


Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 16, 2021 at 11:47:51AM +0100, Uros Bizjak wrote:
> > In this case the match_scratch wouldn't work, since CC_REGNUM is fixed.
> > But as you said on irc, there's peep2_regno_dead_p instead.
> >
> > Haven't tried it and so don't know whether it would work with only
> > one construct in the first […].  But it seems like it would be a better
> > fit, since peephole2 is designed to have up-to-date register information.
> >
> > Uros, does that sound reasonable, or is it a non-starter?
> 
> We have some splits after peephole2 on x86, e.g. "convert impossible
> pushes of immediate". There we try to get scratch register with
> define_peephole2 and if this fails (or when peephole2 pass is not ran
> at all with -O0), define_split after peephole2 splits the impossible
> insn to some less optimal sequence.
> 
> Another usage of late split pass is SSE partial register dependency
> stall that should run after allocated registers won't change anymore.
> 
> Please also note that peephole2 pass doesn't run with -O0.

True, but it seems these ix86_avoid_lea_for_add{,r} splitters are only
an optimization (mostly for the Atom/Bonell like CPUs).
And actually looking at it, there are 2 separate cases.

One is ix86_avoid_lea_for_add where I see no reason to check that
flags are dead, because it is used in splitters:
(define_split
  [(set (match_operand:SWI48 0 "register_operand")
(plus:SWI48 (match_operand:SWI48 1 "register_operand")
(match_operand:SWI48 2 "x86_64_nonmemory_operand")))
   (clobber (reg:CC FLAGS_REG))]
  "reload_completed && ix86_avoid_lea_for_add (insn, operands)"
...
(define_split
  [(set (match_operand:DI 0 "register_operand")
(zero_extend:DI
  (plus:SI (match_operand:SI 1 "register_operand")
   (match_operand:SI 2 "x86_64_nonmemory_operand"
   (clobber (reg:CC FLAGS_REG))]
  "TARGET_64BIT
   && reload_completed && ix86_avoid_lea_for_add (insn, operands)"
where the splitter will only match if the flags reg is dead at that point.
So, I think we can just keep those as splitters and just don't call the
function that makes no sense in that case.

And then there is the ix86_avoid_lea_for_addr case, where indeed we
need to check if flags reg is dead at that point (i.e. if we can clobber
it).  But the define_peephole2 pass seems to have much better infrastructure
for that, the walking of the rest of the bb can be quadratic.

The splitting in that case when optimizing (nobody should care about code
performance microoptimizations with -O0) could happen during split2 and
split3 passes (when people don't disable sched2 or enable sel sched - the
latter only with a few days old trunk):

  NEXT_PASS (pass_split_after_reload);
  NEXT_PASS (pass_ree);
  NEXT_PASS (pass_compare_elim_after_reload);
  NEXT_PASS (pass_thread_prologue_and_epilogue);
  NEXT_PASS (pass_rtl_dse2);
  NEXT_PASS (pass_stack_adjustments);
  NEXT_PASS (pass_jump2);
  NEXT_PASS (pass_duplicate_computed_gotos);
  NEXT_PASS (pass_sched_fusion);
  NEXT_PASS (pass_peephole2);
  NEXT_PASS (pass_if_after_reload);
  NEXT_PASS (pass_regrename);
  NEXT_PASS (pass_cprop_hardreg);
  NEXT_PASS (pass_fast_rtl_dce);
  NEXT_PASS (pass_reorder_blocks);
  NEXT_PASS (pass_leaf_regs);
  NEXT_PASS (pass_split_before_sched2);
  NEXT_PASS (pass_sched2);

I think the passes between split2 and peephole2 won't really benefit
from the splitting being done earlier, and I doubt the passes between
peephole2 and split3 will often create new opportunities to split.
And during/after sched2 we weren't splitting anymore.

So, I think doing following would be best:

2021-02-16  Jakub Jelinek  

PR target/99104
* config/i386/i386.c (ix86_ok_to_clobber_flags): Remove.
(ix86_avoid_lea_for_add): Don't call ix86_ok_to_clobber_flags.
(ix86_avoid_lea_for_addr): Likewise.  Adjust function comment.
* config/i386/i386.m (*lea): Change from define_insn_and_split
into define_insn.  Move the splitting to define_peephole2 and
check there using peep2_regno_dead_p if FLAGS_REG is dead.

* gcc.dg/pr99104.c: New test.

--- gcc/config/i386/i386.c.jj   2021-02-16 08:58:55.197890195 +0100
+++ gcc/config/i386/i386.c  2021-02-16 11:45:44.233204555 +0100
@@ -14968,38 +14968,6 @@ ix86_lea_outperforms (rtx_insn *insn, un
   return dist_define >= dist_use;
 }
 
-/* Return true if it is legal to clobber flags by INSN and
-   false otherwise.  */
-
-static bool
-ix86_ok_to_clobber_flags (rtx_insn *insn)
-{
-  basic_block bb = BLOCK_FOR_INSN (insn);
-  df_ref use;
-  bitmap live;
-
-  while (insn)
-{
-  if (NONDEBUG_INSN_P (insn))
-   {
- FOR_EACH_INSN_USE (use, insn)
-   if (DF_REF_REG_USE_P (use) && DF_REF_REGNO (use) == FLAGS_REG)
- return false;
-
- if (insn_defines_reg (FLAGS_REG, 

Re: Committed: gcc.misc-tests/outputs.exp (outest): Fix typo "is_target"

2021-02-16 Thread Hans-Peter Nilsson via Gcc-patches
> From: Bernd Edlinger 
> Date: Tue, 16 Feb 2021 08:35:04 +0100

> Oops,
> 
> thanks for fixing this problem.
> 
> To my excuse I would like to note,
> that the script error does not happen on x86_64-pc-linux-gnu,
> probably it would only happen when a file is left over.

Sorry but that just sounds to me like bad coverage when
testing this patch.

AFAIU unfortunately errors including tcl syntax errors do
not cause any difference in error return codes for "make
check" (compared to the usual unclean run; with some errors).

> Since usually this is never executed because $outs is empty:
> ...
> Can you say which target was this, where you found this bug?

A simulator target: cris-elf:
'make check RUNTESTFLAGS=--target_board=cris-sim'

> Which file was in $outs?

I added:

--- outputs.exp~2021-01-11 17:44:41.0 +0100
+++ outputs.exp 2021-02-16 11:41:32.130512321 +0100
@@ -187,6 +187,7 @@ proc outest { test sources opts dirs out
 }
 
 set outb {}
+send_log "for test <$test>, outs is <$outs>\n"
 foreach f $outs {
file delete $f
# collect2 may create .cdtor* files in -save-temps link tests,

and ran "make check-gcc-c 'RUNTESTFLAGS=--target_board=cris-sim outputs.exp'"
which resulted in this in gcc.log:
...
PASS: outputs exe savetmp namedb: outputs-outputs-0.i
PASS: outputs exe savetmp namedb: outputs-outputs-0.s
PASS: outputs exe savetmp namedb: outputs-outputs-0.o
PASS: outputs exe savetmp namedb: outputs.args.0
PASS: outputs exe savetmp namedb: outputs.ld1_args
PASS: outputs exe savetmp namedb: outputs.exe
for test , outs is 
ERROR: (DejaGnu) proc "is_target hppa*-*-hpux*" does not exist.
...

> I am also a bit surprised that a script error in one test aborts
> the whole gcc.target tests.  How can that be?

TBH, I'm not really surprised by dejagnu exiting due to an
undefined function/proc, but maybe for penance you can look
into this and wrap key points (like each .exp file) into
try-clauses? ;-]

brgds, H-P



Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Uros Bizjak via Gcc-patches
On Tue, Feb 16, 2021 at 11:28 AM Richard Sandiford
 wrote:
>
> Jakub Jelinek via Gcc-patches  writes:
> > On Tue, Feb 16, 2021 at 09:55:40AM +, Richard Sandiford wrote:
> >> I assume that's because pass_df_initialize_no_opt is slightly after
> >> the first pass_split_all_insns?  Seems like it should just be a case
> >> of moving it up.
> >>
> >> > And for noflow where the cfg doesn't really exit I wouldn't even attempt 
> >> > to
> >> > do df_analyze.
> >>
> >> What stops ix86_ok_to_clobber_flags being called in that case?
> >
> > split5 is never enabled on i386.
> >   virtual bool gate (function *)
> > {
> >   /* The placement of the splitting that we do for shorten_branches
> >  depends on whether regstack is used by the target or not.  */
> > #if HAVE_ATTR_length && !defined (STACK_REGS)
> >   return true;
> > #else
> >   return false;
> > #endif
> > }
>
> OK.  I guess not running df_analyze in that pass is “safe” because any
> attempt to use DF should hopefully blow up spectacularly.
>
> But following up from what we discussed on IRC, could the splits in
> question use define_peephole2 instead?  There's an example in the docs
> of using it for one instruction (plus one scratch register):
>
> @smallexample
> (define_peephole2
>   [(match_scratch:SI 2 "r")
>(parallel [(set (match_operand:SI 0 "register_operand" "")
>(match_operator:SI 3 "arith_or_logical_operator"
>  [(match_dup 0)
>   (match_operand:SI 1 "memory_operand" "")]))
>   (clobber (reg:CC 17))])]
>   "! optimize_size && ! TARGET_READ_MODIFY"
>   [(set (match_dup 2) (match_dup 1))
>(parallel [(set (match_dup 0)
>(match_op_dup 3 [(match_dup 0) (match_dup 2)]))
>   (clobber (reg:CC 17))])]
>   "")
> @end smallexample
>
> In this case the match_scratch wouldn't work, since CC_REGNUM is fixed.
> But as you said on irc, there's peep2_regno_dead_p instead.
>
> Haven't tried it and so don't know whether it would work with only
> one construct in the first […].  But it seems like it would be a better
> fit, since peephole2 is designed to have up-to-date register information.
>
> Uros, does that sound reasonable, or is it a non-starter?

We have some splits after peephole2 on x86, e.g. "convert impossible
pushes of immediate". There we try to get scratch register with
define_peephole2 and if this fails (or when peephole2 pass is not ran
at all with -O0), define_split after peephole2 splits the impossible
insn to some less optimal sequence.

Another usage of late split pass is SSE partial register dependency
stall that should run after allocated registers won't change anymore.

Please also note that peephole2 pass doesn't run with -O0.

Uros.


Re: [RFC] test builtin ratio for loop distribution

2021-02-16 Thread Alexandre Oliva
On Feb 16, 2021, Alexandre Oliva  wrote:

>> So I wonder whether we should instead re-run CCP after loop opts which
>> computes nonzero bits as well instead of the above "hack".

That works.  It takes care of both the dest alignment and the len ctz.

Explicitly masking out the len tz from nonzero bits also works, but I
suppose the ccp pass does a better job, and it takes care of memcpy and
memmove as well.


I noticed that ccp releases CDI_DOMINATORS information.  It looks like
the last copy_prop pass, now replaced with ccp in my tree, was late
enough that it doesn't seem to matter, but at first I thought you meant
an earlier copy_prop, that runs before graphite, and dropping dominator
info at that point was a problem for e.g. cunroll.

-- 
Alexandre Oliva, happy hacker  https://FSFLA.org/blogs/lxo/
   Free Software Activist GNU Toolchain Engineer
Vim, Vi, Voltei pro Emacs -- GNUlius Caesar


Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches  writes:
> On Tue, Feb 16, 2021 at 09:55:40AM +, Richard Sandiford wrote:
>> I assume that's because pass_df_initialize_no_opt is slightly after
>> the first pass_split_all_insns?  Seems like it should just be a case
>> of moving it up.
>> 
>> > And for noflow where the cfg doesn't really exit I wouldn't even attempt to
>> > do df_analyze.
>> 
>> What stops ix86_ok_to_clobber_flags being called in that case?
>
> split5 is never enabled on i386.
>   virtual bool gate (function *)
> {
>   /* The placement of the splitting that we do for shorten_branches
>  depends on whether regstack is used by the target or not.  */
> #if HAVE_ATTR_length && !defined (STACK_REGS)
>   return true;
> #else
>   return false;
> #endif
> }

OK.  I guess not running df_analyze in that pass is “safe” because any
attempt to use DF should hopefully blow up spectacularly.

But following up from what we discussed on IRC, could the splits in
question use define_peephole2 instead?  There's an example in the docs
of using it for one instruction (plus one scratch register):

@smallexample
(define_peephole2
  [(match_scratch:SI 2 "r")
   (parallel [(set (match_operand:SI 0 "register_operand" "")
   (match_operator:SI 3 "arith_or_logical_operator"
 [(match_dup 0)
  (match_operand:SI 1 "memory_operand" "")]))
  (clobber (reg:CC 17))])]
  "! optimize_size && ! TARGET_READ_MODIFY"
  [(set (match_dup 2) (match_dup 1))
   (parallel [(set (match_dup 0)
   (match_op_dup 3 [(match_dup 0) (match_dup 2)]))
  (clobber (reg:CC 17))])]
  "")
@end smallexample

In this case the match_scratch wouldn't work, since CC_REGNUM is fixed.
But as you said on irc, there's peep2_regno_dead_p instead.

Haven't tried it and so don't know whether it would work with only
one construct in the first […].  But it seems like it would be a better
fit, since peephole2 is designed to have up-to-date register information.

Uros, does that sound reasonable, or is it a non-starter?

Thanks,
Richard


Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 16, 2021 at 09:55:40AM +, Richard Sandiford wrote:
> I assume that's because pass_df_initialize_no_opt is slightly after
> the first pass_split_all_insns?  Seems like it should just be a case
> of moving it up.
> 
> > And for noflow where the cfg doesn't really exit I wouldn't even attempt to
> > do df_analyze.
> 
> What stops ix86_ok_to_clobber_flags being called in that case?

split5 is never enabled on i386.
  virtual bool gate (function *)
{
  /* The placement of the splitting that we do for shorten_branches
 depends on whether regstack is used by the target or not.  */
#if HAVE_ATTR_length && !defined (STACK_REGS)
  return true;
#else
  return false;
#endif
}

Jakub



Re: [PATCH 2/2] openacc: Strided array sections and components of derived-type arrays

2021-02-16 Thread Tobias Burnus

On 12.02.21 16:46, Julian Brown wrote:

This patch disallows selecting components of array sections in update
directives for OpenACC, as specified in OpenACC 3.0, "2.14.4. Update
Directive", "Restrictions":

   "In Fortran, members of variables of derived type may appear, including
a subarray of a member. Members of subarrays of derived type may
not appear."

The diagnostic for attempting to use the same construct on other
directives has also been improved.

OK for mainline?


LGTM.

Tobias


gcc/fortran/
  * openmp.c (resolve_omp_clauses): Disallow selecting components of
  arrays of derived type.

gcc/testsuite/
  * gfortran.dg/goacc/array-with-dt-2.f90: Remove expected errors.
  * gfortran.dg/goacc/array-with-dt-6.f90: New test.
  * gfortran.dg/goacc/mapping-tests-2.f90: Update expected error.

libgomp/
  * testsuite/libgomp.oacc-fortran/array-stride-dt-1.f90: Remove expected
  errors.
---
  gcc/fortran/openmp.c  | 55 +++
  .../gfortran.dg/goacc/array-with-dt-2.f90 |  5 +-
  .../gfortran.dg/goacc/array-with-dt-6.f90 | 10 
  .../gfortran.dg/goacc/mapping-tests-2.f90 |  4 +-
  .../array-stride-dt-1.f90 |  5 +-
  5 files changed, 48 insertions(+), 31 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-6.f90

diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index aab17f0589f..9bcb1bf62ca 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -5174,17 +5174,29 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
   "are allowed on ORDERED directive at %L",
   >where);
}
- gfc_ref *array_ref = NULL;
+ gfc_ref *lastref = NULL, *lastslice = NULL;
  bool resolved = false;
  if (n->expr)
{
- array_ref = n->expr->ref;
+ lastref = n->expr->ref;
  resolved = gfc_resolve_expr (n->expr);

  /* Look through component refs to find last array
 reference.  */
  if (resolved)
{
+ for (gfc_ref *ref = n->expr->ref; ref; ref = ref->next)
+   if (ref->type == REF_COMPONENT)
+ lastref = ref;
+   else if (ref->type == REF_ARRAY)
+ {
+   for (int i = 0; i < ref->u.ar.dimen; i++)
+ if (ref->u.ar.dimen_type[i] == DIMEN_RANGE)
+   lastslice = ref;
+
+   lastref = ref;
+ }
+
  /* The "!$acc cache" directive allows rectangular
 subarrays to be specified, with some restrictions
 on the form of bounds (not implemented).
@@ -5192,45 +5204,42 @@ resolve_omp_clauses (gfc_code *code, gfc_omp_clauses 
*omp_clauses,
 array isn't contiguous.  An expression such as
 arr(-n:n,-n:n) could be contiguous even if it looks
 like it may not be.  */
- if (list != OMP_LIST_CACHE
+ if (code->op != EXEC_OACC_UPDATE
+ && list != OMP_LIST_CACHE
  && list != OMP_LIST_DEPEND
  && !gfc_is_simply_contiguous (n->expr, false, true)
- && gfc_is_not_contiguous (n->expr))
+ && gfc_is_not_contiguous (n->expr)
+ && !(lastslice
+  && (lastslice->next
+  || lastslice->type != REF_ARRAY)))
gfc_error ("Array is not contiguous at %L",
   >where);
-
- while (array_ref
-&& (array_ref->type == REF_COMPONENT
-|| (array_ref->type == REF_ARRAY
-&& array_ref->next
-&& (array_ref->next->type
-== REF_COMPONENT
-   array_ref = array_ref->next;
}
}
- if (array_ref
+ if (lastref
  || (n->expr
  && (!resolved || n->expr->expr_type != EXPR_VARIABLE)))
{
- if (array_ref
- && (array_ref->type == REF_SUBSTRING
- || (array_ref->next
- && array_ref->next->type == REF_SUBSTRING)))
+ if (lastref
+ && (lastref->type == REF_SUBSTRING
+ || (lastref->next
+ && lastref->next->type == REF_SUBSTRING)))
 

Re: [PATCH 1/2] openacc: Fix lowering for derived-type mappings through array elements

2021-02-16 Thread Tobias Burnus

On 12.02.21 16:46, Julian Brown wrote:


This patch fixes lowering of derived-type mappings which select elements
of arrays of derived types, and similar. These would previously lead
to ICEs.

With this change, OpenACC directives can pass through constructs that
are no longer recognized by the gimplifier, hence alterations are needed
there also.

OK for mainline?


LGTM.

Thanks,

Tobias


gcc/fortran/
  * trans-openmp.c (gfc_trans_omp_clauses): Handle element selection
  for arrays of derived types.

gcc/
  * gimplify.c (gimplify_scan_omp_clauses): Handle ATTACH_DETACH
  for non-decls.

gcc/testsuite/
  * gfortran.dg/goacc/array-with-dt-1.f90: New test.
  * gfortran.dg/goacc/array-with-dt-3.f90: Likewise.
  * gfortran.dg/goacc/array-with-dt-4.f90: Likewise.
  * gfortran.dg/goacc/array-with-dt-5.f90: Likewise.
  * gfortran.dg/goacc/derived-chartypes-1.f90: Re-enable test.
  * gfortran.dg/goacc/derived-chartypes-2.f90: Likewise.
  * gfortran.dg/goacc/derived-classtypes-1.f95: Uncomment
  previously-broken directives.

libgomp/
  * testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90: New test.
  * testsuite/libgomp.oacc-fortran/update-dt-array.f90: Likewise.
---
  gcc/fortran/trans-openmp.c| 192 ++
  gcc/gimplify.c|  12 ++
  .../gfortran.dg/goacc/array-with-dt-1.f90 |  11 +
  .../gfortran.dg/goacc/array-with-dt-3.f90 |  14 ++
  .../gfortran.dg/goacc/array-with-dt-4.f90 |  18 ++
  .../gfortran.dg/goacc/array-with-dt-5.f90 |  12 ++
  .../gfortran.dg/goacc/derived-chartypes-1.f90 |   3 -
  .../gfortran.dg/goacc/derived-chartypes-2.f90 |   3 -
  .../goacc/derived-classtypes-1.f95|   8 +-
  .../derivedtypes-arrays-1.f90 | 109 ++
  .../libgomp.oacc-fortran/update-dt-array.f90  |  53 +
  11 files changed, 344 insertions(+), 91 deletions(-)
  create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-1.f90
  create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-3.f90
  create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-4.f90
  create mode 100644 gcc/testsuite/gfortran.dg/goacc/array-with-dt-5.f90
  create mode 100644 
libgomp/testsuite/libgomp.oacc-fortran/derivedtypes-arrays-1.f90
  create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/update-dt-array.f90

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 249b3de2cfd..67e370f8b57 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -2675,6 +2675,32 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
tree decl = gfc_trans_omp_variable (n->sym, false);
if (DECL_P (decl))
  TREE_ADDRESSABLE (decl) = 1;
+
+   gfc_ref *lastref = NULL;
+
+   if (n->expr)
+ for (gfc_ref *ref = n->expr->ref; ref; ref = ref->next)
+   if (ref->type == REF_COMPONENT || ref->type == REF_ARRAY)
+ lastref = ref;
+
+   bool allocatable = false, pointer = false;
+
+   if (lastref && lastref->type == REF_COMPONENT)
+ {
+   gfc_component *c = lastref->u.c.component;
+
+   if (c->ts.type == BT_CLASS)
+ {
+   pointer = CLASS_DATA (c)->attr.class_pointer;
+   allocatable = CLASS_DATA (c)->attr.allocatable;
+ }
+   else
+ {
+   pointer = c->attr.pointer;
+   allocatable = c->attr.allocatable;
+ }
+ }
+
if (n->expr == NULL
|| (n->expr->ref->type == REF_ARRAY
&& n->expr->ref->u.ar.type == AR_FULL))
@@ -2911,74 +2937,79 @@ gfc_trans_omp_clauses (stmtblock_t *block, 
gfc_omp_clauses *clauses,
  }
else if (n->expr
 && n->expr->expr_type == EXPR_VARIABLE
-&& n->expr->ref->type == REF_COMPONENT)
+&& n->expr->ref->type == REF_ARRAY
+&& !n->expr->ref->next)
  {
-   gfc_ref *lastcomp;
-
-   for (gfc_ref *ref = n->expr->ref; ref; ref = ref->next)
- if (ref->type == REF_COMPONENT)
-   lastcomp = ref;
-
-   symbol_attribute sym_attr;
-
-   if (lastcomp->u.c.component->ts.type == BT_CLASS)
- sym_attr = CLASS_DATA (lastcomp->u.c.component)->attr;
-   else
- sym_attr = lastcomp->u.c.component->attr;
-
+   /* An array element or array section which is not part of a
+  derived type, etc.  */
+   bool element = n->expr->ref->u.ar.type == AR_ELEMENT;
+   gfc_trans_omp_array_section (block, n, decl, element,
+GOMP_MAP_POINTER, node, node2,
+   

Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Tue, Feb 16, 2021 at 09:16:40AM +, Richard Sandiford wrote:
>> But doing it on demand like this seems fragile.  And the targets aren't
>> a fixed… target.  I think we need to design the interface so that things
>> are unlikely to go wrong in future rather than design it on the basis
>> that things will stay the way they are now.
>
> By making it possible but not required to use df in split passes, at least
> if we document it, I don't see why it should be fragile.
>
> In order to benchmark it, I've tried to do release checking build without
> and with the following patch.  Except with the patch bootstrap fails
> miserably, df_analyze in split1 pass doesn't work at -O0 at all.

I assume that's because pass_df_initialize_no_opt is slightly after
the first pass_split_all_insns?  Seems like it should just be a case
of moving it up.

> And for noflow where the cfg doesn't really exit I wouldn't even attempt to
> do df_analyze.

What stops ix86_ok_to_clobber_flags being called in that case?

Thanks,
Richard

> Plus many backends just call split_all_insns on their own
> at various places...
>
> diff --git a/gcc/recog.c b/gcc/recog.c
> index abbc49f3f9b..e960d4ba706 100644
> --- a/gcc/recog.c
> +++ b/gcc/recog.c
> @@ -4380,8 +4380,9 @@ public:
>opt_pass * clone () { return new pass_split_all_insns (m_ctxt); }
>virtual unsigned int execute (function *)
>  {
> +  df_analyze ();
>split_all_insns ();
> -  return 0;
> +  return TODO_df_finish;
>  }
>  
>  }; // class pass_split_all_insns
> @@ -4425,8 +4426,9 @@ public:
>  
>virtual unsigned int execute (function *)
>  {
> +  df_analyze ();
>split_all_insns ();
> -  return 0;
> +  return TODO_df_finish;
>  }
>  
>  }; // class pass_split_after_reload
> @@ -4479,8 +4481,9 @@ public:
>  
>virtual unsigned int execute (function *)
>  {
> +  df_analyze ();
>split_all_insns ();
> -  return 0;
> +  return TODO_df_finish;
>  }
>  
>  }; // class pass_split_before_sched2
> @@ -4519,8 +4522,9 @@ public:
>virtual bool gate (function *);
>virtual unsigned int execute (function *)
>  {
> +  df_analyze ();
>split_all_insns ();
> -  return 0;
> +  return TODO_df_finish;
>  }
>  
>  }; // class pass_split_before_regstack
>
>
>   Jakub


Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 16, 2021 at 09:16:40AM +, Richard Sandiford wrote:
> But doing it on demand like this seems fragile.  And the targets aren't
> a fixed… target.  I think we need to design the interface so that things
> are unlikely to go wrong in future rather than design it on the basis
> that things will stay the way they are now.

By making it possible but not required to use df in split passes, at least
if we document it, I don't see why it should be fragile.

In order to benchmark it, I've tried to do release checking build without
and with the following patch.  Except with the patch bootstrap fails
miserably, df_analyze in split1 pass doesn't work at -O0 at all.
And for noflow where the cfg doesn't really exit I wouldn't even attempt to
do df_analyze.  Plus many backends just call split_all_insns on their own
at various places...

diff --git a/gcc/recog.c b/gcc/recog.c
index abbc49f3f9b..e960d4ba706 100644
--- a/gcc/recog.c
+++ b/gcc/recog.c
@@ -4380,8 +4380,9 @@ public:
   opt_pass * clone () { return new pass_split_all_insns (m_ctxt); }
   virtual unsigned int execute (function *)
 {
+  df_analyze ();
   split_all_insns ();
-  return 0;
+  return TODO_df_finish;
 }
 
 }; // class pass_split_all_insns
@@ -4425,8 +4426,9 @@ public:
 
   virtual unsigned int execute (function *)
 {
+  df_analyze ();
   split_all_insns ();
-  return 0;
+  return TODO_df_finish;
 }
 
 }; // class pass_split_after_reload
@@ -4479,8 +4481,9 @@ public:
 
   virtual unsigned int execute (function *)
 {
+  df_analyze ();
   split_all_insns ();
-  return 0;
+  return TODO_df_finish;
 }
 
 }; // class pass_split_before_sched2
@@ -4519,8 +4522,9 @@ public:
   virtual bool gate (function *);
   virtual unsigned int execute (function *)
 {
+  df_analyze ();
   split_all_insns ();
-  return 0;
+  return TODO_df_finish;
 }
 
 }; // class pass_split_before_regstack


Jakub



Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Tue, Feb 16, 2021 at 09:42:22AM +0100, Richard Biener wrote:
>> Just to get an idea whether it's worth doing the extra df_analyze.
>> Since we have possibly 5 split passes it's a lot of churn for things
>> like that WRF ltrans unit that already spends 40% of its time in DF ...
>
> Yeah, df_analyze can be fairly expensive and most of the targets don't
> really need it at all.
>
> If I grep for df_get.*_out, I find:
> config/aarch64/aarch64.c:  bitmap live1 = df_get_live_out 
> (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> config/arc/arc.c:   && REGNO_REG_SET_P (df_get_live_out 
> (loop->incoming_src),
> config/arm/arm.c:  bitmap prologue_live_out = df_get_live_out 
> (ENTRY_BLOCK_PTR_FOR_FN (cfun));
> config/arm/arm.c:  return REGNO_REG_SET_P (df_get_live_out 
> (ENTRY_BLOCK_PTR_FOR_FN (cfun)), 3);
> config/arm/arm.c: = REGNO_REG_SET_P (df_get_live_out 
> (ENTRY_BLOCK_PTR_FOR_FN (cfun)),
> config/arm/arm.c: = REGNO_REG_SET_P (df_get_live_out 
> (ENTRY_BLOCK_PTR_FOR_FN (cfun)),
> config/bfin/bfin.c:   && !REGNO_REG_SET_P (df_get_live_out (bb_in), i))
> config/i386/i386.c:  return REGNO_REG_SET_P (df_get_live_out 
> (ENTRY_BLOCK_PTR_FOR_FN (cfun)), 0);
> config/i386/i386.c:  live = df_get_live_out(bb);
> config/i386/i386-features.c:  bitmap_copy (live_regs, df_get_live_out 
> (bb));
> where aarch64, arc and bfin are ok, i386 has this known issue and
> arm uses most of the calls during pro/epilogue expansion (fine), but seems
> to use it also (indirectly) in USE_RETURN_INSN which is used not in
> splitters, but in insn conditions (so matched at any time).  But it seems to
> use a cache so that once computed it remembers it, so probably it is ok too.
>
> So, adding unconditional df_analyze / TODO_df_finish would slow down all the
> targets, but only help a single one and even on that one it is better to do
> it only if it will be really needed (e.g. it is never needed during split1
> (!reload_completed doesn't call those at all) and even in split2+ one really
> needs to trigger the right patterns in the IL (it can trigger quite
> frequently with the atom/bonell tunings, but otherwise only very rarely).

But doing it on demand like this seems fragile.  And the targets aren't
a fixed… target.  I think we need to design the interface so that things
are unlikely to go wrong in future rather than design it on the basis
that things will stay the way they are now.

This kind of thing isn't needed for other uses of DF and I don't think
we should expect anyone who adds a new use of DF in splitters to
remember that this is needed.

The fact that the postorder is recomputed on each df_analyze is IMO
a separate issue and could/should be fixed separately if it's a
significant overhead.

(There are other potential savings like that too.  E.g. at the moment
we don't try to keep dominance information up-to-date for RTL passes,
so every pass that computes it has to free it too.)

Thanks,
Richard


Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 16, 2021 at 08:56:13AM +, Richard Sandiford wrote:
> Yeah, but the updates are incremental.  I think in many cases we'll

df_analyze e.g. computes postorder, so it has some costs even if everything
is really up to date.

Jakub



Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Jakub Jelinek via Gcc-patches
On Tue, Feb 16, 2021 at 09:42:22AM +0100, Richard Biener wrote:
> Just to get an idea whether it's worth doing the extra df_analyze.
> Since we have possibly 5 split passes it's a lot of churn for things
> like that WRF ltrans unit that already spends 40% of its time in DF ...

Yeah, df_analyze can be fairly expensive and most of the targets don't
really need it at all.

If I grep for df_get.*_out, I find:
config/aarch64/aarch64.c:  bitmap live1 = df_get_live_out 
(ENTRY_BLOCK_PTR_FOR_FN (cfun));
config/arc/arc.c: && REGNO_REG_SET_P (df_get_live_out 
(loop->incoming_src),
config/arm/arm.c:  bitmap prologue_live_out = df_get_live_out 
(ENTRY_BLOCK_PTR_FOR_FN (cfun));
config/arm/arm.c:  return REGNO_REG_SET_P (df_get_live_out 
(ENTRY_BLOCK_PTR_FOR_FN (cfun)), 3);
config/arm/arm.c:   = REGNO_REG_SET_P (df_get_live_out 
(ENTRY_BLOCK_PTR_FOR_FN (cfun)),
config/arm/arm.c:   = REGNO_REG_SET_P (df_get_live_out 
(ENTRY_BLOCK_PTR_FOR_FN (cfun)),
config/bfin/bfin.c: && !REGNO_REG_SET_P (df_get_live_out (bb_in), i))
config/i386/i386.c:  return REGNO_REG_SET_P (df_get_live_out 
(ENTRY_BLOCK_PTR_FOR_FN (cfun)), 0);
config/i386/i386.c:  live = df_get_live_out(bb);
config/i386/i386-features.c:  bitmap_copy (live_regs, df_get_live_out (bb));
where aarch64, arc and bfin are ok, i386 has this known issue and
arm uses most of the calls during pro/epilogue expansion (fine), but seems
to use it also (indirectly) in USE_RETURN_INSN which is used not in
splitters, but in insn conditions (so matched at any time).  But it seems to
use a cache so that once computed it remembers it, so probably it is ok too.

So, adding unconditional df_analyze / TODO_df_finish would slow down all the
targets, but only help a single one and even on that one it is better to do
it only if it will be really needed (e.g. it is never needed during split1
(!reload_completed doesn't call those at all) and even in split2+ one really
needs to trigger the right patterns in the IL (it can trigger quite
frequently with the atom/bonell tunings, but otherwise only very rarely).

Jakub



Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, 16 Feb 2021, Richard Sandiford wrote:
>
>> Jakub Jelinek via Gcc-patches  writes:
>> > Hi!
>> >
>> > The following testcase started ICEing with my recent changes to enable
>> > split4 after sel-sched, but it seems the bug is more general.
>> > Some of the i386 splitter condition functions use and rely on df, but
>> > the split passes don't really df_analyze/df_finish_pass, so the DF info
>> > may be stale or not computed at all - the particular ICE is because
>> > there is a new bb and df_get_live_out (bb) returns NULL on it as the
>> > live or lr problem has not been computed yet.
>> >
>> > The following patch makes it possible to call df_analyze during split
>> > on demand (on the first occassion where it is needed) and changes the
>> > i386 backend to do so.  I needed a target hook that signifies the end
>> > of the split pass so that the backend can reset its flag and return
>> > the todo flags (TODO_df_finish in this case) to the caller if needed.
>> 
>> This seems like it could be a recurring source of bugs.  Not everything
>> is necessarily going to be aware that it's using DF information.
>> And even if the code is aware, it'll be easy to forget that this
>> extra step is necessary.
>> 
>> The bug only showed up here because the failure was noisy.  In other
>> situations, using out-of-date DF information could compile but lead to
>> wrong code.
>> 
>> If we want it to be valid for define_splits to use DF information,
>> I think we should just make the split passes use DF unconditionally.
>
> On IRC I was suggesting to assert that the problems are clean in
> df_get_live_out, via checking df_{lr,live}->solutions_dirty.  That would
> trip on them.  I guess no backend can rely on anything else than
> df-scan which should be up-to-date or df_live/lr, and without extra
> dances it would use df_get_live_out to choose from the two.
>
> Just to get an idea whether it's worth doing the extra df_analyze.
> Since we have possibly 5 split passes it's a lot of churn for things
> like that WRF ltrans unit that already spends 40% of its time in DF ...

Yeah, but the updates are incremental.  I think in many cases we'll
be doing the same work, just slightly earlier than before.  E.g. in the
example above, something is going to recalculate df_get_live_out
eventually.  And splits don't normally change liveness at block
boundaries (unless they're buggy).

So I think the question is: can we see a noticeable increase in compile
time if we do make the split passes use DF?  This should be measured on
a release build since I don't think the extra checking overhead should
count.

Thanks,
Richard


Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Biener
On Tue, 16 Feb 2021, Richard Sandiford wrote:

> Jakub Jelinek via Gcc-patches  writes:
> > Hi!
> >
> > The following testcase started ICEing with my recent changes to enable
> > split4 after sel-sched, but it seems the bug is more general.
> > Some of the i386 splitter condition functions use and rely on df, but
> > the split passes don't really df_analyze/df_finish_pass, so the DF info
> > may be stale or not computed at all - the particular ICE is because
> > there is a new bb and df_get_live_out (bb) returns NULL on it as the
> > live or lr problem has not been computed yet.
> >
> > The following patch makes it possible to call df_analyze during split
> > on demand (on the first occassion where it is needed) and changes the
> > i386 backend to do so.  I needed a target hook that signifies the end
> > of the split pass so that the backend can reset its flag and return
> > the todo flags (TODO_df_finish in this case) to the caller if needed.
> 
> This seems like it could be a recurring source of bugs.  Not everything
> is necessarily going to be aware that it's using DF information.
> And even if the code is aware, it'll be easy to forget that this
> extra step is necessary.
> 
> The bug only showed up here because the failure was noisy.  In other
> situations, using out-of-date DF information could compile but lead to
> wrong code.
> 
> If we want it to be valid for define_splits to use DF information,
> I think we should just make the split passes use DF unconditionally.

On IRC I was suggesting to assert that the problems are clean in
df_get_live_out, via checking df_{lr,live}->solutions_dirty.  That would
trip on them.  I guess no backend can rely on anything else than
df-scan which should be up-to-date or df_live/lr, and without extra
dances it would use df_get_live_out to choose from the two.

Just to get an idea whether it's worth doing the extra df_analyze.
Since we have possibly 5 split passes it's a lot of churn for things
like that WRF ltrans unit that already spends 40% of its time in DF ...

Richard.

> Thanks,
> Richard
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek via Gcc-patches  writes:
> Hi!
>
> The following testcase started ICEing with my recent changes to enable
> split4 after sel-sched, but it seems the bug is more general.
> Some of the i386 splitter condition functions use and rely on df, but
> the split passes don't really df_analyze/df_finish_pass, so the DF info
> may be stale or not computed at all - the particular ICE is because
> there is a new bb and df_get_live_out (bb) returns NULL on it as the
> live or lr problem has not been computed yet.
>
> The following patch makes it possible to call df_analyze during split
> on demand (on the first occassion where it is needed) and changes the
> i386 backend to do so.  I needed a target hook that signifies the end
> of the split pass so that the backend can reset its flag and return
> the todo flags (TODO_df_finish in this case) to the caller if needed.

This seems like it could be a recurring source of bugs.  Not everything
is necessarily going to be aware that it's using DF information.
And even if the code is aware, it'll be easy to forget that this
extra step is necessary.

The bug only showed up here because the failure was noisy.  In other
situations, using out-of-date DF information could compile but lead to
wrong code.

If we want it to be valid for define_splits to use DF information,
I think we should just make the split passes use DF unconditionally.

Thanks,
Richard


[PATCH] cfgrtl: Fix up fixup_partitions caused ICE [PR99085]

2021-02-16 Thread Jakub Jelinek via Gcc-patches
Hi!

fixup_partitions sometimes changes some basic blocks from hot partition to
cold partition, in particular if after unreachable block removal or other
optimizations a hot partition block is dominated by cold partition block(s).
It fixes up the edges and jumps on those edges, but when after reorder
blocks and in rtl (non-cfglayout) mode that is clearly not enough, because
it keeps the block order the same and so we can end up with more than
1 hot/cold section transition in the same function.

So, this patch fixes that up too.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-02-15  Jakub Jelinek  

PR target/99085
* cfgrtl.c (fixup_partitions): When changing some bbs from hot to cold
partitions, if in non-layout mode after reorder_blocks also move
affected blocks to ensure a single partition transition.

* gcc.dg/graphite/pr99085.c: New test.

--- gcc/cfgrtl.c.jj 2021-02-10 19:27:33.0 +0100
+++ gcc/cfgrtl.c2021-02-15 16:47:41.625798717 +0100
@@ -2414,8 +2414,6 @@ find_partition_fixes (bool flag_only)
 void
 fixup_partitions (void)
 {
-  basic_block bb;
-
   if (!crtl->has_bb_partition)
 return;
 
@@ -2436,10 +2434,61 @@ fixup_partitions (void)
   /* Do the partition fixup after all necessary blocks have been converted to
  cold, so that we only update the region crossings the minimum number of
  places, which can require forcing edges to be non fallthru.  */
-  while (! bbs_to_fix.is_empty ())
+  if (! bbs_to_fix.is_empty ())
 {
-  bb = bbs_to_fix.pop ();
-  fixup_new_cold_bb (bb);
+  do
+   {
+ basic_block bb = bbs_to_fix.pop ();
+ fixup_new_cold_bb (bb);
+   }
+  while (! bbs_to_fix.is_empty ());
+
+  /* Fix up hot cold block grouping if needed.  */
+  if (crtl->bb_reorder_complete && current_ir_type () == IR_RTL_CFGRTL)
+   {
+ basic_block bb, first = NULL, second = NULL;
+ int current_partition = BB_UNPARTITIONED;
+
+ FOR_EACH_BB_FN (bb, cfun)
+   {
+ if (current_partition != BB_UNPARTITIONED
+ && BB_PARTITION (bb) != current_partition)
+   {
+ if (first == NULL)
+   first = bb;
+ else if (second == NULL)
+   second = bb;
+ else
+   {
+ /* If we switch partitions for the 3rd, 5th etc. time,
+move bbs first (inclusive) .. second (exclusive) right
+before bb.  */
+ basic_block prev_first = first->prev_bb;
+ basic_block prev_second = second->prev_bb;
+ basic_block prev_bb = bb->prev_bb;
+ prev_first->next_bb = second;
+ second->prev_bb = prev_first;
+ prev_second->next_bb = bb;
+ bb->prev_bb = prev_second;
+ prev_bb->next_bb = first;
+ first->prev_bb = prev_bb;
+ rtx_insn *prev_first_insn = PREV_INSN (BB_HEAD (first));
+ rtx_insn *prev_second_insn
+   = PREV_INSN (BB_HEAD (second));
+ rtx_insn *prev_bb_insn = PREV_INSN (BB_HEAD (bb));
+ SET_NEXT_INSN (prev_first_insn) = BB_HEAD (second);
+ SET_PREV_INSN (BB_HEAD (second)) = prev_first_insn;
+ SET_NEXT_INSN (prev_second_insn) = BB_HEAD (bb);
+ SET_PREV_INSN (BB_HEAD (bb)) = prev_second_insn;
+ SET_NEXT_INSN (prev_bb_insn) = BB_HEAD (first);
+ SET_PREV_INSN (BB_HEAD (first)) = prev_bb_insn;
+ second = NULL;
+   }
+   }
+ current_partition = BB_PARTITION (bb);
+   }
+ gcc_assert (!second);
+   }
 }
 }
 
--- gcc/testsuite/gcc.dg/graphite/pr99085.c.jj  2021-02-15 16:52:59.313169888 
+0100
+++ gcc/testsuite/gcc.dg/graphite/pr99085.c 2021-02-15 16:51:54.589910609 
+0100
@@ -0,0 +1,20 @@
+/* PR target/99085 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fgraphite-identity -fsel-sched-pipelining 
-fselective-scheduling2" } */
+
+void
+foo (int m, int n, int o, int i)
+{
+  long double a2[m];
+  int c2[n][o];
+  int j, k;
+
+  while (i < m)
+a2[i++] = 13.132L;
+
+  for (j = 0; j < n; j++)
+for (k = 0; k < o; k++)
+  c2[j][k] = 1;
+
+  __builtin_abort ();
+}

Jakub



[PATCH] split, i386: Fix up df uses in i386 splitters [PR99104]

2021-02-16 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase started ICEing with my recent changes to enable
split4 after sel-sched, but it seems the bug is more general.
Some of the i386 splitter condition functions use and rely on df, but
the split passes don't really df_analyze/df_finish_pass, so the DF info
may be stale or not computed at all - the particular ICE is because
there is a new bb and df_get_live_out (bb) returns NULL on it as the
live or lr problem has not been computed yet.

The following patch makes it possible to call df_analyze during split
on demand (on the first occassion where it is needed) and changes the
i386 backend to do so.  I needed a target hook that signifies the end
of the split pass so that the backend can reset its flag and return
the todo flags (TODO_df_finish in this case) to the caller if needed.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-02-15  Jakub Jelinek  

PR target/99104
* target.def: Add split_pass_finish hook.
* doc/tm.texi.in: Add TARGET_SPLIT_PASS_FINISH.
* doc/tm.texi: Regenerated.
* rtl.h (split_all_insns): Change return type from void to
unsigned int.
* recog.c (split_all_insns): Likewise.  If targetm.split_pass_finish,
call that hook and return what it returned, otherwise return 0.
 (split_all_insns_noflow): If targetm.split_pass_finish,
call that hook and return what it returned.
* config/i386/i386.c (df_analyzed_during_split): New variable.
(ix86_split_pass_finish): New hook.
(ix86_ok_to_clobber_flags): Call df_analyze if it hasn't been
called yet during current split pass.  Formatting fix.
(ix86_avoid_lea_for_add): Formatting fix.
(TARGET_SPLIT_PASS_FINISH): Redefine.

* gcc.dg/pr99104.c: New test.

--- gcc/target.def.jj   2021-01-04 10:25:38.997231967 +0100
+++ gcc/target.def  2021-02-15 17:36:51.997016704 +0100
@@ -3209,6 +3209,13 @@ DEFHOOK
  bool, (ao_ref *ref),
  default_ref_may_alias_errno)
 
+DEFHOOK
+(split_pass_finish,
+ "Define this to a return TODO flags from the end of a instruction\
+  splitting pass.",
+ unsigned, (void),
+ NULL)
+
 /* Support for named address spaces.  */
 #undef HOOK_PREFIX
 #define HOOK_PREFIX "TARGET_ADDR_SPACE_"
--- gcc/doc/tm.texi.in.jj   2021-01-04 10:25:50.624100311 +0100
+++ gcc/doc/tm.texi.in  2021-02-15 17:50:18.887773367 +0100
@@ -3361,6 +3361,8 @@ stack.
 
 @hook TARGET_REF_MAY_ALIAS_ERRNO
 
+@hook TARGET_SPLIT_PASS_FINISH
+
 @hook TARGET_TRANSLATE_MODE_ATTRIBUTE
 
 @hook TARGET_SCALAR_MODE_SUPPORTED_P
--- gcc/doc/tm.texi.jj  2021-01-04 10:25:50.599100594 +0100
+++ gcc/doc/tm.texi 2021-02-15 17:50:45.754465606 +0100
@@ -4303,6 +4303,10 @@ hook returns true for both @code{ptr_mod
 Define this to return nonzero if the memory reference @var{ref}  may alias 
with the system C library errno location.  The default  version of this hook 
assumes the system C library errno location  is either a declaration of type 
int or accessed by dereferencing  a pointer to int.
 @end deftypefn
 
+@deftypefn {Target Hook} unsigned TARGET_SPLIT_PASS_FINISH (void)
+Define this to a return TODO flags from the end of a instruction  splitting 
pass.
+@end deftypefn
+
 @deftypefn {Target Hook} machine_mode TARGET_TRANSLATE_MODE_ATTRIBUTE 
(machine_mode @var{mode})
 Define this hook if during mode attribute processing, the port should
 translate machine_mode @var{mode} to another mode.  For example, rs6000's
--- gcc/rtl.h.jj2021-01-04 10:25:38.778234447 +0100
+++ gcc/rtl.h   2021-02-15 17:52:30.411266758 +0100
@@ -3747,7 +3747,7 @@ extern enum reg_class reg_allocno_class
 extern void setup_reg_classes (int, enum reg_class, enum reg_class,
   enum reg_class);
 
-extern void split_all_insns (void);
+extern unsigned int split_all_insns (void);
 extern unsigned int split_all_insns_noflow (void);
 
 #define MAX_SAVED_CONST_INT 64
--- gcc/recog.c.jj  2021-02-13 16:08:20.793016638 +0100
+++ gcc/recog.c 2021-02-15 17:38:19.500014240 +0100
@@ -3406,7 +3406,7 @@ split_insn (rtx_insn *insn)
 
 /* Split all insns in the function.  If UPD_LIFE, update life info after.  */
 
-void
+unsigned int
 split_all_insns (void)
 {
   bool changed;
@@ -3488,6 +3488,9 @@ split_all_insns (void)
 }
 
   checking_verify_flow_info ();
+  if (targetm.split_pass_finish)
+return targetm.split_pass_finish ();
+  return 0;
 }
 
 /* Same as split_all_insns, but do not expect CFG to be available.
@@ -3523,6 +3526,8 @@ split_all_insns_noflow (void)
split_insn (insn);
}
 }
+  if (targetm.split_pass_finish)
+return targetm.split_pass_finish ();
   return 0;
 }
 
@@ -4380,8 +4385,7 @@ public:
   opt_pass * clone () { return new pass_split_all_insns (m_ctxt); }
   virtual unsigned int execute (function *)
 {
-  split_all_insns ();
-  return 0;
+  return split_all_insns ();
 }
 
 }; // class pass_split_all_insns
@@ -4425,8 +4429,7 @@ 

[committed] openmp: Fix up vectorization simd call badness computation [PR99100]

2021-02-16 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, ix86_simd_clone_usable didn't make it more desirable
to use 'e' mangled AVX512F entrypoints over 'd' mangled ones (AVX2) with the
same simdlen.  This patch fixes that.  I have tweaked the generic code too
to make more room for these target specific badness factors.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2021-02-16  Jakub Jelinek  

PR target/99100
* tree-vect-stmts.c (vectorizable_simd_clone_call): For num_calls != 1
multiply by 4096 and for inbranch by 8192.
* config/i386/i386.c (ix86_simd_clone_usable): For TARGET_AVX512F,
return 3, 2 or 1 for mangle letters 'b', 'c' or 'd'.

* gcc.target/i386/pr99100.c: New test.

--- gcc/tree-vect-stmts.c.jj2021-02-11 17:19:52.371826169 +0100
+++ gcc/tree-vect-stmts.c   2021-02-15 10:32:27.434790444 +0100
@@ -3914,9 +3914,9 @@ vectorizable_simd_clone_call (vec_info *
|| n->simdclone->nargs != nargs)
  continue;
if (num_calls != 1)
- this_badness += exact_log2 (num_calls) * 1024;
+ this_badness += exact_log2 (num_calls) * 4096;
if (n->simdclone->inbranch)
- this_badness += 2048;
+ this_badness += 8192;
int target_badness = targetm.simd_clone.usable (n);
if (target_badness < 0)
  continue;
--- gcc/config/i386/i386.c.jj   2021-02-09 12:28:14.092323006 +0100
+++ gcc/config/i386/i386.c  2021-02-15 10:33:17.550216579 +0100
@@ -22657,15 +22657,15 @@ ix86_simd_clone_usable (struct cgraph_no
return -1;
   if (!TARGET_AVX)
return 0;
-  return TARGET_AVX2 ? 2 : 1;
+  return TARGET_AVX512F ? 3 : TARGET_AVX2 ? 2 : 1;
 case 'c':
   if (!TARGET_AVX)
return -1;
-  return TARGET_AVX2 ? 1 : 0;
+  return TARGET_AVX512F ? 2 : TARGET_AVX2 ? 1 : 0;
 case 'd':
   if (!TARGET_AVX2)
return -1;
-  return 0;
+  return TARGET_AVX512F ? 1 : 0;
 case 'e':
   if (!TARGET_AVX512F)
return -1;
--- gcc/testsuite/gcc.target/i386/pr99100.c.jj  2021-02-15 10:36:54.879727962 
+0100
+++ gcc/testsuite/gcc.target/i386/pr99100.c 2021-02-15 10:36:27.185045092 
+0100
@@ -0,0 +1,22 @@
+/* PR target/99100 */
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mavx512f -fopenmp-simd -mprefer-vector-width=512" } */
+/* { dg-final { scan-assembler "_ZGVeN8v_myfunc" } } */
+/* { dg-final { scan-assembler "_ZGVeN8v_sin" } } */
+
+#pragma omp declare simd notinbranch
+double sin (double x);
+#pragma omp declare simd simdlen(8) notinbranch
+__attribute__((const)) double myfunc (double x);
+
+#define N 1024
+__attribute__((__aligned__ (256))) double a[N], b[N], c[N];
+
+void
+foo ()
+{
+  for (int i = 0; i < N; i++)
+a[i] = myfunc (b[i]);
+  for (int i = 0; i < N; i++)
+c[i] = sin (b[i]);
+}


Jakub