[Patch, fortran] PR96624 - A segment fault occurred when using the reshape function result to assign a variable

2020-08-26 Thread Paul Richard Thomas via Gcc-patches
Hi All,

Here is another of Steve Kargl's patches.

Before the patch is applied, the following code is generated:
atmp.0.span = 4;
atmp.0.data = 0B;
atmp.0.offset = 0;
(*(integer(kind=4)[0] * restrict) atmp.0.data)[0] = 1;
(*(integer(kind=4)[0] * restrict) atmp.0.data)[1] = 2;

which causes a segfault at run time. The test case counts the number of
occurrences of 'data' to check that the bad assignments have gone.

Regtests OK on FC31/x86_64 - OK for aster?

This patch fixes PR96624.

2020-08-27  Paul Thomas  

gcc/fortran
PR fortran/96624
* simplify.c (gfc_simplify_reshape): Detect zero shape and
clear index if found.

gcc/testsuite/
PR fortran/96624
* gfortran.dg/reshape_8.f90 : New test.
diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index 074b50c2e68..8e0d2f97a60 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -6398,7 +6398,7 @@ gfc_simplify_is_contiguous (gfc_expr *array)
 
   if (gfc_is_not_contiguous (array))
 return gfc_get_logical_expr (gfc_default_logical_kind, &array->where, 0);
-
+
   return NULL;
 }
 
@@ -6725,6 +6725,7 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp,
   unsigned long j;
   size_t nsource;
   gfc_expr *e, *result;
+  bool zerosize = false;
 
   /* Check that argument expression types are OK.  */
   if (!is_constant_array_expr (source)
@@ -6847,7 +6848,14 @@ gfc_simplify_reshape (gfc_expr *source, gfc_expr *shape_exp,
   result->rank = rank;
   result->shape = gfc_get_shape (rank);
   for (i = 0; i < rank; i++)
-mpz_init_set_ui (result->shape[i], shape[i]);
+{
+  mpz_init_set_ui (result->shape[i], shape[i]);
+  if (shape[i] == 0)
+	zerosize = true;
+}
+
+  if (zerosize)
+goto sizezero;
 
   while (nsource > 0 || npad > 0)
 {
@@ -6897,6 +6905,8 @@ inc:
   break;
 }
 
+sizezero:
+
   mpz_clear (index);
 
   return result;
! { dg-do compile }
! { dg-options "-fdump-tree-original" }
!
! Test the fix for PR96624 in which an attempt was made to assign
! to the zero length temporary created by reshape, resulting in a segfault.
!
! Contributed by Dong Shenpo  
!
program test
  integer :: a(2,0)
  a = reshape([1,2,3,4], [2,0])
  print *, a
end
! { dg-final { scan-tree-dump-times "data" 3 "original" } }


RE: [PATCH] fix testcase gcc.target/aarch64/insv_1.c

2020-08-26 Thread Qian, Jianhua
Hi Richard

I found that some instructions are using '#' before immediate value,
and others are not. For example
(define_insn "insv_imm"
  [(set (zero_extract:GPI (match_operand:GPI 0 "register_operand" "+r")
  (const_int 16)
  (match_operand:GPI 1 "const_int_operand" "n"))
(match_operand:GPI 2 "const_int_operand" "n"))]
  "UINTVAL (operands[1]) < GET_MODE_BITSIZE (mode)
   && UINTVAL (operands[1]) % 16 == 0"
  "movk\\t%0, %X2, lsl %1"
  [(set_attr "type" "mov_imm")]
)

Are there any standards for this?

Regards
Qian

-Original Message-
From: Richard Sandiford  
Sent: Wednesday, August 26, 2020 6:09 PM
To: Qian, Jianhua/钱 建华 
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] fix testcase gcc.target/aarch64/insv_1.c

Qian Jianhua  writes:
> There are three failures in gcc.target/aarch64/insv_1.c.
>  FAIL: gcc.target/aarch64/insv_1.c scan-assembler bfi\tx[0-9]+, 
> x[0-9]+, 0, 8
>  FAIL: gcc.target/aarch64/insv_1.c scan-assembler bfi\tx[0-9]+, 
> x[0-9]+, 16, 5
>  FAIL: gcc.target/aarch64/insv_1.c scan-assembler movk\tx[0-9]+, 
> 0x1d6b, lsl 32
>
> This patch fix the third failure which was missed "#" before immediate 
> value in scan-assembler.

Thanks, pushed to master.

Richard

> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/insv_1.c: Add '#' in scan-assembler
>
> ---
>  gcc/testsuite/gcc.target/aarch64/insv_1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/insv_1.c 
> b/gcc/testsuite/gcc.target/aarch64/insv_1.c
> index 360a9892ad9..9efa22e649d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/insv_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/insv_1.c
> @@ -32,7 +32,7 @@ bfi2 (bitfield a)
>  bitfield
>  movk (bitfield a)
>  {
> -  /* { dg-final { scan-assembler "movk\tx\[0-9\]+, 0x1d6b, lsl 32" } 
> } */
> +  /* { dg-final { scan-assembler "movk\tx\[0-9\]+, #0x1d6b, lsl 32" } 
> + } */
>a.sixteen = 7531;
>return a;
>  }






RE: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-08-26 Thread xiezhiheng
> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Wednesday, August 26, 2020 6:14 PM
> To: xiezhiheng 
> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> emitted at -O3
> 

Cut...

> 
> Thanks, looks great.  Pushed to master.
> 
> Richard

I made two separate patches for these two groups for review purposes.

Note: Patch for min/max intrinsics should be applied before the patch for 
rounding intrinsics

Bootstrapped and tested on aarch64 Linux platform.

Thanks,
Xie Zhiheng


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f6605eae08c..939aae71ecd 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2020-08-27  Zhiheng Xie  
+
+   * config/aarch64/aarch64-simd-builtins.def: Add proper FLAG
+   for min/max intrinsics.
+


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index f6605eae08c..b0d3ec6cf19 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2020-08-27  Zhiheng Xie  
+
+   * config/aarch64/aarch64-simd-builtins.def: Add proper FLAG
+   for rounding intrinsics.
+


min_max-v1.patch
Description: min_max-v1.patch


rounding-v1.patch
Description: rounding-v1.patch


[PATCH 4/4] PowerPC: Add power10 xscmp{eq,gt,ge}qp support

2020-08-26 Thread Michael Meissner via Gcc-patches
PowerPC: Add power10 xscmp{eq,gt,ge}qp support.

This patch adds the conditional move support.  In adding the conditional move
support, the optimizers will be able to convert things like:

a = (b > c) ? b : c;

into the instructions.  This patch merges together the scalar SF/DF conditional
move with the scalar KF/TF conditional move.  It extends the optimization that
was previously used for SFmode and DFmode to allow the comparison to be a
different scalar floating point mode than the move.  I.e.

__float128 a, b, c;
float x, y;

/* ... */

a = (x == y) ? b : c;

I did have to add an XXPERMDI if the comparison mode was SFmode or DFmode, and
the move mode is KFmode or TFmode (the XSCMP{EQ,GT,GE}DP instructions
explicitly set the bottom 64 bits of the vector register to 0).

I have built compilers on a little endian power9 Linux system with all 4
patches applied.  I did bootstrap builds and ran the testsuite, with no
regressions.  Previous versions of the patch was also tested on a little endian
power8 Linux system.  I would like to check this patch into the master branch
for GCC 11.  At this time, I do not anticipate needing to backport these
changes to GCC 10.3.

gcc/
2020-08-26  Michael Meissner  

* config/rs6000/predicates.md (fpmask_normal_or_invert_operator):
New predicate.
* config/rs6000/rs6000.c (have_compare_and_set_mask): Add IEEE
128-bit floating point types.
* config/rs6000/rs6000.md (FSCALAR2): New iterator for floating
point conditional moves.
(movcc_p9): Replace with
mov.
(movcc_invert_p9): Replace with
mov.
(mov): Combine both
movcc_p9 and
movcc_invert_p9 patterns.  Add ISA 3.1
support for IEEE 128-bit conditional moves.  Always use an
earlyclobber register for the mask.  Use XXPERMDI to extend the
mask if we have a 64-bit comparison and 128-bit move.
register for the mask.
(fpmask, xxsel): Add ISA 3.1 support for IEEE 128-bit
conditional moves.  Enable the generator functionality so
mov can call it.  Update constraints
for 128-bit operations.

gcc/testsuite/
2020-08-26  Michael Meissner  

* gcc.target/powerpc/float128-cmove.c: New test.
* gcc.target/powerpc/float128-minmax-3.c: New test.

---
 gcc/config/rs6000/predicates.md   |   5 +
 gcc/config/rs6000/rs6000.c|   4 +
 gcc/config/rs6000/rs6000.md   | 154 ++
 .../gcc.target/powerpc/float128-cmove.c   |  93 +++
 .../gcc.target/powerpc/float128-minmax-3.c|  15 ++
 5 files changed, 200 insertions(+), 71 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-cmove.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-3.c

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 2709e46f7e5..60b45601e9b 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1225,6 +1225,11 @@ (define_predicate "fpmask_comparison_operator"
 (define_predicate "invert_fpmask_comparison_operator"
   (match_code "ne,unlt,unle"))
 
+;; Return 1 if OP is either a fpmask_comparison_operator or
+;; invert_fpmask_comparsion_operator.
+(define_predicate "fpmask_normal_or_invert_operator"
+  (match_code "eq,gt,ge,ne,unlt,unle"))
+
 ;; Return 1 if OP is a comparison operation suitable for integer vector/scalar
 ;; comparisons that generate a -1/0 mask.
 (define_predicate "vecint_comparison_operator"
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 05eb141a2cd..403897926c5 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15177,6 +15177,10 @@ have_compare_and_set_mask (machine_mode mode)
 case DFmode:
   return TARGET_P9_MINMAX;
 
+case KFmode:
+case TFmode:
+  return FLOAT128_IEEE_MINMAX_P (mode);
+
 default:
   break;
 }
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 006e60f09bc..147c635994c 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -795,6 +795,15 @@ (define_mode_iterator FSCALAR [SF
   (KF "FLOAT128_IEEE_MINMAX_P (KFmode)")
   (TF "FLOAT128_IEEE_MINMAX_P (TFmode)")])
 
+;; Secondary iterator for scalar binary floating point operations.  This is
+;; used for the conditional moves when we have a compare and set mask
+;; instruction.  Using this instruction allows us to do a conditional move
+;; where the comparison type might be different from the values being moved.
+(define_mode_iterator FSCALAR2 [SF
+   DF
+   (KF "FLOAT128_IEEE_MINMAX_P (KFmode)")
+   (TF "FLOAT128_IEEE_MINMAX_P (TFmode)")])
+
 ;; Constraints to use for scalar FP operations
 (define_mode_attr Fm [(SF "wa")
 

[PATCH 3/4] PowerPC: Add power10 xsmaxcqp/xsmincqp support

2020-08-26 Thread Michael Meissner via Gcc-patches
PowerPC: Add power10 xsmaxcqp/xsmincqp support.

This patch adds support for the ISA 3.1 (power10) IEEE 128-bit "C" minimum and
maximum functions.  Because of the NaN differences, the built-in functions will
only generate these instructions if -ffast-math is used until the conditional
move support is added in the next patch.

I have built compilers on a little endian power9 Linux system with all 4
patches applied.  I did bootstrap builds and ran the testsuite, with no
regressions.  Previous versions of the patch was also tested on a little endian
power8 Linux system.  I would like to check this patch into the master branch
for GCC 11.  At this time, I do not anticipate needing to backport these
changes to GCC 10.3.


gcc/
2020-08-26  Michael Meissner  

* config/rs6000/rs6000.h (FLOAT128_IEEE_MINMAX_P): New helper
macro.
* config/rs6000/rs6000.md (FSCALAR): New mode iterator for floating
point scalars.
(Fm): New mode attribute for floating point scalars.
(s): Add support for the ISA 3.1 IEEE 128-bit
minimum and maximum instructions.
(s3_vsx): Add support for the ISA 3.1 IEEE 128-bit
minimum and maximum instructions.

gcc/testsuite/
2020-08-26  Michael Meissner  

* gcc.target/powerpc/float128-minmax-2.c: New test.

---
 gcc/config/rs6000/rs6000.c|  3 +-
 gcc/config/rs6000/rs6000.h|  4 +++
 gcc/config/rs6000/rs6000.md   | 28 +++
 .../gcc.target/powerpc/float128-minmax-2.c| 15 ++
 4 files changed, 43 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/float128-minmax-2.c

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6324f930628..05eb141a2cd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15445,7 +15445,8 @@ rs6000_emit_minmax (rtx dest, enum rtx_code code, rtx 
op0, rtx op1)
   /* VSX/altivec have direct min/max insns.  */
   if ((code == SMAX || code == SMIN)
   && (VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)
- || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode
+ || (mode == SFmode && VECTOR_UNIT_VSX_P (DFmode))
+ || FLOAT128_IEEE_MINMAX_P (mode)))
 {
   emit_insn (gen_rtx_SET (dest, gen_rtx_fmt_ee (code, mode, op0, op1)));
   return;
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index bbd8060e143..b504aaa0199 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -345,6 +345,10 @@ extern const char *host_detect_local_cpu (int argc, const 
char **argv);
|| ((MODE) == TDmode)   \
|| (!TARGET_FLOAT128_TYPE && FLOAT128_IEEE_P (MODE)))
 
+/* Macro whether the float128 min/max instructions are enabled.  */
+#define FLOAT128_IEEE_MINMAX_P(MODE)   \
+  (TARGET_POWER10 && TARGET_FLOAT128_HW && FLOAT128_IEEE_P (MODE))
+
 /* Return true for floating point that does not use a vector register.  */
 #define SCALAR_FLOAT_MODE_NOT_VECTOR_P(MODE)   \
   (SCALAR_FLOAT_MODE_P (MODE) && !FLOAT128_VECTOR_P (MODE))
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 43b620ae1c0..006e60f09bc 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -789,6 +789,18 @@ (define_code_attr minmax   [(smin "min")
 (define_code_attr SMINMAX  [(smin "SMIN")
 (smax "SMAX")])
 
+;; Mode iterator for scalar binary floating point operations
+(define_mode_iterator FSCALAR [SF
+  DF
+  (KF "FLOAT128_IEEE_MINMAX_P (KFmode)")
+  (TF "FLOAT128_IEEE_MINMAX_P (TFmode)")])
+
+;; Constraints to use for scalar FP operations
+(define_mode_attr Fm [(SF "wa")
+ (DF "wa")
+ (TF "v")
+ (KF "v")])
+
 ;; Iterator to optimize the following cases:
 ;; D-form load to FPR register & move to Altivec register
 ;; Move Altivec register to FPR register and store
@@ -5142,9 +5154,9 @@ (define_insn "copysign3_fcpsgn"
 ;; to allow either DF/SF to use only traditional registers.
 
 (define_expand "s3"
-  [(set (match_operand:SFDF 0 "gpc_reg_operand")
-   (fp_minmax:SFDF (match_operand:SFDF 1 "gpc_reg_operand")
-   (match_operand:SFDF 2 "gpc_reg_operand")))]
+  [(set (match_operand:FSCALAR 0 "gpc_reg_operand")
+   (fp_minmax:FSCALAR (match_operand:FSCALAR 1 "gpc_reg_operand")
+  (match_operand:FSCALAR 2 "gpc_reg_operand")))]
   "TARGET_MINMAX"
 {
   rs6000_emit_minmax (operands[0], , operands[1], operands[2]);
@@ -5152,11 +5164,15 @@ (define_expand "s3"
 })
 
 (define_insn "*s3_vsx"
-  [(set (match_operand:SFDF 0 "vsx_register_operand" "=")
-   (fp_minmax:SFDF (match_operand:SFDF 1 "vsx_register_operand" "")
-   (match_operand:SF

[PATCH 2/4] PowerPC: Rename functions for min, max, cmove

2020-08-26 Thread Michael Meissner via Gcc-patches
PowerPC: Rename functions for min, max, cmove.

This patch renames the functions that generate the ISA 3.0 C minimum, C
maximum, and conditional move instructions to use a better name than just using
a _p9 suffix.  Because the functions can fail, the names use "maybe_emit"
instead of "generate_" in the name.

I have built compilers on a little endian power9 Linux system with all 4
patches applied.  I did bootstrap builds and ran the testsuite, with no
regressions.  Previous versions of the patch was also tested on a little endian
power8 Linux system.  I would like to check this patch into the master
branch for GCC 11.  At this time, I do not anticipate needing to backport these
changes to GCC 10.3.

gcc/
2020-08-26  Michael Meissner  

* config/rs6000/rs6000.c (rs6000_emit_p9_fp_minmax): Rename to
maybe_emit_fp_c_minmax.
(maybe_emit_fp_c_minmax): Rename rs6000_emit_p9_fp_minmax.  Return
bool instead of int.
(rs6000_emit_p9_fp_cmove): Rename to maybe_emit_fp_cmove.
(maybe_emit_fp_cmove): Rename rs6000_emit_p9_fp_cmove.  Return
bool instead of int.
(have_compare_and_set_mask): New helper function.
(rs6000_emit_cmove): Rework support of ISA 3.0 functions to
generate "C" minimum, "C" maximum, and conditional move
instructions for scalar floating point.

---
 gcc/config/rs6000/rs6000.c | 77 ++
 1 file changed, 53 insertions(+), 24 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index bac50c2bcf6..6324f930628 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15056,13 +15056,17 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx op_true, 
rtx op_false,
   return 1;
 }
 
-/* ISA 3.0 (power9) minmax subcase to emit a XSMAXCDP or XSMINCDP instruction
-   for SF/DF scalars.  Move TRUE_COND to DEST if OP of the operands of the last
-   comparison is nonzero/true, FALSE_COND if it is zero/false.  Return 0 if the
-   hardware has no such operation.  */
+/* Possibly emit the C variant of the minimum or maximum instruction for
+   floating point scalars (xsmincdp, xsmaxcdp, etc.).
 
-static int
-rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond)
+   Move TRUE_COND to DEST if OP of the operands of the last comparison is
+   nonzero/true, FALSE_COND if it is zero/false.
+
+   Return false if we can't generate the appropriate minimum or maximum, and
+   true if we can did the minimum or maximum.  */
+
+static bool
+maybe_emit_fp_c_minmax (rtx dest, rtx op, rtx true_cond, rtx false_cond)
 {
   enum rtx_code code = GET_CODE (op);
   rtx op0 = XEXP (op, 0);
@@ -15072,14 +15076,14 @@ rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx 
true_cond, rtx false_cond)
   bool max_p = false;
 
   if (result_mode != compare_mode)
-return 0;
+return false;
 
   if (code == GE || code == GT)
 max_p = true;
   else if (code == LE || code == LT)
 max_p = false;
   else
-return 0;
+return false;
 
   if (rtx_equal_p (op0, true_cond) && rtx_equal_p (op1, false_cond))
 ;
@@ -15092,19 +15096,23 @@ rs6000_emit_p9_fp_minmax (rtx dest, rtx op, rtx 
true_cond, rtx false_cond)
 max_p = !max_p;
 
   else
-return 0;
+return false;
 
   rs6000_emit_minmax (dest, max_p ? SMAX : SMIN, op0, op1);
-  return 1;
+  return true;
 }
 
-/* ISA 3.0 (power9) conditional move subcase to emit XSCMP{EQ,GE,GT,NE}DP and
-   XXSEL instructions for SF/DF scalars.  Move TRUE_COND to DEST if OP of the
-   operands of the last comparison is nonzero/true, FALSE_COND if it is
-   zero/false.  Return 0 if the hardware has no such operation.  */
+/* Possibly emit a floating point conditional move by generating a compare that
+   sets a mask instruction and a XXSEL select instruction.
 
-static int
-rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond)
+   Move TRUE_COND to DEST if OP of the operands of the last comparison is
+   nonzero/true, FALSE_COND if it is zero/false.
+
+   Return false if the operation cannot be generated, and true if we could
+   generate the instruction.  */
+
+static bool
+maybe_emit_fp_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond)
 {
   enum rtx_code code = GET_CODE (op);
   rtx op0 = XEXP (op, 0);
@@ -15132,7 +15140,7 @@ rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx 
true_cond, rtx false_cond)
   break;
 
 default:
-  return 0;
+  return false;
 }
 
   /* Generate: [(parallel [(set (dest)
@@ -15152,7 +15160,28 @@ rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx 
true_cond, rtx false_cond)
   emit_insn (gen_rtx_PARALLEL (VOIDmode,
   gen_rtvec (2, cmove_rtx, clobber_rtx)));
 
-  return 1;
+  return true;
+}
+
+/* Helper function to return true if the target has instructions to do a
+   compare and set mask instruction that can be used with XXSEL to implement a
+   conditional move.  It is also assumed that such a target also supports the

[PATCH 1/4] PowerPC: Change cmove function return to bool

2020-08-26 Thread Michael Meissner via Gcc-patches
PowerPC: Change cmove function return to bool.

In doing the other work for adding ISA 3.1 128-bit minimum, maximum, and
conditional move support, I noticed the two functions that process conditional
moves return 'int' instead of 'bool'.  This patch changes these functions to
return 'bool'.

I have built compilers on a little endian power9 Linux system with all 4
patches applied.  I did bootstrap builds and ran the testsuite, with no
regressions.  Previous versions of the patch was also tested on a little endian
power8 Linux system.  I would like to check this patch into the master
branch for GCC 11.  At this time, I do not anticipate needing to backport these
changes to GCC 10.3.

gcc/
2020-08-26  Michael Meissner  

* config/rs6000/rs6000-protos.h (rs6000_emit_cmove): Change return
type to bool.
(rs6000_emit_int_cmove): Change return type to bool.
* config/rs6000/rs6000.c (rs6000_emit_cmove): Change return type
to bool.
(rs6000_emit_int_cmove): Change return type to bool.

---
 gcc/config/rs6000/rs6000-protos.h |  4 ++--
 gcc/config/rs6000/rs6000.c| 32 +++
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 28e859f4381..02e4d71922f 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -119,8 +119,8 @@ extern char * output_cbranch (rtx, const char *, int, 
rtx_insn *);
 extern const char * output_probe_stack_range (rtx, rtx, rtx);
 extern void rs6000_emit_dot_insn (rtx dst, rtx src, int dot, rtx ccreg);
 extern bool rs6000_emit_set_const (rtx, rtx);
-extern int rs6000_emit_cmove (rtx, rtx, rtx, rtx);
-extern int rs6000_emit_int_cmove (rtx, rtx, rtx, rtx);
+extern bool rs6000_emit_cmove (rtx, rtx, rtx, rtx);
+extern bool rs6000_emit_int_cmove (rtx, rtx, rtx, rtx);
 extern int rs6000_emit_vector_cond_expr (rtx, rtx, rtx, rtx, rtx, rtx);
 extern void rs6000_emit_minmax (rtx, enum rtx_code, rtx, rtx);
 extern void rs6000_expand_atomic_compare_and_swap (rtx op[]);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 1c1caa90ede..bac50c2bcf6 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -15159,7 +15159,7 @@ rs6000_emit_p9_fp_cmove (rtx dest, rtx op, rtx 
true_cond, rtx false_cond)
operands of the last comparison is nonzero/true, FALSE_COND if it
is zero/false.  Return 0 if the hardware has no such operation.  */
 
-int
+bool
 rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, rtx false_cond)
 {
   enum rtx_code code = GET_CODE (op);
@@ -15175,11 +15175,11 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, 
rtx false_cond)
   /* In the isel case however, we can use a compare immediate, so
 op1 may be a small constant.  */
   && (!TARGET_ISEL || !short_cint_operand (op1, VOIDmode)))
-return 0;
+return false;
   if (GET_MODE (true_cond) != result_mode)
-return 0;
+return false;
   if (GET_MODE (false_cond) != result_mode)
-return 0;
+return false;
 
   /* See if we can use the ISA 3.0 (power9) min/max/compare functions.  */
   if (TARGET_P9_MINMAX
@@ -15187,16 +15187,16 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, 
rtx false_cond)
   && (result_mode == SFmode || result_mode == DFmode))
 {
   if (rs6000_emit_p9_fp_minmax (dest, op, true_cond, false_cond))
-   return 1;
+   return true;
 
   if (rs6000_emit_p9_fp_cmove (dest, op, true_cond, false_cond))
-   return 1;
+   return true;
 }
 
   /* Don't allow using floating point comparisons for integer results for
  now.  */
   if (FLOAT_MODE_P (compare_mode) && !FLOAT_MODE_P (result_mode))
-return 0;
+return false;
 
   /* First, work out if the hardware can do this at all, or
  if it's too slow  */
@@ -15204,7 +15204,7 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, rtx 
false_cond)
 {
   if (TARGET_ISEL)
return rs6000_emit_int_cmove (dest, op, true_cond, false_cond);
-  return 0;
+  return false;
 }
 
   is_against_zero = op1 == CONST0_RTX (compare_mode);
@@ -15216,7 +15216,7 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, rtx 
false_cond)
  generated.  */
   if (SCALAR_FLOAT_MODE_P (compare_mode)
   && flag_trapping_math && ! is_against_zero)
-return 0;
+return false;
 
   /* Eliminate half of the comparisons by switching operands, this
  makes the remaining code simpler.  */
@@ -15232,7 +15232,7 @@ rs6000_emit_cmove (rtx dest, rtx op, rtx true_cond, rtx 
false_cond)
   /* UNEQ and LTGT take four instructions for a comparison with zero,
  it'll probably be faster to use a branch here too.  */
   if (code == UNEQ && HONOR_NANS (compare_mode))
-return 0;
+return false;
 
   /* We're going to try to implement comparisons by performing
  a subtract, then comparing against zero.  Unfortunately,
@@ -15247,14 +15247,14 

[PATCH] Add power10 IEEE 128-bit minimum, maximum, and compare with mask instructions

2020-08-26 Thread Michael Meissner via Gcc-patches
The following patches are a rewrite of the previous set of patches to add
support for the power10 IEEE 128-bit C minimum, C maximum, and compare/set mask
instructions that are similar to the instructions added in power9.

There are 4 patches in this series.

The first patch is a cosmetic patch.  In the previous patches, Segher said the
new functions should return bool instead of int.  In adding the support, I
noticed the two existing functions processing conditional moves
(rs6000_emit_cmove and rs6000_emit_int_cmove) returned int rather than bool.
The first patch just changes the return type and return statements to now
return bool.

The second patch renames the functions that generate the ISA 3.0 C minimum, C
maximum, and conditional move instructions to use a better name than just using
a _p9 suffix.  As Segher suggested, the names should be of the form
"maybe_emit" instead of "generate_", since both functions can fail.

The third patch adds the minimum and maximum support without adding the
conditional move support (the 4th patch will add the conditional move support).
Because of the NaN differences, the built-in functions will only generate these
instructions if -ffast-math is used.

The fourth patch adds the conditional move support.  In adding the conditional
move support, the optimizers will be able to convert things like:

a = (b > c) ? b : c;

into the instructions.  Unlike the previous set of patches, this patch merges
together the scalar SF/DF conditional move with the scalar KF/TF conditional
move.  It extends the optimization that was previously used for SFmode and
DFmode to allow the comparison to be a different scalar floating point mode
than the move.  I.e.

__float128 a, b, c;
float x, y;

/* ... */

a = (x == y) ? b : c;

I did have to add an XXPERMDI if the comparison mode was SFmode or DFmode, and
the move mode is KFmode or TFmode (the XSCMP{EQ,GT,GE}DP instructions
explicitly set the bottom 64 bits of the vector register to 0).

I have built compilers on a little endian power9 Linux system with all 4
patches applied.  I did bootstrap builds and ran the testsuite, with no
regressions.  Previous versions of the patch was also tested on a little endian
power8 Linux system.  I would like to check these patches into the master
branch for GCC 11.  At this time, I do not anticipate needing to backport these
changes to GCC 10.3.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797


Re: [PATCH] lra: Canonicalize mult to shift in address reloads

2020-08-26 Thread Vladimir Makarov via Gcc-patches

On 2020-08-26 11:15 a.m., Alex Coplan wrote:

Thanks for the review, both.


Please find a reworked version of the patch attached incorporating
Richard's feedback.

Testing:
  * Bootstrap and regtest on aarch64-none-linux-gnu,
arm-none-linux-gnueabihf, and x86_64-pc-linux-gnu: no regressions.

Is the updated patch OK for master?


Yes.  Thank you, Alex.



[PATCH] testsuite: add -fno-tree-fre in gcc.dg/guality

2020-08-26 Thread Hu Jiangping
This patch add -fno-tree-fre to dg-options in gcc.dg/guality/sra-1.c,
to make the following testcases passed.

FAIL: gcc.dg/guality/sra-1.c  -Og -DPREVENT_OPTIMIZATION  line 43 a.i == 4  
 
FAIL: gcc.dg/guality/sra-1.c  -Og -DPREVENT_OPTIMIZATION  line 43 a.j == 14 
 
FAIL: gcc.dg/guality/sra-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 21 a.j == 14   
 
FAIL: gcc.dg/guality/sra-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 32 a[1] == 14  
 
FAIL: gcc.dg/guality/sra-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects -DPREVENT_OPTIMIZATION line 43 a.j == 14   
 

The detail error logs like this:
> Spawning: gdb -nx -nw -quiet -batch -x sra-1.gdb ./sra-1.exe  
> 
> spawn gdb -nx -nw -quiet -batch -x sra-1.gdb ./sra-1.exe  
>   
> Breakpoint 1 at 0x40074c: file 
> /home/build_gcc/gcc-a64fx-1-a/gcc/testsuite/gcc.dg/guality/sra-1.c, line 43. 
>  
> Breakpoint 1, f3 (k=k@entry=7) at 
> /home/build_gcc/gcc-a64fx-1-a/gcc/testsuite/gcc.dg/guality/sra-1.c:43
> 43bar (a.j);/* { dg-final { gdb-test . "a.j" "14" } } */
> $1 = 
> $2 = 4
>  != 4

Tested on aarch64.

Regards!
hujp

---
 gcc/testsuite/gcc.dg/guality/sra-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/guality/sra-1.c 
b/gcc/testsuite/gcc.dg/guality/sra-1.c
index 8ad57cf3f8e..94ea29dd727 100644
--- a/gcc/testsuite/gcc.dg/guality/sra-1.c
+++ b/gcc/testsuite/gcc.dg/guality/sra-1.c
@@ -1,6 +1,6 @@
 /* PR debug/43983 */
 /* { dg-do run } */
-/* { dg-options "-g -fno-ipa-icf" } */
+/* { dg-options "-g -fno-tree-fre -fno-ipa-icf" } */
 
 struct A { int i; int j; };
 struct B { int : 4; int i : 12; int j : 12; int : 4; };
-- 
2.17.1





Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-26 Thread John David Anglin
On 2020-08-26 5:23 p.m., Roger Sayle wrote:
> These more accurate target rtx_costs are used by the 
> gimple-ssa-strength-reduction.c
> (via a call to mult_by_coeff_cost) to decide whether applying strength 
> reduction would
> be profitable.  This test case, slsr-13.c, assumes that two multiplications 
> by four are
> cheaper than two multiplications by five.   (I believe) This is not the case 
> on hppa which
> has a sh2add instruction, that performs a multiplication by five in one 
> cycle, or exactly
> the same cost as performing a left shift by two (i.e. a multiplication by 
> four).  Oddly, I
> also believe this isn't the case on x86_64, where the similar lea instruction 
> is (sometimes)
> as efficient as left shift by two bits.
This looks like a regression.

gcc-10 (prepatch):

    addl %r25,%r26,%r28
    sh2addl %r25,%r28,%r25
    sh2addl %r26,%r28,%r26
    addl %r26,%r28,%r28
    bv %r0(%r2)
    addl %r28,%r25,%r28

   [local count: 1073741824]:
  x1_4 = c_2(D) + s_3(D);
  slsr_11 = s_3(D) * 4;
  x2_6 = x1_4 + slsr_11;
  slsr_12 = c_2(D) * 4;
  x3_8 = x1_4 + slsr_12;
  _1 = x1_4 + x2_6;
  x_9 = _1 + x3_8;
  return x_9;

gcc-11 (with patch):

    addl %r25,%r26,%r19
    sh2addl %r26,%r26,%r28
    addl %r28,%r25,%r28
    sh2addl %r25,%r25,%r25
    addl %r28,%r19,%r28
    addl %r25,%r26,%r26
    bv %r0(%r2)
    addl %r28,%r26,%r28

   [local count: 1073741824]:
  x1_4 = c_2(D) + s_3(D);
  a2_5 = s_3(D) * 5;
  x2_6 = c_2(D) + a2_5;
  a3_7 = c_2(D) * 5;
  x3_8 = s_3(D) + a3_7;
  _1 = x1_4 + x2_6;
  x_9 = _1 + x3_8;
  return x_9;

Regards,
Dave

-- 
John David Anglin  dave.ang...@bell.net



Re: Duplicate .debug_lines (Was: [PATCH 5/5] Add --gdwarf-5 to ASM_SPEC)

2020-08-26 Thread H.J. Lu via Gcc-patches
On Wed, Aug 26, 2020 at 2:38 PM Mark Wielaard  wrote:
>
> Hi gcc hackers, binutils hackers,
>
> Nick, Jakub and I were discussing the gcc patch below and all the ways
> it is wrong. Most things can be fixed in the spec. Like only passing
> -gdwarf if we are generating DWARF and passing the right DWARF version
> as -gdwarf-N for the version that gcc itself creates. But whether or
> not we want gas to generate .debug_line info is a bit tricky. But when
> giving -gdwarf-N gas will always try to generate a .debug_line section
> and error out when there is already one.
>
> Would it be possible to have something like the following in gas, so
> that it doesn't try generating a .debug_line section if there already
> is one, even when -gdwarf-N is given (unless the assembly also
> contains .loc directives because that shows the user is really
> confused)?
>
> diff --git a/gas/dwarf2dbg.c b/gas/dwarf2dbg.c
> index e4ba56d82ba..c0c09f4e9d0 100644
> --- a/gas/dwarf2dbg.c
> +++ b/gas/dwarf2dbg.c
> @@ -2626,7 +2626,7 @@ dwarf2_init (void)
>
>
>  /* Finish the dwarf2 debug sections.  We emit .debug.line if there
> -   were any .file/.loc directives, or --gdwarf2 was given, or if the
> +   were any .file/.loc directives, or --gdwarf2 was given, and if the
> file has a non-empty .debug_info section and an empty .debug_line
> section.  If we emit .debug_line, and the .debug_info section is
> empty, we also emit .debug_info, .debug_aranges and .debug_abbrev.
> @@ -2650,9 +2650,16 @@ dwarf2_finish (void)
>empty_debug_line = line_seg == NULL || !seg_not_empty_p (line_seg);
>
>/* We can't construct a new debug_line section if we already have one.
> - Give an error.  */
> + Give an error if we have seen any .loc, otherwise trust the user
> + knows what they are doing and want to generate the .debug_line
> + (and all other debug sections) themselves.  */
>if (all_segs && !empty_debug_line)
> -as_fatal ("duplicate .debug_line sections");
> +{
> +  if (dwarf2_loc_directive_seen)
> +   as_fatal ("duplicate .debug_line sections");
> +  else
> +   return;
> +}
>
>if ((!all_segs && emit_other_sections)
>|| (!emit_other_sections && !empty_debug_line))
>

I have run into this issue before.  "as -g" shouldn't silently
generate incorrect
debug info when input assembly codes already contain debug directives.
AS should either issue an error or ignore -g.  In either case, we need
a testcase
to verify it.

-- 
H.J.


libgo patch committed: Add -maix32 when calling GCC on 32-bit AIX

2020-08-26 Thread Ian Lance Taylor via Gcc-patches
This libgo patch by Clément Chigot adds -maix32 when running GCC on
32-bit AIX.  As gcc might now be compiled in 64bit, -maix32 must
always be added to ensure that created objects will be 32bit.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
0c223b02b6e4dfbdeac9f1dcd64ee9d1cd04a5a2
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 2f0865b7c80..780588aabc5 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-aac2b382839154d74eeef160522c0a5c1ea8aadf
+9aed2d2c5e9c69aa530bf09d72d33c66e497d720
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/cmd/cgo/gcc.go b/libgo/go/cmd/cgo/gcc.go
index 249cfe4675d..f774cbb9353 100644
--- a/libgo/go/cmd/cgo/gcc.go
+++ b/libgo/go/cmd/cgo/gcc.go
@@ -1573,6 +1573,10 @@ func (p *Package) gccMachine() []string {
if goos == "aix" {
return []string{"-maix64"}
}
+   case "ppc":
+   if goos == "aix" {
+   return []string{"-maix32"}
+   }
}
return nil
 }
@@ -1615,7 +1619,6 @@ func (p *Package) gccCmd() []string {
c = append(c, p.GccOptions...)
c = append(c, p.gccMachine()...)
if goos == "aix" {
-   c = append(c, "-maix64")
c = append(c, "-mcmodel=large")
}
c = append(c, "-") //read input from standard input
diff --git a/libgo/go/cmd/go/internal/work/exec.go 
b/libgo/go/cmd/go/internal/work/exec.go
index d610410a72c..65f3011adfa 100644
--- a/libgo/go/cmd/go/internal/work/exec.go
+++ b/libgo/go/cmd/go/internal/work/exec.go
@@ -2503,6 +2503,10 @@ func (b *Builder) gccArchArgs() []string {
if cfg.Goos == "aix" {
return []string{"-maix64"}
}
+   case "ppc":
+   if cfg.Goos == "aix" {
+   return []string{"-maix32"}
+   }
}
return nil
 }


libgo patch committed: Add FAT library support for AIX static libraries

2020-08-26 Thread Ian Lance Taylor via Gcc-patches
This libgo patch by Clément Chigot adds FAT library support for static
libraries on AIX.  Like shared libraries, AIX static libraries must
also have both 32 and 64 bit objects.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
a66f773796a78d61891f5e78d275aeb46260e7ca
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index e425f15285e..2f0865b7c80 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-823c91088bc6ac606362fc34b2880ce0de1624ad
+aac2b382839154d74eeef160522c0a5c1ea8aadf
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/Makefile.am b/libgo/Makefile.am
index 1112ee27df6..9ce0cab9d6a 100644
--- a/libgo/Makefile.am
+++ b/libgo/Makefile.am
@@ -1255,4 +1255,8 @@ add-aix-fat-library: all-multi
  arx=`echo $(AR) | sed -e 's/-X[^ ]*//g'`; \
  $${arx} -X$(AIX_EXTRA_ARCH) rc .libs/$(PACKAGE).a 
../ppc$(AIX_EXTRA_ARCH)/$(PACKAGE)/.libs/$(PACKAGE).so.$(MAJOR); \
  $${arx} -X$(AIX_EXTRA_ARCH) rc 
../pthread/$(PACKAGE)/.libs/$(PACKAGE).a 
../pthread/ppc$(AIX_EXTRA_ARCH)/$(PACKAGE)/.libs/$(PACKAGE).so.$(MAJOR); \
+ $${arx} -X$(AIX_EXTRA_ARCH) rc libgobegin.a 
../ppc$(AIX_EXTRA_ARCH)/$(PACKAGE)/$(libgobegin_a_OBJECTS); \
+ $${arx} -X$(AIX_EXTRA_ARCH) rc ../pthread/$(PACKAGE)/libgobegin.a 
../pthread/ppc$(AIX_EXTRA_ARCH)/$(PACKAGE)/$(libgobegin_a_OBJECTS); \
+ $${arx} -X$(AIX_EXTRA_ARCH) rc libgolibbegin.a 
../ppc$(AIX_EXTRA_ARCH)/$(PACKAGE)/$(libgolibbegin_a_OBJECTS); \
+ $${arx} -X$(AIX_EXTRA_ARCH) rc ../pthread/$(PACKAGE)/libgolibbegin.a 
../pthread/ppc$(AIX_EXTRA_ARCH)/$(PACKAGE)/$(libgolibbegin_a_OBJECTS); \
fi


Re: [PATCH] i386: Add c99 runtime requirement to math optimisation tests

2020-08-26 Thread Jeff Law via Gcc-patches
On Tue, 2020-08-18 at 18:18 -0400, Pat Bernardi wrote:
> A number of i386 math optimisation tests are looking assembly instructions
> that are only emitted when the compiler knows the target has a C99 libm
> available. Since targets like *-elf may not have such a libm, a C99 runtime
> requirement is added to these tests.
> 
> Tested on x86-elf and x86_64-elf hosted on x86_64-linux in addition to 
> x86_64-pc-linux-gnu
> 
> If approved, I'll need a maintainer to kindly commit on my behalf.
> 
> Thanks,
> 
> Pat Bernardi
> Senior Software Engineer, AdaCore
> 
> 2020-08-18  Pat Bernardi  
> 
> gcc/testsuite/ChangeLog
> 
>   * gcc.target/i386/387-7.c: Add dg-require-effective-target c99_runtime.
>   * gcc.target/i386/387-9.c: Likewise.
>   * gcc.target/i386/avx512bw-pr96246-1.c: Likewise.
>   * gcc.target/i386/avx512f-rint-sfix-vec-2.c: Likewise.
>   * gcc.target/i386/avx512f-rintf-sfix-vec-2.c: Likewise.
>   * gcc.target/i386/avx512vl-pr96246-1.c: Likewise.
>   * gcc.target/i386/pr61403.c: Likewise.
>   * gcc.target/i386/sse4_1-ceil-sfix-vec.c: Likewise.
>   * gcc.target/i386/sse4_1-ceilf-sfix-vec.c: Likewise.
>   * gcc.target/i386/sse4_1-floor-sfix-vec.c: Likewise.
>   * gcc.target/i386/sse4_1-floorf-sfix-vec.c: Likewise.
>   * gcc.target/i386/sse4_1-rint-sfix-vec.c: Likewise.
>   * gcc.target/i386/sse4_1-rintf-sfix-vec.c: Likewise.
>   * gcc.target/i386/sse4_1-round-sfix-vec.c: Likewise.
>   * gcc.target/i386/sse4_1-roundf-sfix-vec.c: Likewise.
THanks.  THe pr96246 tests moved into g++.target and needed to be applied by
hand.  I took care of that and committed the change.

Jeff



Re: [PATCH] gcov-profile: clarify profile-exclude-files documentation [PR96285]

2020-08-26 Thread Jeff Law via Gcc-patches
On Fri, 2020-07-24 at 15:42 +0200, Martin Liška wrote:
> On 7/24/20 1:02 PM, Göran Uddeborg wrote:
> > The wording of the description of -fprofile-exclude-files is easy to
> > misunderstand.  One can be led to believe a file is excluded only if
> > it matches all of the patterns, not just one.  This patch tries to
> > clarify the function.  It also adjusts the wording of
> > -fprofile-filter-files accordingly.
> 
> Hey.
> 
> The patch is fine. Thank you for the improvement.
> 
> Please install the patch.
I'm pretty sure Goran doesn't have write access.  So I committed the patch for
him.

Jeff



Duplicate .debug_lines (Was: [PATCH 5/5] Add --gdwarf-5 to ASM_SPEC)

2020-08-26 Thread Mark Wielaard
Hi gcc hackers, binutils hackers,

Nick, Jakub and I were discussing the gcc patch below and all the ways
it is wrong. Most things can be fixed in the spec. Like only passing
-gdwarf if we are generating DWARF and passing the right DWARF version
as -gdwarf-N for the version that gcc itself creates. But whether or
not we want gas to generate .debug_line info is a bit tricky. But when
giving -gdwarf-N gas will always try to generate a .debug_line section
and error out when there is already one.

Would it be possible to have something like the following in gas, so
that it doesn't try generating a .debug_line section if there already
is one, even when -gdwarf-N is given (unless the assembly also
contains .loc directives because that shows the user is really
confused)?

diff --git a/gas/dwarf2dbg.c b/gas/dwarf2dbg.c
index e4ba56d82ba..c0c09f4e9d0 100644
--- a/gas/dwarf2dbg.c
+++ b/gas/dwarf2dbg.c
@@ -2626,7 +2626,7 @@ dwarf2_init (void)
 
 
 /* Finish the dwarf2 debug sections.  We emit .debug.line if there
-   were any .file/.loc directives, or --gdwarf2 was given, or if the
+   were any .file/.loc directives, or --gdwarf2 was given, and if the
file has a non-empty .debug_info section and an empty .debug_line
section.  If we emit .debug_line, and the .debug_info section is
empty, we also emit .debug_info, .debug_aranges and .debug_abbrev.
@@ -2650,9 +2650,16 @@ dwarf2_finish (void)
   empty_debug_line = line_seg == NULL || !seg_not_empty_p (line_seg);
 
   /* We can't construct a new debug_line section if we already have one.
- Give an error.  */
+ Give an error if we have seen any .loc, otherwise trust the user
+ knows what they are doing and want to generate the .debug_line
+ (and all other debug sections) themselves.  */
   if (all_segs && !empty_debug_line)
-as_fatal ("duplicate .debug_line sections");
+{
+  if (dwarf2_loc_directive_seen)
+   as_fatal ("duplicate .debug_line sections");
+  else
+   return;
+}
 
   if ((!all_segs && emit_other_sections)
   || (!emit_other_sections && !empty_debug_line))

On Mon, Aug 24, 2020 at 02:56:58PM +0200, Mark Wielaard wrote:
> This is needed to get DWARF version 5 .debug_line tables.
> It is also obviously wrong. It needs a check for whether as supports
> --gdwarf- for all versions we support and it should only
> be added when generating DWARF debug information for the specific
> DWARF version we are generating.
> 
> It also needs some fixes to binutils, to make sure the line table is
> generated correctly:
> https://sourceware.org/pipermail/binutils/2020-August/112685.html
> And to make sure it can read the generated line table itself:
> https://sourceware.org/pipermail/binutils/2020-August/112966.html
> ---
>  gcc/gcc.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/gcc/gcc.c b/gcc/gcc.c
> index 10bc9881aed3..98b10e7cd154 100644
> --- a/gcc/gcc.c
> +++ b/gcc/gcc.c
> @@ -1882,6 +1882,11 @@ init_spec (void)
>}
>  #endif
>  
> +  static const char dv[] = "--gdwarf-5 ";
> +  obstack_grow (&obstack, dv, sizeof (dv) - 1);
> +  obstack_grow0 (&obstack, asm_spec, strlen (asm_spec));
> +  asm_spec = XOBFINISH (&obstack, const char *);
> +
>  #if defined LINK_EH_SPEC || defined LINK_BUILDID_SPEC || \
>  defined LINKER_HASH_STYLE
>  # ifdef LINK_BUILDID_SPEC
> -- 
> 2.18.4
> 


Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-26 Thread Jeff Law via Gcc-patches
On Wed, 2020-08-26 at 22:23 +0100, Roger Sayle wrote:
> The failure of slsr-13.c is not caused by the patchh3.txt, but the previous 
> patchh2.txt
> that's now on mainline and the gcc-10 branch.  That change provided more 
> accurate
> rtx_costs for hppa, and solved the performance problems with synth_mult.
> 
> These more accurate target rtx_costs are used by the 
> gimple-ssa-strength-reduction.c
> (via a call to mult_by_coeff_cost) to decide whether applying strength 
> reduction would
> be profitable.  This test case, slsr-13.c, assumes that two multiplications 
> by four are
> cheaper than two multiplications by five.   (I believe) This is not the case 
> on hppa which
> has a sh2add instruction, that performs a multiplication by five in one 
> cycle, or exactly
> the same cost as performing a left shift by two (i.e. a multiplication by 
> four).  Oddly, I
> also believe this isn't the case on x86_64, where the similar lea instruction 
> is (sometimes)
> as efficient as left shift by two bits.
Yea, you can do a multiplication by 5 cheap on the PA.  While the x86 can too, I
don't think it's as clear cut a win as the PA, so they may not cost the same as 
a
multiply by 4 or left shift by 2.


> 
> I suspect that slsr-13.c should be expected to fail on some platforms 
> depending upon 
> a targets instruction set/timings.
Sounds like you're right since it depends on mult_by_coeff_cost under the hood 
:(
 I presume you or John will xfail it for the PA.


> 
> Unfortunately, to complicate things in our case, it appears that after RTL 
> optimizations,
> performing this strength reduction actually does results in fewer 
> instructions on the PA,
> so it's the right thing to do.  I'll need to study the logic in 
> gimple-ssa-strength to see
> how mult_by_coeff cost is being used; cost(x*4) == cost(x*5), but cost(x*4+y) 
> < cost(x*5+y).
Yea, x*4+y is cheaper than x*5+y on the PA.  The first is a single sh2add, the
second requires an additional add instruction.

Jeff



Re: [PATCH] [AVX512] [PR87767] Optimize memory broadcast for constant vector under AVX512

2020-08-26 Thread Jeff Law via Gcc-patches
On Tue, 2020-08-04 at 14:05 +0800, Hongtao Liu via Gcc-patches wrote:
> Update patch.
> 
> There are a lot of avx512 define_insns which lack corresponding memory
> broadcast version, i only add *avx512f_mul3_bcst and
> *avx512dq_mul3_bcst in this patch.
> 
> On Fri, Jul 24, 2020 at 10:37 AM Hongtao Liu  wrote:
> > On Thu, Jul 23, 2020 at 9:53 PM Hongtao Liu  wrote:
> > > On Thu, Jul 23, 2020 at 4:39 PM Jan Hubicka  wrote:
> > > > Hello,
> > > > sorry for taking so long to get to this.
> > > > > diff --git a/gcc/config/i386/i386-features.c 
> > > > > b/gcc/config/i386/i386-features.c
> > > > > index 535fc7e981d..8f81d101382 100644
> > > > > --- a/gcc/config/i386/i386-features.c
> > > > > +++ b/gcc/config/i386/i386-features.c
> > > > > @@ -2379,6 +2379,152 @@ make_pass_remove_partial_avx_dependency 
> > > > > (gcc::context *ctxt)
> > > > >return new pass_remove_partial_avx_dependency (ctxt);
> > > > >  }
> > > > > 
> > > > > +/* Replace all one-value const vector that are referenced by 
> > > > > SYMBOL_REFs in x
> > > > > +   with embedded broadcast. i.e.transform
> > > > > +
> > > > > + vpaddq .LC0(%rip), %zmm0, %zmm0
> > > > > + ret
> > > > > +  .LC0:
> > > > > +.quad 3
> > > > > +.quad 3
> > > > > +.quad 3
> > > > > +.quad 3
> > > > > +.quad 3
> > > > > +.quad 3
> > > > > +.quad 3
> > > > > +.quad 3
> > > > > +
> > > > > +to
> > > > > +
> > > > > + vpaddq .LC0(%rip){1to8}, %zmm0, %zmm0
> > > > 
> > > > It seems to me that having a special purpose pass for this is bit
> > > > overzelaous.  It seems to me that you can do same pattern matching via
> > > > splitter and fit it into the usual insn splitting pass?
> > > > 
> > > 
> > > From an implementation perspective, there could be lots of work, since
> > > memory embedding broadcast is available for nearly every instruction
> > > in AVX512. And for new added AVX512 instructions, we also need to add
> > > a define_split for them.
> > > 
> > 
> > 


> > +/* For const vector having one duplicated value, there's no need to put
> > > > > +   whole vector in the constant pool when target supports embedded 
> > > > > broadcast. */
> > > > > +static unsigned int
> > > > > +constant_pool_broadcast (void)
> > > > > +{
> > > > > +  timevar_push (TV_MACH_DEP);
> > > > > +  rtx_insn *insn;
> > > > > +
> > > > > +  for (insn = get_insns (); insn; insn = NEXT_INSN (insn))
> > > > > +{
> > > > > +  if (!INSN_P (insn))
> > > > > + continue;
> > > > > +
> > > > > +  /* Insns may appear inside a SEQUENCE.  Only check the 
> > > > > patterns of
> > > > > +  insns, not any notes that may be attached.  We don't want to 
> > > > > mark
> > > > > +  a constant just because it happens to appear in a REG_EQUIV 
> > > > > note.  */
Under what circumstances are we seeing a SEQUENCE in the x86 backend?  I'm
surprised we need to handle that case.

So your pass modifies the insn in place, which is fine.  But do we actually
remove the original constant pool entry if it's no longer used?  If not, does
this patch actually save anything (memory bandwidth perhaps?)

Is there an existing pass over the RTL chain where this would work so that it's
more compile-time efficient?

jeff



RE: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-26 Thread Roger Sayle


The failure of slsr-13.c is not caused by the patchh3.txt, but the previous 
patchh2.txt
that's now on mainline and the gcc-10 branch.  That change provided more 
accurate
rtx_costs for hppa, and solved the performance problems with synth_mult.

These more accurate target rtx_costs are used by the 
gimple-ssa-strength-reduction.c
(via a call to mult_by_coeff_cost) to decide whether applying strength 
reduction would
be profitable.  This test case, slsr-13.c, assumes that two multiplications by 
four are
cheaper than two multiplications by five.   (I believe) This is not the case on 
hppa which
has a sh2add instruction, that performs a multiplication by five in one cycle, 
or exactly
the same cost as performing a left shift by two (i.e. a multiplication by 
four).  Oddly, I
also believe this isn't the case on x86_64, where the similar lea instruction 
is (sometimes)
as efficient as left shift by two bits.

I suspect that slsr-13.c should be expected to fail on some platforms depending 
upon 
a targets instruction set/timings.

Unfortunately, to complicate things in our case, it appears that after RTL 
optimizations,
performing this strength reduction actually does results in fewer instructions 
on the PA,
so it's the right thing to do.  I'll need to study the logic in 
gimple-ssa-strength to see
how mult_by_coeff cost is being used; cost(x*4) == cost(x*5), but cost(x*4+y) < 
cost(x*5+y).

My apologies for the inconvenience.

Roger
--

-Original Message-
From: John David Anglin  
Sent: 26 August 2020 21:34
To: l...@redhat.com; Roger Sayle ; 'GCC Patches' 

Subject: Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

On 2020-08-26 4:08 p.m., Jeff Law wrote:
> It 3-stages, but trips:
> Tests that now fail, but worked before (5 tests):
>
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2 
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2 
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0 
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0
In my last Linux build, these failed:

FAIL: gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " * 4" 2
FAIL: gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " * 5" 0

The above didn't include patchh3.txt.  Seems to be a difference in quoting.  
I'll check.
Linux build is close to running this test.

These tests didn't fail on 32-bit hpux without patchh2.txt or in previous Linux 
build.

Dave

--
John David Anglin  dave.ang...@bell.net




Re: [PATCH] rs6000: Disable -fcaller-saves by default

2020-08-26 Thread Jeff Law via Gcc-patches
On Wed, 2020-08-26 at 15:58 -0500, Segher Boessenkool wrote:
> On Tue, Aug 25, 2020 at 02:35:51PM -0600, Jeff Law wrote:
> > I've gone back and forth on pre allocation splitting as well as 
> > post-allocating
> > splitting and re-allocation.   I could argue either side of that discussion 
> > --
> 
> If you end up wanting something split it is best to do it early. 
The problem with early splitting is you don't know if you want to split or not.
 You're making an educated guess how the split will interact with register
allocation.  If you guess wrong, often the allocator will save you and assign 
the
split pseudos to the same hard reg, but not always.  Been there more than once.



>  But if
> you cannot predict well if you want it split eventually, there is no
> good way to do this afaics :-(
Precisely.  In the old LRS bits (Meissner with a little help from me, circa 
1995)
we actually recorded the original pseudo as well as all its split versions in a
new RTL node.  We could then undo the splitting if it wasn't helpful.  IIRC we
also use that node to improve debugging of objects that were subject to LRS.

It worked reasonably well, but was fairly hard to maintain.  I think we managed
to drop in some of the infrastructure into EGCS and shipped to to our customer,
but I don't think we ever got the main pass submitted.  Eventually it got 
dropped
as it was fairly painful to maintain.

jeff



Re: [PATCH 3/3] vec: use inexact growth where possible.

2020-08-26 Thread Jeff Law via Gcc-patches
On Tue, 2020-08-11 at 13:37 +0200, Martin Liška wrote:
> From cc1d41a469d76f2f8e4f44bed788ace77a1c6d62 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Mon, 10 Aug 2020 12:09:19 +0200
> Subject: [PATCH 3/3] vec: use inexact growth where possible.
> 
> gcc/ChangeLog:
> 
>   * cfgrtl.c (rtl_create_basic_block): Use default value for
>   growth vector function.
>   * gimple.c (gimple_set_bb): Likewise.
>   * symbol-summary.h: Likewise.
>   * tree-cfg.c (init_empty_tree_cfg_for_function): Likewise.
>   (build_gimple_cfg): Likewise.
>   (create_bb): Likewise.
>   (move_block_to_fn): Likewise.
I'll note that in some cases we were avoiding exponential growth in our new size
computations.  Presumably the inexact growth support will do something similar,
even if it's not exactly the same.  Right?  Assuming that's the case this is OK
too.

jeff



Re: [PATCH 2/3] vec: default exact = false in grow functions.

2020-08-26 Thread Jeff Law via Gcc-patches
On Tue, 2020-08-11 at 13:37 +0200, Martin Liška wrote:
> From 292532ea9e3d42ca164b9951674c1eccc86a1f11 Mon Sep 17 00:00:00 2001
> From: Martin Liska 
> Date: Mon, 10 Aug 2020 12:01:59 +0200
> Subject: [PATCH 2/3] vec: default exect = false in grow functions.
> 
> gcc/ChangeLog:
> 
>   * vec.h (vec_safe_grow): Change default of exact to false.
>   (vec_safe_grow_cleared): Likewise.
And this is fine when you're ready to install it.
jeff



Re: [PATCH 1/3] vec: add exact argument for various grow functions.

2020-08-26 Thread Jeff Law via Gcc-patches
On Tue, 2020-08-11 at 13:36 +0200, Martin Liška wrote:
> Hello.
> 
> All right, I did it in 3 steps:
> 1) - new exact argument is added (no default value) - I tested the on 
> x86_64-linux-gnu
> and I build all cross targets.
> 2) set default value of exact = false
> 3) change places which calculate its own growth to use the default argument
> 
> I would like to install first 1) and then wait some time before the rest is 
> installed.
> 
> Thoughts?
Given this is how Richi wanted the series to play out.  #1 is fine with me as is
the plan for the kit as a whole.

Jeff
> 



Re: [PATCH] rs6000: Disable -fcaller-saves by default

2020-08-26 Thread Segher Boessenkool
On Tue, Aug 25, 2020 at 02:35:51PM -0600, Jeff Law wrote:
> I've gone back and forth on pre allocation splitting as well as 
> post-allocating
> splitting and re-allocation.   I could argue either side of that discussion --

If you end up wanting something split it is best to do it early.  But if
you cannot predict well if you want it split eventually, there is no
good way to do this afaics :-(

> given we've got a bit of special code for splitting to help shrink wrapping,

Only very minimal stuff for splitting copies from a hard reg (argument reg)
to a pseudo.  And then only for all together.

> I'd probably start by making sure the cost computation is sane though.

Yeah.


Segher


Re: [PATCH 3/3] libstdc++: Implement remaining piece of LWG 3448

2020-08-26 Thread Jonathan Wakely via Gcc-patches

On 26/08/20 16:44 -0400, Patrick Palka via Libstdc++ wrote:

Almost all of the proposed resolution for LWG 3448 is already
implemented; the only part left is to adjust the return type of
transform_view::sentinel::operator-.

libstdc++-v3/ChangeLog:

LWG 3448
PR libstdc++/95322
* include/std/ranges (transform_view::__distance_from): Give
this a deduced return type.
(transform_view::sentinel::operator-): Adjust the return type so
that it's based on the constness of the iterator rather than
that of the sentinel.
* testsuite/std/ranges/adaptors/95322.cc: Refer to LWG 3488.


Also OK, thanks for fixing these.




Re: [PATCH 2/3] libstdc++: elements_view's sentinel and iterator not comparable [LWG 3406]

2020-08-26 Thread Jonathan Wakely via Gcc-patches

On 26/08/20 16:44 -0400, Patrick Palka via Libstdc++ wrote:

This implements the proposed resolution for LWG 3406 and adds a testcase
for the example from P1994R1.

libstdc++-v3/ChangeLog:

LWG 3406
* include/std/ranges (elements_view::begin): Adjust constraints.
(elements_view::end): Likewise.
(elements_view::_Sentinel::operator==): Templatize to take both
_Iterator and _Iterator.
(elements_view::_Sentinel::operator-): Likewise.
* testsuite/std/ranges/adaptors/elements.cc: Add testcase for
the example from P1994R1.
* testsuite/std/range/adaptors/lwg3406.cc: New test.


OK, thanks.



Re: [PATCH 1/3] libstdc++: Implement P1994R1 changes to ranges::elements_view

2020-08-26 Thread Jonathan Wakely via Gcc-patches

On 26/08/20 16:44 -0400, Patrick Palka via Libstdc++ wrote:

The example from the paper doesn't compile without the proposed
resolution for LWG 3406, so we'll add a testcase for this once the
proposed resolution is in.

libstdc++-v3/ChangeLog:

P1994R1: elements_view needs its own sentinel.
* include/std/ranges (elements_view::end): Replace these two
overloads with four new overloads.
(elements_view::_Iterator::operator==): Remove.
(elements_view::_Iterator::operator-): Likewise.
(elements_view::_Sentinel): Define.


OK, thanks.




Re: [PATCH] rs6000, restrict bfloat convert intrinsic to Power 10. Fix BU_P10V macro definitions.

2020-08-26 Thread Segher Boessenkool
Hi!

(All that Will says included by reference ;-) )

On Mon, Aug 24, 2020 at 02:39:41PM -0700, Carl Love wrote:
> I attempted to do the backport however the patch doesn't even come
> close to applying.  The names XVCVBF16SPN and XVCVSPBF16 are the only
> two builtin names that exist in GCC 10.  The other issue is there is no
> Power 10 builtin macro definitions in GCC 10.  So basically, I can fix
> the issue with XVCVBF16SPN and XVCVSPBF16 to be restricted to Power 10
> by adding the needed Power 10 macro definition.  

Okay, that is what we should do I agree.

> This is a whole new patch so I figure it needs to be reviewed to make
> sure we want to make this change to GCC 10.  I did run the regression
> tests again using a Power 9 machine to verify it complies and there are
> no regression test failures.  
> 
> Please let me know if this is OK for the GCC 10 tree.  Thanks.

The patch looks fine.  It now is impossible to write a correct
changelog for a backport like this, so I won't review that part.

Please make it clear that this is a partial backport in the commit
message (and commit of what ofc).  Okay for trunk with that.  Thanks!


Segher


[PATCH 3/3] libstdc++: Implement remaining piece of LWG 3448

2020-08-26 Thread Patrick Palka via Gcc-patches
Almost all of the proposed resolution for LWG 3448 is already
implemented; the only part left is to adjust the return type of
transform_view::sentinel::operator-.

libstdc++-v3/ChangeLog:

LWG 3448
PR libstdc++/95322
* include/std/ranges (transform_view::__distance_from): Give
this a deduced return type.
(transform_view::sentinel::operator-): Adjust the return type so
that it's based on the constness of the iterator rather than
that of the sentinel.
* testsuite/std/ranges/adaptors/95322.cc: Refer to LWG 3488.
---
 libstdc++-v3/include/std/ranges| 18 +-
 .../testsuite/std/ranges/adaptors/95322.cc |  2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 42028354b81..2d0017f1750 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1865,7 +1865,7 @@ namespace views
  using _Base = __detail::__maybe_const_t<_Const, _Vp>;
 
  template
-   constexpr range_difference_t<_Base>
+   constexpr auto
__distance_from(const _Iterator<_Const2>& __i) const
{ return _M_end - __i._M_current; }
 
@@ -1902,17 +1902,17 @@ namespace views
operator==(const _Iterator<_Const2>& __x, const _Sentinel& __y)
{ return __y.__equal(__x); }
 
- template
-   requires sized_sentinel_for,
-  iterator_t<__detail::__maybe_const_t<_Const2, _Vp>>>
-   friend constexpr range_difference_t<_Base>
+ template>
+   requires sized_sentinel_for, iterator_t<_Base2>>
+   friend constexpr range_difference_t<_Base2>
operator-(const _Iterator<_Const2>& __x, const _Sentinel& __y)
{ return -__y.__distance_from(__x); }
 
- template
-   requires sized_sentinel_for,
-  iterator_t<__detail::__maybe_const_t<_Const2, _Vp>>>
-   friend constexpr range_difference_t<_Base>
+ template>
+   requires sized_sentinel_for, iterator_t<_Base2>>
+   friend constexpr range_difference_t<_Base2>
operator-(const _Sentinel& __y, const _Iterator<_Const2>& __x)
{ return __y.__distance_from(__x); }
 
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/95322.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/95322.cc
index 9279d5520a8..67bc7d33917 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/95322.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/95322.cc
@@ -26,7 +26,7 @@ using __gnu_test::test_forward_range;
 void
 test01()
 {
-  // PR libstdc++/95322
+  // PR libstdc++/95322 and LWG 3488
   int a[2]{1, 2};
   test_forward_range v{a};
   auto view1 = v | std::views::take(2);
-- 
2.28.0.337.ge9b77c84a0



[PATCH 2/3] libstdc++: elements_view's sentinel and iterator not comparable [LWG 3406]

2020-08-26 Thread Patrick Palka via Gcc-patches
This implements the proposed resolution for LWG 3406 and adds a testcase
for the example from P1994R1.

libstdc++-v3/ChangeLog:

LWG 3406
* include/std/ranges (elements_view::begin): Adjust constraints.
(elements_view::end): Likewise.
(elements_view::_Sentinel::operator==): Templatize to take both
_Iterator and _Iterator.
(elements_view::_Sentinel::operator-): Likewise.
* testsuite/std/ranges/adaptors/elements.cc: Add testcase for
the example from P1994R1.
* testsuite/std/range/adaptors/lwg3406.cc: New test.
---
 libstdc++-v3/include/std/ranges   | 39 ---
 .../testsuite/std/ranges/adaptors/elements.cc | 22 +
 .../testsuite/std/ranges/adaptors/lwg3406.cc  | 48 +++
 3 files changed, 93 insertions(+), 16 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/ranges/adaptors/lwg3406.cc

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index efa8d2cf9f4..42028354b81 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3362,15 +3362,15 @@ namespace views
   { return _Iterator(ranges::begin(_M_base)); }
 
   constexpr auto
-  begin() const requires __detail::__simple_view<_Vp>
+  begin() const requires range
   { return _Iterator(ranges::begin(_M_base)); }
 
   constexpr auto
-  end()
+  end() requires (!__detail::__simple_view<_Vp> && !common_range<_Vp>)
   { return _Sentinel{ranges::end(_M_base)}; }
 
   constexpr auto
-  end() requires common_range<_Vp>
+  end() requires (!__detail::__simple_view<_Vp> && common_range<_Vp>)
   { return _Iterator{ranges::end(_M_base)}; }
 
   constexpr auto
@@ -3576,19 +3576,26 @@ namespace views
  base() const
  { return _M_end; }
 
- friend constexpr bool
- operator==(const _Iterator<_Const>& __x, const _Sentinel& __y)
- { return __y._M_equal(__x); }
-
- friend constexpr range_difference_t<_Base>
- operator-(const _Iterator<_Const>& __x, const _Sentinel& __y)
-   requires sized_sentinel_for, iterator_t<_Base>>
- { return __x._M_current - __y._M_end; }
-
- friend constexpr range_difference_t<_Base>
- operator-(const _Sentinel& __x, const _Iterator<_Const>& __y)
-   requires sized_sentinel_for, iterator_t<_Base>>
- { return __x._M_end - __y._M_current; }
+ template
+   requires sentinel_for,
+  iterator_t<__detail::__maybe_const_t<_Const2, _Vp>>>
+   friend constexpr bool
+   operator==(const _Iterator<_Const2>& __x, const _Sentinel& __y)
+   { return __y._M_equal(__x); }
+
+ template>
+   requires sized_sentinel_for, iterator_t<_Base2>>
+   friend constexpr range_difference_t<_Base2>
+   operator-(const _Iterator<_Const2>& __x, const _Sentinel& __y)
+   { return __x._M_current - __y._M_end; }
+
+ template>
+   requires sized_sentinel_for, iterator_t<_Base2>>
+   friend constexpr range_difference_t<_Base>
+   operator-(const _Sentinel& __x, const _Iterator<_Const2>& __y)
+   { return __x._M_end - __y._M_current; }
 
  friend _Sentinel;
};
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc
index d846c4cf33e..af5c2d43bb6 100644
--- a/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/elements.cc
@@ -45,8 +45,30 @@ test01()
   VERIFY( ranges::equal(v1, x | views::values) );
 }
 
+struct S
+{
+  friend bool
+  operator==(std::input_iterator auto const& i, S)
+  { return std::get<1>(*i) == 0; }
+};
+
+void
+test02()
+{
+  // This verifies that P1994R1 (and LWG3406) is implemented.
+  std::pair, long> x[]
+= {{{1,2},3l}, {{1,0},2l}, {{1,2},0l}};
+  ranges::subrange r{ranges::begin(x), S{}};
+
+  auto v = r | views::keys;
+  VERIFY( ranges::equal(v, (std::pair[]){{1,2},{1,0}}) );
+  ranges::subrange v2 = {ranges::begin(v), S{}};
+  VERIFY( ranges::equal(v2, (std::pair[]){{1,2}}) );
+}
+
 int
 main()
 {
   test01();
+  test02();
 }
diff --git a/libstdc++-v3/testsuite/std/ranges/adaptors/lwg3406.cc 
b/libstdc++-v3/testsuite/std/ranges/adaptors/lwg3406.cc
new file mode 100644
index 000..d36db22aad5
--- /dev/null
+++ b/libstdc++-v3/testsuite/std/ranges/adaptors/lwg3406.cc
@@ -0,0 +1,48 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied wa

[PATCH 1/3] libstdc++: Implement P1994R1 changes to ranges::elements_view

2020-08-26 Thread Patrick Palka via Gcc-patches
The example from the paper doesn't compile without the proposed
resolution for LWG 3406, so we'll add a testcase for this once the
proposed resolution is in.

libstdc++-v3/ChangeLog:

P1994R1: elements_view needs its own sentinel.
* include/std/ranges (elements_view::end): Replace these two
overloads with four new overloads.
(elements_view::_Iterator::operator==): Remove.
(elements_view::_Iterator::operator-): Likewise.
(elements_view::_Sentinel): Define.
---
 libstdc++-v3/include/std/ranges | 74 ++---
 1 file changed, 60 insertions(+), 14 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 9d22b138082..efa8d2cf9f4 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -3366,12 +3366,20 @@ namespace views
   { return _Iterator(ranges::begin(_M_base)); }
 
   constexpr auto
-  end() requires (!__detail::__simple_view<_Vp>)
-  { return ranges::end(_M_base); }
+  end()
+  { return _Sentinel{ranges::end(_M_base)}; }
 
   constexpr auto
-  end() const requires __detail::__simple_view<_Vp>
-  { return ranges::end(_M_base); }
+  end() requires common_range<_Vp>
+  { return _Iterator{ranges::end(_M_base)}; }
+
+  constexpr auto
+  end() const requires range
+  { return _Sentinel{ranges::end(_M_base)}; }
+
+  constexpr auto
+  end() const requires common_range
+  { return _Iterator{ranges::end(_M_base)}; }
 
   constexpr auto
   size() requires sized_range<_Vp>
@@ -3382,6 +3390,9 @@ namespace views
   { return ranges::size(_M_base); }
 
 private:
+  template
+   struct _Sentinel;
+
   template
struct _Iterator
{
@@ -3484,10 +3495,6 @@ namespace views
requires equality_comparable>
  { return __x._M_current == __y._M_current; }
 
- friend constexpr bool
- operator==(const _Iterator& __x, const sentinel_t<_Base>& __y)
- { return __x._M_current == __y; }
-
  friend constexpr bool
  operator<(const _Iterator& __x, const _Iterator& __y)
requires random_access_range<_Base>
@@ -3536,15 +3543,54 @@ namespace views
requires random_access_range<_Base>
  { return __x._M_current - __y._M_current; }
 
- friend constexpr difference_type
- operator-(const _Iterator<_Const>& __x, const sentinel_t<_Base>& __y)
+ friend _Sentinel<_Const>;
+   };
+
+  template
+   struct _Sentinel
+   {
+   private:
+ constexpr bool
+ _M_equal(const _Iterator<_Const>& __x) const
+ { return __x._M_current == _M_end; }
+
+ using _Base = __detail::__maybe_const_t<_Const, _Vp>;
+ sentinel_t<_Base> _M_end = sentinel_t<_Base>();
+
+   public:
+ _Sentinel() = default;
+
+ constexpr explicit
+ _Sentinel(sentinel_t<_Base> __end)
+   : _M_end(std::move(__end))
+ { }
+
+ constexpr
+ _Sentinel(_Sentinel __other)
+   requires _Const
+ && convertible_to, sentinel_t<_Base>>
+   : _M_end(std::move(__other._M_end))
+ { }
+
+ constexpr sentinel_t<_Base>
+ base() const
+ { return _M_end; }
+
+ friend constexpr bool
+ operator==(const _Iterator<_Const>& __x, const _Sentinel& __y)
+ { return __y._M_equal(__x); }
+
+ friend constexpr range_difference_t<_Base>
+ operator-(const _Iterator<_Const>& __x, const _Sentinel& __y)
requires sized_sentinel_for, iterator_t<_Base>>
- { return __x._M_current - __y; }
+ { return __x._M_current - __y._M_end; }
 
- friend constexpr difference_type
- operator-(const sentinel_t<_Base>& __x, const _Iterator<_Const>& __y)
+ friend constexpr range_difference_t<_Base>
+ operator-(const _Sentinel& __x, const _Iterator<_Const>& __y)
requires sized_sentinel_for, iterator_t<_Base>>
- { return -(__y - __x); }
+ { return __x._M_end - __y._M_current; }
+
+ friend _Sentinel;
};
 
   _Vp _M_base = _Vp();
-- 
2.28.0.337.ge9b77c84a0



Re: [PATCH][Hashtable 5/6] Remove H1/H2 template parameters

2020-08-26 Thread François Dumont via Gcc-patches
Deeply sorry, I indeed didn't sent the patch I wanted to commit. It was 
in 3 commits on my side and it looks like I had miss the last one 
regarding the changes for _ExtractKey.


But moreover I had change the ebo helper index wrongly, I missed the abi 
change here, thanks for fixing it.



On 26/08/20 6:45 pm, Jonathan Wakely wrote:

On 26/08/20 16:58 +0100, Jonathan Wakely wrote:

On 26/08/20 16:55 +0100, Jonathan Wakely wrote:

On 26/08/20 16:30 +0100, Jonathan Wakely wrote:

I'm seeing new FAILures with this:

FAIL: 20_util/function_objects/searchers.cc (test for excess errors)
UNRESOLVED: 20_util/function_objects/searchers.cc compilation 
failed to produce executable

FAIL: experimental/functional/searchers.cc (test for excess errors)
UNRESOLVED: experimental/functional/searchers.cc compilation failed 
to produce executable


It looks like what you committed is not what you sent for review. The
patch sent for review has:

/// Specialization: hash function and range-hashing function, no
/// caching of hash codes.
/// Provides typedef and accessor required by C++ 11.
template
-    struct _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2,
-  _Default_ranged_hash, false>
+  typename _Hash, typename _RangeHash, typename _Unused>
+    struct _Hash_code_base<_Key, _Value, _ExtractKey, _Hash, 
_RangeHash,

+  _Unused, false>
  : private _Hashtable_ebo_helper<0, _ExtractKey>,
-  private _Hashtable_ebo_helper<1, _H1>,
-  private _Hashtable_ebo_helper<2, _H2>
+  private _Hashtable_ebo_helper<1, _Hash>
  {


But what you committed has:

/// Specialization: hash function and range-hashing function, no
/// caching of hash codes.
/// Provides typedef and accessor required by C++ 11.
template
-    struct _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2,
-  _Default_ranged_hash, false>
-    : private _Hashtable_ebo_helper<0, _ExtractKey>,
-  private _Hashtable_ebo_helper<1, _H1>,
-  private _Hashtable_ebo_helper<2, _H2>
+  typename _Hash, typename _RangeHash, typename _Unused>
+    struct _Hash_code_base<_Key, _Value, _ExtractKey, _Hash, 
_RangeHash,

+  _Unused, false>
+    : private _Hashtable_ebo_helper<0, _Hash>
  {


Note that you've changed the type of the base class from:

+  private _Hashtable_ebo_helper<1, _Hash>

to

+  private _Hashtable_ebo_helper<0, _Hash>

This causes an ambiguity:

/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/hashtable_policy.h:1706: 
error: 'std::__detail::_Hashtable_ebo_helper<0, test03()::struct>, true>' is an ambiguous base of 
'std::__detail::_Hashtable_baseint>, std::__detail::_Select1st, test03()::, 
test03()::, std::__detail::_Mod_range_hashing, 
std::__detail::_Default_ranged_hash, 
std::__detail::_Hashtable_traits >'


However, what I don't understand is why we are storing that _Hash type
more than once as a base class. That seems wrong (but not something we
can change without ABI impact).


Ah, we're not storing it more than once.

The problem is:

template
struct _Hashtable_base
: public _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2, _Hash,
   _Traits::__hash_cached::value>,
  private _Hashtable_ebo_helper<0, _Equal>

This has a base of _Hashtable_ebo_helper<0, _Equal> so it used to
have these bases:

_Hashtable_ebo_helper<0, _ExtractKey>
_Hashtable_ebo_helper<1, _Hash>
_Hashtable_ebo_helper<2, _RangeHash>
_Hashtable_ebo_helper<0, _Equal>

but after your change it has these bases:

_Hashtable_ebo_helper<0, _Hash>
_Hashtable_ebo_helper<0, _Equal>

In the case
where _Equal and _Hash are the same type (which is what I was testing
in the test that fail, because I'm sneaky) that means:

_Hashtable_ebo_helper<0, T>
_Hashtable_ebo_helper<0, T>

which is obviously ambiguous.

I think the _hash_code_base should still use the index 1 for its base
class, i.e. _Hashtable_ebo_helper<1, _Hash>. That way we have these:

_Hashtable_ebo_helper<1, _Hash>
_Hashtable_ebo_helper<0, _Equal>

which works even if they're the same types.


Here's a minimal test:

#include 

struct Evil : std::hash, std::equal_to
{
};

int main()
{
 std::unordered_map h;
 h.key_eq();
}

This fails on current trunk.


Fixed by the attached patch.

Tested powerpc64le-linux, committed to trunk.






Re: [PATCH] separate reading past the end from -Wstringop-overflow

2020-08-26 Thread Jeff Law via Gcc-patches
On Tue, 2020-06-23 at 20:05 -0600, Martin Sebor wrote:
> -Wstringop-overflow is issued for both writing and reading past
> the end, even though the latter problem is often viewed as being
> lower severity than the former, or at least sufficiently different
> to triage separately.  In CWE, for example, each of the two kinds
> of problems has its own classification (CWE-787: Out-of-bounds
> Write, and CWE-125: Out-of-bounds Read).
> 
> To make this easier with GCC, the attached patch introduces
> a new option called -Wstringop-overread and splits the current
> indiscriminate warning into the respective two categories.  Other
> than the new option the improvements also exposed a few instances
> of reading past the end that GCC doesn't detect that the new code
> does thanks to more consistent checking.
> 
> As usual, I tested the patch by building Binutils/GDB, Glibc, and
> the Linux kernel with no unexpected (or any, in fact) instances
> of the two warnings.
> 
> The work involved more churn that I expected, and maintaining
> the expected precedence of warnings (missing terminating nul,
> excessive bounds, etc.) turned out to be more delicate and time
> consuming than I would have liked.  I think the result is cleaner
> code, but I'm quite sure it can still stand to be made better.
> That's also my plan for the next step when I'd like to move this
> checking out of builtins.c and calls.c and into a file of its own.
> Ultimately, Jeff and have would like to integrate it into the path
> isolation pass.
> 
> Martin
> 
> PS There's a FIXME comment in expand_builtin_memory_chk as
> a reminder of a future change.  It currently has no bearing
> on anything.

> Add -Wstringop-overread for reading past the end by string functions.
> 
> gcc/ChangeLog:
> 
>   * attribs.c (init_attr_rdwr_indices): Use global access_mode.
>   * attribs.h (struct attr_access): Same.
>   * builtins.c (fold_builtin_strlen): Add argument.
>   (compute_objsize): Declare.
>   (get_range): Declare.
>   (check_read_access): New function.
>   (access_ref::access_ref): Define ctor.
>   (warn_string_no_nul): Add arguments.  Handle -Wstrintop-overread.
>   (check_nul_terminated_array): Handle source strings of different
>   ranges of sizes.
>   (expand_builtin_strlen): Remove warning code, call check_read_access
>   instead.  Declare locals closer to their initialization.
>   (expand_builtin_strnlen): Same.
>   (maybe_warn_for_bound): New function.
>   (warn_for_access): Remove argument.  Handle -Wstrintop-overread.
>   (inform_access): Change argument type.
>   (get_size_range): New function.
>   (check_access): Remove unused arguments.  Add new arguments.  Handle
>   -Wstrintop-overread.  Move warning code to helpers and call them.
>   Call check_nul_terminated_array.
>   (check_memop_access): Remove unnecessary and provide additional
>   arguments in calls.
>   (expand_builtin_memchr): Call check_read_access.
>   (expand_builtin_strcat): Remove unnecessary and provide additional
>   arguments in calls.
>   (expand_builtin_strcpy): Same.
>   (expand_builtin_strcpy_args): Same.  Avoid testing no-warning bit.
>   (expand_builtin_stpcpy_1): Remove unnecessary and provide additional
>   arguments in calls.
>   (expand_builtin_stpncpy): Same.
>   (check_strncat_sizes): Same.
>   (expand_builtin_strncat): Remove unnecessary and provide additional
>   arguments in calls.  Adjust comments.
>   (expand_builtin_strncpy): Remove unnecessary and provide additional
>   arguments in calls.
>   (expand_builtin_memcmp): Remove warning code.  Call check_access.
>   (expand_builtin_strcmp): Call check_access instead of
>   check_nul_terminated_array.
>   (expand_builtin_strncmp): Handle -Wstrintop-overread.
>   (expand_builtin_fork_or_exec): Call check_access instead of
>   check_nul_terminated_array.
>   (expand_builtin): Same.
>   (fold_builtin_1): Pass additional argument.
>   (fold_builtin_n): Same.
>   (fold_builtin_strpbrk): Remove calls to check_nul_terminated_array.
>   (expand_builtin_memory_chk): Add comments.
>   (maybe_emit_chk_warning): Remove unnecessary and provide additional
>   arguments in calls.
>   (maybe_emit_sprintf_chk_warning): Same.  Adjust comments.
>   * builtins.h (warn_string_no_nul): Add arguments.
>   (struct access_ref): Add member and ctor argument.
>   (struct access_data): Add members and ctor.
>   (check_access): Adjust signature.
>   * calls.c (maybe_warn_nonstring_arg): Return an indication of
>   whether a warning was issued.  Issue -Wstrintop-overread instead
>   of -Wstringop-overflow.
>   (append_attrname): Adjust to naming changes.
>   (maybe_warn_rdwr_sizes): Same.  Remove unnecessary and provide
>   additional arguments in calls.
>   * calls.h (maybe_warn_nonstring_arg): Return bool.
> 

Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-26 Thread John David Anglin
On 2020-08-26 4:08 p.m., Jeff Law wrote:
> It 3-stages, but trips:
> Tests that now fail, but worked before (5 tests):
>
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0
> gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0
In my last Linux build, these failed:

FAIL: gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " * 4" 2
FAIL: gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " * 5" 0

The above didn't include patchh3.txt.  Seems to be a difference in quoting.  
I'll check.
Linux build is close to running this test.

These tests didn't fail on 32-bit hpux without patchh2.txt or in previous Linux 
build.

Dave

-- 
John David Anglin  dave.ang...@bell.net



RE: [Patch 4/5] rs6000, Test 128-bit shifts for just the int128 type.

2020-08-26 Thread Carl Love via Gcc-patches
Segher:

On Thu, 2020-08-20 at 16:50 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Aug 11, 2020 at 12:23:05PM -0700, Carl Love wrote:
> > +;; 128-bit int modes
> > +(define_mode_iterator VEC_I128 [V1TI TI])
> 
> We already have VSX_TI for this (in vsx.md).  Rename that to
> something
> without VSX, and move it to vector.md or such?  Maybe name it VEC_TI
> or anyTI.
> 
> Do that renaming as a separate patch before this one?  It is
> logically
> separate, and it is boring stuff, so putting it in a separate patch
> makes the non-boring stuff stand out more.
> 
> (It would be better if we could just get rid of V1TI, but that isn't
> going to happen soon).
> 
> > --- a/gcc/config/rs6000/vsx.md
> > +++ b/gcc/config/rs6000/vsx.md
> > @@ -367,7 +367,7 @@
> > UNSPEC_INSERTR
> > UNSPEC_REPLACE_ELT
> > UNSPEC_REPLACE_UN
> > -   UNSPEC_XXSWAPD_V1TI
> > +   UNSPEC_XXSWAPD_VEC_I128
> 
> Why not just UNSPEC_XXSWAPD?  And, why an unspec at all?

I am trying to figure out how to specify this without using an unpsec
per your last comment.  I changed the definition to:

;; Swap upper/lower 64-bit values in V1TI or TI type
   
(define_insn "xxswapd_"   
   
  [(set (match_operand:VEC_I128 0 "vsx_register_operand" "=v")  
   
(vec_select:VEC_I128
   
  (match_operand:VEC_I128 1 "vsx_register_operand" "v") 
   
  (parallel [(const_int 0)])))] 
   
  "TARGET_POWER10"  
   
;; AIX does not support extended mnemonic xxswapd.  Use the basic   
   
;; mnemonic xxpermdi instead.   
   
  "xxpermdi %x0,%x1,%x1,2"  
   
  [(set_attr "type" "vecperm")])


All of the swap definitions that I can see are based on using
vec_select which seems to be the issue here.  Not seeing anyway to do
this without using unspec.  Any thoughts?

  Carl 



Re: [PATCH] hppa: Improve expansion of ashldi3 when !TARGET_64BIT

2020-08-26 Thread Jeff Law via Gcc-patches
On Sun, 2020-08-23 at 00:24 +0100, Roger Sayle wrote:
> Hi Dave,
> I actually think using plus_xor_ior operator is useful.  It means that if 
> combine,
> inlining or some other RTL simplification generates these variants, these 
> forms
> will still be recognized by the backend.  It's more typing, but the compiler 
> produces
> better code.
> 
> Here's what I have so far, but please feel free to modify anything.  I'll 
> leave the
> rest to you.
> 
> With this patch:
> 
> unsigned long long rotl4(unsigned long long x)
> {
>   return (x<<4) | (x>>60);
> }
> 
> unsigned long long rotr4(unsigned long long x)
> {
>   return (x<<60) | (x>>4);
> }
> 
> which previously generated:
> 
> rotl4:depd,z %r26,59,60,%r28
>   extrd,u %r26,3,4,%r26
>   bve (%r2)
>   or %r26,%r28,%r28
> 
> rotr4:extrd,u %r26,59,60,%r28
>   depd,z %r26,3,4,%r26
>   bve (%r2)
>   or %r26,%r28,%r28
> 
> now produces:
> 
> rotl4:bve (%r2)
>   shrpd %r26,%r26,60,%r28
> 
> rotr4:bve (%r2)
>   shrpd %r26,%r26,4,%r28
> 
> 
> I'm guessing this is very similar to what you were thinking (or what I 
> described previously).
> 
> Many thanks again for trying out these patches/suggestions for me.
So I put this one into the tester overnight. 

It 3-stages, but trips:
Tests that now fail, but worked before (5 tests):

gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2
gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 4" 2
gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0
gcc.dg/tree-ssa/slsr-13.c scan-tree-dump-times optimized " \\* 5" 0
gcc.target/hppa/shadd-2.c scan-assembler-times sh.add 2

I think we've already discussed shadd-2.  It's not immediately clear if the slsr
failure is due to this change or something different -- the latter seems like a
definite possibility as we're changing things as we leave gimple, but the test 
is
checking the result of the gimple optimizers.

Your call on how to proceed.  I can put additional patches into the tester
easily, so if you've got something you want investigated, don't hesitate to 
reach
out.

jeff



Re: [PATCH] x86: Reject target("no-general-regs-only")

2020-08-26 Thread Uros Bizjak via Gcc-patches
> Reject target("no-general-regs-only") pragma and attribute.
>
> gcc/
>
> PR target/96802
> * config/i386/i386-options.c (ix86_valid_target_attribute_inner_p):
> Reject target("no-general-regs-only").
>
> gcc/testsuite/
>
> PR target/96802
> * gcc.target/i386/pr96802-1.c: New test.
> * gcc.target/i386/pr96802-2.c: Likewise.

OK.

Thanks,
Uros.


[PATCH] x86: Reject target("no-general-regs-only")

2020-08-26 Thread H.J. Lu via Gcc-patches
Reject target("no-general-regs-only") pragma and attribute.

gcc/

PR target/96802
* config/i386/i386-options.c (ix86_valid_target_attribute_inner_p):
Reject target("no-general-regs-only").

gcc/testsuite/

PR target/96802
* gcc.target/i386/pr96802-1.c: New test.
* gcc.target/i386/pr96802-2.c: Likewise.
---
 gcc/config/i386/i386-options.c|  7 +++
 gcc/testsuite/gcc.target/i386/pr96802-1.c | 12 
 gcc/testsuite/gcc.target/i386/pr96802-2.c | 16 
 3 files changed, 35 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr96802-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr96802-2.c

diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c
index e0fc68c27bf..b93c338346f 100644
--- a/gcc/config/i386/i386-options.c
+++ b/gcc/config/i386/i386-options.c
@@ -1189,6 +1189,13 @@ ix86_valid_target_attribute_inner_p (tree fndecl, tree 
args, char *p_strings[],
{
  if (mask == OPTION_MASK_GENERAL_REGS_ONLY)
{
+ if (!opt_set_p)
+   {
+ error_at (loc, "pragma or attribute %  "
+   "does not allow a negated form", p);
+ return false;
+   }
+
  if (type != ix86_opt_ix86_yes)
gcc_unreachable ();
 
diff --git a/gcc/testsuite/gcc.target/i386/pr96802-1.c 
b/gcc/testsuite/gcc.target/i386/pr96802-1.c
new file mode 100644
index 000..e6ceb95d238
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr96802-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+
+/* Reject the negated form of non-negatable attributes.  */
+
+__attribute__ ((target ("no-general-regs-only")))
+int
+foo (int a)
+{
+  return a + 1;
+}
+
+/* { dg-error "does not allow a negated form" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/i386/pr96802-2.c 
b/gcc/testsuite/gcc.target/i386/pr96802-2.c
new file mode 100644
index 000..515f5673777
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr96802-2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+
+/* Reject the negated form of non-negatable pragma target.  */
+
+#pragma GCC push_options
+#pragma GCC target("no-general-regs-only")
+
+int
+foo (int a)
+{
+  return a + 1;
+}
+
+#pragma GCC pop_options
+
+/* { dg-error "does not allow a negated form" "" { target *-*-* } 0 } */
-- 
2.26.2



Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-26 Thread Jeff Law via Gcc-patches
On Wed, 2020-08-26 at 09:02 -0300, Alexandre Oliva wrote:
> On Aug 25, 2020, Jeff Law  wrote:
> 
> > On Tue, 2020-08-25 at 02:16 -0300, Alexandre Oliva wrote:
> > > On Aug 24, 2020, Richard Biener  wrote:
> > > 
> > > > since the option is quite elaborate on what (sub-)set of regs is
> > > > supposed to be cleared I'm not sure an implementation not involving
> > > > any target hook is possible?
> > > 
> > > I don't think this follows.  Machine-independent code has a pretty good
> > > notion of what registers are call-saved or call-clobbered, which ones
> > > could be changed in this regard for function-specific calling
> > > conventions, which ones may be used by a function to hold its return
> > > value, which ones are used within a function...
> > > 
> > > It *should* be possible to introduce this in machine-independent code,
> > > emitting insns to set registers to zero and regarding them as holding
> > > values to be returned from the function.  Machine-specific code could
> > > use more efficient insns to get the same result, but I can't see good
> > > reason to not have a generic fallback implementation with at least a
> > > best-effort attempt to offer the desired feature.
> > I think part of the problem here is you have to worry about stubs which can
> > change caller-saved registers.  Return path stubs aren't particularly 
> > common, but
> > they do exist -- 32 bit hpux for example :(
> 
> This suggests that such targets might have to implement the
> target-specific hook to deal with this, but does it detract in any way
> from the notion of having generic code to fall back to on targets that
> do NOT require any special handling?
Agreed.  Sorry if I wasn't clear that generic code + a hook should be 
sufficient.

jeff



[committed] libstdc++: Use correct argument type for __use_alloc [PR 96803]

2020-08-26 Thread Jonathan Wakely via Gcc-patches
The _Tuple_impl constructor for allocator-extended construction from a
different tuple type uses the _Tuple_impl's own _Head type in the
__use_alloc test. That is incorrect, because the argument tuple could
have a different type. Using the wrong type might select the
leading-allocator convention when it should use the trailing-allocator
convention, or vice versa.

libstdc++-v3/ChangeLog:

PR libstdc++/96803
* include/std/tuple
(_Tuple_impl(allocator_arg_t, Alloc, const _Tuple_impl&)):
Replace parameter pack with a type parameter and a pack and pass
the first type to __use_alloc.
* testsuite/20_util/tuple/cons/96803.cc: New test.

Tested powerpc64le-linux. Committed to trunk.

This isn't a regression, but would be safe to backport and fixes a
conformance bug, albeit one that nobody has ever noticed.

commit 5494edae83ad33c769bd1ebc98f0c492453a6417
Author: Jonathan Wakely 
Date:   Wed Aug 26 19:32:30 2020

libstdc++: Use correct argument type for __use_alloc [PR 96803]

The _Tuple_impl constructor for allocator-extended construction from a
different tuple type uses the _Tuple_impl's own _Head type in the
__use_alloc test. That is incorrect, because the argument tuple could
have a different type. Using the wrong type might select the
leading-allocator convention when it should use the trailing-allocator
convention, or vice versa.

libstdc++-v3/ChangeLog:

PR libstdc++/96803
* include/std/tuple
(_Tuple_impl(allocator_arg_t, Alloc, const _Tuple_impl&)):
Replace parameter pack with a type parameter and a pack and pass
the first type to __use_alloc.
* testsuite/20_util/tuple/cons/96803.cc: New test.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index d4a35f0fe7f..7be9943e34a 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -338,14 +338,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _Base(__use_alloc<_Head, _Alloc, _Head>(__a),
std::forward<_Head>(_M_head(__in))) { }
 
-  template
+  template
_GLIBCXX20_CONSTEXPR
_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
-   const _Tuple_impl<_Idx, _UElements...>& __in)
+   const _Tuple_impl<_Idx, _UHead, _UTails...>& __in)
: _Inherited(__tag, __a,
-_Tuple_impl<_Idx, _UElements...>::_M_tail(__in)),
- _Base(__use_alloc<_Head, _Alloc, _Head>(__a),
-   _Tuple_impl<_Idx, _UElements...>::_M_head(__in)) { }
+_Tuple_impl<_Idx, _UHead, _UTails...>::_M_tail(__in)),
+ _Base(__use_alloc<_Head, _Alloc, _UHead>(__a),
+   _Tuple_impl<_Idx, _UHead, _UTails...>::_M_head(__in)) { }
 
   template
_GLIBCXX20_CONSTEXPR
diff --git a/libstdc++-v3/testsuite/20_util/tuple/cons/96803.cc 
b/libstdc++-v3/testsuite/20_util/tuple/cons/96803.cc
new file mode 100644
index 000..9d3c07d55b2
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/tuple/cons/96803.cc
@@ -0,0 +1,41 @@
+// Copyright (C) 2020 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do compile { target c++11 } }
+
+#include 
+#include 
+
+struct X
+{
+  using allocator_type = std::allocator;
+
+  X(X&&) { }
+  X(std::allocator_arg_t, const allocator_type&, X&&) { }
+
+  explicit X(int) { }
+  explicit X(int, allocator_type) { }
+};
+
+void
+test01()
+{
+  // PR libstdc++/96803
+  // std::tuple chooses wrong constructor for uses-allocator construction
+  std::tuple o;
+  std::tuple nok(std::allocator_arg, std::allocator(), o);
+}


[committed] libstdc++: Whitespace changes in

2020-08-26 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* include/std/tuple (_Tuple_impl): Whitespaces changes for
consistent indentation. Also use __enable_if_t alias template.

Tested powerpc64le-linux. Committed to trunk.

commit af06acfc8de1ddcfd02a4de1200735b5479f086f
Author: Jonathan Wakely 
Date:   Wed Aug 26 19:32:30 2020

libstdc++: Whitespace changes in 

libstdc++-v3/ChangeLog:

* include/std/tuple (_Tuple_impl): Whitespaces changes for
consistent indentation. Also use __enable_if_t alias template.

diff --git a/libstdc++-v3/include/std/tuple b/libstdc++-v3/include/std/tuple
index 7be9943e34a..1c22d4db788 100644
--- a/libstdc++-v3/include/std/tuple
+++ b/libstdc++-v3/include/std/tuple
@@ -265,16 +265,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr _Tuple_impl()
   : _Inherited(), _Base() { }
 
-  explicit
-  constexpr _Tuple_impl(const _Head& __head, const _Tail&... __tail)
-  : _Inherited(__tail...), _Base(__head) { }
+  explicit constexpr
+  _Tuple_impl(const _Head& __head, const _Tail&... __tail)
+  : _Inherited(__tail...), _Base(__head)
+  { }
 
-  template::type>
-explicit
-constexpr _Tuple_impl(_UHead&& __head, _UTail&&... __tail)
+  template>
+   explicit constexpr
+   _Tuple_impl(_UHead&& __head, _UTail&&... __tail)
: _Inherited(std::forward<_UTail>(__tail)...),
- _Base(std::forward<_UHead>(__head)) { }
+ _Base(std::forward<_UHead>(__head))
+   { }
 
   constexpr _Tuple_impl(const _Tuple_impl&) = default;
 
@@ -285,58 +287,67 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   constexpr
   _Tuple_impl(_Tuple_impl&& __in)
   noexcept(__and_,
- is_nothrow_move_constructible<_Inherited>>::value)
+ is_nothrow_move_constructible<_Inherited>>::value)
   : _Inherited(std::move(_M_tail(__in))),
-   _Base(std::forward<_Head>(_M_head(__in))) { }
+   _Base(std::forward<_Head>(_M_head(__in)))
+  { }
 
   template
-constexpr _Tuple_impl(const _Tuple_impl<_Idx, _UElements...>& __in)
+   constexpr
+   _Tuple_impl(const _Tuple_impl<_Idx, _UElements...>& __in)
: _Inherited(_Tuple_impl<_Idx, _UElements...>::_M_tail(__in)),
- _Base(_Tuple_impl<_Idx, _UElements...>::_M_head(__in)) { }
+ _Base(_Tuple_impl<_Idx, _UElements...>::_M_head(__in))
+   { }
 
   template
-constexpr _Tuple_impl(_Tuple_impl<_Idx, _UHead, _UTails...>&& __in)
+   constexpr
+   _Tuple_impl(_Tuple_impl<_Idx, _UHead, _UTails...>&& __in)
: _Inherited(std::move
 (_Tuple_impl<_Idx, _UHead, _UTails...>::_M_tail(__in))),
  _Base(std::forward<_UHead>
-   (_Tuple_impl<_Idx, _UHead, _UTails...>::_M_head(__in))) { }
+   (_Tuple_impl<_Idx, _UHead, _UTails...>::_M_head(__in)))
+   { }
 
   template
_GLIBCXX20_CONSTEXPR
_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a)
: _Inherited(__tag, __a),
-  _Base(__tag, __use_alloc<_Head>(__a)) { }
+ _Base(__tag, __use_alloc<_Head>(__a))
+   { }
 
   template
_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
const _Head& __head, const _Tail&... __tail)
: _Inherited(__tag, __a, __tail...),
-  _Base(__use_alloc<_Head, _Alloc, _Head>(__a), __head) { }
+ _Base(__use_alloc<_Head, _Alloc, _Head>(__a), __head)
+   { }
 
   template::type>
+  typename = __enable_if_t>
_GLIBCXX20_CONSTEXPR
_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
-   _UHead&& __head, _UTail&&... __tail)
+   _UHead&& __head, _UTail&&... __tail)
: _Inherited(__tag, __a, std::forward<_UTail>(__tail)...),
-  _Base(__use_alloc<_Head, _Alloc, _UHead>(__a),
-   std::forward<_UHead>(__head)) { }
-
-  template
-   _GLIBCXX20_CONSTEXPR
-_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
-   const _Tuple_impl& __in)
-   : _Inherited(__tag, __a, _M_tail(__in)),
-  _Base(__use_alloc<_Head, _Alloc, _Head>(__a), _M_head(__in)) { }
+ _Base(__use_alloc<_Head, _Alloc, _UHead>(__a),
+   std::forward<_UHead>(__head))
+   { }
 
   template
_GLIBCXX20_CONSTEXPR
_Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
-   _Tuple_impl&& __in)
+   const _Tuple_impl& __in)
+   : _Inherited(__tag, __a, _M_tail(__in)),
+ _Base(__use_alloc<_Head, _Alloc, _Head>(__a), _M_head(__in))
+   { }
+
+  template
+   _GLIBCXX20_CONSTEXPR
+   _Tuple_impl(allocator_arg_t __tag, const _Alloc& __a,
+   _Tuple_impl&& __in)
: _Inherited(__tag, __a, std::move(_M_tail(__in))),
  _Base(__use_alloc<_Head, _Alloc, _Head>(__a),
-   std::forward<_Head>(_M_head(__in))) { }
+   s

RE: [Patch 3/5] rs6000, Add TI to TD (128-bit DFP) and TD to TI support

2020-08-26 Thread Carl Love via Gcc-patches
Segher:

On Wed, 2020-08-19 at 20:29 -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Aug 11, 2020 at 12:22:59PM -0700, Carl Love wrote:
> > +(define_insn "floattitd2"
> > +  [(set (match_operand:TD 0 "gpc_reg_operand" "=d")
> > +   (float:TD (match_operand:TI 1 "gpc_reg_operand" "v")))]
> > +  "TARGET_TI_VECTOR_OPS"
> > +  "dcffixqq %0,%1"
> > +  [(set_attr "type" "dfp")])
> 
> I wonder if this should just be TARGET_POWER10 now?  That goes for
> the
> whole series of course.
> 

> 
> Looks fine otherwise.  Okay for trunk, modulo whatever we do with
> YARGET_TI_VECTOR_OPS.  Thanks!
> 
> 
> Segher

You commented on the TARGET_TI_VECTOR_OPS in patch 2 as well, i.e.

   > > +  /* The -mti-vector-ops option requires ISA 3.1 support and -maltivec 
for
   > > + the 128-bit instructions.  Currently, TARGET_POWER10 is sufficient 
to
   > > + enable it by default.  */
   > > +  if (TARGET_POWER10)
   > > +{
   > > +  if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
   > > +   warning(0, ("%<-mno-altivec%> disables -mti-vector-ops (128-bit 
integer vector register operations)."));
   > > +  else
   > > +   rs6000_isa_flags |= OPTION_MASK_TI_VECTOR_OPS;
   > > +}
   > 
   > It seems odd here that -maltivec is explicitly called out here.  That
   > should be default on for quite a while at this point.

   And the actual check it for -mvsx anyway?  Not sure I follow what this
   does at all.

So this all goes way back, the following is from a discussion way back
in November.   We added it thinking it would be good to be able to
enable/disable the 128-bit vector support, as I recall.  I have to
admit that I am struggling a bit to remember all the details as we
discussed them back then. 

The following is from an old email.


   Hi Carl,

   On Mon, Nov 18, 2019 at 12:18:48PM -0800, Carl Love wrote:
   > Per your other note, I change the test suite check, with spelling fix,
   > to:
   >  
   > # Return 1 if the can generate the 128-bit integer operations in ISA 3.1.
   > # That means the following 128-bit instructions can be generated:
   > # vadduqm, vsubuqm, vmsumcud, vdivsq, vdivuq, vmodsq, vmoduq, vcmpuq, 
vcmpsq,
   > # vrlq, vslq, vsrq.  The -mti-vector-ops flag enables the needed support.
   > # The -mti-vector-ops is enabled by default for -mcpu=future.  Sufficient
   > # to test if the vadduqm instruction is generated for ISA 3.1 support.
   > proc check_effective_target_ppc_native_128bit { } {
 
   > return [check_no_messages_and_pattern_nocache \
   > ppc_native_128bit {\mvadduqm\M} assembly {
   > __int128_t test_add (__int128_t a, __int128_t b)
   > { return a + b; }   
   > } {-mti-vector-ops} ]
   > }

   Ah, very good, this will work fine :-)

   But without that -mti-vector-ops option?

   > > (We also need to test what happens with -mno-altivec, etc. --
   > > shouldn't
   > > be hard, should just do the same thing as it does on older CPUs).
   > 
   > So I tried compiling the test case with -mn-altivec and -mcpu=future
   > and I get a GCC crash.  :-(

   Think of it this way: it is good if developers get ICEs.  That way, users
   get fewer.  :-)

   > I updated the setting for rs6000_isa_flags in rs6000.c to:
   > 
   >   /* The -mti-vector-ops option requires ISA 3.1 support and -maltivec for
   >  the 128-bit instructions.  Currently, TARGET_FUTURE is sufficient to
   >  enable it by default.  */ 
   >   if (TARGET_FUTURE)   
 
   > {  
 
   >   if (rs6000_isa_flags_explicit & OPTION_MASK_VSX)
   > warning(0, ("%<-mno-altivec%> disables -mti-vector-ops (128-bit 
integer vector register operations)."));
   >   else 
 
   > rs6000_isa_flags |= OPTION_MASK_TI_VECTOR_OPS; 
 
   > }
   >  
   > Now, I get:
   >  
   > $GCC_INSTALL/bin/gcc -g -mcpu=future -mno-altivec  int_128bit-runnable.c 
-o int_128bit-runnable
   > cc1: warning: ‘-mno-altivec’ disables vsx
   > cc1: warning: ‘-mno-altivec’ disables -mti-vector-ops (128-bit integer 
vector register operations).
   > 
   > without -mno-altivec it compiles with no warning or errors.  The object
   > dump has the expected vadduqm and vsubuqm instructions.

   That will do the trick.  Maybe you want the message to be quiet, (or
   quieter anyway), if you get testsuite fallout?  You will find out.

   > > > +mti-vector-ops
   > > > +Target Report Mask(TI_VECTOR_OPS) Var(rs6000_isa_flags)
   > > > +Use integer 128-bit instructions for a future architecture.
   > > 
   > > For upstream we should make this an internal option (so
   > > Undocumented), but
   > > it may be handy to keep the option more visible 

Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-26 Thread Qing Zhao via Gcc-patches



> On Aug 26, 2020, at 7:02 AM, Alexandre Oliva  wrote:
> 
> On Aug 25, 2020, Jeff Law mailto:l...@redhat.com>> wrote:
> 
>> On Tue, 2020-08-25 at 02:16 -0300, Alexandre Oliva wrote:
>>> On Aug 24, 2020, Richard Biener  wrote:
>>> 
 since the option is quite elaborate on what (sub-)set of regs is
 supposed to be cleared I'm not sure an implementation not involving
 any target hook is possible?
>>> 
>>> I don't think this follows.  Machine-independent code has a pretty good
>>> notion of what registers are call-saved or call-clobbered, which ones
>>> could be changed in this regard for function-specific calling
>>> conventions, which ones may be used by a function to hold its return
>>> value, which ones are used within a function...
>>> 
>>> It *should* be possible to introduce this in machine-independent code,
>>> emitting insns to set registers to zero and regarding them as holding
>>> values to be returned from the function.  Machine-specific code could
>>> use more efficient insns to get the same result, but I can't see good
>>> reason to not have a generic fallback implementation with at least a
>>> best-effort attempt to offer the desired feature.
>> I think part of the problem here is you have to worry about stubs which can
>> change caller-saved registers.  Return path stubs aren't particularly 
>> common, but
>> they do exist -- 32 bit hpux for example :(
> 
> This suggests that such targets might have to implement the
> target-specific hook to deal with this, but does it detract in any way
> from the notion of having generic code to fall back to on targets that
> do NOT require any special handling?

There are two issues I can see with adding a default generator in middle end:

1. In order to determine where a target should not use the generic code to emit 
the zeroing sequence, 
a new target hook to determine this has to be added;

2. In order to avoid the generated zeroing insns (which are simply insns that 
set registers) being deleted, 
We have to define a new insn “pro_epilogue_use” in the target. 
So, any target that want to use the default generator in middle end, must 
provide such a new target hook.

Based on the above 2, I don’t think that adding the default generator in middle 
end is a good idea.

Qing

> 
> -- 
> Alexandre Oliva, happy hacker
> https://urldefense.com/v3/__https://FSFLA.org/blogs/lxo/__;!!GqivPVa7Brio!Pee3_l4yYpNOUbnymMqrEM68oDGk-2Q3zebqLnQ255SX5go78t8Sq1RmM72wJP3a$
>  
> 
>  
> Free Software Activist
> GNU Toolchain Engineer



Re: c++: template operator lookup caching

2020-08-26 Thread Nathan Sidwell

On 8/26/20 1:07 PM, Jason Merrill wrote:

On 8/26/20 12:56 PM, Nathan Sidwell wrote:

On 8/26/20 12:06 PM, Jason Merrill wrote:

On 8/26/20 8:30 AM, Nathan Sidwell wrote:





This still retains only recording on lambdas.  As the comment says, we
could enable for all templates.


If we're going to need it on all templates for modules, is there a 
reason not to go ahead and make that change now?


No reason.

Do we need a different approach for operator lookup in non-function 
template contexts, e.g. data member initializers?


Hm, I guess we do.


Perhaps represent the operation as a CALL_EXPR with 
CALL_EXPR_OPERATOR_SYNTAX set?


I think that'd work.  To be strictly correct we need it everywhere, 
because (crazy user/testsuite):


template void Foo (T t)
{
  { using namespace A;  !t; }
  { using namespace B;  !t; }
}

--
Nathan Sidwell


Re: c++: template operator lookup caching

2020-08-26 Thread Jason Merrill via Gcc-patches

On 8/26/20 12:56 PM, Nathan Sidwell wrote:

On 8/26/20 12:06 PM, Jason Merrill wrote:

On 8/26/20 8:30 AM, Nathan Sidwell wrote:


Jason's fix to retain operator lookups inside dependent lambda
functions turns out to be needed on the modules branch for all
template functions.  Because the context for that lookup no longer
exists in imports.  There were also a couple of shortcomings, which
this patch fixes.

(a) we conflate 'we found nothing' and 'we can redo this at
instantiation time'.  Fixed by making the former produce
error_mark_node.  That needs a fix in name-lookup to know that finding
a binding containing error_mark_node, means 'stop looking' you found
nothing.

(b) we'd continually do lookups for every operator, if nothing needed
to be retained.  Fixed by always caching that information, and then
dealing with it when pushing the bindings.

(c) if what we found was that find by a global namespace lookup, we'd
not cache that.  But that'd cause us to find decls declared
after the template, potentially hiding those we expected to find.  So
don't do that check.

This still retains only recording on lambdas.  As the comment says, we
could enable for all templates.


If we're going to need it on all templates for modules, is there a 
reason not to go ahead and make that change now?


No reason.

Do we need a different approach for operator lookup in non-function 
template contexts, e.g. data member initializers?


Hm, I guess we do.


Perhaps represent the operation as a CALL_EXPR with 
CALL_EXPR_OPERATOR_SYNTAX set?


Jason



Re: c++: template operator lookup caching

2020-08-26 Thread Nathan Sidwell

On 8/26/20 12:06 PM, Jason Merrill wrote:

On 8/26/20 8:30 AM, Nathan Sidwell wrote:


Jason's fix to retain operator lookups inside dependent lambda
functions turns out to be needed on the modules branch for all
template functions.  Because the context for that lookup no longer
exists in imports.  There were also a couple of shortcomings, which
this patch fixes.

(a) we conflate 'we found nothing' and 'we can redo this at
instantiation time'.  Fixed by making the former produce
error_mark_node.  That needs a fix in name-lookup to know that finding
a binding containing error_mark_node, means 'stop looking' you found
nothing.

(b) we'd continually do lookups for every operator, if nothing needed
to be retained.  Fixed by always caching that information, and then
dealing with it when pushing the bindings.

(c) if what we found was that find by a global namespace lookup, we'd
not cache that.  But that'd cause us to find decls declared
after the template, potentially hiding those we expected to find.  So
don't do that check.

This still retains only recording on lambdas.  As the comment says, we
could enable for all templates.


If we're going to need it on all templates for modules, is there a 
reason not to go ahead and make that change now?


No reason.

Do we need a different approach for operator lookup in non-function 
template contexts, e.g. data member initializers?


Hm, I guess we do.

nathan
--
Nathan Sidwell


[PATCH] libstdc++: Add compile-time checks to__glibcxx_assert [PR 71960]

2020-08-26 Thread Jonathan Wakely via Gcc-patches
This change evaluates __glibcxx_assert checks unconditionally when a
function is being constant evaluated (when std::is_constant_evaluated()
is true). If the check fails, compilation will fail with an error.

If the function isn't being constant evaluated, the normal runtime check
will be done if enabled by _GLIBCXX_ASSERTIONS or _GLIBCXX_DEBUG, the
same as before.

Tangentially, the __glibcxx_assert and _GLIBCXX_PARALLEL_ASSERT macros
are changed to expand to 'do { } while (false)' when assertions are
disabled, instead of expanding to nothing. This avoids -Wempty-body
warnings when a disabled assertion is used in an 'if' or 'else'
statement e.g.

  if constexpr (/* precondition is testable */)
__glibcxx_assert(precondition);

a.C:9:27: warning: suggest braces around empty body in an ‘if’ statement 
[-Wempty-body]
9 | __glibcxx_assert(precondition);
  |  ^

libstdc++-v3/ChangeLog:

PR libstdc++/71960
* include/bits/c++config (__glibcxx_assert_impl): Remove
do-while so that uses of the macro need to add it.
(__glibcxx_assert): Rename macro for runtime assertions
to __glibcxx_assert_2.
(__glibcxx_assert_1): Define macro for constexpr assertions.
(__glibcxx_assert): Define macro for constexpr and runtime
assertions.
* include/bits/range_access.h (ranges::advance): Remove
redundant precondition checks during constant evaluation.
* include/parallel/base.h (_GLIBCXX_PARALLEL_ASSERT): Always
use do-while in macro expansion.
* include/std/ranges (iota_view::iota_view(W, B)): Remove
redundant braces.

Not yet committed.

Tested powerpc64le-linux, normal and debug modes.

Thoughts?


commit 4e71993e73cabcba16d23f43c0d000778e812884
Author: Jonathan Wakely 
Date:   Wed Aug 26 17:45:36 2020

libstdc++: Add compile-time checks to__glibcxx_assert [PR 71960]

This change evaluates __glibcxx_assert checks unconditionally when a
function is being constant evaluated (when std::is_constant_evaluated()
is true). If the check fails, compilation will fail with an error.

If the function isn't being constant evaluated, the normal runtime check
will be done if enabled by _GLIBCXX_ASSERTIONS or _GLIBCXX_DEBUG, the
same as before.

Tangentially, the __glibcxx_assert and _GLIBCXX_PARALLEL_ASSERT macros
are changed to expand to 'do { } while (false)' when assertions are
disabled, instead of expanding to nothing. This avoids -Wempty-body
warnings when a disabled assertion is used in an 'if' or 'else'
statement e.g.

  if constexpr (/* precondition is testable */)
__glibcxx_assert(precondition);

a.C:9:27: warning: suggest braces around empty body in an ‘if’ statement 
[-Wempty-body]
9 | __glibcxx_assert(precondition);
  |  ^

libstdc++-v3/ChangeLog:

PR libstdc++/71960
* include/bits/c++config (__glibcxx_assert_impl): Remove
do-while so that uses of the macro need to add it.
(__glibcxx_assert): Rename macro for runtime assertions
to __glibcxx_assert_2.
(__glibcxx_assert_1): Define macro for constexpr assertions.
(__glibcxx_assert): Define macro for constexpr and runtime
assertions.
* include/bits/range_access.h (ranges::advance): Remove
redundant precondition checks during constant evaluation.
* include/parallel/base.h (_GLIBCXX_PARALLEL_ASSERT): Always
use do-while in macro expansion.
* include/std/ranges (iota_view::iota_view(W, B)): Remove
redundant braces.

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index de28acea6b7..badf9d01a04 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -477,19 +477,16 @@ namespace std
 __builtin_abort();
   }
 }
-#define __glibcxx_assert_impl(_Condition)   \
-  do\
-  { \
-if (! (_Condition))  \
-  std::__replacement_assert(__FILE__, __LINE__, __PRETTY_FUNCTION__, \
-   #_Condition);\
-  } while (false)
+#define __glibcxx_assert_impl(_Condition) \
+  if (!bool(_Condition))  \
+std::__replacement_assert(__FILE__, __LINE__, __PRETTY_FUNCTION__, \
+ #_Condition)
 #endif
 
 #if defined(_GLIBCXX_ASSERTIONS)
-# define __glibcxx_assert(_Condition) __glibcxx_assert_impl(_Condition)
+# define __glibcxx_assert_2(_Condition) __glibcxx_assert_impl(_Condition)
 #else
-# define __glibcxx_

Re: [PATCH][Hashtable 5/6] Remove H1/H2 template parameters

2020-08-26 Thread Jonathan Wakely via Gcc-patches

On 26/08/20 16:58 +0100, Jonathan Wakely wrote:

On 26/08/20 16:55 +0100, Jonathan Wakely wrote:

On 26/08/20 16:30 +0100, Jonathan Wakely wrote:

I'm seeing new FAILures with this:

FAIL: 20_util/function_objects/searchers.cc (test for excess errors)
UNRESOLVED: 20_util/function_objects/searchers.cc compilation failed to produce 
executable
FAIL: experimental/functional/searchers.cc (test for excess errors)
UNRESOLVED: experimental/functional/searchers.cc compilation failed to produce 
executable

It looks like what you committed is not what you sent for review. The
patch sent for review has:

/// Specialization: hash function and range-hashing function, no
/// caching of hash codes.
/// Provides typedef and accessor required by C++ 11.
template
-struct _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2,
-  _Default_ranged_hash, false>
+  typename _Hash, typename _RangeHash, typename _Unused>
+struct _Hash_code_base<_Key, _Value, _ExtractKey, _Hash, _RangeHash,
+  _Unused, false>
  : private _Hashtable_ebo_helper<0, _ExtractKey>,
-  private _Hashtable_ebo_helper<1, _H1>,
-  private _Hashtable_ebo_helper<2, _H2>
+  private _Hashtable_ebo_helper<1, _Hash>
  {


But what you committed has:

/// Specialization: hash function and range-hashing function, no
/// caching of hash codes.
/// Provides typedef and accessor required by C++ 11.
template
-struct _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2,
-  _Default_ranged_hash, false>
-: private _Hashtable_ebo_helper<0, _ExtractKey>,
-  private _Hashtable_ebo_helper<1, _H1>,
-  private _Hashtable_ebo_helper<2, _H2>
+  typename _Hash, typename _RangeHash, typename _Unused>
+struct _Hash_code_base<_Key, _Value, _ExtractKey, _Hash, _RangeHash,
+  _Unused, false>
+: private _Hashtable_ebo_helper<0, _Hash>
  {


Note that you've changed the type of the base class from:

+  private _Hashtable_ebo_helper<1, _Hash>

to

+  private _Hashtable_ebo_helper<0, _Hash>

This causes an ambiguity:

/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/hashtable_policy.h:1706: error: 
'std::__detail::_Hashtable_ebo_helper<0, test03()::, true>' is an ambiguous base of 
'std::__detail::_Hashtable_base, std::__detail::_Select1st, 
test03()::, test03()::, std::__detail::_Mod_range_hashing, 
std::__detail::_Default_ranged_hash, std::__detail::_Hashtable_traits >'

However, what I don't understand is why we are storing that _Hash type
more than once as a base class. That seems wrong (but not something we
can change without ABI impact).


Ah, we're not storing it more than once.

The problem is:

template
struct _Hashtable_base
: public _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2, _Hash,
   _Traits::__hash_cached::value>,
  private _Hashtable_ebo_helper<0, _Equal>

This has a base of _Hashtable_ebo_helper<0, _Equal> so it used to
have these bases:

_Hashtable_ebo_helper<0, _ExtractKey>
_Hashtable_ebo_helper<1, _Hash>
_Hashtable_ebo_helper<2, _RangeHash>
_Hashtable_ebo_helper<0, _Equal>

but after your change it has these bases:

_Hashtable_ebo_helper<0, _Hash>
_Hashtable_ebo_helper<0, _Equal>

In the case
where _Equal and _Hash are the same type (which is what I was testing
in the test that fail, because I'm sneaky) that means:

_Hashtable_ebo_helper<0, T>
_Hashtable_ebo_helper<0, T>

which is obviously ambiguous.

I think the _hash_code_base should still use the index 1 for its base
class, i.e. _Hashtable_ebo_helper<1, _Hash>. That way we have these:

_Hashtable_ebo_helper<1, _Hash>
_Hashtable_ebo_helper<0, _Equal>

which works even if they're the same types.


Here's a minimal test:

#include 

struct Evil : std::hash, std::equal_to
{
};

int main()
{
 std::unordered_map h;
 h.key_eq();
}

This fails on current trunk.


Fixed by the attached patch.

Tested powerpc64le-linux, committed to trunk.


commit 9f9c0549dd42e85e2500ca67cef89dddb142c0c7
Author: Jonathan Wakely 
Date:   Wed Aug 26 17:30:31 2020

libstdc++: Fix regression in hash containers

A recent change altered the layout of EBO-helper base classes, resulting
in an ambiguity when the hash function and equality predicate are the
same type.

This modifies the type of one of the base classes, so that we don't get
two base classes of the same type.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (_Hash_code_base): Change
index of _Hashtable_ebo_helper base class.
* testsuite/23_containers/unordered_map/dup_types.cc: New test.

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 38bd5ae4e81..0109ef86a7b 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -1190,10 +1190,10 @@ namespace __detail
 	   typename

Re: [PATCH] libstdc++: Fix typo in chrono::year_month_weekday::operator==

2020-08-26 Thread Jonathan Wakely via Gcc-patches

On 26/08/20 12:35 -0400, Patrick Palka via Libstdc++ wrote:

Tested on x86_64-pc-linux-gnu, does this look OK to commit?


OK, thanks.



libstdc++-v3/ChangeLog:

* include/std/chrono (year_month_weekday::operator==): Compare
weekday_indexed instead of weekday.
* testsuite/std/time/year_month_weekday/1.cc: Augment testcase.
---
libstdc++-v3/include/std/chrono | 2 +-
libstdc++-v3/testsuite/std/time/year_month_weekday/1.cc | 5 +
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 46086140d9d..417954d103b 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2657,7 +2657,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  {
return __x.year() == __y.year()
  && __x.month() == __y.month()
- && __x.weekday() == __y.weekday();
+ && __x.weekday_indexed() == __y.weekday_indexed();
  }

  template
diff --git a/libstdc++-v3/testsuite/std/time/year_month_weekday/1.cc 
b/libstdc++-v3/testsuite/std/time/year_month_weekday/1.cc
index 3781f1781b7..6924f947210 100644
--- a/libstdc++-v3/testsuite/std/time/year_month_weekday/1.cc
+++ b/libstdc++-v3/testsuite/std/time/year_month_weekday/1.cc
@@ -58,6 +58,11 @@ constexpr_year_month_weekday()
  static_assert(years{1} + January/Tuesday[2]/1900y == 
January/Tuesday[2]/1901y);
  static_assert(January/Tuesday[2]/1900y - years{1} == 
January/Tuesday[2]/1899y);

+  static_assert(January/Tuesday[1]/1900y != February/Tuesday[1]/1900y);
+  static_assert(January/Tuesday[1]/1900y != January/Wednesday[1]/1900y);
+  static_assert(January/Tuesday[1]/1900y != January/Tuesday[1]/1901y);
+  static_assert(January/Tuesday[1]/1900y != January/Tuesday[2]/1900y);
+
  // N.B. unix seems to be a macro somewhere!
  constexpr ymwd myunix(local_days{days{0}});
  static_assert(myunix.ok());
--
2.28.0.337.ge9b77c84a0





[PATCH] libstdc++: Fix typo in chrono::year_month_weekday::operator==

2020-08-26 Thread Patrick Palka via Gcc-patches
Tested on x86_64-pc-linux-gnu, does this look OK to commit?

libstdc++-v3/ChangeLog:

* include/std/chrono (year_month_weekday::operator==): Compare
weekday_indexed instead of weekday.
* testsuite/std/time/year_month_weekday/1.cc: Augment testcase.
---
 libstdc++-v3/include/std/chrono | 2 +-
 libstdc++-v3/testsuite/std/time/year_month_weekday/1.cc | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 46086140d9d..417954d103b 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2657,7 +2657,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   {
return __x.year() == __y.year()
  && __x.month() == __y.month()
- && __x.weekday() == __y.weekday();
+ && __x.weekday_indexed() == __y.weekday_indexed();
   }
 
   template
diff --git a/libstdc++-v3/testsuite/std/time/year_month_weekday/1.cc 
b/libstdc++-v3/testsuite/std/time/year_month_weekday/1.cc
index 3781f1781b7..6924f947210 100644
--- a/libstdc++-v3/testsuite/std/time/year_month_weekday/1.cc
+++ b/libstdc++-v3/testsuite/std/time/year_month_weekday/1.cc
@@ -58,6 +58,11 @@ constexpr_year_month_weekday()
   static_assert(years{1} + January/Tuesday[2]/1900y == 
January/Tuesday[2]/1901y);
   static_assert(January/Tuesday[2]/1900y - years{1} == 
January/Tuesday[2]/1899y);
 
+  static_assert(January/Tuesday[1]/1900y != February/Tuesday[1]/1900y);
+  static_assert(January/Tuesday[1]/1900y != January/Wednesday[1]/1900y);
+  static_assert(January/Tuesday[1]/1900y != January/Tuesday[1]/1901y);
+  static_assert(January/Tuesday[1]/1900y != January/Tuesday[2]/1900y);
+
   // N.B. unix seems to be a macro somewhere!
   constexpr ymwd myunix(local_days{days{0}});
   static_assert(myunix.ok());
-- 
2.28.0.337.ge9b77c84a0



[Patch] Fortran: Fix absent-optional handling for nondescriptor arrays (PR94672)

2020-08-26 Thread Tobias Burnus

This fixes an issue caused by the patch for PR 94672, which
affects both GCC 10 and GCC 11.

Only 'sVal' of 'subroutine foo' was affected, the rest is
only a crosscheck that it worked for those code paths.

(I did check against the dump – which looks fine. I could
add dump tests as well. The 'foo' test was failing with
'stop 5' (absent argument) at runtime before the patch;
the report was for the 'stop 4' case, which is probably
harder to trigger as run-time fail as the stack memory
is likely zero-initialized. → -fdump-tree-original scan
test useful?)

OK for mainline and GCC 10?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
Fortran: Fix absent-optional handling for nondescriptor arrays (PR94672)

gcc/fortran/ChangeLog:

	PR fortran/94672
	* trans-array.c (gfc_trans_g77_array): Check against the parm decl and
	set the nonparm decl used for the is-present check to NULL if absent.

gcc/testsuite/ChangeLog:

	PR fortran/94672
	* gfortran.dg/optional_assumed_charlen_2.f90: New test.

 gcc/fortran/trans-array.c  | 10 -
 .../gfortran.dg/optional_assumed_charlen_2.f90 | 48 ++
 2 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index 0e3495d59cc..6566c47d4ae 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -6472,8 +6472,14 @@ gfc_trans_g77_array (gfc_symbol * sym, gfc_wrapped_block * block)
 
   if (sym->attr.optional || sym->attr.not_always_present)
 {
-  tmp = gfc_conv_expr_present (sym);
-  stmt = build3_v (COND_EXPR, tmp, stmt, build_empty_stmt (input_location));
+  tree nullify;
+  if (TREE_CODE (parm) != PARM_DECL)
+	nullify = fold_build2_loc (input_location, MODIFY_EXPR, void_type_node,
+   parm, null_pointer_node);
+  else
+	nullify = build_empty_stmt (input_location);
+  tmp = gfc_conv_expr_present (sym, true);
+  stmt = build3_v (COND_EXPR, tmp, stmt, nullify);
 }
 
   gfc_add_init_cleanup (block, stmt, NULL_TREE);
diff --git a/gcc/testsuite/gfortran.dg/optional_assumed_charlen_2.f90 b/gcc/testsuite/gfortran.dg/optional_assumed_charlen_2.f90
new file mode 100644
index 000..fa8cfd79038
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/optional_assumed_charlen_2.f90
@@ -0,0 +1,48 @@
+! { dg-do run }
+! PR fortran/94672
+!
+! Contributed by Tomáš Trnka
+!
+module m
+  implicit none (type,external)
+  type t
+integer :: i = 5
+  end type t
+contains
+subroutine bar(x, y, z, n)
+  integer, value :: n
+  type(t), intent(out), optional :: x(:), y(n), z(:)
+  allocatable :: z
+end subroutine bar
+
+subroutine foo (n, nFound, sVal)
+   integer,   value  :: n
+   integer,   intent(out)  :: nFound
+   character(*),optional, intent(out) :: sVal(n)
+
+   nFound = 0
+
+   if (present(sVal)) then
+  nFound = nFound + 1
+   end if
+end subroutine
+end
+
+use m
+implicit none (type,external)
+type(t) :: a(7), b(7), c(:)
+allocatable :: c
+integer :: nn, nf
+character(len=4) :: str
+
+allocate(c(7))
+call bar(a,b,c,7)
+if (any(a(:)%i /= 5)) stop 1
+if (any(b(:)%i /= 5)) stop 2
+if (allocated(c)) stop 3
+
+call foo(7, nf, str)
+if (nf /= 1) stop 4
+call foo(7, nf)
+if (nf /= 0) stop 5
+end


Re: [PATCH] hppa: PR middle-end/87256: Improved hppa_rtx_costs avoids synth_mult madness.

2020-08-26 Thread Jeff Law via Gcc-patches
On Wed, 2020-08-26 at 12:19 +0100, Roger Sayle wrote:
> Hi Dave,
> 
> > This change is also fine.
> > 
> > The gcc.target/hppa/shadd-2.c test now fails because there are two
> additional sh1add instructions.
> > However, the total number of instructions is the same as before.
> 
> Just to be on the safe side, I took a deeper look into this.
> We're now generating a slightly different sequence for multiply by 87.
> 
> Before:
> zdep %r28,28,29,%r20; r20 = r28*8
> ldo RR'default_target_regs-$global$(%r1),%r19
> sub %r20,%r28,%r20  ; r20 = r28*7
> sh2addl %r20,%r28,%r20  ; r20 = r28*29
> sh2addl %r20,%r19,%r19  ; r20 = addr+r28*116
> sub %r19,%r20,%r19  ; r20 = addr+r28*87
> 
> After:
> sh2addl %r28,%r28,%r19  ; r19 = r28*5
> ldo RR'default_target_regs-$global$(%r1),%r1
> sh2addl %r19,%r28,%r19  ; r19 = r28*21
> sh1addl %r19,%r28,%r19  ; r19 = r28*43
> sh1addl %r19,%r28,%r19  ; r19 = r28*87
> addl %r1,%r19,%r1   ; r1 = addr+r28*87
> 
> As you point out, both sequences have exactly the same length and rtx_costs.
> I suspect the middle-end has a minor preference for simple shifts and adds.
> If shadd-2.c is there to check we make use of sh?add instructions, then
> we're
> making even more use of them now.
> 
> I agree it's reasonable to increase the scan-assembler-times count for this
> test.
Yes, those tests were there to catch failure to create shadd insns after some
changes in the combiner.  If we're generating more, then that's good enough for
those tests :-)

jeff



Re: c++: template operator lookup caching

2020-08-26 Thread Jason Merrill via Gcc-patches

On 8/26/20 8:30 AM, Nathan Sidwell wrote:


Jason's fix to retain operator lookups inside dependent lambda
functions turns out to be needed on the modules branch for all
template functions.  Because the context for that lookup no longer
exists in imports.  There were also a couple of shortcomings, which
this patch fixes.

(a) we conflate 'we found nothing' and 'we can redo this at
instantiation time'.  Fixed by making the former produce
error_mark_node.  That needs a fix in name-lookup to know that finding
a binding containing error_mark_node, means 'stop looking' you found
nothing.

(b) we'd continually do lookups for every operator, if nothing needed
to be retained.  Fixed by always caching that information, and then
dealing with it when pushing the bindings.

(c) if what we found was that find by a global namespace lookup, we'd
not cache that.  But that'd cause us to find decls declared
after the template, potentially hiding those we expected to find.  So
don't do that check.

This still retains only recording on lambdas.  As the comment says, we
could enable for all templates.


If we're going to need it on all templates for modules, is there a 
reason not to go ahead and make that change now?


Do we need a different approach for operator lookup in non-function 
template contexts, e.g. data member initializers?


Jason



Re: [PATCH][Hashtable 5/6] Remove H1/H2 template parameters

2020-08-26 Thread Jonathan Wakely via Gcc-patches

On 26/08/20 16:55 +0100, Jonathan Wakely wrote:

On 26/08/20 16:30 +0100, Jonathan Wakely wrote:

I'm seeing new FAILures with this:

FAIL: 20_util/function_objects/searchers.cc (test for excess errors)
UNRESOLVED: 20_util/function_objects/searchers.cc compilation failed to produce 
executable
FAIL: experimental/functional/searchers.cc (test for excess errors)
UNRESOLVED: experimental/functional/searchers.cc compilation failed to produce 
executable

It looks like what you committed is not what you sent for review. The
patch sent for review has:

 /// Specialization: hash function and range-hashing function, no
 /// caching of hash codes.
 /// Provides typedef and accessor required by C++ 11.
 template
-struct _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2,
-  _Default_ranged_hash, false>
+  typename _Hash, typename _RangeHash, typename _Unused>
+struct _Hash_code_base<_Key, _Value, _ExtractKey, _Hash, _RangeHash,
+  _Unused, false>
   : private _Hashtable_ebo_helper<0, _ExtractKey>,
-  private _Hashtable_ebo_helper<1, _H1>,
-  private _Hashtable_ebo_helper<2, _H2>
+  private _Hashtable_ebo_helper<1, _Hash>
   {


But what you committed has:

 /// Specialization: hash function and range-hashing function, no
 /// caching of hash codes.
 /// Provides typedef and accessor required by C++ 11.
 template
-struct _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2,
-  _Default_ranged_hash, false>
-: private _Hashtable_ebo_helper<0, _ExtractKey>,
-  private _Hashtable_ebo_helper<1, _H1>,
-  private _Hashtable_ebo_helper<2, _H2>
+  typename _Hash, typename _RangeHash, typename _Unused>
+struct _Hash_code_base<_Key, _Value, _ExtractKey, _Hash, _RangeHash,
+  _Unused, false>
+: private _Hashtable_ebo_helper<0, _Hash>
   {


Note that you've changed the type of the base class from:

+  private _Hashtable_ebo_helper<1, _Hash>

to

+  private _Hashtable_ebo_helper<0, _Hash>

This causes an ambiguity:

/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/hashtable_policy.h:1706: error: 
'std::__detail::_Hashtable_ebo_helper<0, test03()::, true>' is an ambiguous base of 
'std::__detail::_Hashtable_base, std::__detail::_Select1st, 
test03()::, test03()::, std::__detail::_Mod_range_hashing, 
std::__detail::_Default_ranged_hash, std::__detail::_Hashtable_traits >'

However, what I don't understand is why we are storing that _Hash type
more than once as a base class. That seems wrong (but not something we
can change without ABI impact).


Ah, we're not storing it more than once.

The problem is:

 template
 struct _Hashtable_base
 : public _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2, _Hash,
   _Traits::__hash_cached::value>,
   private _Hashtable_ebo_helper<0, _Equal>

This has a base of _Hashtable_ebo_helper<0, _Equal> so it used to
have these bases:

_Hashtable_ebo_helper<0, _ExtractKey>
_Hashtable_ebo_helper<1, _Hash>
_Hashtable_ebo_helper<2, _RangeHash>
_Hashtable_ebo_helper<0, _Equal>

but after your change it has these bases:

_Hashtable_ebo_helper<0, _Hash>
_Hashtable_ebo_helper<0, _Equal>

In the case
where _Equal and _Hash are the same type (which is what I was testing
in the test that fail, because I'm sneaky) that means:

_Hashtable_ebo_helper<0, T>
_Hashtable_ebo_helper<0, T>

which is obviously ambiguous.

I think the _hash_code_base should still use the index 1 for its base
class, i.e. _Hashtable_ebo_helper<1, _Hash>. That way we have these:

_Hashtable_ebo_helper<1, _Hash>
_Hashtable_ebo_helper<0, _Equal>

which works even if they're the same types.


Here's a minimal test:

#include 

struct Evil : std::hash, std::equal_to
{
};

int main()
{
  std::unordered_map h;
  h.key_eq();
}

This fails on current trunk.



Re: [PATCH][Hashtable 5/6] Remove H1/H2 template parameters

2020-08-26 Thread Jonathan Wakely via Gcc-patches

On 26/08/20 16:30 +0100, Jonathan Wakely wrote:

On 25/08/20 15:30 +0100, Jonathan Wakely wrote:

On 17/08/20 19:13 +0200, François Dumont via Libstdc++ wrote:

Hi

    Here is the new proposal.

    As we can't remove template parameters I simply restore those 
that I tried to pass differently _H2 and _ExtractKey, so 
eventually I only remove usage of _Hash which I renamed in 
_Unused. Maybe I can keep the doc about it in hashtable.h and just 
add a remark saying that it is now unused.


    For _RangeHash, formerly _H2, and _ExtractKey I just stop 
maintaining any storage. When we need those I always use a value 
initialized instance. I kind of prefer the value initialization 
syntax because you can't confuse it with a function call but let 
me know if it is wrong and I should use _ExtractKey() or 
_RangeHash(). I also add some static assertions about those types 
regarding their noexcept qualifications.


    I also included in this patch the few changes left from 
[Hashtable 0/6] which are mostly _M_insert_unique_node and 
_M_insert_multi_node signature cleanup as the key part can be 
extracted from the inserted node.


    Tested under Linux x86_64, ok to commit ?

François

On 06/08/20 11:27 am, Jonathan Wakely wrote:

On 06/08/20 08:35 +0200, François Dumont wrote:

On 17/07/20 1:35 pm, Jonathan Wakely wrote:

I really like the general idea of getting rid of some of the
complexity and not supporting infinite customization. But we can do
that without changing mangled names of the _Hashtable specialiations.



I didn't thought we need to keep abi compatibility for extensions.


These aren't extensions though, they're part of std::unordered_map
etc.

Just because something like _Vector_base is an internal type rather
than something defined in the standard doesn't mean we can just change
its ABI, because that would change the ABI of std::vector. It the same
here.

Changing _Hashtable affects all users of std::unordered_map etc.







diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 7b772a475e3..1ba32a3c7e2 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -311,35 +303,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
"Cache the hash code or qualify your functors involved"
" in hash code and bucket index computation with noexcept");

-  // When hash codes are cached local iterator inherits from H2 functor
-  // which must then be default constructible.
-  static_assert(__if_hash_cached>::value,
+  // To get bucket index we need _RangeHash not to throw.
+  static_assert(is_nothrow_default_constructible<_RangeHash>::value,
"Functor used to map hash code to bucket index"
-   " must be default constructible");
+   " is nothrow default constructible");


Please phrase this as "must be nothrow default constructible".


+  static_assert(noexcept(
+   std::declval()((std::size_t)0, (std::size_t)0)),
+   "Functor used to map hash code to bucket index is noexcept");


Same here, "must be noexcept".

Otherwise this looks great, thanks. Please push.


I'm seeing new FAILures with this:

FAIL: 20_util/function_objects/searchers.cc (test for excess errors)
UNRESOLVED: 20_util/function_objects/searchers.cc compilation failed to produce 
executable
FAIL: experimental/functional/searchers.cc (test for excess errors)
UNRESOLVED: experimental/functional/searchers.cc compilation failed to produce 
executable

It looks like what you committed is not what you sent for review. The
patch sent for review has:

  /// Specialization: hash function and range-hashing function, no
  /// caching of hash codes.
  /// Provides typedef and accessor required by C++ 11.
  template
-struct _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2,
-  _Default_ranged_hash, false>
+  typename _Hash, typename _RangeHash, typename _Unused>
+struct _Hash_code_base<_Key, _Value, _ExtractKey, _Hash, _RangeHash,
+  _Unused, false>
: private _Hashtable_ebo_helper<0, _ExtractKey>,
-  private _Hashtable_ebo_helper<1, _H1>,
-  private _Hashtable_ebo_helper<2, _H2>
+  private _Hashtable_ebo_helper<1, _Hash>
{


But what you committed has:

  /// Specialization: hash function and range-hashing function, no
  /// caching of hash codes.
  /// Provides typedef and accessor required by C++ 11.
  template
-struct _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2,
-  _Default_ranged_hash, false>
-: private _Hashtable_ebo_helper<0, _ExtractKey>,
-  private _Hashtable_ebo_helper<1, _H1>,
-  private _Hashtable_ebo_helper<2, _H2>
+  typename _Hash, typename _RangeHash, typename _Unused>
+struct _Hash_code_base<_Key, _Value, _ExtractKey, _Hash, _RangeHash,
+  _Unused, false>
+

Re: [PATCH][Hashtable 5/6] Remove H1/H2 template parameters

2020-08-26 Thread Jonathan Wakely via Gcc-patches

On 25/08/20 15:30 +0100, Jonathan Wakely wrote:

On 17/08/20 19:13 +0200, François Dumont via Libstdc++ wrote:

Hi

    Here is the new proposal.

    As we can't remove template parameters I simply restore those 
that I tried to pass differently _H2 and _ExtractKey, so eventually 
I only remove usage of _Hash which I renamed in _Unused. Maybe I can 
keep the doc about it in hashtable.h and just add a remark saying 
that it is now unused.


    For _RangeHash, formerly _H2, and _ExtractKey I just stop 
maintaining any storage. When we need those I always use a value 
initialized instance. I kind of prefer the value initialization 
syntax because you can't confuse it with a function call but let me 
know if it is wrong and I should use _ExtractKey() or _RangeHash(). 
I also add some static assertions about those types regarding their 
noexcept qualifications.


    I also included in this patch the few changes left from 
[Hashtable 0/6] which are mostly _M_insert_unique_node and 
_M_insert_multi_node signature cleanup as the key part can be 
extracted from the inserted node.


    Tested under Linux x86_64, ok to commit ?

François

On 06/08/20 11:27 am, Jonathan Wakely wrote:

On 06/08/20 08:35 +0200, François Dumont wrote:

On 17/07/20 1:35 pm, Jonathan Wakely wrote:

I really like the general idea of getting rid of some of the
complexity and not supporting infinite customization. But we can do
that without changing mangled names of the _Hashtable specialiations.



I didn't thought we need to keep abi compatibility for extensions.


These aren't extensions though, they're part of std::unordered_map
etc.

Just because something like _Vector_base is an internal type rather
than something defined in the standard doesn't mean we can just change
its ABI, because that would change the ABI of std::vector. It the same
here.

Changing _Hashtable affects all users of std::unordered_map etc.







diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index 7b772a475e3..1ba32a3c7e2 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -311,35 +303,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
"Cache the hash code or qualify your functors involved"
" in hash code and bucket index computation with noexcept");

-  // When hash codes are cached local iterator inherits from H2 functor
-  // which must then be default constructible.
-  static_assert(__if_hash_cached>::value,
+  // To get bucket index we need _RangeHash not to throw.
+  static_assert(is_nothrow_default_constructible<_RangeHash>::value,
"Functor used to map hash code to bucket index"
-   " must be default constructible");
+   " is nothrow default constructible");


Please phrase this as "must be nothrow default constructible".


+  static_assert(noexcept(
+   std::declval()((std::size_t)0, (std::size_t)0)),
+   "Functor used to map hash code to bucket index is noexcept");


Same here, "must be noexcept".

Otherwise this looks great, thanks. Please push.


I'm seeing new FAILures with this:

FAIL: 20_util/function_objects/searchers.cc (test for excess errors)
UNRESOLVED: 20_util/function_objects/searchers.cc compilation failed to produce 
executable
FAIL: experimental/functional/searchers.cc (test for excess errors)
UNRESOLVED: experimental/functional/searchers.cc compilation failed to produce 
executable

It looks like what you committed is not what you sent for review. The
patch sent for review has:

   /// Specialization: hash function and range-hashing function, no
   /// caching of hash codes.
   /// Provides typedef and accessor required by C++ 11.
   template
-struct _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2,
-  _Default_ranged_hash, false>
+  typename _Hash, typename _RangeHash, typename _Unused>
+struct _Hash_code_base<_Key, _Value, _ExtractKey, _Hash, _RangeHash,
+  _Unused, false>
 : private _Hashtable_ebo_helper<0, _ExtractKey>,
-  private _Hashtable_ebo_helper<1, _H1>,
-  private _Hashtable_ebo_helper<2, _H2>
+  private _Hashtable_ebo_helper<1, _Hash>
 {


But what you committed has:

   /// Specialization: hash function and range-hashing function, no
   /// caching of hash codes.
   /// Provides typedef and accessor required by C++ 11.
   template
-struct _Hash_code_base<_Key, _Value, _ExtractKey, _H1, _H2,
-  _Default_ranged_hash, false>
-: private _Hashtable_ebo_helper<0, _ExtractKey>,
-  private _Hashtable_ebo_helper<1, _H1>,
-  private _Hashtable_ebo_helper<2, _H2>
+  typename _Hash, typename _RangeHash, typename _Unused>
+struct _Hash_code_base<_Key, _Value, _ExtractKey, _Hash, _RangeHash,
+  _Unused, false>
+: private _Hashtable_ebo_helper<0, 

Re: [PATCH] lra: Canonicalize mult to shift in address reloads

2020-08-26 Thread Alex Coplan
Thanks for the review, both.

On 26/08/2020 09:19, Vladimir Makarov wrote:
> 
> On 2020-08-26 5:06 a.m., Richard Sandiford wrote:
> > Alex Coplan  writes:
> > 
> > Minor nit, should be formatted as:
> > 
> > static rtx
> > canonicalize_reload_addr (rtx addr)
> Sorry for missing this.  Alex, it should be fixed anyway.
> > 
> > I don't think we should we restrict this to (plus (mult X Y) Z),
> > since addresses can be more complicated than that.  One way to
> > search for all MULTs is:
> > 
> >subrtx_iterator::array_type array;
> >FOR_EACH_SUBRTX (iter, array, x, NONCONST)
> >  {
> >rtx x = *iter;
> >if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
> >  ...
> >  }
> > 
> > (Needs rtl-iter.h)
> 
> I am agree it would be nice to process a general case.  Alex, you can do
> this as a separate patch if you want.
> 
> Richard, thank you for looking at this patch too.
> 
> 

Please find a reworked version of the patch attached incorporating
Richard's feedback.

Testing:
 * Bootstrap and regtest on aarch64-none-linux-gnu,
   arm-none-linux-gnueabihf, and x86_64-pc-linux-gnu: no regressions.

Is the updated patch OK for master?

Thanks,
Alex
diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index 421c453997b..580da9c3ed6 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -131,6 +131,7 @@
 #include "lra-int.h"
 #include "print-rtl.h"
 #include "function-abi.h"
+#include "rtl-iter.h"
 
 /* Value of LRA_CURR_RELOAD_NUM at the beginning of BB of the current
insn.  Remember that LRA_CURR_RELOAD_NUM is the number of emitted
@@ -570,6 +571,33 @@ init_curr_insn_input_reloads (void)
   curr_insn_input_reloads_num = 0;
 }
 
+/* The canonical form of an rtx inside a MEM is not necessarily the same as the
+   canonical form of the rtx outside the MEM.  Fix this up in the case that
+   we're reloading an address (and therefore pulling it outside a MEM).  */
+static rtx
+canonicalize_reload_addr (rtx addr)
+{
+  subrtx_var_iterator::array_type array;
+  FOR_EACH_SUBRTX_VAR (iter, array, addr, NONCONST)
+{
+  rtx x = *iter;
+  if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
+   {
+ const HOST_WIDE_INT ci = INTVAL (XEXP (x, 1));
+ const int pwr2 = exact_log2 (ci);
+ if (pwr2 > 0)
+   {
+ /* Rewrite this to use a shift instead, which is canonical when
+outside of a MEM.  */
+ PUT_CODE (x, ASHIFT);
+ XEXP (x, 1) = GEN_INT (pwr2);
+   }
+   }
+}
+
+  return addr;
+}
+
 /* Create a new pseudo using MODE, RCLASS, ORIGINAL or reuse already
created input reload pseudo (only if TYPE is not OP_OUT).  Don't
reuse pseudo if IN_SUBREG_P is true and the reused pseudo should be
@@ -4362,12 +4390,19 @@ curr_insn_transform (bool check_only_p)
{
  rtx addr = *loc;
  enum rtx_code code = GET_CODE (addr);
- 
+ bool align_p = false;
+
  if (code == AND && CONST_INT_P (XEXP (addr, 1)))
-   /* (and ... (const_int -X)) is used to align to X bytes.  */
-   addr = XEXP (*loc, 0);
+   {
+ /* (and ... (const_int -X)) is used to align to X bytes.  */
+ align_p = true;
+ addr = XEXP (*loc, 0);
+   }
+ else
+   addr = canonicalize_reload_addr (addr);
+
  lra_emit_move (new_reg, addr);
- if (addr != *loc)
+ if (align_p)
emit_move_insn (new_reg, gen_rtx_AND (GET_MODE (new_reg), 
new_reg, XEXP (*loc, 1)));
}
  before = get_insns ();
diff --git a/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c 
b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
new file mode 100644
index 000..36beed497a0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/mem-shift-canonical.c
@@ -0,0 +1,27 @@
+/* This test is a copy of gcc.dg/torture/pr34330.c: here we are looking for
+   specific patterns being matched in the AArch64 backend.  */
+
+/* { dg-do compile } */
+/* { dg-options "-Os -ftree-vectorize -dp" } */
+
+
+struct T
+{
+  int t;
+  struct { short s1, s2, s3, s4; } *s;
+};
+
+void
+foo (int *a, int *b, int *c, int *d, struct T *e)
+{
+  int i;
+  for (i = 0; i < e->t; i++)
+{
+  e->s[i].s1 = a[i];
+  e->s[i].s2 = b[i];
+  e->s[i].s3 = c[i];
+  e->s[i].s4 = d[i];
+}
+}
+
+/* { dg-final { scan-assembler-times "add_lsl_di" 3 } } */


[PATCH] tree-optimization/96565 - improve DSE with paths ending in noreturn

2020-08-26 Thread Richard Biener
This improves DSEs stmt walking by not considering a DEF without
uses for further processing (and thus giving up when there's two
paths to follow).

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2020-08-26  Richard Biener  

PR tree-optimization/96565
* tree-ssa-dse.c (dse_classify_store): Remove defs with
no uses from further processing.

* gcc.dg/tree-ssa/ssa-dse-40.c: New testcase.
* gcc.dg/builtin-object-size-4.c: Adjust.
---
 gcc/testsuite/gcc.dg/builtin-object-size-4.c |  3 +++
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-40.c   | 16 
 gcc/tree-ssa-dse.c   | 11 +++
 3 files changed, 30 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-40.c

diff --git a/gcc/testsuite/gcc.dg/builtin-object-size-4.c 
b/gcc/testsuite/gcc.dg/builtin-object-size-4.c
index c22654dea2a..9f159e36a0f 100644
--- a/gcc/testsuite/gcc.dg/builtin-object-size-4.c
+++ b/gcc/testsuite/gcc.dg/builtin-object-size-4.c
@@ -170,6 +170,9 @@ test1 (void *q, int x)
   r = (char *) L"abcd\0efg";
   if (__builtin_object_size (r + 2, 3) != sizeof (L"abcd\0efg") - 2)
 abort ();
+  /* Prevent DSE from removing calls that prevent bad combining of
+ addresses and offsets.  */
+  asm volatile ("" : : "g" (&a));
 }
 
 size_t l1 = 1;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-40.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-40.c
new file mode 100644
index 000..36f69c05d5b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-40.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1-details" } */
+
+_Bool g(void);
+
+void f(int x)
+{
+  char arr[x];
+
+  arr[0] = 0;
+
+  if (g())
+__builtin_abort();
+}
+
+/* { dg-final { scan-tree-dump "Deleted dead store" "dse1" } } */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index cc93f559286..76eed06f17f 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -898,6 +898,17 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
*by_clobber_p = false;
  defs.unordered_remove (i);
}
+ /* If the path ends here we do not need to process it further.
+This for example happens with calls to noreturn functions.  */
+ else if (gimple_code (def) != GIMPLE_PHI
+  && has_zero_uses (gimple_vdef (def)))
+   {
+ /* But if the store is to global memory it is definitely
+not dead.  */
+ if (ref_may_alias_global_p (ref))
+   return DSE_STORE_LIVE;
+ defs.unordered_remove (i);
+   }
  /* In addition to kills we can remove defs whose only use
 is another def in defs.  That can only ever be PHIs of which
 we track a single for simplicity reasons (we fail for multiple
-- 
2.25.1


Re: RFQ: -R remark options

2020-08-26 Thread Martin Sebor via Gcc-patches

On 8/26/20 7:37 AM, Nathan Sidwell wrote:

Hi,
I had a need to add a new type of informative message on the modules 
branch, with an option to enable it.  The message is not a warning or an 
error, but just 'hey, you asked if X happens.  It happens just here'. 
This is emitted as a note.  I chose -fnote-$NAME for the option, but 
that wasn't a very satisfactory solution.


I have learnt that clang has -R$NAME options for exactly this kind of 
informative note.  Presumably with -Rno-$NAME negation.


Presumably we could also use this for optimization notes -- 'I failed to 
vectorize this loop because ...'


thoughts?


I think it would be useful to issue informational messages under
the control of a dedicated option.

At the same time, it would also be helpful to make it easier to
keep notes from issuing unintentionally, when a warning is disabled
but the inform() call neglects to check the result of the prior
warning() call.

Finally, another related problem to solve is the incidental
dependency of code generation on the effect of a warning that's
sometimes introduced if a codegen decision is inadvertently made
in the block controlled by the 'if (warning ())' statement.

One idea to solve all three problems is to add an option argument
to inform() to either associate the note with it and issue the note
only when the option is enabled.  A note-only option would then
control standalone notes.

With that, the usual sequence of warning-then-note would be coded
without any tests of return values, like so:

warning_at (warnloc, OPT_Wfoo, "this foo is bad");
/* ... */
inform (noteloc, OPT_Wfoo, "the last foo is here");

Admittedly, this isn't a foolproof solution as it wouldn't prevent
the note from being issued if the source code parsed in between
the calls changed the -Wfoo's disposition via a #pragma GCC
diagnostic.  To solve that, the paired warning() and inform() calls
would need to be associated more closely than by an option.  An idea
is to either change the functions to return and take a unique token
or handle as an argument.  An implementation of this scheme is to
use the auto_diagnostic_group class as this token:

  auto_diagnostic_group adg (OPT_Wfoo);
  adg.warning (warnloc, "this foo is bad");
  /* ... */
  adg.inform (noteloc, "the last foo is here"");

This approach could also work to issue just the notes (for note-
specific options).

Martin


Re: PING: Fwd: [PATCH] Adjust tree-ssa-strlen.c for irange API.

2020-08-26 Thread Aldy Hernandez via Gcc-patches

PING * 2

On 8/11/20 1:55 PM, Aldy Hernandez wrote:


-- Forwarded message -
From: *Aldy Hernandez* mailto:al...@redhat.com>>
Date: Tue, Aug 4, 2020, 13:34
Subject: [PATCH] Adjust tree-ssa-strlen.c for irange API.
To: mailto:gcc-patches@gcc.gnu.org>>
Cc: mailto:l...@redhat.com>>, >, Aldy Hernandez >



This patch adapts the strlen pass to use the irange API.

I wasn't able to remove the one annoying use of VR_ANTI_RANGE, because
I'm not sure what to do.  Perhaps Martin can shed some light.  The
current code has:

   else if (rng == VR_ANTI_RANGE)
         {
           wide_int maxobjsize = wi::to_wide (TYPE_MAX_VALUE 
(ptrdiff_type_node));

           if (wi::ltu_p (cntrange[1], maxobjsize))
             {
               cntrange[0] = cntrange[1] + 1;
               cntrange[1] = maxobjsize;

Suppose we have ~[10,20], won't the above set cntrange[] to [21,MAX]?  Won't
this ignore the 0..9 that is part of the range?  What should we do here?

Anyways, I've left the anti-range in place, but the rest of the patch still
stands.

OK?

gcc/ChangeLog:

         * tree-ssa-strlen.c (get_range): Adjust for irange API.
         (compare_nonzero_chars): Same.
         (dump_strlen_info): Same.
         (get_range_strlen_dynamic): Same.
         (set_strlen_range): Same.
         (maybe_diag_stxncpy_trunc): Same.
         (get_len_or_size): Same.
         (count_nonzero_bytes_addr): Same.
         (handle_integral_assign): Same.
---
  gcc/tree-ssa-strlen.c | 122 --
  1 file changed, 57 insertions(+), 65 deletions(-)

diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index fbaee745f7d..e6009874ee5 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -220,21 +220,25 @@ get_range (tree val, wide_int minmax[2], const 
vr_values *rvals /* = NULL */)

          GCC 11).  */
        const value_range *vr
         = (CONST_CAST (class vr_values *, rvals)->get_value_range (val));
-      value_range_kind rng = vr->kind ();
-      if (rng != VR_RANGE || !range_int_cst_p (vr))
+      if (vr->undefined_p () || vr->varying_p ())
         return NULL_TREE;

-      minmax[0] = wi::to_wide (vr->min ());
-      minmax[1] = wi::to_wide (vr->max ());
+      minmax[0] = vr->lower_bound ();
+      minmax[1] = vr->upper_bound ();
        return val;
      }

-  value_range_kind rng = get_range_info (val, minmax, minmax + 1);
-  if (rng == VR_RANGE)
-    return val;
+  value_range vr;
+  get_range_info (val, vr);
+  if (!vr.undefined_p () && !vr.varying_p ())
+    {
+      minmax[0] = vr.lower_bound ();
+      minmax[1] = vr.upper_bound ();
+      return val;
+    }

-  /* Do not handle anti-ranges and instead make use of the on-demand
-     VRP if/when it becomes available (hopefully in GCC 11).  */
+  /* We should adjust for the on-demand VRP if/when it becomes
+     available (hopefully in GCC 11).  */
    return NULL_TREE;
  }

@@ -278,16 +282,18 @@ compare_nonzero_chars (strinfo *si, unsigned 
HOST_WIDE_INT off,

      = (CONST_CAST (class vr_values *, rvals)
         ->get_value_range (si->nonzero_chars));

-  value_range_kind rng = vr->kind ();
-  if (rng != VR_RANGE || !range_int_cst_p (vr))
+  if (vr->undefined_p () || vr->varying_p ())
      return -1;

    /* If the offset is less than the minimum length or if the bounds
       of the length range are equal return the result of the comparison
       same as in the constant case.  Otherwise return a conservative
       result.  */
-  int cmpmin = compare_tree_int (vr->min (), off);
-  if (cmpmin > 0 || tree_int_cst_equal (vr->min (), vr->max ()))
+  tree type = TREE_TYPE (si->nonzero_chars);
+  tree tmin = wide_int_to_tree (type, vr->lower_bound ());
+  tree tmax = wide_int_to_tree (type, vr->upper_bound ());
+  int cmpmin = compare_tree_int (tmin, off);
+  if (cmpmin > 0 || tree_int_cst_equal (tmin, tmax))
      return cmpmin;

    return -1;
@@ -905,32 +911,14 @@ dump_strlen_info (FILE *fp, gimple *stmt, const 
vr_values *rvals)

                   print_generic_expr (fp, si->nonzero_chars);
                   if (TREE_CODE (si->nonzero_chars) == SSA_NAME)
                     {
-                     value_range_kind rng = VR_UNDEFINED;
-                     wide_int min, max;
+                     value_range vr;
                       if (rvals)
-                       {
-                         const value_range *vr
-                           = CONST_CAST (class vr_values *, rvals)
-                           ->get_value_range (si->nonzero_chars);
-                         rng = vr->kind ();
-                         if (range_int_cst_p (vr))
-                           {
-                             min = wi::to_wide (vr->min ());
-                             max = wi::to_wide (vr->max ());
-                           }
-                         else
-                           rng = VR_UNDEFINED;
-                       }
+                     

Re: PING: Fwd: [PATCH] Rewrite get_size_range for irange API.

2020-08-26 Thread Aldy Hernandez via Gcc-patches

PING * 2

On 8/11/20 1:56 PM, Aldy Hernandez wrote:


-- Forwarded message -
From: *Aldy Hernandez* mailto:al...@redhat.com>>
Date: Thu, Aug 6, 2020, 16:54
Subject: [PATCH] Rewrite get_size_range for irange API.
To: mailto:gcc-patches@gcc.gnu.org>>
Cc: mailto:mse...@redhat.com>>, Aldy Hernandez 
mailto:al...@redhat.com>>



[Martin, does this sound reasonable to you?]

The following patch converts get_size_range to the irange API, thus
removing the use of VR_ANTI_RANGE.

This was a bit tricky because of the gymnastics we do in get_size_range
to ignore negatives and all that.  I didn't convert the function for
multi-ranges.  The code still returns a pair of trees indicating the
massaged range.  But I do believe the code is cleaner and smaller.

I'm not sure the current code (or my adaptation) gets all cases, but
the goal was to keep to the existing functionality, nothing more.

OK?

gcc/ChangeLog:

         * calls.c (range_remove_non_positives): New.
         (set_bounds_from_range): New.
         (get_size_range): Rewrite for irange API.
         * tree-affine.c (expr_to_aff_combination): Call 
determine_value_range

         with a range.
         * tree-vrp.c (determine_value_range_1): Rename to...
         (determine_value_range): ...this.
         * tree-vrp.h (determine_value_range): Adjust prototype.
---
  gcc/calls.c       | 139 ++
  gcc/tree-affine.c |   5 +-
  gcc/tree-vrp.c    |  44 ++-
  gcc/tree-vrp.h    |   2 +-
  4 files changed, 73 insertions(+), 117 deletions(-)

diff --git a/gcc/calls.c b/gcc/calls.c
index 44401e6350d..4aeeb36a2be 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -57,6 +57,7 @@ along with GCC; see the file COPYING3.  If not see
  #include "attribs.h"
  #include "builtins.h"
  #include "gimple-fold.h"
+#include "range.h"

  /* Like PREFERRED_STACK_BOUNDARY but in units of bytes, not bits.  */
  #define STACK_BYTES (PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT)
@@ -1237,6 +1238,31 @@ alloc_max_size (void)
    return alloc_object_size_limit;
  }

+// Remove non-positive numbers from a range.  ALLOW_ZERO is TRUE if 0
+// is considered positive.
+
+static void
+range_remove_non_positives (irange *vr, bool allow_zero)
+{
+  tree floor, type = vr->type ();
+  if (allow_zero)
+    floor = build_zero_cst (type);
+  else
+    floor = build_one_cst (type);
+  value_range positives (floor, TYPE_MAX_VALUE (type));
+  vr->intersect (positives);
+}
+
+// Set the extreme bounds of range VR into range[].
+
+static bool
+set_bounds_from_range (const irange *vr, tree range[2])
+{
+  range[0] = wide_int_to_tree (vr->type (), vr->lower_bound ());
+  range[1] = wide_int_to_tree (vr->type (), vr->upper_bound ());
+  return true;
+}
+
  /* Return true when EXP's range can be determined and set RANGE[] to it
     after adjusting it if necessary to make EXP a represents a valid size
     of object, or a valid size argument to an allocation function declared
@@ -1250,9 +1276,11 @@ alloc_max_size (void)
  bool
  get_size_range (tree exp, tree range[2], bool allow_zero /* = false */)
  {
-  if (!exp)
-    return false;
-
+  if (!exp || !INTEGRAL_TYPE_P (TREE_TYPE (exp)))
+    {
+      range[0] = range[1] = NULL_TREE;
+      return false;
+    }
    if (tree_fits_uhwi_p (exp))
      {
        /* EXP is a constant.  */
@@ -1261,91 +1289,30 @@ get_size_range (tree exp, tree range[2], bool 
allow_zero /* = false */)

      }

    tree exptype = TREE_TYPE (exp);
-  bool integral = INTEGRAL_TYPE_P (exptype);
-
-  wide_int min, max;
-  enum value_range_kind range_type;
-
-  if (integral)
-    range_type = determine_value_range (exp, &min, &max);
-  else
-    range_type = VR_VARYING;
-
-  if (range_type == VR_VARYING)
+  value_range vr;
+  determine_value_range (&vr, exp);
+  if (vr.num_pairs () == 1)
+    return set_bounds_from_range (&vr, range);
+
+  widest_irange positives (vr);
+  range_remove_non_positives (&positives, allow_zero);
+
+  // If all numbers are negative, let the caller sort it out.
+  if (positives.undefined_p ())
+    return set_bounds_from_range (&vr, range);
+
+  // Remove the unknown parts of a multi-range.
+  // This will transform [5,10][20,MAX] into [5,10].
+  int pairs = positives.num_pairs ();
+  if (pairs > 1
+      && positives.upper_bound () == wi::to_wide (TYPE_MAX_VALUE 
(exptype)))

      {
-      if (integral)
-       {
-         /* Use the full range of the type of the expression when
-            no value range information is available.  */
-         range[0] = TYPE_MIN_VALUE (exptype);
-         range[1] = TYPE_MAX_VALUE (exptype);
-         return true;
-       }
-
-      range[0] = NULL_TREE;
-      range[1] = NULL_TREE;
-      return false;
+      value_range last_range (exptype,
+                             positives.lower_bound (pairs - 1),
+                             positives.upper_bound (pairs - 1), 
VR_ANTI_RANGE);

+      positives.intersect (last_range);
      }
-
-  unsigned expprec = TYPE_P

Re: RFQ: -R remark options

2020-08-26 Thread Nathan Sidwell

On 8/26/20 9:40 AM, Richard Biener wrote:

On Wed, Aug 26, 2020 at 3:37 PM Nathan Sidwell  wrote:


Hi,
I had a need to add a new type of informative message on the modules
branch, with an option to enable it.  The message is not a warning or an
error, but just 'hey, you asked if X happens.  It happens just here'.
This is emitted as a note.  I chose -fnote-$NAME for the option, but
that wasn't a very satisfactory solution.

I have learnt that clang has -R$NAME options for exactly this kind of
informative note.  Presumably with -Rno-$NAME negation.

Presumably we could also use this for optimization notes -- 'I failed to
vectorize this loop because ...'

thoughts?


We have -fopt-info-X-Y for this.


ah, thanks.

 Perhaps -f-info-$NAME?
-flang-info-$NAME?





nathan

--
Nathan Sidwell



--
Nathan Sidwell


[PATCH] tree-optimization/96698 - fix ICE when vectorizing nested cycles

2020-08-26 Thread Richard Biener
This fixes vectorized PHI latch edge updating and delay it until
all of the loop is code generated to deal with the case that the
latch def is a PHI in the same block.

Boostrapped and tested on x86_64-unknown-linux-gnu, pushed.

2020-08-26  Richard Biener  

PR tree-optimization/96698
* tree-vectorizer.h (loop_vec_info::reduc_latch_defs): New.
(loop_vec_info::reduc_latch_slp_defs): Likewise.
* tree-vect-stmts.c (vect_transform_stmt): Only record
stmts to update PHI latches from, perform the update ...
* tree-vect-loop.c (vect_transform_loop): ... here after
vectorizing those PHIs.
(info_for_reduction): Properly handle non-reduction PHIs.

* gcc.dg/vect/pr96698.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr96698.c | 19 
 gcc/tree-vect-loop.c| 35 -
 gcc/tree-vect-stmts.c   | 29 +---
 gcc/tree-vectorizer.h   |  5 +
 4 files changed, 63 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr96698.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr96698.c 
b/gcc/testsuite/gcc.dg/vect/pr96698.c
new file mode 100644
index 000..1d141c1dfff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr96698.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+
+void test(int a, int* i)
+{
+  for (; a < 5; ++a)
+{
+  int b = 0;
+  int c = 0;
+  for (; b != -11; b--)
+   for (int d = 0; d ==0; d++)
+ {
+   *i += c & a;
+   c = b;
+ }
+}
+}
+
+/* We should be able to vectorize the inner cycle.  */
+/* { dg-final { scan-tree-dump "OUTER LOOP VECTORIZED" "vect" { target 
vect_int } } } */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index a92813eb2ac..50abb2b2f3c 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -4646,7 +4646,8 @@ info_for_reduction (vec_info *vinfo, stmt_vec_info 
stmt_info)
 {
   stmt_info = vect_orig_stmt (stmt_info);
   gcc_assert (STMT_VINFO_REDUC_DEF (stmt_info));
-  if (!is_a  (stmt_info->stmt))
+  if (!is_a  (stmt_info->stmt)
+  || !VECTORIZABLE_CYCLE_DEF (STMT_VINFO_DEF_TYPE (stmt_info)))
 stmt_info = STMT_VINFO_REDUC_DEF (stmt_info);
   gphi *phi = as_a  (stmt_info->stmt);
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
@@ -9031,6 +9032,38 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple 
*loop_vectorized_call)
}
}
 
+  /* Fill in backedge defs of reductions.  */
+  for (unsigned i = 0; i < loop_vinfo->reduc_latch_defs.length (); ++i)
+   {
+ stmt_vec_info stmt_info = loop_vinfo->reduc_latch_defs[i];
+ stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
+ vec &phi_info
+   = STMT_VINFO_VEC_STMTS (STMT_VINFO_REDUC_DEF (orig_stmt_info));
+ vec &vec_stmt
+   = STMT_VINFO_VEC_STMTS (stmt_info);
+ gcc_assert (phi_info.length () == vec_stmt.length ());
+ gphi *phi
+   = dyn_cast  (STMT_VINFO_REDUC_DEF (orig_stmt_info)->stmt);
+ edge e = loop_latch_edge (gimple_bb (phi_info[0])->loop_father);
+ for (unsigned j = 0; j < phi_info.length (); ++j)
+   add_phi_arg (as_a  (phi_info[j]),
+gimple_get_lhs (vec_stmt[j]), e,
+gimple_phi_arg_location (phi, e->dest_idx));
+   }
+  for (unsigned i = 0; i < loop_vinfo->reduc_latch_slp_defs.length (); ++i)
+   {
+ slp_tree slp_node = loop_vinfo->reduc_latch_slp_defs[i].first;
+ slp_tree phi_node = loop_vinfo->reduc_latch_slp_defs[i].second;
+ gphi *phi = as_a  (SLP_TREE_SCALAR_STMTS (phi_node)[0]->stmt);
+ e = loop_latch_edge (gimple_bb (phi)->loop_father);
+ gcc_assert (SLP_TREE_VEC_STMTS (phi_node).length ()
+ == SLP_TREE_VEC_STMTS (slp_node).length ());
+ for (unsigned j = 0; j < SLP_TREE_VEC_STMTS (phi_node).length (); ++j)
+   add_phi_arg (as_a  (SLP_TREE_VEC_STMTS (phi_node)[j]),
+vect_get_slp_vect_def (slp_node, j),
+e, gimple_phi_arg_location (phi, e->dest_idx));
+   }
+
   /* Stub out scalar statements that must not survive vectorization.
 Doing this here helps with grouped statements, or statements that
 are involved in patterns.  */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 65e30bac424..f6532558693 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -10912,8 +10912,8 @@ vect_transform_stmt (vec_info *vinfo,
   if (STMT_VINFO_TYPE (stmt_info) == store_vec_info_type)
 return is_store;
 
-  /* If this stmt defines a value used on a backedge, update the
- vectorized PHIs.  */
+  /* If this stmt defines a value used on a backedge, record it so
+ we can update the vectorized PHIs later.  */
   stmt_vec_info orig_stmt_info = vect_orig_stmt (stmt_info);
   stmt_vec_info reduc_i

Re: RFQ: -R remark options

2020-08-26 Thread Nathan Sidwell

On 8/26/20 9:50 AM, Rainer Orth wrote:

Hi Nathan,


Hi,
I had a need to add a new type of informative message on the modules
branch, with an option to enable it.  The message is not a warning or an
error, but just 'hey, you asked if X happens.  It happens just here'. This
is emitted as a note.  I chose -fnote-$NAME for the option, but that wasn't
a very satisfactory solution.

I have learnt that clang has -R$NAME options for exactly this kind of
informative note.  Presumably with -Rno-$NAME negation.

Presumably we could also use this for optimization notes -- 'I failed to
vectorize this loop because ...'

thoughts?


this would be very bad on Solaris: it's been using -R (the native form
of GNU ld -rpath) for ages; changing this now would cause much breakage.


oh gosh, I remember that :) thanks for the reminder!

nathan

--
Nathan Sidwell


Re: [PATCH 1/3] vec: add exact argument for various grow functions.

2020-08-26 Thread Martin Liška

On 8/12/20 2:28 PM, Martin Liška wrote:

I guess Richi can defend his strategy for this


Richi?

Martin


Re: RFQ: -R remark options

2020-08-26 Thread Rainer Orth
Hi Nathan,

> Hi,
> I had a need to add a new type of informative message on the modules
> branch, with an option to enable it.  The message is not a warning or an
> error, but just 'hey, you asked if X happens.  It happens just here'. This
> is emitted as a note.  I chose -fnote-$NAME for the option, but that wasn't
> a very satisfactory solution.
>
> I have learnt that clang has -R$NAME options for exactly this kind of
> informative note.  Presumably with -Rno-$NAME negation.
>
> Presumably we could also use this for optimization notes -- 'I failed to
> vectorize this loop because ...'
>
> thoughts?

this would be very bad on Solaris: it's been using -R (the native form
of GNU ld -rpath) for ages; changing this now would cause much breakage.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: RFQ: -R remark options

2020-08-26 Thread David Malcolm via Gcc-patches
On Wed, 2020-08-26 at 09:37 -0400, Nathan Sidwell wrote:
> Hi,
> I had a need to add a new type of informative message on the modules 
> branch, with an option to enable it.  The message is not a warning or
> an 
> error, but just 'hey, you asked if X happens.  It happens just
> here'. 
> This is emitted as a note.  I chose -fnote-$NAME for the option, but 
> that wasn't a very satisfactory solution.
> 
> I have learnt that clang has -R$NAME options for exactly this kind
> of 
> informative note.  Presumably with -Rno-$NAME negation.
> 
> Presumably we could also use this for optimization notes -- 'I failed
> to 
> vectorize this loop because ...'
> 
> thoughts?

I like the idea.

FWIW I implemented something like this as part of my revamping of
optinfo in gcc 9.

I wrote a patch that added "remarks" to the diagnostic framework, which
can be seen here:
  https://gcc.gnu.org/legacy-ml/gcc-patches/2018-07/msg00067.html
though that patch is somewhat coupled to the optinfo stuff.  IIRC it
was somehow eventually dropped; I think I ended up just
extending/updating the -fopt-info output, and the "remarks"
infrastructure never made it into trunk (possibly due to a desire to
avoid duplication; I can't quite remember the details).

Dave



Re: [PATCH 1/2] IPA symver: allow multiple symvers for a definition

2020-08-26 Thread Martin Liška

Hi.

There's one obvious fix that I've just noticed.

I'm going to install it.
Martin
>From a5d075d595d35cd5d6607bbd9c8b2eb7707ca920 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 26 Aug 2020 13:18:14 +0200
Subject: [PATCH] symver: fix attribute matching.

gcc/ChangeLog:

	* cgraphunit.c (process_symver_attribute): Match only symver
	TREE_PURPOSE.
---
 gcc/cgraphunit.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
index fa3aec79a48..26d3995a0c0 100644
--- a/gcc/cgraphunit.c
+++ b/gcc/cgraphunit.c
@@ -727,6 +727,9 @@ process_symver_attribute (symtab_node *n)
 	  .symver foo, bar@V1
 	  .symver foo, baz@V2
   */
+  const char *purpose = IDENTIFIER_POINTER (TREE_PURPOSE (value));
+  if (strcmp (purpose, "symver") != 0)
+	continue;
 
   tree symver = get_identifier_with_length
 	(TREE_STRING_POINTER (TREE_VALUE (TREE_VALUE (value))),
-- 
2.28.0



Re: RFQ: -R remark options

2020-08-26 Thread Richard Biener via Gcc-patches
On Wed, Aug 26, 2020 at 3:37 PM Nathan Sidwell  wrote:
>
> Hi,
> I had a need to add a new type of informative message on the modules
> branch, with an option to enable it.  The message is not a warning or an
> error, but just 'hey, you asked if X happens.  It happens just here'.
> This is emitted as a note.  I chose -fnote-$NAME for the option, but
> that wasn't a very satisfactory solution.
>
> I have learnt that clang has -R$NAME options for exactly this kind of
> informative note.  Presumably with -Rno-$NAME negation.
>
> Presumably we could also use this for optimization notes -- 'I failed to
> vectorize this loop because ...'
>
> thoughts?

We have -fopt-info-X-Y for this.

>
> nathan
>
> --
> Nathan Sidwell


RFQ: -R remark options

2020-08-26 Thread Nathan Sidwell

Hi,
I had a need to add a new type of informative message on the modules 
branch, with an option to enable it.  The message is not a warning or an 
error, but just 'hey, you asked if X happens.  It happens just here'. 
This is emitted as a note.  I chose -fnote-$NAME for the option, but 
that wasn't a very satisfactory solution.


I have learnt that clang has -R$NAME options for exactly this kind of 
informative note.  Presumably with -Rno-$NAME negation.


Presumably we could also use this for optimization notes -- 'I failed to 
vectorize this loop because ...'


thoughts?

nathan

--
Nathan Sidwell


Re: [PATCH] Fortran : ICE for division by zero in declaration PR95882

2020-08-26 Thread Thomas Koenig via Gcc-patches

Hi Mark,


OK to commit and backport?


OK.  Thanks for the patch!

Best regards

Thomas


Re: [PATCH] Fortran : ICE on invalid code PR95398

2020-08-26 Thread Thomas Koenig via Gcc-patches

Hi Mark,

Please find attach a fix for PR95398.  The original patch was  by Steve 
Kargl.


OK to commit?


OK.  Thanks for taking this on!

Regards

Thomas


Re: [PATCH] lra: Canonicalize mult to shift in address reloads

2020-08-26 Thread Vladimir Makarov via Gcc-patches



On 2020-08-26 5:06 a.m., Richard Sandiford wrote:

Alex Coplan  writes:

Minor nit, should be formatted as:

static rtx
canonicalize_reload_addr (rtx addr)

Sorry for missing this.  Alex, it should be fixed anyway.


I don't think we should we restrict this to (plus (mult X Y) Z),
since addresses can be more complicated than that.  One way to
search for all MULTs is:

   subrtx_iterator::array_type array;
   FOR_EACH_SUBRTX (iter, array, x, NONCONST)
 {
   rtx x = *iter;
   if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
 ...
 }

(Needs rtl-iter.h)


I am agree it would be nice to process a general case.  Alex, you can do 
this as a separate patch if you want.


Richard, thank you for looking at this patch too.




[PATCH] tree-optimization/96783 - fix vectorization of negative step SLP

2020-08-26 Thread Richard Biener
This appropriately uses VMAT_ELEMENTWISE when the vectors cannot be
filled from a single SLP group until we get more explicit support
for negative stride SLP.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

2020-08-26  Richard Biener  

PR tree-optimization/96783
* tree-vect-stmts.c (get_group_load_store_type): Use
VMAT_ELEMENTWISE for negative strides when we cannot
use VMAT_STRIDED_SLP.

* gcc.dg/vect/pr96783-1.c: New testcase.
* gcc.dg/vect/pr96783-2.c: Likewise.
---
 gcc/testsuite/gcc.dg/vect/pr96783-1.c | 38 +++
 gcc/testsuite/gcc.dg/vect/pr96783-2.c | 29 
 gcc/tree-vect-stmts.c | 10 ++-
 3 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr96783-1.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr96783-2.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr96783-1.c 
b/gcc/testsuite/gcc.dg/vect/pr96783-1.c
new file mode 100644
index 000..55d1364f056
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr96783-1.c
@@ -0,0 +1,38 @@
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+void __attribute__((noipa))
+foo (long *a, int off, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  long tem1 = a[0];
+  long tem2 = a[1];
+  long tem3 = a[2];
+  long tem4 = a[off + 1];
+  a[0] = tem4;
+  long tem5 = a[off + 2];
+  a[1] = tem5;
+  long tem6 = a[off + 3];
+  a[2] = tem6;
+  a[off + 1] = tem1;
+  a[off + 2] = tem2;
+  a[off + 3] = tem3;
+  a -= 3;
+}
+}
+
+int main ()
+{
+  long a[3 * 9];
+  check_vect ();
+  for (int i = 0; i < 3 * 9; ++i)
+a[i] = i;
+  foo (a + 3 * 5, 6-1, 5);
+  const long b[3 * 8] = { 0, 1, 2, 21, 22, 23, 18, 19, 20, 3, 4, 5, 6, 7, 8, 
9, 10, 11, 12, 13, 14, 15, 16, 17 };
+  for (int i = 0; i < 3 * 8; ++i)
+if (a[i] != b[i])
+  __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/vect/pr96783-2.c 
b/gcc/testsuite/gcc.dg/vect/pr96783-2.c
new file mode 100644
index 000..33c37109e3a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr96783-2.c
@@ -0,0 +1,29 @@
+/* { dg-do run } */
+
+#include "tree-vect.h"
+
+long a[1024];
+long b[1024];
+
+void __attribute__((noipa)) foo ()
+{
+  for (int i = 0; i < 256; ++i)
+{
+  a[3*i] = b[1023 - 3*i - 2];
+  a[3*i + 1] = b[1023 - 3*i - 1];
+  a[3*i + 2] = b[1023 - 3*i];
+}
+}
+
+int main()
+{
+  for (int i = 0; i < 1024; ++i)
+b[i] = i;
+  foo ();
+  for (int i = 0; i < 256; ++i)
+if (a[3*i] != 1023 - 3*i - 2
+   || a[3*i+1] != 1023 - 3*i - 1
+   || a[3*i+2] != 1023 - 3*i)
+  __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 65e30bac424..224be018af9 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2192,7 +2192,15 @@ get_group_load_store_type (vec_info *vinfo, 
stmt_vec_info stmt_info,
*memory_access_type = get_negative_load_store_type
   (vinfo, stmt_info, vectype, vls_type, 1);
  else
-   *memory_access_type = VMAT_STRIDED_SLP;
+   {
+ /* Try to use consecutive accesses of DR_GROUP_SIZE elements,
+separated by the stride, until we have a complete vector.
+Fall back to scalar accesses if that isn't possible.  */
+ if (multiple_p (nunits, group_size))
+   *memory_access_type = VMAT_STRIDED_SLP;
+ else
+   *memory_access_type = VMAT_ELEMENTWISE;
+   }
}
  else
{
-- 
2.26.2


Re: [PATCH 2/2] IPA symver: support visibility and static symbols.

2020-08-26 Thread H.J. Lu via Gcc-patches
On Wed, Aug 26, 2020 at 5:24 AM Martin Liška  wrote:
>
> On 8/26/20 2:22 PM, H.J. Lu wrote:
> > What is the question?
>
> What are the scenarios where the new syntax:
>
>.symver foo, foo@VERS_1, local# Change foo to a local symbol.

foo is local to the file.  Use it there is no global reference to foo.  It is
usually used to define foo@VERS_1.

>.symver foo, foo@VERS_2, hidden   # Change foo to a hidden symbol.
>

foo is local to shared object or executable.  It provides internal reference
to foo.

The difference is internal reference to foo.

-- 
H.J.


Re: [PATCH] add move CTOR to auto_vec, use auto_vec for get_loop_exit_edges

2020-08-26 Thread Richard Biener
On Thu, 6 Aug 2020, Richard Biener wrote:

> On Thu, 6 Aug 2020, Richard Biener wrote:
> 
> > This adds a move CTOR to auto_vec and makes use of a
> > auto_vec return value for get_loop_exit_edges denoting
> > that lifetime management of the vector is handed to the caller.
> > 
> > The move CTOR prompted the hash_table change because it appearantly
> > makes the copy CTOR implicitely deleted (good) and hash-table
> > expansion of the odr_enum_map which is
> > hash_map  where odr_enum has an
> > auto_vec member triggers this.  Not sure if
> > there's a latent bug there before this (I think we're not
> > invoking DTORs, but we're invoking copy-CTORs).
> > 
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > 
> > Does this all look sensible and is it a good change
> > (the get_loop_exit_edges one)?
> 
> Regtest went OK, here's an update with a complete ChangeLog
> (how useful..) plus the move assign operator deleted, copy
> assign wouldn't work as auto-generated and at the moment
> there's no use of assigning.  I guess if we'd have functions
> that take an auto_vec<> argument meaning they will destroy
> the vector that will become useful and we can implement it.
> 
> OK for trunk?

Ping.

> Thanks,
> Richard.
> 
> 
> From d74c346e95ff967d930b7c83daabc26b0227aea3 Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Thu, 6 Aug 2020 14:50:56 +0200
> Subject: [PATCH] add move CTOR to auto_vec, use auto_vec for
>  get_loop_exit_edges
> 
> This adds a move CTOR to auto_vec and makes use of a
> auto_vec return value for get_loop_exit_edges denoting
> that lifetime management of the vector is handed to the caller.
> 
> The move CTOR prompted the hash_table change because it appearantly
> makes the copy CTOR implicitely deleted (good) and hash-table
> expansion of the odr_enum_map which is
> hash_map  where odr_enum has an
> auto_vec member triggers this.  Not sure if
> there's a latent bug there before this (I think we're not
> invoking DTORs, but we're invoking copy-CTORs).
> 
> 2020-08-06  Richard Biener  
> 
>   * vec.h (auto_vec::auto_vec (auto_vec &&)): New move CTOR.
>   (auto_vec::operator=(auto_vec &&)): Delete.
>   * hash-table.h (hash_table::expand): Use std::move when expanding.
>   * cfgloop.h (get_loop_exit_edges): Return auto_vec.
>   * cfgloop.c (get_loop_exit_edges): Adjust.
>   * cfgloopmanip.c (fix_loop_placement): Likewise.
>   * ipa-fnsummary.c (analyze_function_body): Likewise.
>   * ira-build.c (create_loop_tree_nodes): Likewise.
>   (create_loop_tree_node_allocnos): Likewise.
>   (loop_with_complex_edge_p): Likewise.
>   * ira-color.c (ira_loop_edge_freq): Likewise.
>   * loop-unroll.c (analyze_insns_in_loop): Likewise.
>   * predict.c (predict_loops): Likewise.
>   * tree-predcom.c (last_always_executed_block): Likewise.
>   * tree-ssa-loop-ch.c (ch_base::copy_headers): Likewise.
>   * tree-ssa-loop-im.c (store_motion_loop): Likewise.
>   * tree-ssa-loop-ivcanon.c (loop_edge_to_cancel): Likewise.
>   (canonicalize_loop_induction_variables): Likewise.
>   * tree-ssa-loop-manip.c (get_loops_exits): Likewise.
>   * tree-ssa-loop-niter.c (find_loop_niter): Likewise.
>   (finite_loop_p): Likewise.
>   (find_loop_niter_by_eval): Likewise.
>   (estimate_numbers_of_iterations): Likewise.
>   * tree-ssa-loop-prefetch.c (emit_mfence_after_loop): Likewise.
>   (may_use_storent_in_loop_p): Likewise.
> ---
>  gcc/cfgloop.c|  4 ++--
>  gcc/cfgloop.h|  2 +-
>  gcc/cfgloopmanip.c   |  3 +--
>  gcc/hash-table.h |  2 +-
>  gcc/ipa-fnsummary.c  |  4 +---
>  gcc/ira-build.c  | 12 +++-
>  gcc/ira-color.c  |  4 +---
>  gcc/loop-unroll.c|  3 +--
>  gcc/predict.c|  9 ++---
>  gcc/tree-predcom.c   |  3 +--
>  gcc/tree-ssa-loop-ch.c   |  3 +--
>  gcc/tree-ssa-loop-im.c   |  3 +--
>  gcc/tree-ssa-loop-ivcanon.c  |  9 ++---
>  gcc/tree-ssa-loop-manip.c|  3 +--
>  gcc/tree-ssa-loop-niter.c| 20 +---
>  gcc/tree-ssa-loop-prefetch.c |  7 ++-
>  gcc/vec.h|  7 +++
>  17 files changed, 33 insertions(+), 65 deletions(-)
> 
> diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c
> index 7720e6e5d2c..33a26cca6a4 100644
> --- a/gcc/cfgloop.c
> +++ b/gcc/cfgloop.c
> @@ -1202,10 +1202,10 @@ release_recorded_exits (function *fn)
>  
>  /* Returns the list of the exit edges of a LOOP.  */
>  
> -vec 
> +auto_vec
>  get_loop_exit_edges (const class loop *loop, basic_block *body)
>  {
> -  vec edges = vNULL;
> +  auto_vec edges;
>edge e;
>unsigned i;
>edge_iterator ei;
> diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
> index 18b404e292f..f1687f37401 100644
> --- a/gcc/cfgloop.h
> +++ b/gcc/cfgloop.h
> @@ -383,7 +383,7 @@ extern basic_block *get_loop_body_in_custom_order (const 
> class loop *,
>  extern basic_block *get_loop_body_in_

c++: template operator lookup caching

2020-08-26 Thread Nathan Sidwell


Jason's fix to retain operator lookups inside dependent lambda
functions turns out to be needed on the modules branch for all
template functions.  Because the context for that lookup no longer
exists in imports.  There were also a couple of shortcomings, which
this patch fixes.

(a) we conflate 'we found nothing' and 'we can redo this at
instantiation time'.  Fixed by making the former produce
error_mark_node.  That needs a fix in name-lookup to know that finding
a binding containing error_mark_node, means 'stop looking' you found
nothing.

(b) we'd continually do lookups for every operator, if nothing needed
to be retained.  Fixed by always caching that information, and then
dealing with it when pushing the bindings.

(c) if what we found was that find by a global namespace lookup, we'd
not cache that.  But that'd cause us to find decls declared
after the template, potentially hiding those we expected to find.  So
don't do that check.

This still retains only recording on lambdas.  As the comment says, we
could enable for all templates.

gcc/cp/
* decl.c (poplevel): A local-binding tree list holds the name in
TREE_PURPOSE.
* name-lookup.c (update_local_overload): Add id to TREE_PURPOSE.
(lookup_name_1): Deal with local-binding error_mark_node marker.
(op_unqualified_lookup): Return error_mark_node for 'nothing
found'.  Retain global binding, check class binding here.
(maybe_save_operator_binding): Reimplement to always cache a
result.
(push_operator_bindings): Deal with 'ignore' marker.
gcc/testsuite/
* g++.dg/lookup/operator-1.C: New.
* g++.dg/lookup/operator-2.C: New.

pushed

nathan
--
Nathan Sidwell
diff --git c/gcc/cp/decl.c w/gcc/cp/decl.c
index d1af63e7300..5e17e4dc4b1 100644
--- c/gcc/cp/decl.c
+++ w/gcc/cp/decl.c
@@ -692,8 +692,18 @@ poplevel (int keep, int reverse, int functionbody)
   /* Remove declarations for all the DECLs in this level.  */
   for (link = decls; link; link = TREE_CHAIN (link))
 {
-  decl = TREE_CODE (link) == TREE_LIST ? TREE_VALUE (link) : link;
-  tree name = OVL_NAME (decl);
+  tree name;
+  if (TREE_CODE (link) == TREE_LIST)
+	{
+	  decl = TREE_VALUE (link);
+	  name = TREE_PURPOSE (link);
+	  gcc_checking_assert (name);
+	}
+  else
+	{
+	  decl = link;
+	  name = DECL_NAME (decl);
+	}
 
   /* Remove the binding.  */
   if (TREE_CODE (decl) == LABEL_DECL)
diff --git c/gcc/cp/name-lookup.c w/gcc/cp/name-lookup.c
index c68ea09e610..3c2ddc197e6 100644
--- c/gcc/cp/name-lookup.c
+++ w/gcc/cp/name-lookup.c
@@ -2305,7 +2305,7 @@ update_local_overload (cxx_binding *binding, tree newval)
 if (*d == binding->value)
   {
 	/* Stitch new list node in.  */
-	*d = tree_cons (NULL_TREE, NULL_TREE, TREE_CHAIN (*d));
+	*d = tree_cons (DECL_NAME (*d), NULL_TREE, TREE_CHAIN (*d));
 	break;
   }
 else if (TREE_CODE (*d) == TREE_LIST && TREE_VALUE (*d) == binding->value)
@@ -3239,7 +3239,7 @@ push_local_binding (tree id, tree decl, bool is_using)
   if (TREE_CODE (decl) == OVERLOAD || is_using)
 /* We must put the OVERLOAD or using into a TREE_LIST since we
cannot use the decl's chain itself.  */
-decl = build_tree_list (NULL_TREE, decl);
+decl = build_tree_list (id, decl);
 
   /* And put DECL on the list of things declared by the current
  binding level.  */
@@ -6539,13 +6539,20 @@ lookup_name_1 (tree name, LOOK_where where, LOOK_want want)
 		gcc_assert (TREE_CODE (binding) == TYPE_DECL);
 		continue;
 	  }
+
+	/* The saved lookups for an operator record 'nothing
+	   found' as error_mark_node.  We need to stop the search
+	   here, but not return the error mark node.  */
+	if (binding == error_mark_node)
+	  binding = NULL_TREE;
+
 	val = binding;
-	break;
+	goto found;
 	  }
   }
 
   /* Now lookup in namespace scopes.  */
-  if (!val && bool (where & LOOK_where::NAMESPACE))
+  if (bool (where & LOOK_where::NAMESPACE))
 {
   name_lookup lookup (name, want);
   if (lookup.search_unqualified
@@ -6553,6 +6560,8 @@ lookup_name_1 (tree name, LOOK_where where, LOOK_want want)
 	val = lookup.value;
 }
 
+ found:;
+
   /* If we have a single function from a using decl, pull it out.  */
   if (val && TREE_CODE (val) == OVERLOAD && !really_overloaded_fn (val))
 val = OVL_FUNCTION (val);
@@ -7598,23 +7607,38 @@ op_unqualified_lookup (tree fnname)
   cp_binding_level *l = binding->scope;
   while (l && !l->this_entity)
 	l = l->level_chain;
+
   if (l && uses_template_parms (l->this_entity))
 	/* Don't preserve decls from an uninstantiated template,
 	   wait until that template is instantiated.  */
 	return NULL_TREE;
 }
+
   tree fns = lookup_name (fnname);
-  if (fns && fns == get_global_binding (fnname))
-/* The instantiation can find these.  */
-return NULL_TREE;
+  if (!fns)
+/* Remember we found nothing!  */
+return error_mar

Re: [PATCH 2/2] IPA symver: support visibility and static symbols.

2020-08-26 Thread Martin Liška

On 8/26/20 2:22 PM, H.J. Lu wrote:

What is the question?


What are the scenarios where the new syntax:

  .symver foo, foo@VERS_1, local# Change foo to a local symbol.
  .symver foo, foo@VERS_2, hidden   # Change foo to a hidden symbol.

is useful?

Thanks,
Martin


Re: [PATCH 1/5] Don't enable -gvariable-location-views by default for DWARF5.

2020-08-26 Thread Alexandre Oliva
Hello, Mark,

On Aug 25, 2020, Mark Wielaard  wrote:

> On Tue, 2020-08-25 at 01:05 -0300, Alexandre Oliva wrote:
>> it would seem to
>> make more sense to adopt and promote the proposed extension,
>> implemented in =incompat5 in GCC, but it would need some
>> implementation work in consumers to at least ignore the extension.

> Is that what is described in:
> http://www.fsfla.org/~lxoliva/papers/sfn/dwarf6-sfn-lvu.txt

> Does it match the proposal in:
> http://www.dwarfstd.org/ShowIssue.php?issue=170427.1

Yes, the implementation of incompat5 is supposed to match the
enhancement to DWARF proposed in this issue.

> Should we try to introduce your extending loclists proposal:
> http://www.dwarfstd.org/ShowIssue.php?issue=170427.2

If it makes sense to you, sure.  I think it could be useful for loclist
compression, besides adding extensibility to an unusually rigid part of
DWARF specs.  I'm not attached to this particular formulation, though,
and I'm not aware of any implementations thereof.  I only felt the need
for extending loclists and found myself severely constrained.

> The main issue is that there is no formal way of extending the
> loclists.

*nod*

Thanks,

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


Re: [PATCH 2/2] IPA symver: support visibility and static symbols.

2020-08-26 Thread H.J. Lu via Gcc-patches
On Wed, Aug 26, 2020 at 4:34 AM Martin Liška  wrote:
>
> On 8/26/20 1:27 PM, Jan Hubicka wrote:
> >> On 8/26/20 11:22 AM, Jan Hubicka wrote:
>  On 8/25/20 8:46 PM, Jan Hubicka wrote:
> > What will happen here with protected visibility?
> 
>  I forgot about it. Should it be mapped also to "local"?
> 
>  +  const char *visibility = NULL;
>  +  if (!TREE_PUBLIC (origin_decl))
>  +visibility = "remove";
>  +  else if (DECL_VISIBILITY (origin_decl) == VISIBILITY_INTERNAL
>  +  || DECL_VISIBILITY (origin_decl) == VISIBILITY_PROTECTED)
>  +visibility = "local";
>  +  else if (DECL_VISIBILITY (origin_decl) == VISIBILITY_HIDDEN)
>  +visibility = "hidden";
> >>>
> >>> I have no idea (depends what gas will do), you need to check if the
> >>> resulting symbol will indeed have right visibility in all of the cases.
> >>> If some of them are not possible, I suppose we could just reject the
> >>> comination.
> >>
> >> Ok, so I experimented a bit with the .symver attribute I don't see how
> >> is the new syntax handy (I mean .symver foo, foo@VERS_2, hidden and
> >> .symver foo, foo@VERS_1, local)?
> >>
> >> For instance:
> >>
> >> $ cat vi2.c
> >> int
> >> __attribute__((visibility ("hidden")))
> >> hidden_object;
> >>
> >> extern int
> >> __attribute__ ((__symver__ ("foo@@VERS_2")))
> >> __attribute__ ((alias ("hidden_object")))
> >> symver_foo_v1;
> >>
> >> $ gcc vi2.c -S
> >> $ cat vi2.s | grep '\.symver'
> >>  .symver symver_foo_v1, foo@@VERS_2
> >>
> >> $ readelf -s vi2.o -W
> >> ...
> >>   8:  4 OBJECT  GLOBAL HIDDEN 3 hidden_object
> >>   9:  4 OBJECT  GLOBAL DEFAULT3 symver_foo_v1
> >>  10:  4 OBJECT  GLOBAL DEFAULT3 foo@@VERS_2
> >>
> >> Which seems fine to me. Similarly one can directly modify visibility of
> >> symver_foo_v1 with:
> >> .hidden  symver_foo_v1
> >>
> >> Or do I miss something?
> >
> > At low-level (and how GCC symtab should understand it), symver is just
> > a symbol with funny name.  Alias is just another symbol pointing to same
> > place and in general we want to handle versions as aliases with funny
> > names.
> >
> > But it is not how gas interpret its.
> > Writting .symver foo, foo@VERS_2 in my understnading does two things
> > 1) it makes gas to turn all apperances of references to "foo" to 
> > "foo@VERS_2"
> > this is necessary since gas syntax does not allow names with @ in it
> > so otherwise we have no way to reffer to it
> > 2) it either exports the foo symbol (and you can add visibility).
> > or makes foo@VERS_2 disappear depending how you declare visibilities
> > of foo.
> >
> > With this we lose way to refernece to actul symbol foo (and not
> > foo@VERS_2) which may be needed in LTO if one symbol declares foo and
> > other declares foo with symtab attribute.
> >
> > So I think we want symver attributes of symbol "foo" to produce a local
> > alias "foo.localalias" which will be renamed for GAS to "foo@VERS_2".
> > Symtab can understand this via syntactic aliases.  Then we have way to
> > refer to symbol "foo" if we want "foo" and "foo.localalias" if we want
> > "foo@VERS_2".  I had patch for that but I stopped on the situation that
> > there was no way to prevent gas from doing 2).
>
> Now I see how is this more complicated :) Can you please send the patch
> you have for this?
>
> >
> > So I see reason for .symbol X,Y, local
> > I do not see much morivation for others, that is why I stopped on
> > updating the patch for new gas syntax - wanted to take some time to
> > understand it (and the time did not materialized).
> >
> > So it seems to me that taking that old patch (or patch of yours) and add
> > the local alias machinery will do the trick, but I may be missing
> > something.
>
> Maybe H.J. can chime in what was motivation for implementation of the syntax?

What is the question?

-- 
H.J.


Re: PING [Patch][Middle-end]Add -fzero-call-used-regs=[skip|used-gpr|all-gpr|used|all]

2020-08-26 Thread Alexandre Oliva
On Aug 25, 2020, Jeff Law  wrote:

> On Tue, 2020-08-25 at 02:16 -0300, Alexandre Oliva wrote:
>> On Aug 24, 2020, Richard Biener  wrote:
>> 
>> > since the option is quite elaborate on what (sub-)set of regs is
>> > supposed to be cleared I'm not sure an implementation not involving
>> > any target hook is possible?
>> 
>> I don't think this follows.  Machine-independent code has a pretty good
>> notion of what registers are call-saved or call-clobbered, which ones
>> could be changed in this regard for function-specific calling
>> conventions, which ones may be used by a function to hold its return
>> value, which ones are used within a function...
>> 
>> It *should* be possible to introduce this in machine-independent code,
>> emitting insns to set registers to zero and regarding them as holding
>> values to be returned from the function.  Machine-specific code could
>> use more efficient insns to get the same result, but I can't see good
>> reason to not have a generic fallback implementation with at least a
>> best-effort attempt to offer the desired feature.
> I think part of the problem here is you have to worry about stubs which can
> change caller-saved registers.  Return path stubs aren't particularly common, 
> but
> they do exist -- 32 bit hpux for example :(

This suggests that such targets might have to implement the
target-specific hook to deal with this, but does it detract in any way
from the notion of having generic code to fall back to on targets that
do NOT require any special handling?

-- 
Alexandre Oliva, happy hacker
https://FSFLA.org/blogs/lxo/
Free Software Activist
GNU Toolchain Engineer


Re: [PATCH 2/2] IPA symver: support visibility and static symbols.

2020-08-26 Thread Martin Liška

On 8/26/20 1:27 PM, Jan Hubicka wrote:

On 8/26/20 11:22 AM, Jan Hubicka wrote:

On 8/25/20 8:46 PM, Jan Hubicka wrote:

What will happen here with protected visibility?


I forgot about it. Should it be mapped also to "local"?

+  const char *visibility = NULL;
+  if (!TREE_PUBLIC (origin_decl))
+visibility = "remove";
+  else if (DECL_VISIBILITY (origin_decl) == VISIBILITY_INTERNAL
+  || DECL_VISIBILITY (origin_decl) == VISIBILITY_PROTECTED)
+visibility = "local";
+  else if (DECL_VISIBILITY (origin_decl) == VISIBILITY_HIDDEN)
+visibility = "hidden";


I have no idea (depends what gas will do), you need to check if the
resulting symbol will indeed have right visibility in all of the cases.
If some of them are not possible, I suppose we could just reject the
comination.


Ok, so I experimented a bit with the .symver attribute I don't see how
is the new syntax handy (I mean .symver foo, foo@VERS_2, hidden and
.symver foo, foo@VERS_1, local)?

For instance:

$ cat vi2.c
int
__attribute__((visibility ("hidden")))
hidden_object;

extern int
__attribute__ ((__symver__ ("foo@@VERS_2")))
__attribute__ ((alias ("hidden_object")))
symver_foo_v1;

$ gcc vi2.c -S
$ cat vi2.s | grep '\.symver'
.symver symver_foo_v1, foo@@VERS_2

$ readelf -s vi2.o -W
...
  8:  4 OBJECT  GLOBAL HIDDEN 3 hidden_object
  9:  4 OBJECT  GLOBAL DEFAULT3 symver_foo_v1
 10:  4 OBJECT  GLOBAL DEFAULT3 foo@@VERS_2

Which seems fine to me. Similarly one can directly modify visibility of
symver_foo_v1 with:
.hidden symver_foo_v1

Or do I miss something?


At low-level (and how GCC symtab should understand it), symver is just
a symbol with funny name.  Alias is just another symbol pointing to same
place and in general we want to handle versions as aliases with funny
names.

But it is not how gas interpret its.
Writting .symver foo, foo@VERS_2 in my understnading does two things
1) it makes gas to turn all apperances of references to "foo" to "foo@VERS_2"
this is necessary since gas syntax does not allow names with @ in it
so otherwise we have no way to reffer to it
2) it either exports the foo symbol (and you can add visibility).
or makes foo@VERS_2 disappear depending how you declare visibilities
of foo.

With this we lose way to refernece to actul symbol foo (and not
foo@VERS_2) which may be needed in LTO if one symbol declares foo and
other declares foo with symtab attribute.

So I think we want symver attributes of symbol "foo" to produce a local
alias "foo.localalias" which will be renamed for GAS to "foo@VERS_2".
Symtab can understand this via syntactic aliases.  Then we have way to
refer to symbol "foo" if we want "foo" and "foo.localalias" if we want
"foo@VERS_2".  I had patch for that but I stopped on the situation that
there was no way to prevent gas from doing 2).


Now I see how is this more complicated :) Can you please send the patch
you have for this?



So I see reason for .symbol X,Y, local
I do not see much morivation for others, that is why I stopped on
updating the patch for new gas syntax - wanted to take some time to
understand it (and the time did not materialized).

So it seems to me that taking that old patch (or patch of yours) and add
the local alias machinery will do the trick, but I may be missing
something.


Maybe H.J. can chime in what was motivation for implementation of the syntax?

MArtin


Honza





Re: [PATCH v3] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-26 Thread Jakub Jelinek via Gcc-patches
On Wed, Aug 26, 2020 at 01:08:00PM +0200, Richard Biener via Gcc-patches wrote:
> You only need -fexceptions for that, then you can throw; from a signal handler
> for example.  If you want to be able to catch the exception somewhere up
> the call chain all intermediate code needs to be compiled so that unwinding
> from asynchronous events is possible - -fasynchronous-unwind-tables.
> 
> So -fasynchronous-unwind-tables is about unwinding.  -f[non-call]-exceptions
> is about throw/catch.  Clearly libgcc does neither throw nor catch but with
> async events we might need to unwind from inside it.

In C code -f{,non-call-}exceptions is also about whether cleanup attribute
will work or not.  But I think in libgcc we don't really use it, especially
not in the division/modulo code, not even indirectly from libc headers like
pthread_cleanup_* macros.

Jakub



Re: [PATCH 2/2] IPA symver: support visibility and static symbols.

2020-08-26 Thread Jan Hubicka
> On 8/26/20 11:22 AM, Jan Hubicka wrote:
> > > On 8/25/20 8:46 PM, Jan Hubicka wrote:
> > > > What will happen here with protected visibility?
> > > 
> > > I forgot about it. Should it be mapped also to "local"?
> > > 
> > > +  const char *visibility = NULL;
> > > +  if (!TREE_PUBLIC (origin_decl))
> > > +visibility = "remove";
> > > +  else if (DECL_VISIBILITY (origin_decl) == VISIBILITY_INTERNAL
> > > +  || DECL_VISIBILITY (origin_decl) == VISIBILITY_PROTECTED)
> > > +visibility = "local";
> > > +  else if (DECL_VISIBILITY (origin_decl) == VISIBILITY_HIDDEN)
> > > +visibility = "hidden";
> > 
> > I have no idea (depends what gas will do), you need to check if the
> > resulting symbol will indeed have right visibility in all of the cases.
> > If some of them are not possible, I suppose we could just reject the
> > comination.
> 
> Ok, so I experimented a bit with the .symver attribute I don't see how
> is the new syntax handy (I mean .symver foo, foo@VERS_2, hidden and
> .symver foo, foo@VERS_1, local)?
> 
> For instance:
> 
> $ cat vi2.c
> int
> __attribute__((visibility ("hidden")))
> hidden_object;
> 
> extern int
> __attribute__ ((__symver__ ("foo@@VERS_2")))
> __attribute__ ((alias ("hidden_object")))
> symver_foo_v1;
> 
> $ gcc vi2.c -S
> $ cat vi2.s | grep '\.symver'
>   .symver symver_foo_v1, foo@@VERS_2
> 
> $ readelf -s vi2.o -W
> ...
>  8:  4 OBJECT  GLOBAL HIDDEN 3 hidden_object
>  9:  4 OBJECT  GLOBAL DEFAULT3 symver_foo_v1
> 10:  4 OBJECT  GLOBAL DEFAULT3 foo@@VERS_2
> 
> Which seems fine to me. Similarly one can directly modify visibility of
> symver_foo_v1 with:
> .hidden   symver_foo_v1
> 
> Or do I miss something?

At low-level (and how GCC symtab should understand it), symver is just
a symbol with funny name.  Alias is just another symbol pointing to same
place and in general we want to handle versions as aliases with funny
names.

But it is not how gas interpret its.
Writting .symver foo, foo@VERS_2 in my understnading does two things
1) it makes gas to turn all apperances of references to "foo" to "foo@VERS_2"
   this is necessary since gas syntax does not allow names with @ in it
   so otherwise we have no way to reffer to it
2) it either exports the foo symbol (and you can add visibility).
   or makes foo@VERS_2 disappear depending how you declare visibilities
   of foo.

With this we lose way to refernece to actul symbol foo (and not
foo@VERS_2) which may be needed in LTO if one symbol declares foo and
other declares foo with symtab attribute.

So I think we want symver attributes of symbol "foo" to produce a local
alias "foo.localalias" which will be renamed for GAS to "foo@VERS_2".
Symtab can understand this via syntactic aliases.  Then we have way to
refer to symbol "foo" if we want "foo" and "foo.localalias" if we want
"foo@VERS_2".  I had patch for that but I stopped on the situation that
there was no way to prevent gas from doing 2).

So I see reason for .symbol X,Y, local
I do not see much morivation for others, that is why I stopped on
updating the patch for new gas syntax - wanted to take some time to
understand it (and the time did not materialized).

So it seems to me that taking that old patch (or patch of yours) and add
the local alias machinery will do the trick, but I may be missing
something.
Honza


RE: [PATCH] hppa: PR middle-end/87256: Improved hppa_rtx_costs avoids synth_mult madness.

2020-08-26 Thread Roger Sayle


Hi Dave,

> This change is also fine.
> 
> The gcc.target/hppa/shadd-2.c test now fails because there are two
additional sh1add instructions.
> However, the total number of instructions is the same as before.

Just to be on the safe side, I took a deeper look into this.
We're now generating a slightly different sequence for multiply by 87.

Before:
zdep %r28,28,29,%r20; r20 = r28*8
ldo RR'default_target_regs-$global$(%r1),%r19
sub %r20,%r28,%r20  ; r20 = r28*7
sh2addl %r20,%r28,%r20  ; r20 = r28*29
sh2addl %r20,%r19,%r19  ; r20 = addr+r28*116
sub %r19,%r20,%r19  ; r20 = addr+r28*87

After:
sh2addl %r28,%r28,%r19  ; r19 = r28*5
ldo RR'default_target_regs-$global$(%r1),%r1
sh2addl %r19,%r28,%r19  ; r19 = r28*21
sh1addl %r19,%r28,%r19  ; r19 = r28*43
sh1addl %r19,%r28,%r19  ; r19 = r28*87
addl %r1,%r19,%r1   ; r1 = addr+r28*87

As you point out, both sequences have exactly the same length and rtx_costs.
I suspect the middle-end has a minor preference for simple shifts and adds.
If shadd-2.c is there to check we make use of sh?add instructions, then
we're
making even more use of them now.

I agree it's reasonable to increase the scan-assembler-times count for this
test.

Cheers,
Roger
--




Re: [PATCH 2/2] IPA symver: support visibility and static symbols.

2020-08-26 Thread Martin Liška

On 8/26/20 11:22 AM, Jan Hubicka wrote:

On 8/25/20 8:46 PM, Jan Hubicka wrote:

What will happen here with protected visibility?


I forgot about it. Should it be mapped also to "local"?

+  const char *visibility = NULL;
+  if (!TREE_PUBLIC (origin_decl))
+visibility = "remove";
+  else if (DECL_VISIBILITY (origin_decl) == VISIBILITY_INTERNAL
+  || DECL_VISIBILITY (origin_decl) == VISIBILITY_PROTECTED)
+visibility = "local";
+  else if (DECL_VISIBILITY (origin_decl) == VISIBILITY_HIDDEN)
+visibility = "hidden";


I have no idea (depends what gas will do), you need to check if the
resulting symbol will indeed have right visibility in all of the cases.
If some of them are not possible, I suppose we could just reject the
comination.


Ok, so I experimented a bit with the .symver attribute I don't see how
is the new syntax handy (I mean .symver foo, foo@VERS_2, hidden and
.symver foo, foo@VERS_1, local)?

For instance:

$ cat vi2.c
int
__attribute__((visibility ("hidden")))
hidden_object;

extern int
__attribute__ ((__symver__ ("foo@@VERS_2")))
__attribute__ ((alias ("hidden_object")))
symver_foo_v1;

$ gcc vi2.c -S
$ cat vi2.s | grep '\.symver'
.symver symver_foo_v1, foo@@VERS_2

$ readelf -s vi2.o -W
...
 8:  4 OBJECT  GLOBAL HIDDEN 3 hidden_object
 9:  4 OBJECT  GLOBAL DEFAULT3 symver_foo_v1
10:  4 OBJECT  GLOBAL DEFAULT3 foo@@VERS_2

Which seems fine to me. Similarly one can directly modify visibility of
symver_foo_v1 with:
.hidden symver_foo_v1

Or do I miss something?

Martin



Honza


Thanks,
Martin




Re: [PATCH] lto: fix documentation about -fpie and -fpic options

2020-08-26 Thread Richard Biener via Gcc-patches
On Wed, Aug 26, 2020 at 10:13 AM Martin Liška  wrote:
>
> Hey.
>
> We should document how we currently merge pie and pie options
> as we fixed PR80838.
>
> Ready for master?

OK.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * doc/invoke.texi: Document how are pie and pic options merged.
> ---
>   gcc/doc/invoke.texi | 17 ++---
>   1 file changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 4cf6b204b56..95d3699d1fa 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11569,9 +11569,20 @@ There are some code generation flags preserved by 
> GCC when
>   generating bytecodes, as they need to be used during the final link.
>   Currently, the following options and their settings are taken from
>   the first object file that explicitly specifies them:
> -@option{-fPIC}, @option{-fpic}, @option{-fpie}, @option{-fcommon},
> -@option{-fexceptions}, @option{-fnon-call-exceptions}, @option{-fgnu-tm}
> -and all the @option{-m} target flags.
> +@option{-fcommon}, @option{-fexceptions}, @option{-fnon-call-exceptions},
> +@option{-fgnu-tm} and all the @option{-m} target flags.
> +
> +The following options @option{-fPIC}, @option{-fpic}, @option{-fpie} and
> +@option{-fPIE} are combined based on the following scheme:
> +
> +@smallexample
> +@option{-fPIC} + @option{-fpic} = @option{-fpic}
> +@option{-fPIC} + @option{-fno-pic} = @option{-fno-pic}
> +@option{-fpic/-fPIC} + (no option) = (no option)
> +@option{-fPIC} + @option{-fPIE} = @option{-fPIE}
> +@option{-fpic} + @option{-fPIE} = @option{-fpie}
> +@option{-fPIC/-fpic} + @option{-fpie} = @option{-fpie}
> +@end smallexample
>
>   Certain ABI-changing flags are required to match in all compilation units,
>   and trying to override this at link time with a conflicting value
> --
> 2.28.0
>


Re: [PATCH v3] libgcc: Use `-fasynchronous-unwind-tables' for LIB2_DIVMOD_FUNCS

2020-08-26 Thread Richard Biener via Gcc-patches
On Tue, Aug 25, 2020 at 6:32 PM Maciej W. Rozycki  wrote:
>
> Hi Kito,
>
> > I just found the mail thread about div mod with -fnon-call-exceptions,
> > I think keeping the default LIB2_DIVMOD_EXCEPTION_FLAGS unchanged
> > should be the best way to go.
> >
> > Non-call exceptions and libcalls
> > https://gcc.gnu.org/legacy-ml/gcc/2001-06/msg01108.html
> >
> > Non-call exceptions and libcalls Part 2
> > https://gcc.gnu.org/legacy-ml/gcc/2001-07/msg00402.html
>
>  Thank you for your input.  I believe I had a look at these commits before
> I posted my original proposal.  Please note however that they both predate
> the addition of `-fasynchronous-unwind-tables', so clearly the option
> could not have been considered at the time the changes were accepted into
> GCC.
>
>  Please note that, as observed by Andreas and Richard here:
>  in no case we
> want to have full exception handling here, so we clearly need no
> `-fexceptions'; this libcall code won't itself ever call `throw'.
>
>  Now it might be a bit unclear from documentation as to whether we want
> `-fnon-call-exceptions' or `-fasynchronous-unwind-tables', as it says that
> the former option makes GCC:
>
> "Generate code that allows trapping instructions to throw
>  exceptions.  Note that this requires platform-specific runtime
>  support that does not exist everywhere.  Moreover, it only allows
>  _trapping_ instructions to throw exceptions, i.e. memory references
>  or floating-point instructions.  It does not allow exceptions to be
>  thrown from arbitrary signal handlers such as 'SIGALRM'."
>
> Note the observation that arbitrary signal handlers (invoked at more inner
> a frame level, and necessarily built with `-fexceptions') are still not
> allowed to throw exceptions.  For that, as far as I understand it, you
> actually need `-fasynchronous-unwind-tables', which makes GCC:
>
> "Generate unwind table in DWARF format, if supported by target
>  machine.  The table is exact at each instruction boundary, so it
>  can be used for stack unwinding from asynchronous events (such as
>  debugger or garbage collector)."
>
> and therefore allows arbitrary signal handlers to throw exceptions,
> effectively making the option a superset of `-fexceptions'.  As libcall
> code can generally be implicitly invoked everywhere, we want people not to
> be restrained by it and let a exception thrown by e.g. a user-supplied
> SIGALRM handler propagate through the relevant libcall's stack frame,
> rather than just those exceptions the libcall itself might indirectly
> cause.
>
>  Maybe I am missing something here, especially as `-fexceptions' mentions
> code generation, while `-fasynchronous-unwind-tables' only refers to
> unwind table generation, but then what would be the option to allow
> exceptions to be thrown from arbitrary signal handlers rather than those
> for memory references or floating-point instructions (where by a special
> provision integer division falls as well)?
>
>  My understanding has been it is `-fasynchronous-unwind-tables', but I'll
> be gladly straightened out otherwise.  If I am indeed right, then perhaps
> the documentation could be clarified and expanded a bit.
>
>  Barring evidence to the contrary I maintain the change I have proposed is
> correct, and not only removes the RISC-V `ld.so' build issue, but it fixes
> the handling of asynchronous events arriving in the middle of the relevant
> libcalls for all platforms as well.
>
>  Please let me know if you have any further questions, comments or
> concerns.

You only need -fexceptions for that, then you can throw; from a signal handler
for example.  If you want to be able to catch the exception somewhere up
the call chain all intermediate code needs to be compiled so that unwinding
from asynchronous events is possible - -fasynchronous-unwind-tables.

So -fasynchronous-unwind-tables is about unwinding.  -f[non-call]-exceptions
is about throw/catch.  Clearly libgcc does neither throw nor catch but with
async events we might need to unwind from inside it.

Now I don't know about the arm situation but if arm cannot do async unwinding
then even -fexceptions won't help it here - libgcc still does not throw.

Richard.

>
>   Maciej


Re: [PATCH] RX add control register PC

2020-08-26 Thread Darius Galis

Hello,

Thank you for adjusting the patch.

I don't have commit privileges, so if you could please commit it, that would be 
great.

Best regards,
Darius Galis

On 25-Aug-20 10:48 PM, Jeff Law wrote:

On Tue, 2020-08-18 at 16:05 +0300, Darius Galis wrote:

Hello,

The following patch is adding the PC control register.
It also updates the __builtin_rx_mvfc() function, since
according to the documentation, the PC register cannot be specified
as dest.

The regression was tested with the following command:

make -k check-gcc RUNTESTFLAGS=--target_board=rx-sim

There were no additionals failures.

Please let me know if this is OK, Thank you!
Darius Galis

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index b834a2c..3436c25 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2020-08-17  Darius Galis  
+
+   * config/rx/rx.md (CTRLREG_PC): Add.
+   * config/rx/rx.c (CTRLREG_PC): Add
+   (rx_expand_builtin_mvtc): Add warning: PC register cannot
+   be used as dest.
+
   2020-08-03  Jonathan Wakely  
   
   	* doc/cpp.texi (Variadic Macros): Use the exact ... token in

diff --git a/gcc/config/rx/rx.c b/gcc/config/rx/rx.c
index 151ad39..1cc88d9 100644
--- a/gcc/config/rx/rx.c
+++ b/gcc/config/rx/rx.c
@@ -639,6 +639,7 @@ rx_print_operand (FILE * file, rtx op, int letter)
 switch (INTVAL (op))
{
case CTRLREG_PSW:   fprintf (file, "psw"); break;
+   case CTRLREG_PC:fprintf (file, "pc"); break;
case CTRLREG_USP:   fprintf (file, "usp"); break;
case CTRLREG_FPSW:  fprintf (file, "fpsw"); break;
case CTRLREG_BPSW:  fprintf (file, "bpsw"); break;
@@ -2474,6 +2475,14 @@ rx_expand_builtin_mvtc (tree exp)
 if (! REG_P (arg2))
   arg2 = force_reg (SImode, arg2);
   
+  if (INTVAL(arg1) == 1/*PC*/)

We generally avoid comments on the same line as code.  And there should be a
space before the open paren of a function or macro argument.  So:

/* PC */
if (INTVAL (arg1) == 1)

With that change, this is OK to commit.  I can't recall if you have commit privs
or not...  If not, I can commit for you.


Thanks,
Jeff


--
Ing. Darius Galiș
Software Engineer - CyberTHOR Studios Ltd.



Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

2020-08-26 Thread Richard Sandiford
xiezhiheng  writes:
>> -Original Message-
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Tuesday, August 25, 2020 7:08 PM
>> To: xiezhiheng 
>> Cc: Richard Biener ; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
>> 
>> xiezhiheng  writes:
>> >> -Original Message-
>> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> >> Sent: Friday, August 21, 2020 5:02 PM
>> >> To: xiezhiheng 
>> >> Cc: Richard Biener ;
>> gcc-patches@gcc.gnu.org
>> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> >> emitted at -O3
>> >
>> > Cut...
>> >
>> >> Looks like the saturating intrinsics might need a bit more thought.
>> >> Would you mind submitting the patch with just the other parts?
>> >> Those were uncontroversial and it would be a shame to hold them
>> >> up over this.
>> >
>> > Okay, I reorganized the existing patch and finished the first half of the
>> intrinsics
>> > except saturating intrinsics and load intrinsics.
>> >
>> > Bootstrapped and tested on aarch64 Linux platform.
>> 
>> I know this'll be frustrating, sorry, but could you post the
>> 2020-08-17 patch without the saturation changes?  It's going to be
>> easier to track and review if each patch deals with similar intrinsics.
>> The non-saturating part of the 2020-08-17 patch was good because it was
>> dealing purely with arithmetic operations.  Loads should really be a
>> separate change.
>> 
>> BTW, for something like this, it's OK to test and submit several patches
>> at once, so separating the patches doesn't need to mean longer test cycles.
>> It's just that for review purposes, it's easier if one patch does one thing.
>> 
>
> That's true.  And I finished the patch to add FLAG for add/sub arithmetic
> intrinsics except saturating intrinsics.  Later I will try to separate the 
> rest
> into several subsets to fix.
>
> Bootstrapped and tested on aarch64 Linux platform.

Thanks, looks great.  Pushed to master.

Richard


Re: [PATCH] fix testcase gcc.target/aarch64/insv_1.c

2020-08-26 Thread Richard Sandiford
Qian Jianhua  writes:
> There are three failures in gcc.target/aarch64/insv_1.c.
>  FAIL: gcc.target/aarch64/insv_1.c scan-assembler bfi\tx[0-9]+, x[0-9]+, 0, 8
>  FAIL: gcc.target/aarch64/insv_1.c scan-assembler bfi\tx[0-9]+, x[0-9]+, 16, 5
>  FAIL: gcc.target/aarch64/insv_1.c scan-assembler movk\tx[0-9]+, 0x1d6b, lsl 
> 32
>
> This patch fix the third failure which was missed "#" before immediate value
> in scan-assembler.

Thanks, pushed to master.

Richard

> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/insv_1.c: Add '#' in scan-assembler
>
> ---
>  gcc/testsuite/gcc.target/aarch64/insv_1.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/insv_1.c 
> b/gcc/testsuite/gcc.target/aarch64/insv_1.c
> index 360a9892ad9..9efa22e649d 100644
> --- a/gcc/testsuite/gcc.target/aarch64/insv_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/insv_1.c
> @@ -32,7 +32,7 @@ bfi2 (bitfield a)
>  bitfield
>  movk (bitfield a)
>  {
> -  /* { dg-final { scan-assembler "movk\tx\[0-9\]+, 0x1d6b, lsl 32" } } */
> +  /* { dg-final { scan-assembler "movk\tx\[0-9\]+, #0x1d6b, lsl 32" } } */
>a.sixteen = 7531;
>return a;
>  }


[PATCH PR96757] aarch64: ICE during GIMPLE pass: vect

2020-08-26 Thread duanbo (C)
Hi, 

This is a fix for PR96757.
Before vect pass, the main statements of the test case will be optimized like:
_1 = d_10(D) > 3;
_2 = a_11(D) > m_12(D);
_18 = _1 > _2 ? _26 : 0; 
At the beginning of vectorization analysis, function 
vect_recog_mask_conversion_pattern 
processed the gimple  _18 = _1 > _2 ? _15 : 0, and produce:
patt_17 = _1 > _2
patt_3 = patt_17 ? _15 : 0
it didn't consider the situation that  _1, _2's vectype is different, which 
leading to an ICE in 
some special  mode , like V4HImode for this case.

This patch added the identification and handling for this situation in 
vect_recog_mask_conversion_pattern. 
With that _18 = _1 > _2 ? _15 : 0 will be transformed to:
patt_3 = () _2;
patt_17 = _1 > patt_3;
patt_20 = patt_17 ? _26 : 0;

More details in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96757.
Bootstrap and tested on both aarch64 and x86 Linux platform, no new regression 
witnessed.
Ok for trunk?

Thanks, 
Duan Bo



pr95767.patch
Description: pr95767.patch


Re: [PATCH 2/2] IPA symver: support visibility and static symbols.

2020-08-26 Thread Jan Hubicka
> On 8/25/20 8:46 PM, Jan Hubicka wrote:
> > What will happen here with protected visibility?
> 
> I forgot about it. Should it be mapped also to "local"?
> 
> +  const char *visibility = NULL;
> +  if (!TREE_PUBLIC (origin_decl))
> +visibility = "remove";
> +  else if (DECL_VISIBILITY (origin_decl) == VISIBILITY_INTERNAL
> +  || DECL_VISIBILITY (origin_decl) == VISIBILITY_PROTECTED)
> +visibility = "local";
> +  else if (DECL_VISIBILITY (origin_decl) == VISIBILITY_HIDDEN)
> +visibility = "hidden";

I have no idea (depends what gas will do), you need to check if the
resulting symbol will indeed have right visibility in all of the cases.
If some of them are not possible, I suppose we could just reject the
comination.

Honza
> 
> Thanks,
> Martin


Re: [PATCH] Implement no_stack_protect attribute.

2020-08-26 Thread Martin Liška

On 8/25/20 4:46 PM, Nick Desaulniers wrote:

This would solve a common pattern in the kernel where folks are using
`extern inline` with `gnu_inline` semantics or worse (empty `asm("");`
statements) in certain places where it would be much more preferable
to have this attribute.  Thank you very much Martin for writing it.


is direct equivalent of Clang's no_stack_protector.
Unlike Clang, I chose to name it no_stack_protect because we already
have stack_protect attribute (used with -fstack-protector-explicit).


That's pretty easy for us to work around the differences in the
kernel, but one final plea for the users; it would simplify users'
codebases not to have to shim this for differences between compilers.
If I had a dollar for every time I had to implement something in LLVM
where a different identifier or flag would be more consistent with
another part of the codebase...I'm sure there's many examples of this
on LLVM's side too, but I would prefer to stop the proliferation of
subtle differences like this that harm toolchain portability when
possible and when we can proactively address.



All right, I'm open to unify the flag name to what LLVM has ;)

Martin


Re: [PATCH] lra: Canonicalize mult to shift in address reloads

2020-08-26 Thread Richard Sandiford
Alex Coplan  writes:
> Hello,
>
> Inside a (mem) RTX, it is canonical to write multiplications by powers
> of two using a (mult) [0]. For example, given the following C function:
>
> long f(long *p, long x)
> {
> return p[x];
> }
>
> AArch64 GCC generates the following RTL insn (in final):
>
> (set (reg/i:DI 0 x0)
>  (mem:DI (plus:DI (mult:DI (reg:DI 1 x1 [101])
>  (const_int 8 [0x8]))
>  (reg:DI 0 x0 [100])) [1 *_3+0 S8 A64]))
>
> However, outside of a (mem), the canonical way to write multiplications
> by powers of two is using (ashift). E.g. for the following C function:
>
> long *g(long *p, long x)
> {
> return p + x;
> }
>
> AArch64 GCC generates:
>
> (set (reg/i:DI 0 x0)
>  (plus:DI (ashift:DI (reg:DI 1 x1 [100])
>  (const_int 3 [0x3]))
>  (reg:DI 0 x0 [99])))
>
> Now I observed that LRA does not quite respect this RTL canonicalization
> rule. When compiling gcc/testsuite/gcc.dg/torture/pr34330.c with -Os
> -ftree-vectorize, the RTL in the dump "281r.ira" has the insn:
>
> (set (reg:SI 111 [ MEM[base: b_25(D), index: ivtmp.9_35, step: 4, offset: 0B] 
> ])
>  (mem:SI (plus:DI (mult:DI (reg:DI 101 [ ivtmp.9 ])
>  (const_int 4 [0x4]))
>  (reg/v/f:DI 105 [ b ])) [2 MEM[base: b_25(D), index: ivtmp.9_35, 
> step: 4, offset: 0B]+0 S4 A32]))
>
> but LRA then proceeds to generate a reload, and we get the following
> non-canonical insn in "282r.reload":
>
> (set (reg:DI 7 x7 [121])
>  (plus:DI (mult:DI (reg:DI 5 x5 [orig:101 ivtmp.9 ] [101])
>  (const_int 4 [0x4]))
>  (reg/v/f:DI 1 x1 [orig:105 b ] [105])))
>
> This patch fixes LRA to ensure that we generate canonical RTL in this
> case. After the patch, we get the following insn in "282r.reload":
>
> (set (reg:DI 7 x7 [121])
> (plus:DI (ashift:DI (reg:DI 5 x5 [orig:101 ivtmp.9 ] [101])
> (const_int 2 [0x2]))
> (reg/v/f:DI 1 x1 [orig:105 b ] [105])))
>
> The motivation here is to be able to remove several redundant patterns
> in the AArch64 backend. See the previous thread [1] for context.
>
> Testing:
>  * Bootstrapped and regtested on aarch64-none-linux-gnu,
>x86_64-pc-linux-gnu.
>  * New unit test which checks that we're using the shift pattern (rather
>than the problematic mult pattern) on AArch64.
>
> OK for master?
>
> Thanks,
> Alex
>
> [0] : https://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html
> [1] : https://gcc.gnu.org/pipermail/gcc-patches/2020-August/552066.html

Thanks for doing this.

>
> ---
>
> gcc/ChangeLog:
>
>   * lra-constraints.c (canonicalize_reload_addr): New.
>   (curr_insn_transform): Use canonicalize_reload_addr to ensure we
>   generate canonical RTL for an address reload.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/mem-shift-canonical.c: New test.
>
> diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
> index 421c453997b..630a4f167cd 100644
> --- a/gcc/lra-constraints.c
> +++ b/gcc/lra-constraints.c
> @@ -570,6 +570,34 @@ init_curr_insn_input_reloads (void)
>curr_insn_input_reloads_num = 0;
>  }
>  
> +/* The canonical form of an rtx inside a MEM is not necessarily the same as 
> the
> +   canonical form of the rtx outside the MEM.  Fix this up in the case that
> +   we're reloading an address (and therefore pulling it outside a MEM).  */
> +static rtx canonicalize_reload_addr (rtx addr)

Minor nit, should be formatted as:

static rtx
canonicalize_reload_addr (rtx addr)

> +{
> +  const enum rtx_code code = GET_CODE (addr);
> +
> +  if (code == PLUS)
> +{
> +  rtx inner = XEXP (addr, 0);
> +  if (GET_CODE (inner) == MULT && CONST_INT_P (XEXP (inner, 1)))
> + {
> +   const HOST_WIDE_INT ci = INTVAL (XEXP (inner, 1));
> +   const int pwr2 = exact_log2 (ci);
> +   if (pwr2 > 0)
> + {
> +   /* Rewrite this to use a shift instead, which is canonical
> +  when outside of a MEM.  */
> +   XEXP (addr, 0) = gen_rtx_ASHIFT (GET_MODE (inner),
> +XEXP (inner, 0),
> +GEN_INT (pwr2));
> + }
> + }
> +}

I don't think we should we restrict this to (plus (mult X Y) Z),
since addresses can be more complicated than that.  One way to
search for all MULTs is:

  subrtx_iterator::array_type array;
  FOR_EACH_SUBRTX (iter, array, x, NONCONST)
{
  rtx x = *iter;
  if (GET_CODE (x) == MULT && CONST_INT_P (XEXP (x, 1)))
...
}

(Needs rtl-iter.h)

Thanks,
Richard


Re: [PATCH] dwarf2out: Fix up dwarf2out_next_real_insn caching [PR96729]

2020-08-26 Thread Richard Biener
On Wed, 26 Aug 2020, Jakub Jelinek wrote:

> Hi!
> 
> The addition of NOTE_INSN_BEGIN_STMT and NOTE_INSN_INLINE_ENTRY notes
> reintroduced quadratic behavior into dwarf2out_var_location.
> This function needs to know the next real instruction to which the var
> location note applies, but the way final_scan_insn is called outside of
> final.c main loop doesn't make it easy to look up the next real insn in
> there (and for non-dwarf it is even useless).  Usually next real insn is
> only a few notes away, but we can have hundreds of thousands of consecutive
> notes only followed by a real insn.  dwarf2out_var_location to avoid the
> quadratic behavior contains a cache, it remembers the next note and when it
> is called again on that loc_note, it can use the previously computed
> dwarf2out_next_real_insn result, rather than walking the insn chain once
> again.  But, for NOTE_INSN_{BEGIN_STMT,INLINE_ENTRY} dwarf2out_var_location
> is not called while the code puts into the cache those notes, which means if
> we have e.g. in the worst case NOTE_INSN_VAR_LOCATION and
> NOTE_INSN_BEGIN_STMT notes alternating, the cache is not really used.
> 
> The following patch fixes it by looking up the next NOTE_INSN_VAR_LOCATION
> if any.  While the lookup could be perhaps done together with looking for
> the next real insn once (e.g. in dwarf2out_next_real_insn or its copy),
> there are other dwarf2out_next_real_insn callers which don't need/want that
> behavior and if there are more than two NOTE_INSN_VAR_LOCATION notes
> followed by the same real insn, we need to do that "find next
> NOTE_INSN_VAR_LOCATION" walk anyway.
> 
> On the testcase from the PR this patch speeds it 2.8times, from 0m0.674s
> to 0m0.236s (why it takes for the reporter more than 60s is unknown).
> 
> Bootstrapped/regtested on i686-linux, ok for trunk?

LGTM.

Richard.

> 2020-08-26  Jakub Jelinek  
> 
>   PR debug/96729
>   * dwarf2out.c (dwarf2out_next_real_insn): Adjust function comment.
>   (dwarf2out_var_location): Look for next_note only if next_real is
>   non-NULL, in that case look for the first non-deleted
>   NOTE_INSN_VAR_LOCATION between loc_note and next_real, if any.
> 
> --- gcc/dwarf2out.c.jj2020-08-25 21:27:57.241094806 +0200
> +++ gcc/dwarf2out.c   2020-08-25 21:51:43.475775792 +0200
> @@ -27204,7 +27204,7 @@ static bool maybe_at_text_label_p = true
>  /* One above highest N where .LVLN label might be equal to .Ltext0 label.  */
>  static unsigned int first_loclabel_num_not_at_text_label;
>  
> -/* Look ahead for a real insn, or for a begin stmt marker.  */
> +/* Look ahead for a real insn.  */
>  
>  static rtx_insn *
>  dwarf2out_next_real_insn (rtx_insn *loc_note)
> @@ -27229,7 +27229,7 @@ dwarf2out_var_location (rtx_insn *loc_no
>  {
>char loclabel[MAX_ARTIFICIAL_LABEL_BYTES + 2];
>struct var_loc_node *newloc;
> -  rtx_insn *next_real, *next_note;
> +  rtx_insn *next_real;
>rtx_insn *call_insn = NULL;
>static const char *last_label;
>static const char *last_postcall_label;
> @@ -27254,7 +27254,6 @@ dwarf2out_var_location (rtx_insn *loc_no
> var_loc_p = false;
>  
> next_real = dwarf2out_next_real_insn (call_insn);
> -   next_note = NULL;
> cached_next_real_insn = NULL;
> goto create_label;
>   }
> @@ -27282,7 +27281,6 @@ dwarf2out_var_location (rtx_insn *loc_no
> var_loc_p = false;
>  
> next_real = dwarf2out_next_real_insn (call_insn);
> -   next_note = NULL;
> cached_next_real_insn = NULL;
> goto create_label;
>   }
> @@ -27311,22 +27309,28 @@ dwarf2out_var_location (rtx_insn *loc_no
>   next_real = NULL;
>  }
>  
> -  next_note = NEXT_INSN (loc_note);
> -  if (! next_note
> -  || next_note->deleted ()
> -  || ! NOTE_P (next_note)
> -  || (NOTE_KIND (next_note) != NOTE_INSN_VAR_LOCATION
> -   && NOTE_KIND (next_note) != NOTE_INSN_BEGIN_STMT
> -   && NOTE_KIND (next_note) != NOTE_INSN_INLINE_ENTRY))
> -next_note = NULL;
> -
>if (! next_real)
>  next_real = dwarf2out_next_real_insn (loc_note);
>  
> -  if (next_note)
> +  if (next_real)
>  {
> -  expected_next_loc_note = next_note;
> -  cached_next_real_insn = next_real;
> +  rtx_insn *next_note = NEXT_INSN (loc_note);
> +  while (next_note != next_real)
> + {
> +   if (! next_note->deleted ()
> +   && NOTE_P (next_note)
> +   && NOTE_KIND (next_note) == NOTE_INSN_VAR_LOCATION)
> + break;
> +   next_note = NEXT_INSN (next_note);
> + }
> +
> +  if (next_note == next_real)
> + cached_next_real_insn = NULL;
> +  else
> + {
> +   expected_next_loc_note = next_note;
> +   cached_next_real_insn = next_real;
> + }
>  }
>else
>  cached_next_real_insn = NULL;
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 

[PATCH] dwarf2out: Fix up dwarf2out_next_real_insn caching [PR96729]

2020-08-26 Thread Jakub Jelinek via Gcc-patches
Hi!

The addition of NOTE_INSN_BEGIN_STMT and NOTE_INSN_INLINE_ENTRY notes
reintroduced quadratic behavior into dwarf2out_var_location.
This function needs to know the next real instruction to which the var
location note applies, but the way final_scan_insn is called outside of
final.c main loop doesn't make it easy to look up the next real insn in
there (and for non-dwarf it is even useless).  Usually next real insn is
only a few notes away, but we can have hundreds of thousands of consecutive
notes only followed by a real insn.  dwarf2out_var_location to avoid the
quadratic behavior contains a cache, it remembers the next note and when it
is called again on that loc_note, it can use the previously computed
dwarf2out_next_real_insn result, rather than walking the insn chain once
again.  But, for NOTE_INSN_{BEGIN_STMT,INLINE_ENTRY} dwarf2out_var_location
is not called while the code puts into the cache those notes, which means if
we have e.g. in the worst case NOTE_INSN_VAR_LOCATION and
NOTE_INSN_BEGIN_STMT notes alternating, the cache is not really used.

The following patch fixes it by looking up the next NOTE_INSN_VAR_LOCATION
if any.  While the lookup could be perhaps done together with looking for
the next real insn once (e.g. in dwarf2out_next_real_insn or its copy),
there are other dwarf2out_next_real_insn callers which don't need/want that
behavior and if there are more than two NOTE_INSN_VAR_LOCATION notes
followed by the same real insn, we need to do that "find next
NOTE_INSN_VAR_LOCATION" walk anyway.

On the testcase from the PR this patch speeds it 2.8times, from 0m0.674s
to 0m0.236s (why it takes for the reporter more than 60s is unknown).

Bootstrapped/regtested on i686-linux, ok for trunk?

2020-08-26  Jakub Jelinek  

PR debug/96729
* dwarf2out.c (dwarf2out_next_real_insn): Adjust function comment.
(dwarf2out_var_location): Look for next_note only if next_real is
non-NULL, in that case look for the first non-deleted
NOTE_INSN_VAR_LOCATION between loc_note and next_real, if any.

--- gcc/dwarf2out.c.jj  2020-08-25 21:27:57.241094806 +0200
+++ gcc/dwarf2out.c 2020-08-25 21:51:43.475775792 +0200
@@ -27204,7 +27204,7 @@ static bool maybe_at_text_label_p = true
 /* One above highest N where .LVLN label might be equal to .Ltext0 label.  */
 static unsigned int first_loclabel_num_not_at_text_label;
 
-/* Look ahead for a real insn, or for a begin stmt marker.  */
+/* Look ahead for a real insn.  */
 
 static rtx_insn *
 dwarf2out_next_real_insn (rtx_insn *loc_note)
@@ -27229,7 +27229,7 @@ dwarf2out_var_location (rtx_insn *loc_no
 {
   char loclabel[MAX_ARTIFICIAL_LABEL_BYTES + 2];
   struct var_loc_node *newloc;
-  rtx_insn *next_real, *next_note;
+  rtx_insn *next_real;
   rtx_insn *call_insn = NULL;
   static const char *last_label;
   static const char *last_postcall_label;
@@ -27254,7 +27254,6 @@ dwarf2out_var_location (rtx_insn *loc_no
  var_loc_p = false;
 
  next_real = dwarf2out_next_real_insn (call_insn);
- next_note = NULL;
  cached_next_real_insn = NULL;
  goto create_label;
}
@@ -27282,7 +27281,6 @@ dwarf2out_var_location (rtx_insn *loc_no
  var_loc_p = false;
 
  next_real = dwarf2out_next_real_insn (call_insn);
- next_note = NULL;
  cached_next_real_insn = NULL;
  goto create_label;
}
@@ -27311,22 +27309,28 @@ dwarf2out_var_location (rtx_insn *loc_no
next_real = NULL;
 }
 
-  next_note = NEXT_INSN (loc_note);
-  if (! next_note
-  || next_note->deleted ()
-  || ! NOTE_P (next_note)
-  || (NOTE_KIND (next_note) != NOTE_INSN_VAR_LOCATION
- && NOTE_KIND (next_note) != NOTE_INSN_BEGIN_STMT
- && NOTE_KIND (next_note) != NOTE_INSN_INLINE_ENTRY))
-next_note = NULL;
-
   if (! next_real)
 next_real = dwarf2out_next_real_insn (loc_note);
 
-  if (next_note)
+  if (next_real)
 {
-  expected_next_loc_note = next_note;
-  cached_next_real_insn = next_real;
+  rtx_insn *next_note = NEXT_INSN (loc_note);
+  while (next_note != next_real)
+   {
+ if (! next_note->deleted ()
+ && NOTE_P (next_note)
+ && NOTE_KIND (next_note) == NOTE_INSN_VAR_LOCATION)
+   break;
+ next_note = NEXT_INSN (next_note);
+   }
+
+  if (next_note == next_real)
+   cached_next_real_insn = NULL;
+  else
+   {
+ expected_next_loc_note = next_note;
+ cached_next_real_insn = next_real;
+   }
 }
   else
 cached_next_real_insn = NULL;

Jakub



[committed] d: Fix no RVO when returning struct literals initialized with constructor.

2020-08-26 Thread Iain Buclaw via Gcc-patches
Hi,

This patch backports a change from upstream dmd that moves front-end
NRVO checking from ReturnStatement semantic to the end of
FuncDeclaration semantic.

In the codegen, retStyle has been partially implemented so that only
structs and static arrays return RETstack.  This isn't accurate, but
don't need to be for the purposes of semantic analysis.

If a function either has TREE_ADDRESSABLE or must return in memory, then
DECL_RESULT is set as the shidden field for the function.  This is used
in the codegen pass for ReturnStatement where it is now detected whether
a function is returning a struct literal or a constructor function, then
the DECL_RESULT is used to directly construct the return value, instead
of doing so via temporaries.

Regstrapped on x86_64-linux-gnu/-m32/-mx32, committed to mainline.

Regards
Iain

---
gcc/d/ChangeLog:

PR d/96156
* d-frontend.cc (retStyle): Only return RETstack for struct and static
array types.
* decl.cc (DeclVisitor::visit (FuncDeclaration *)): Use NRVO return
for all TREE_ADDRESSABLE types.  Set shidden to the RESULT_DECL.
* expr.cc (ExprVisitor::visit (CallExp *)): Force TARGET_EXPR if the
'this' pointer reference is a CONSTRUCTOR.
(ExprVisitor::visit (StructLiteralExp *)): Generate assignment to the
symbol to initialize with literal.
* toir.cc (IRVisitor::visit (ReturnStatement *)): Detect returning
struct literals and write directly into the RESULT_DECL.
* dmd/MERGE: Merge upstream dmd fe5f388d8.

gcc/testsuite/ChangeLog:

PR d/96156
* gdc.dg/pr96156.d: New test.
---
 gcc/d/d-frontend.cc | 12 +-
 gcc/d/decl.cc   | 25 ---
 gcc/d/dmd/MERGE |  2 +-
 gcc/d/dmd/declaration.h |  1 +
 gcc/d/dmd/func.c| 52 +--
 gcc/d/dmd/statementsem.c| 33 ---
 gcc/d/expr.cc   | 14 ++-
 gcc/d/toir.cc   | 56 ++---
 gcc/testsuite/gdc.dg/pr96156.d  | 33 +++
 gcc/testsuite/gdc.test/runnable/sdtor.d |  5 ++-
 gcc/testsuite/gdc.test/runnable/test8.d |  8 ++--
 11 files changed, 171 insertions(+), 70 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/pr96156.d

diff --git a/gcc/d/d-frontend.cc b/gcc/d/d-frontend.cc
index 3b9fc1aaf2a..da34e902275 100644
--- a/gcc/d/d-frontend.cc
+++ b/gcc/d/d-frontend.cc
@@ -143,12 +143,20 @@ Loc::equals (const Loc &loc)
hidden pointer to the caller's stack.  */
 
 RET
-retStyle (TypeFunction *)
+retStyle (TypeFunction *tf)
 {
   /* Need the backend type to determine this, but this is called from the
  frontend before semantic processing is finished.  An accurate value
  is not currently needed anyway.  */
-  return RETstack;
+  if (tf->isref)
+return RETregs;
+
+  Type *tn = tf->next->toBasetype ();
+
+  if (tn->ty == Tstruct || tn->ty == Tsarray)
+return RETstack;
+
+  return RETregs;
 }
 
 /* Determine if function FD is a builtin one that we can evaluate in CTFE.  */
diff --git a/gcc/d/decl.cc b/gcc/d/decl.cc
index 295f780170a..008a2bb16ab 100644
--- a/gcc/d/decl.cc
+++ b/gcc/d/decl.cc
@@ -944,20 +944,11 @@ public:
Implemented by overriding all the RETURN_EXPRs and replacing all
occurrences of VAR with the RESULT_DECL for the function.
This is only worth doing for functions that can return in memory.  */
-if (d->nrvo_can)
-  {
-   tree restype = TREE_TYPE (DECL_RESULT (fndecl));
-
-   if (!AGGREGATE_TYPE_P (restype))
- d->nrvo_can = 0;
-   else
- d->nrvo_can = aggregate_value_p (restype, fndecl);
-  }
+tree resdecl = DECL_RESULT (fndecl);
 
-if (d->nrvo_can)
+if (TREE_ADDRESSABLE (TREE_TYPE (resdecl))
+   || aggregate_value_p (TREE_TYPE (resdecl), fndecl))
   {
-   tree resdecl = DECL_RESULT (fndecl);
-
/* Return non-trivial structs by invisible reference.  */
if (TREE_ADDRESSABLE (TREE_TYPE (resdecl)))
  {
@@ -965,9 +956,12 @@ public:
DECL_BY_REFERENCE (resdecl) = 1;
TREE_ADDRESSABLE (resdecl) = 0;
relayout_decl (resdecl);
+   d->shidden = build_deref (resdecl);
  }
+   else
+ d->shidden = resdecl;
 
-   if (d->nrvo_var)
+   if (d->nrvo_can && d->nrvo_var)
  {
tree var = get_symbol_decl (d->nrvo_var);
 
@@ -976,12 +970,9 @@ public:
/* Don't forget that we take its address.  */
TREE_ADDRESSABLE (var) = 1;
 
-   if (DECL_BY_REFERENCE (resdecl))
- resdecl = build_deref (resdecl);
-
SET_DECL_VALUE_EXPR (var, resdecl);
DECL_HAS_VALUE_EXPR_P (var) = 1;
-   SET_DECL_LANG_NRVO (var, resdecl);
+   SET_DECL_LANG_NRVO (var, d->shidden);
  }
   }
 
diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/

[committed] d: Merge upstream dmd cb4a96fae

2020-08-26 Thread Iain Buclaw via Gcc-patches
Hi,

This patch merges the D front-end implementation with upstream dmd
cb4a96fae.  Fixes both a bug where compilation would hang, and an issue
where recursive template limits are hit too early.

Regstrapped on x86_64-linux-gnu/-m32/-mx32, committed to mainline, and
cherry-picked to the releases/gcc-10 branch.

Regards
Iain

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd cb4a96fae
---
 gcc/d/dmd/MERGE  |  2 +-
 gcc/d/dmd/dmacro.c   |  7 +++
 gcc/d/dmd/dtemplate.c| 19 +--
 gcc/d/dmd/expressionsem.c|  2 +-
 gcc/d/dmd/globals.h  |  2 ++
 gcc/d/dmd/mtype.c|  4 ++--
 gcc/d/dmd/optimize.c | 11 ++-
 gcc/testsuite/gdc.test/compilable/ice20092.d | 10 ++
 8 files changed, 42 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gdc.test/compilable/ice20092.d

diff --git a/gcc/d/dmd/MERGE b/gcc/d/dmd/MERGE
index 8445bfaf343..2dc2ef3d622 100644
--- a/gcc/d/dmd/MERGE
+++ b/gcc/d/dmd/MERGE
@@ -1,4 +1,4 @@
-2cc25c2191928f865e1b711f30b6a4268d6a0d9a
+cb4a96faecb6b521ac4749581bcfa39f61143db0
 
 The first line of this file holds the git revision number of the last
 merge done from the dlang/dmd repository.
diff --git a/gcc/d/dmd/dmacro.c b/gcc/d/dmd/dmacro.c
index 07187612616..cefd3b037aa 100644
--- a/gcc/d/dmd/dmacro.c
+++ b/gcc/d/dmd/dmacro.c
@@ -226,11 +226,10 @@ void Macro::expand(OutBuffer *buf, size_t start, size_t 
*pend,
 {
 // limit recursive expansion
 static int nest;
-static const int nestLimit = 1000;
-if (nest > nestLimit)
+if (nest > global.recursionLimit)
 {
-error(Loc(), "DDoc macro expansion limit exceeded; more than %d "
-"expansions.", nestLimit);
+error(Loc(), "DDoc macro expansion limit exceeded; more than %d 
expansions.",
+  global.recursionLimit);
 return;
 }
 nest++;
diff --git a/gcc/d/dmd/dtemplate.c b/gcc/d/dmd/dtemplate.c
index a35721cc9cd..a86daeee633 100644
--- a/gcc/d/dmd/dtemplate.c
+++ b/gcc/d/dmd/dtemplate.c
@@ -4337,6 +4337,13 @@ MATCH deduceType(RootObject *o, Scope *sc, Type *tparam, 
TemplateParameters *par
 
 void visit(ArrayLiteralExp *e)
 {
+// https://issues.dlang.org/show_bug.cgi?id=20092
+if (e->elements && e->elements->length &&
+e->type->toBasetype()->nextOf()->ty == Tvoid)
+{
+result = deduceEmptyArrayElement();
+return;
+}
 if ((!e->elements || !e->elements->length) &&
 e->type->toBasetype()->nextOf()->ty == Tvoid &&
 tparam->ty == Tarray)
@@ -5932,10 +5939,10 @@ void TemplateInstance::tryExpandMembers(Scope *sc2)
 static int nest;
 // extracted to a function to allow windows SEH to work without 
destructors in the same function
 //printf("%d\n", nest);
-if (++nest > 500)
+if (++nest > global.recursionLimit)
 {
 global.gag = 0; // ensure error message gets printed
-error("recursive expansion");
+error("recursive expansion exceeded allowed nesting limit");
 fatal();
 }
 
@@ -5949,10 +5956,10 @@ void TemplateInstance::trySemantic3(Scope *sc2)
 // extracted to a function to allow windows SEH to work without 
destructors in the same function
 static int nest;
 //printf("%d\n", nest);
-if (++nest > 300)
+if (++nest > global.recursionLimit)
 {
 global.gag = 0;// ensure error message gets printed
-error("recursive expansion");
+error("recursive expansion exceeded allowed nesting limit");
 fatal();
 }
 semantic3(sc2);
@@ -6353,7 +6360,7 @@ Lerror:
 while (ti && !ti->deferred && ti->tinst)
 {
 ti = ti->tinst;
-if (++nest > 500)
+if (++nest > global.recursionLimit)
 {
 global.gag = 0;// ensure error message gets printed
 error("recursive expansion");
@@ -8440,7 +8447,7 @@ void TemplateMixin::semantic(Scope *sc)
 
 static int nest;
 //printf("%d\n", nest);
-if (++nest > 500)
+if (++nest > global.recursionLimit)
 {
 global.gag = 0; // ensure error message gets printed
 error("recursive expansion");
diff --git a/gcc/d/dmd/expressionsem.c b/gcc/d/dmd/expressionsem.c
index 9f21dabb7e4..d2519969a2c 100644
--- a/gcc/d/dmd/expressionsem.c
+++ b/gcc/d/dmd/expressionsem.c
@@ -2896,7 +2896,7 @@ public:
 else
 {
 static int nest;
-if (++nest > 500)
+if (++nest > global.recursionLimit)
 {
 exp->error("recursive evaluation of %s", exp->toChars());
 --nest;
diff --git a/gcc/d/dmd/globals.h b/gcc/d/dmd/globals

  1   2   >