Re: [PATCH] PR c/104506: Tolerate error_mark_node in useless_type_conversion_p.

2022-02-14 Thread Andrew Pinski via Gcc-patches
On Mon, Feb 14, 2022 at 11:33 PM Richard Biener
 wrote:
>
> On Tue, Feb 15, 2022 at 12:58 AM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > On Mon, Feb 14, 2022 at 4:54 AM Roger Sayle  
> > wrote:
> > >
> > >
> > >
> > > This simple fix to the middle-end, resolves PR c/104506, by adding an
> > >
> > > explicit check for error_mark_node to useless_type_conversion_p.  I first
> > >
> > > trying fixing this in the C front-end, but the type is valid at the point
> > >
> > > that the NOP_EXPR is created, so the poisoned type leaks to the 
> > > middle-end.
> > >
> > > Returning either true or false from useless_type_conversion_p avoids the
> > >
> > > ICE-after-error.  Apologies to Andrew Pinski, I hadn't noticed that he'd
> > >
> > > assigned this PR to himself until after my regression testing had 
> > > finished.
> > >
> > >
> > >
> > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> > >
> > > make -k check with no new failures.  Ok for mainline?
> > >
> > >
> > >
> > >
> > >
> > > 2022-02-14  Roger Sayle  
> > >
> > >
> > >
> > > gcc/ChangeLog
> > >
> > > PR c/104506
> > >
> > > * gimple-expr.cc (useless_type_conversion_p): Add a check for
> > >
> > > error_mark_node.
> >
> > I came up with a different patch (attached) which just changes
> > tree_ssa_useless_type_conversion rather than useless_type_conversion_p
> > which I was going to submit but had an issue with my build machine.
> > I did it this way as it was similar to how
> > STRIP_NOPS/tree_nop_conversion was done already.
> >
> > Also from my description of the patch
> > STRIP_USELESS_TYPE_CONVERSION is mostly used inside the gimplifier
> > and the places where it is used outside of the gimplifier would not
> > be adding too much overhead.
> >
> > Though I think Richard Biener's patch is better really. It would be
> > interesting to see how the C++ front-end handles this case, I remember
> > it using integer_type_node in some locations after an error for a
> > type.
>
> If the fix to the C frontend doesn't work out I'd indeed prefer your variant.
> Nit:
>
> +  outer_type = TREE_TYPE (expr);
> +  inner_type = TREE_TYPE (TREE_OPERAND (expr, 0));
> +
> +  if (!inner_type || inner_type == error_mark_node)
> +return false;
>
> unless we get to a case where inner_type == NULL I would not bother
> checking that.

Understood, I will remove it, I was just copying exactly what was done
in tree_nop_conversion really. The history on the null check seems to
date to 2000 without any explanation really.

Thanks,
Andrew Pinski

>
> As said, that TREE_TYPE (error_mark_node) is not a type is IMHO bad
> for error recovery.  Maybe we really need ERROR_TYPE here.
>
> > Thanks,
> > Andrew Pinski
> >
> > >
> > >
> > >
> > > gcc/testsuite/ChangeLog
> > >
> > > PR c/104506
> > >
> > > * gcc.dg/pr104506.c: New test case.
> > >
> > >
> > >
> > >
> > >
> > > Roger
> > >
> > > --
> > >
> > >


Re: [committed][nvptx] Handle pre-sm_7x shared atomic store using atomic exchange

2022-02-14 Thread Thomas Schwinge
Hi Tom!

For my understanding:

On 2022-02-10T10:13:10+0100, Tom de Vries via Gcc-patches 
 wrote:
> The ptx isa specifies (for pre-sm_7x) that atomic operations on shared memory
> locations do not guarantee atomicity with respect to normal store instructions
> to the same address.
>
> This can be fixed by:
> - inserting barriers between normal stores and atomic operations to a common
>   address
> - using atom.exch to store to locations accessed by other atomic operations.
>
> It's not clearly spelled out which barriers are needed, and a barrier seem 
> more
> expensive than atomic exchange.
>
> Implement the pre-sm_7x shared atomic store using atomic exchange.
>
> That includes stores using generic addressing, since those may also point to
> shared memory.

It is expected that this changes, for example (similar elsewhere)
'nvptx-none/libatomic/store_4_.o', to use (a) 'atom.exch' (with a new
dummy register allocated) and yet (b) 'membar.sys' remains before as well
as after (a):

 .visible .func __atomic_store_4 (.param .u64 %in_ar0, .param .u32 %in_ar1, 
.param .u32 %in_ar2)
 {
 .reg .u64 %ar0;
 ld.param.u64 %ar0,[%in_ar0];
 .reg .u32 %ar1;
 ld.param.u32 %ar1,[%in_ar1];
 .reg .u32 %ar2;
 ld.param.u32 %ar2,[%in_ar2];
 .reg .u64 %r22;
 .reg .u32 %r23;
+.reg .u32 %r25;
 mov.u64 %r22,%ar0;
 mov.u32 %r23,%ar1;
 .loc 2 39 5
 membar.sys;
-st.u32 [%r22],%r23;
+atom.exch.b32 %r25,[%r22],%r23;
 membar.sys;
 .loc 2 40 1
 ret;
 }

And, for example (similar elsewhere) 'nvptx-none/libgomp/single.o', a
'ld' that previously was issued after 'membar.sys' now moves before that
one:

 .visible .func (.param .u64 %value_out) GOMP_single_copy_start
 {
[...]
 .reg .u64 %r33;
 .reg .u64 %r34;
[...]
 .reg .pred %r51;
 .reg .u64 %r50;
 .reg .u64 %r52;
[...]
 ld.shared.u64 %r50,[nvptx_thrs];
 add.u64 %r33,%r50,%r49;
 .loc 2 1317 32
 ld.u64 %r34,[%r33+32];
 .loc 2 1317 6
 setp.eq.u64 %r51,%r34,0;
 @ %r51 bra $L6;
 .loc 4 66 3
-membar.sys;
 ld.u64 %r52,[%r33+24];
-st.u64 [%r34+80],%r52;
+membar.sys;
+atom.exch.b64 %r53,[%r34+80],%r52;
 $L6:
[...]

(But I see there is another 'ld.u64 %r34,[%r33+32]' that previously also
already was issued before the 'membar.sys'.)


Grüße
 Thomas


> [nvptx] Handle pre-sm_7x shared atomic store using atomic exchange
>
> gcc/ChangeLog:
>
> 2022-02-02  Tom de Vries  
>
>   * config/nvptx/nvptx-protos.h (nvptx_mem_maybe_shared_p): Declare.
>   * config/nvptx/nvptx.cc (nvptx_mem_data_area): New static function.
>   (nvptx_mem_maybe_shared_p): New function.
>   * config/nvptx/nvptx.md (define_expand "atomic_store"): New
>   define_expand.
>
> gcc/testsuite/ChangeLog:
>
> 2022-02-02  Tom de Vries  
>
>   * gcc.target/nvptx/atomic-store-1.c: New test.
>   * gcc.target/nvptx/atomic-store-3.c: New test.
>   * gcc.target/nvptx/stack-atomics-run.c: Update.
>
> ---
>  gcc/config/nvptx/nvptx-protos.h|  1 +
>  gcc/config/nvptx/nvptx.cc  | 22 
>  gcc/config/nvptx/nvptx.md  | 30 
> ++
>  gcc/testsuite/gcc.target/nvptx/atomic-store-1.c| 26 +++
>  gcc/testsuite/gcc.target/nvptx/atomic-store-3.c| 25 ++
>  gcc/testsuite/gcc.target/nvptx/stack-atomics-run.c |  6 -
>  6 files changed, 109 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/nvptx/nvptx-protos.h b/gcc/config/nvptx/nvptx-protos.h
> index a846e341917..0bf9af406a2 100644
> --- a/gcc/config/nvptx/nvptx-protos.h
> +++ b/gcc/config/nvptx/nvptx-protos.h
> @@ -60,5 +60,6 @@ extern const char *nvptx_output_simt_exit (rtx);
>  extern const char *nvptx_output_red_partition (rtx, rtx);
>  extern const char *nvptx_output_atomic_insn (const char *, rtx *, int, int);
>  extern bool nvptx_mem_local_p (rtx);
> +extern bool nvptx_mem_maybe_shared_p (const_rtx);
>  #endif
>  #endif
> diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
> index 1b0227a2c31..5b26c0f4c7d 100644
> --- a/gcc/config/nvptx/nvptx.cc
> +++ b/gcc/config/nvptx/nvptx.cc
> @@ -76,6 +76,7 @@
>  #include "intl.h"
>  #include "opts.h"
>  #include "tree-pretty-print.h"
> +#include "rtl-iter.h"
>
>  /* This file should be included last.  */
>  #include "target-def.h"
> @@ -2787,6 +2788,27 @@ nvptx_print_operand_address (FILE *file, machine_mode 
> mode, rtx addr)
>nvptx_print_address_operand (file, addr, mode);
>  }
>
> +static nvptx_data_area
> +nvptx_mem_data_area (const_rtx x)
> +{
> +  gcc_assert (GET_CODE (x) == MEM);
> +
> +  const_rtx addr = XEXP (x, 0);
> +  subrtx_iterator::array_type array;
> +  FOR_EACH_SUBRTX (iter, array, addr, ALL)
> +if (SYMBOL_REF_P (*iter))
> +  return SYMBOL_DATA_AREA (*iter);
> +
> +  return DATA_AREA_GENERIC;
> +}
> +
> +bool
> +nvptx_mem_maybe_shared_p (const_rtx x)
> +{
> +  

Re: [PATCH] PR c/104506: Tolerate error_mark_node in useless_type_conversion_p.

2022-02-14 Thread Richard Biener via Gcc-patches
On Tue, Feb 15, 2022 at 12:58 AM Andrew Pinski via Gcc-patches
 wrote:
>
> On Mon, Feb 14, 2022 at 4:54 AM Roger Sayle  
> wrote:
> >
> >
> >
> > This simple fix to the middle-end, resolves PR c/104506, by adding an
> >
> > explicit check for error_mark_node to useless_type_conversion_p.  I first
> >
> > trying fixing this in the C front-end, but the type is valid at the point
> >
> > that the NOP_EXPR is created, so the poisoned type leaks to the middle-end.
> >
> > Returning either true or false from useless_type_conversion_p avoids the
> >
> > ICE-after-error.  Apologies to Andrew Pinski, I hadn't noticed that he'd
> >
> > assigned this PR to himself until after my regression testing had finished.
> >
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> >
> > make -k check with no new failures.  Ok for mainline?
> >
> >
> >
> >
> >
> > 2022-02-14  Roger Sayle  
> >
> >
> >
> > gcc/ChangeLog
> >
> > PR c/104506
> >
> > * gimple-expr.cc (useless_type_conversion_p): Add a check for
> >
> > error_mark_node.
>
> I came up with a different patch (attached) which just changes
> tree_ssa_useless_type_conversion rather than useless_type_conversion_p
> which I was going to submit but had an issue with my build machine.
> I did it this way as it was similar to how
> STRIP_NOPS/tree_nop_conversion was done already.
>
> Also from my description of the patch
> STRIP_USELESS_TYPE_CONVERSION is mostly used inside the gimplifier
> and the places where it is used outside of the gimplifier would not
> be adding too much overhead.
>
> Though I think Richard Biener's patch is better really. It would be
> interesting to see how the C++ front-end handles this case, I remember
> it using integer_type_node in some locations after an error for a
> type.

If the fix to the C frontend doesn't work out I'd indeed prefer your variant.
Nit:

+  outer_type = TREE_TYPE (expr);
+  inner_type = TREE_TYPE (TREE_OPERAND (expr, 0));
+
+  if (!inner_type || inner_type == error_mark_node)
+return false;

unless we get to a case where inner_type == NULL I would not bother
checking that.

As said, that TREE_TYPE (error_mark_node) is not a type is IMHO bad
for error recovery.  Maybe we really need ERROR_TYPE here.

> Thanks,
> Andrew Pinski
>
> >
> >
> >
> > gcc/testsuite/ChangeLog
> >
> > PR c/104506
> >
> > * gcc.dg/pr104506.c: New test case.
> >
> >
> >
> >
> >
> > Roger
> >
> > --
> >
> >


Re: [PATCH] c: Add diagnostic when operator= is used as truth cond [PR25689]

2022-02-14 Thread Jason Merrill via Gcc-patches

On 2/14/22 21:30, Zhao Wei Liew wrote:

On 14/02/2022, Jason Merrill  wrote:


+/* Returns true if EXPR is a reference to an implicit
+   call to operator=(). */
+static bool
+is_assignment_overload_ref_p (tree expr)
+{
+  if (expr == NULL_TREE || !REFERENCE_REF_P (expr))
+return false;


This will only warn about op= that returns a reference, which is not
required.



Ah I understand now. I added some new test cases for non-reference return types
and copied some code from extract_call_expr that seems to do what we want.


Good plan, but let's call extract_call_expr rather than copy from it.


+  return fndecl != NULL_TREE
+&& DECL_OVERLOADED_OPERATOR_P (fndecl)
+&& DECL_OVERLOADED_OPERATOR_IS (fndecl, NOP_EXPR)
+&& DECL_ASSIGNMENT_OPERATOR_P (fndecl);


This could be

&& DECL_ASSIGNMENT_OPERATOR_P (fndecl)
&& DECL_OVERLOADED_OPERATOR_IS (fndecl, NOP_EXPR);

without the separate DECL_OVERLOADED_OPERATOR_P test.



I see. I've adjusted the code as you suggested and it works. Thanks!

-- Everything below is the patch v4 --

When compiling the following code with g++ -Wparentheses, GCC does not
warn on the if statement. For example, there is no warning for this code:

struct A {
A& operator=(int);
operator bool();
};

void f(A a) {
if (a = 0); // no warning
}

This is because a = 0 is a call to operator=, which GCC does not handle.

This patch fixes this issue by handling calls to operator= when deciding
to warn.

Bootstrapped and tested on x86_64-pc-linux-gnu.

v3: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590310.html
Changes since v3:
1. Also handle COMPOUND_EXPRs and TARGET_EXPRs.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590236.html
Changes since v2:
1. Add more test cases in Wparentheses-31.C.
2. Refactor added logic to a function (is_assignment_overload_ref_p).
3. Use REFERENCE_REF_P instead of INDIRECT_REF_P.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590158.html
Changes since v1:
1. Use CALL_EXPR_OPERATOR_SYNTAX to avoid warnings for explicit
operator=() calls.
2. Use INDIRECT_REF_P to filter implicit operator=() calls.
3. Use cp_get_callee_fndecl_nofold.
4. Add spaces before (.

PR c/25689

gcc/cp/ChangeLog:

* semantics.cc (maybe_convert_cond): Handle the implicit
  operator=() case for -Wparentheses.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wparentheses-31.C: New test.
---
  gcc/cp/semantics.cc | 29 +-
  gcc/testsuite/g++.dg/warn/Wparentheses-31.C | 62 +
  2 files changed, 90 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wparentheses-31.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 0cb17a6a8ab1c..9c3f613a1aa62 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -815,6 +815,33 @@ finish_goto_stmt (tree destination)
return add_stmt (build_stmt (input_location, GOTO_EXPR, destination));
  }

+/* Returns true if CALL is a (possibly wrapped) CALL_EXPR
+   to operator=() that is written as an operator expression. */
+static bool
+is_assignment_op_expr_p(tree call)


Missing space before (


+{
+  if (call == NULL_TREE)
+return false;
+
+  /* Unwrap the CALL_EXPR. */
+  while (TREE_CODE (call) == COMPOUND_EXPR)
+call = TREE_OPERAND (call, 1);
+  if (REFERENCE_REF_P (call))
+call = TREE_OPERAND (call, 0);
+  if (TREE_CODE (call) == TARGET_EXPR)
+call = TARGET_EXPR_INITIAL (call);
+
+  if (call == NULL_TREE
+  || TREE_CODE (call) != CALL_EXPR
+  || !CALL_EXPR_OPERATOR_SYNTAX (call))
+return false;
+
+  tree fndecl = cp_get_callee_fndecl_nofold (call);
+  return fndecl != NULL_TREE
+&& DECL_ASSIGNMENT_OPERATOR_P (fndecl)
+&& DECL_OVERLOADED_OPERATOR_IS (fndecl, NOP_EXPR);
+}
+
  /* COND is the condition-expression for an if, while, etc.,
 statement.  Convert it to a boolean value, if appropriate.
 In addition, verify sequence points if -Wsequence-point is enabled.  */
@@ -836,7 +863,7 @@ maybe_convert_cond (tree cond)
/* Do the conversion.  */
cond = convert_from_reference (cond);

-  if (TREE_CODE (cond) == MODIFY_EXPR
+  if ((TREE_CODE (cond) == MODIFY_EXPR || is_assignment_op_expr_p (cond))
&& warn_parentheses
&& !warning_suppressed_p (cond, OPT_Wparentheses)
&& warning_at (cp_expr_loc_or_input_loc (cond),
diff --git a/gcc/testsuite/g++.dg/warn/Wparentheses-31.C
b/gcc/testsuite/g++.dg/warn/Wparentheses-31.C
new file mode 100644
index 0..8d48ca5205782
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wparentheses-31.C
@@ -0,0 +1,62 @@
+/* Test that -Wparentheses warns for struct/class assignments,
+   except for explicit calls to operator= (). */
+/* PR c/25689 */
+/* { dg-options "-Wparentheses" }  */
+
+struct A
+{
+   A& operator= (int);
+   A operator= (double);
+   operator bool ();
+};
+
+struct B
+{
+   bool x;
+   B& operator= (int);

[pushed] c++: TTP in member alias template [PR104107]

2022-02-14 Thread Jason Merrill via Gcc-patches
In the first testcase, coerce_template_template_parms was adding too much of
outer_args when coercing to match P's template parameters, so that when
substituting into the 'const T&' parameter we got an unrelated template
argument for T.  We should only add outer_args when the argument template is
a nested template.

Tested x86_64-pc-linux-gnu, applying to trunk.

PR c++/104107
PR c++/95036

gcc/cp/ChangeLog:

* pt.cc (coerce_template_template_parms): Take full parms.
Avoid adding too much of outer_args.
(coerce_template_template_parm): Adjust.
(template_template_parm_bindings_ok_p): Adjust.
(convert_template_argument): Adjust.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-ttp2.C: New test.
* g++.dg/cpp1z/ttp2.C: New test.
---
 gcc/cp/pt.cc | 41 +++-
 gcc/testsuite/g++.dg/cpp0x/alias-decl-ttp2.C | 25 
 gcc/testsuite/g++.dg/cpp1z/ttp2.C| 21 ++
 3 files changed, 77 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-ttp2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/ttp2.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 1b18e2a7787..6dda66081bd 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -7731,8 +7731,8 @@ coerce_template_template_parm (tree parm,
 template  class> class TT>
 class C;  */
   {
-   tree parmparm = DECL_INNERMOST_TEMPLATE_PARMS (parm);
-   tree argparm = DECL_INNERMOST_TEMPLATE_PARMS (arg);
+   tree parmparm = DECL_TEMPLATE_PARMS (parm);
+   tree argparm = DECL_TEMPLATE_PARMS (arg);
 
if (!coerce_template_template_parms
(parmparm, argparm, complain, in_decl, outer_args))
@@ -8001,8 +8001,8 @@ unify_bound_ttp_args (tree tparms, tree targs, tree parm, 
tree& arg,
the parameters to A, and OUTER_ARGS contains A.  */
 
 static int
-coerce_template_template_parms (tree parm_parms,
-   tree arg_parms,
+coerce_template_template_parms (tree parm_parms_full,
+   tree arg_parms_full,
tsubst_flags_t complain,
tree in_decl,
tree outer_args)
@@ -8011,6 +8011,9 @@ coerce_template_template_parms (tree parm_parms,
   tree parm, arg;
   int variadic_p = 0;
 
+  tree parm_parms = INNERMOST_TEMPLATE_PARMS (parm_parms_full);
+  tree arg_parms = INNERMOST_TEMPLATE_PARMS (arg_parms_full);
+
   gcc_assert (TREE_CODE (parm_parms) == TREE_VEC);
   gcc_assert (TREE_CODE (arg_parms) == TREE_VEC);
 
@@ -8046,8 +8049,26 @@ coerce_template_template_parms (tree parm_parms,
 specialized as P, so they match.*/
   processing_template_decl_sentinel ptds (/*reset*/false);
   ++processing_template_decl;
+
   tree pargs = template_parms_level_to_args (parm_parms);
-  pargs = add_outermost_template_args (outer_args, pargs);
+
+  /* PARM, and thus the context in which we are passing ARG to it, may be
+at a deeper level than ARG; when trying to coerce to ARG_PARMS, we
+want to provide the right number of levels, so we reduce the number of
+levels in OUTER_ARGS before prepending them.  This is most important
+when ARG is a namespace-scope template, as in alias-decl-ttp2.C.
+
+ARG might also be deeper than PARM (ttp23).  In that case, we include
+all of OUTER_ARGS.  The missing levels seem potentially problematic,
+but I can't come up with a testcase that breaks.  */
+  if (int arg_outer_levs = TMPL_PARMS_DEPTH (arg_parms_full) - 1)
+   {
+ auto x = make_temp_override (TREE_VEC_LENGTH (outer_args));
+ if (TMPL_ARGS_DEPTH (outer_args) > arg_outer_levs)
+   TREE_VEC_LENGTH (outer_args) = arg_outer_levs;
+ pargs = add_to_template_args (outer_args, pargs);
+   }
+
   pargs = coerce_template_parms (arg_parms, pargs, NULL_TREE, tf_none,
 /*require_all*/true, /*use_default*/true);
   if (pargs != error_mark_node)
@@ -8186,16 +8207,16 @@ template_template_parm_bindings_ok_p (tree tparms, tree 
targs)
  /* Extract the template parameters from the template
 argument.  */
  if (TREE_CODE (targ) == TEMPLATE_DECL)
-   targ_parms = DECL_INNERMOST_TEMPLATE_PARMS (targ);
+   targ_parms = DECL_TEMPLATE_PARMS (targ);
  else if (TREE_CODE (targ) == TEMPLATE_TEMPLATE_PARM)
-   targ_parms = DECL_INNERMOST_TEMPLATE_PARMS (TYPE_NAME (targ));
+   targ_parms = DECL_TEMPLATE_PARMS (TYPE_NAME (targ));
 
  /* Verify that we can coerce the template template
 parameters from the template argument to the template
 parameter.  This requires an exact match.  */
  if (targ_parms
  && !coerce_template_template_parms
-  

Re: [PATCH, rs6000] Remove TImode from mode iterator BOOL_128 [PR100694]

2022-02-14 Thread HAO CHEN GUI via Gcc-patches
Segher,
  Thanks for your comments. Here are my comments and questions.Thanks.

On 15/2/2022 上午 5:36, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Feb 09, 2022 at 10:43:17AM +0800, HAO CHEN GUI wrote:
>>   This patch removes TImode from mode iterator BOOL_128. Thus, bool 
>> operations (AND, IOR, XOR, NOT)
>> on TImode will be split to the relevant operations on word mode during 
>> expand (in optabs.c).
> 
> But we also want to allow TImode in VSRs.  This of course is a never-
> ending story, no choice works very well here.
> 
>> Potential
>> optimizations can be implemented after the split. The former practice splits 
>> it after the reload
>> pass which is too later for some optimizations. The new test case 
>> illustrates it.
> 
> All that are arguments for expanding to split form, not for removing
> TImode from the iterator.  And you leave PTImode, which *always* is in
> GPRs!
>From my understanding, PTImode has limitation that it needs to be assigned
with an even/odd register pair. So it can't be split before the reload pass.
Currently it is split after reload.>
> It may be that only leaving the "V" modes there works well; that needs
> testing though, more than just asaserting this.
> 
> Just doing it and handling the ICEs later is fine, but in stage 1.
> 
> (You'll also have to show it is *correct*, you need to prove (or show it
> really likely :-) ) that after this change there are no TImode things
> generated anywhere (anywhere!) that are no longer handled now).
> 
Yes, the TI may be generated after expand pass and causes ICEs. So how about
creating two mode iterators? One is for expand which doesn't include TImode,
another is for the split which include TImode and make TImode to be split
as early as possible?

> 
> Segher


Re: [PATCH] c: Add diagnostic when operator= is used as truth cond [PR25689]

2022-02-14 Thread Zhao Wei Liew via Gcc-patches
On 14/02/2022, Jason Merrill  wrote:
>>
>> +/* Returns true if EXPR is a reference to an implicit
>> +   call to operator=(). */
>> +static bool
>> +is_assignment_overload_ref_p (tree expr)
>> +{
>> +  if (expr == NULL_TREE || !REFERENCE_REF_P (expr))
>> +return false;
>
> This will only warn about op= that returns a reference, which is not
> required.
>

Ah I understand now. I added some new test cases for non-reference return types
and copied some code from extract_call_expr that seems to do what we want.

>> +  return fndecl != NULL_TREE
>> +&& DECL_OVERLOADED_OPERATOR_P (fndecl)
>> +&& DECL_OVERLOADED_OPERATOR_IS (fndecl, NOP_EXPR)
>> +&& DECL_ASSIGNMENT_OPERATOR_P (fndecl);
>
> This could be
>
>&& DECL_ASSIGNMENT_OPERATOR_P (fndecl)
>&& DECL_OVERLOADED_OPERATOR_IS (fndecl, NOP_EXPR);
>
> without the separate DECL_OVERLOADED_OPERATOR_P test.
>

I see. I've adjusted the code as you suggested and it works. Thanks!

-- Everything below is the patch v4 --

When compiling the following code with g++ -Wparentheses, GCC does not
warn on the if statement. For example, there is no warning for this code:

struct A {
A& operator=(int);
operator bool();
};

void f(A a) {
if (a = 0); // no warning
}

This is because a = 0 is a call to operator=, which GCC does not handle.

This patch fixes this issue by handling calls to operator= when deciding
to warn.

Bootstrapped and tested on x86_64-pc-linux-gnu.

v3: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590310.html
Changes since v3:
1. Also handle COMPOUND_EXPRs and TARGET_EXPRs.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590236.html
Changes since v2:
1. Add more test cases in Wparentheses-31.C.
2. Refactor added logic to a function (is_assignment_overload_ref_p).
3. Use REFERENCE_REF_P instead of INDIRECT_REF_P.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590158.html
Changes since v1:
1. Use CALL_EXPR_OPERATOR_SYNTAX to avoid warnings for explicit
   operator=() calls.
2. Use INDIRECT_REF_P to filter implicit operator=() calls.
3. Use cp_get_callee_fndecl_nofold.
4. Add spaces before (.

PR c/25689

gcc/cp/ChangeLog:

* semantics.cc (maybe_convert_cond): Handle the implicit
  operator=() case for -Wparentheses.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wparentheses-31.C: New test.
---
 gcc/cp/semantics.cc | 29 +-
 gcc/testsuite/g++.dg/warn/Wparentheses-31.C | 62 +
 2 files changed, 90 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wparentheses-31.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 0cb17a6a8ab1c..9c3f613a1aa62 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -815,6 +815,33 @@ finish_goto_stmt (tree destination)
   return add_stmt (build_stmt (input_location, GOTO_EXPR, destination));
 }

+/* Returns true if CALL is a (possibly wrapped) CALL_EXPR
+   to operator=() that is written as an operator expression. */
+static bool
+is_assignment_op_expr_p(tree call)
+{
+  if (call == NULL_TREE)
+return false;
+
+  /* Unwrap the CALL_EXPR. */
+  while (TREE_CODE (call) == COMPOUND_EXPR)
+call = TREE_OPERAND (call, 1);
+  if (REFERENCE_REF_P (call))
+call = TREE_OPERAND (call, 0);
+  if (TREE_CODE (call) == TARGET_EXPR)
+call = TARGET_EXPR_INITIAL (call);
+
+  if (call == NULL_TREE
+  || TREE_CODE (call) != CALL_EXPR
+  || !CALL_EXPR_OPERATOR_SYNTAX (call))
+return false;
+
+  tree fndecl = cp_get_callee_fndecl_nofold (call);
+  return fndecl != NULL_TREE
+&& DECL_ASSIGNMENT_OPERATOR_P (fndecl)
+&& DECL_OVERLOADED_OPERATOR_IS (fndecl, NOP_EXPR);
+}
+
 /* COND is the condition-expression for an if, while, etc.,
statement.  Convert it to a boolean value, if appropriate.
In addition, verify sequence points if -Wsequence-point is enabled.  */
@@ -836,7 +863,7 @@ maybe_convert_cond (tree cond)
   /* Do the conversion.  */
   cond = convert_from_reference (cond);

-  if (TREE_CODE (cond) == MODIFY_EXPR
+  if ((TREE_CODE (cond) == MODIFY_EXPR || is_assignment_op_expr_p (cond))
   && warn_parentheses
   && !warning_suppressed_p (cond, OPT_Wparentheses)
   && warning_at (cp_expr_loc_or_input_loc (cond),
diff --git a/gcc/testsuite/g++.dg/warn/Wparentheses-31.C
b/gcc/testsuite/g++.dg/warn/Wparentheses-31.C
new file mode 100644
index 0..8d48ca5205782
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wparentheses-31.C
@@ -0,0 +1,62 @@
+/* Test that -Wparentheses warns for struct/class assignments,
+   except for explicit calls to operator= (). */
+/* PR c/25689 */
+/* { dg-options "-Wparentheses" }  */
+
+struct A
+{
+   A& operator= (int);
+   A operator= (double);
+   operator bool ();
+};
+
+struct B
+{
+   bool x;
+   B& operator= (int);
+   B operator= (double);
+   operator bool ();
+};
+
+struct C
+{
+   C& operator= (int);
+   

Re: [PATCH] PR c/104506: Tolerate error_mark_node in useless_type_conversion_p.

2022-02-14 Thread Andrew Pinski via Gcc-patches
On Mon, Feb 14, 2022 at 4:54 AM Roger Sayle  wrote:
>
>
>
> This simple fix to the middle-end, resolves PR c/104506, by adding an
>
> explicit check for error_mark_node to useless_type_conversion_p.  I first
>
> trying fixing this in the C front-end, but the type is valid at the point
>
> that the NOP_EXPR is created, so the poisoned type leaks to the middle-end.
>
> Returning either true or false from useless_type_conversion_p avoids the
>
> ICE-after-error.  Apologies to Andrew Pinski, I hadn't noticed that he'd
>
> assigned this PR to himself until after my regression testing had finished.
>
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
>
> make -k check with no new failures.  Ok for mainline?
>
>
>
>
>
> 2022-02-14  Roger Sayle  
>
>
>
> gcc/ChangeLog
>
> PR c/104506
>
> * gimple-expr.cc (useless_type_conversion_p): Add a check for
>
> error_mark_node.

I came up with a different patch (attached) which just changes
tree_ssa_useless_type_conversion rather than useless_type_conversion_p
which I was going to submit but had an issue with my build machine.
I did it this way as it was similar to how
STRIP_NOPS/tree_nop_conversion was done already.

Also from my description of the patch
STRIP_USELESS_TYPE_CONVERSION is mostly used inside the gimplifier
and the places where it is used outside of the gimplifier would not
be adding too much overhead.

Though I think Richard Biener's patch is better really. It would be
interesting to see how the C++ front-end handles this case, I remember
it using integer_type_node in some locations after an error for a
type.

Thanks,
Andrew Pinski

>
>
>
> gcc/testsuite/ChangeLog
>
> PR c/104506
>
> * gcc.dg/pr104506.c: New test case.
>
>
>
>
>
> Roger
>
> --
>
>
From 43d418042efacbb7efe8664fbb1176608470474a Mon Sep 17 00:00:00 2001
From: Andrew Pinski 
Date: Sun, 13 Feb 2022 00:09:39 +
Subject: [PATCH] c: [PR104506] Fix ICE after error due to change of type to
 error_mark_node

The problem here is we end up with an error_mark_node when calling
useless_type_conversion_p and that ICEs. STRIP_NOPS/tree_nop_conversion
has had a check for the inner type being an error_mark_node since g9a6bb3f78c96
(2000). This just adds the check also to tree_ssa_useless_type_conversion.
STRIP_USELESS_TYPE_CONVERSION is mostly used inside the gimplifier
and the places where it is used outside of the gimplifier would not
be adding too much overhead.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Thanks,
Andrew Pinski

PR c/104506

gcc/ChangeLog:

* tree-ssa.cc (tree_ssa_useless_type_conversion):
Check the inner type before calling useless_type_conversion_p.

gcc/testsuite/ChangeLog:

* gcc.dg/pr104506-1.c: New test.
* gcc.dg/pr104506-2.c: New test.
* gcc.dg/pr104506-3.c: New test.
---
 gcc/testsuite/gcc.dg/pr104506-1.c | 12 
 gcc/testsuite/gcc.dg/pr104506-2.c | 11 +++
 gcc/testsuite/gcc.dg/pr104506-3.c | 11 +++
 gcc/tree-ssa.cc   | 20 +---
 4 files changed, 47 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr104506-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr104506-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr104506-3.c

diff --git a/gcc/testsuite/gcc.dg/pr104506-1.c 
b/gcc/testsuite/gcc.dg/pr104506-1.c
new file mode 100644
index 000..5eb71911b71
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104506-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu11" } */
+/* PR c/104506: we used to ICE after the error of
+   changing the type.  */
+
+void
+foo (double x)
+/* { dg-message "note: previous definition" "previous definition" { target 
*-*-* } .-1 } */
+{
+  (void)x;
+  int x; /* { dg-error "redeclared as different kind of symbol" } */
+}
diff --git a/gcc/testsuite/gcc.dg/pr104506-2.c 
b/gcc/testsuite/gcc.dg/pr104506-2.c
new file mode 100644
index 000..3c3c4f8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104506-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-std=gnu11" } */
+/* PR c/104506: we used to ICE after the error of
+   changing the type.  */
+void
+foo (double x)
+/* { dg-message "note: previous definition" "previous definition" { target 
*-*-* } .-1 } */
+{
+  x;
+  int x; /* { dg-error "redeclared as different kind of symbol" } */
+}
diff --git a/gcc/testsuite/gcc.dg/pr104506-3.c 
b/gcc/testsuite/gcc.dg/pr104506-3.c
new file mode 100644
index 000..b14deb5cf25
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104506-3.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* PR c/104506: we used to ICE after the error of
+   changing the type.  */
+double x;
+/* { dg-message "note: previous declaration" "previous declaration" { target 
*-*-* } .-1 } */
+void
+foo (void)
+{
+  x;
+}
+int x; /* { dg-error "conflicting types" } */
diff --git a/gcc/tree-ssa.cc b/gcc/tree-ssa.cc
index 8fe0682981d..98dada09647 100644
--- a/gcc/tree-ssa.cc
+++ 

Re: [PATCH] Update -Warray-bounds documentation [PR104355]

2022-02-14 Thread Martin Sebor via Gcc-patches

On 2/10/22 23:48, Richard Sandiford wrote:

Martin Sebor via Gcc-patches  writes:

The -Warray-bounds description in the manual is out of date in
a couple of ways.  First it claims that the option is only active
with optimization, which isn't entirely correct since at least one
instance is issued even without it.  Second, the description of
its level 2 suggests it controls the warning for all trailing
array members, when it only controls it for trailing one-element
arrays (this was made tighter in GCC 10 but we neglected to update
the manual).

In addition, the word "always" in the description of the option
is also interpreted by some as implying that every instance of
the warning is necessarily a true positive.  I've reworded
the description to hopefully avoid this misreading(*).

Finally, the generic text that talks about the interaction with
optimizations says that -Wmaybe-uninitialized is not issued unless
optimization is enabled.  That's also not accurate anymore since
at least one instance of the warning is independent of optimization
(passing uninitialized objects by reference to const arguments).

The attached changes correct these oversights.

Martin

[*] It should probably be made clearer in the generic text that
no instance of any warning, not just -Warray-bounds, should be
taken to be a definitive indication of a bug in the code.  I've
left that for later.


Yeah, maybe, but I guess it's unlikely to be useful in practice.
The chances of users happening to read a given bit of generic text
seem pretty low.

Doesn't mean we shouldn't do it of course (provided that we don't then
castigate users for having failed to notice it).


I'm sure you're right that few users go to the trouble of studying
the manual when interpreting a warning (and some not even before
submitting a problem report).




Update -Warray-bounds documentation [PR104355].

Resolves:
PR middle-end/104355 - Misleading and outdated -Warray-bounds documentation

gcc/ChangeLog:
* doc/invoke.texi (-Warray-bounds): Update documentation.


diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index b49ba22df89..b7b1f47a5ce 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -5641,8 +5641,10 @@ warns that an unrecognized option is present.
  
  The effectiveness of some warnings depends on optimizations also being

  enabled. For example @option{-Wsuggest-final-types} is more effective
-with link-time optimization and @option{-Wmaybe-uninitialized} does not
-warn at all unless optimization is enabled.
+with link-time optimization and some instances of other warnings may
+not be issued at all unless optimization is enabled.  While optimization
+in general improves the efficacy of control and data flow sensitive
+warnings, in some cases it may also cause false positives.
  
  @table @gcctabopt

  @item -Wpedantic
@@ -7691,20 +7693,22 @@ void f (char c, int i)
  @itemx -Warray-bounds=@var{n}
  @opindex Wno-array-bounds
  @opindex Warray-bounds
-This option is only active when @option{-ftree-vrp} is active
-(default for @option{-O2} and above). It warns about subscripts to arrays
-that are always out of bounds. This warning is enabled by @option{-Wall}.
+Warn about out of bounds subscripts or offsets into arrays. This warning
+level is enabled by @option{-Wall}.  It is the most effective when


It's not clear to me which level is “this level” here.  How about
something like “This warning is enabled at level 1…”?


Yes, good catch, that was a mistake on my part.



“It is more effective when” seems more natural to me than “It is the most
effective when”, but maybe that's just me.


-Warray-bounds doesn't do much without -ftree-vrp (it detects out-of-
bounds offsets in calls to built-ins), and does almost nothing below
-O2 (all it diagnoses at -O0 is strlen() calls with past-the-end
offsets to string constants).  But since the rest of the sentence
mention that it's not completely silent even at -O0 your words sound
good to me too.



Looks good to me otherwise FWIW.  OK with those changes if you agree and
if no-one has further comments by Monday.


Pushed now in r12-7234.

Martin



Thanks,
Richard


+@option{-ftree-vrp} is active (the default for @option{-O2} and above)
+but a subset of instances are issued even without optimization.
  
  @table @gcctabopt

  @item -Warray-bounds=1
-This is the warning level of @option{-Warray-bounds} and is enabled
+This is the default warning level of @option{-Warray-bounds} and is enabled
  by @option{-Wall}; higher levels are not, and must be explicitly requested.
  
  @item -Warray-bounds=2

-This warning level also warns about out of bounds access for
-arrays at the end of a struct and for arrays accessed through
-pointers. This warning level may give a larger number of
-false positives and is deactivated by default.
+This warning level also warns out of bounds accesses to trailing struct
+members of one-element array types (@pxref{Zero Length}) and about
+the intermediate results of 

Re: [PATCH, rs6000] Remove TImode from mode iterator BOOL_128 [PR100694]

2022-02-14 Thread Segher Boessenkool
Hi!

On Wed, Feb 09, 2022 at 10:43:17AM +0800, HAO CHEN GUI wrote:
>   This patch removes TImode from mode iterator BOOL_128. Thus, bool 
> operations (AND, IOR, XOR, NOT)
> on TImode will be split to the relevant operations on word mode during expand 
> (in optabs.c).

But we also want to allow TImode in VSRs.  This of course is a never-
ending story, no choice works very well here.

> Potential
> optimizations can be implemented after the split. The former practice splits 
> it after the reload
> pass which is too later for some optimizations. The new test case illustrates 
> it.

All that are arguments for expanding to split form, not for removing
TImode from the iterator.  And you leave PTImode, which *always* is in
GPRs!

It may be that only leaving the "V" modes there works well; that needs
testing though, more than just asaserting this.

Just doing it and handling the ICEs later is fine, but in stage 1.

(You'll also have to show it is *correct*, you need to prove (or show it
really likely :-) ) that after this change there are no TImode things
generated anywhere (anywhere!) that are no longer handled now).


Segher


Re: [PATCH] reload: Adjust comment in find_reloads about subset, not intersection

2022-02-14 Thread Hans-Peter Nilsson via Gcc-patches
Rather than assuming it's seen and thought not worth the
bother, I'll go with not-seen, so:

Jeff: ping.  A little love for reload, comment-wise, before it's put down!

> From: Richard Sandiford 
> CC: "gcc-patches@gcc.gnu.org" , "j...@tachyum.com"
>   
> Date: Wed, 2 Feb 2022 16:16:14 +0100
> Old-Content-Type: multipart/alternative; boundary="_000_mpta6f9fge9fsfarmcom_"
> Content-Type: text/plain; charset=iso-8859-1
> 
> Richard Sandiford  writes:
> > Hans-Peter Nilsson  writes:
> >>> From: Richard Sandiford 
> >>> Hans-Peter Nilsson via Gcc-patches  writes:
> >>> > The mystery isn't so much that there's code mismatching comments or
> >>> > intent, but that this code has been there "forever".  There has been a
> >>> > function reg_classes_intersect_p, in gcc since r0-54, even *before*
> >>> > there was reload.c; r0-361, so why the "we don't have a way of forming
> >>> > the intersection" in the actual patch and why wasn't this fixed
> >>> > (perhaps not even noticed) when reload was state-of-the-art?
> >>> 
> >>> But doesn't the comment
> >>
> >> (the second, patched comment; removed in the patch)
> >>
> >>> mean that we have/had no way of getting
> >>> a *class* that is the intersection of preferred_class[i] and
> >>> this_alternative[i], to store as the new
> >>> this_alternative[i]?
> >>
> >> Yes, that's likely what's going on in the (second) comment
> >> and code; needing a s/intersection/a class for the
> >> intersection/, but the *first* comment is pretty clear that
> >> the intent is exactly to "override" this_alternative[i]: "If
> >> not (a subclass), but it intersects that class, use the
> >> preferred class instead".  But of course, that doesn't
> >> actually have to make sense as code!  And indeed it doesn't,
> >> as you say.
> >>
> >>> If the alternatives were register sets rather than classes,
> >>> I think the intended effect would be:
> >>> 
> >>>   this_alternative[i] &= preferred_class[i];
> >>> 
> >>> (i.e. AND_HARD_REG_SET in old money).
> >>> 
> >>> It looks like the patch would instead make this_alternative[i] include
> >>> registers that the alternative doesn't actually accept, which feels odd.
> >>
> >> Perhaps I put too much trust in the sanity of old comments.
> >>
> >> How about I actually commit this one instead?  Better get it
> >> right before reload is removed.
> >
> > :-)  LGTM, but I'd like to hear Jeff's opinion.
> 
> So it would be a good idea if I used the right email address.

Perhaps even better to use the address seen in mailing list
conversations, so I'm switching to that one.

> 
> >
> > Thanks,
> > Richard
> >
> >> 8< --- >8
> >> "reload: Adjust comment in find_reloads about subset, not intersection"
> >> gcc:
> >>
> >>* reload.cc (find_reloads): Align comment with code where
> >>considering the intersection of register classes then tweaking the
> >>regclass for the current alternative or rejecting it.
> >> ---
> >>  gcc/reload.cc | 15 +--
> >>  1 file changed, 9 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/gcc/reload.cc b/gcc/reload.cc
> >> index 664082a533d9..3ed901e39447 100644
> >> --- a/gcc/reload.cc
> >> +++ b/gcc/reload.cc
> >> @@ -3635,9 +3635,11 @@ find_reloads (rtx_insn *insn, int replace, int 
> >> ind_levels, int live_known,
> >> a hard reg and this alternative accepts some
> >> register, see if the class that we want is a subset
> >> of the preferred class for this register.  If not,
> >> -   but it intersects that class, use the preferred class
> >> -   instead.  If it does not intersect the preferred
> >> -   class, show that usage of this alternative should be
> >> +   but it intersects that class, we'd like to use the
> >> +   intersection, but the best we can do is to use the
> >> +   preferred class, if it is instead a subset of the
> >> +   class we want in this alternative.  If we can't use
> >> +   it, show that usage of this alternative should be
> >> discouraged; it will be discouraged more still if the
> >> register is `preferred or nothing'.  We do this
> >> because it increases the chance of reusing our spill
> >> @@ -3664,9 +3666,10 @@ find_reloads (rtx_insn *insn, int replace, int 
> >> ind_levels, int live_known,
> >>  if (! reg_class_subset_p (this_alternative[i],
> >>preferred_class[i]))
> >>{
> >> -/* Since we don't have a way of forming the intersection,
> >> -   we just do something special if the preferred class
> >> -   is a subset of the class we have; that's the most
> >> +/* Since we don't have a way of forming a register
> >> +   class for the intersection, we just do
> >> +   something special if the preferred class is a
> >> +   subset of the class we have; that's the most

Re: [PATCH] RISC-V: Add target machine headers as a dependency for riscv-sr.o

2022-02-14 Thread Maciej W. Rozycki
On Thu, 10 Feb 2022, Kito Cheng wrote:

> OK for release branches, thanks!

 Backported to GCC 11 & 10 then; not needed for GCC 9.

  Maciej


Re: [PATCH] ipa: Create LOAD references when necessary during inlining (PR 103171)

2022-02-14 Thread Martin Jambor
Hello and ping, please. Thanks!

Martin

On Fri, Jan 28 2022, Martin Jambor wrote:
> Hi,
>
> in r12-2523-g13586172d0b70c ipa-prop tracking of jump functions during
> inlining got the ability to remove ADDR references when inlining
> discovered that they were not necessary or turn them into LOAD
> references when we know that what was a function call argument passed
> by reference will end up as a load (one or more).
>
> Unfortunately, the code only creates the LOAD references when
> replacing removed ADDR references and PR 103171 showed that with some
> ordering of inlining, we need to add the LOAD reference before we know
> we can remove the ADDR one - or the reference will be lost, leading to
> link errors or even ICEs.
>
> Specifically in testcase gcc.dg/lto/pr103171_1.c added in this patch,
> if foo() is inlined to entry(), we need to create the LOAD reference
> so that when later bar() is inlined into foo() and we discover that
> the paameter is unused, we can remove the ADDR reference and still
> keep the varaible around for the load.
>
> Bootstrapped, LTO bootstrapped and tested on x86_64-linux.  OK for
> trunk?
>
> Thanks,
>
> Martin
>
>
>
> gcc/ChangeLog:
>
> 2022-01-28  Martin Jambor  
>
>   PR ipa/103171
>   * ipa-prop.cc (propagate_controlled_uses): Add a LOAD reference
>   always when an ADDR_EXPR constant is known to reach a load because
>   of inlining, not just when removing an ADDR reference.
>
> gcc/testsuite/ChangeLog:
>
> 2022-01-28  Martin Jambor  
>
>   PR ipa/103171
>   * gcc.dg/ipa/remref-6.c: Adjust dump scan string.
>   * gcc.dg/ipa/remref-7.c: New test.
>   * gcc.dg/lto/pr103171_0.c: New test.
>   * gcc.dg/lto/pr103171_1.c: Likewise.
> ---
>  gcc/ipa-prop.cc   | 30 ---
>  gcc/testsuite/gcc.dg/ipa/remref-6.c   |  2 +-
>  gcc/testsuite/gcc.dg/ipa/remref-7.c   | 33 +
>  gcc/testsuite/gcc.dg/lto/pr103171_0.c | 11 +
>  gcc/testsuite/gcc.dg/lto/pr103171_1.c | 35 +++
>  5 files changed, 96 insertions(+), 15 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/ipa/remref-7.c
>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr103171_0.c
>  create mode 100644 gcc/testsuite/gcc.dg/lto/pr103171_1.c
>
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index e55fe2776f2..72aa3e2f60d 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -4181,6 +4181,20 @@ propagate_controlled_uses (struct cgraph_edge *cs)
> int d = ipa_get_controlled_uses (old_root_info, i);
> int c = rdesc->refcount;
> rdesc->refcount = combine_controlled_uses_counters (c, d);
> +   if (rdesc->refcount != IPA_UNDESCRIBED_USE
> +   && ipa_get_param_load_dereferenced (old_root_info, i))
> + {
> +   tree cst = ipa_get_jf_constant (jf);
> +   gcc_checking_assert (TREE_CODE (cst) == ADDR_EXPR
> +&& (TREE_CODE (TREE_OPERAND (cst, 0))
> +== VAR_DECL));
> +   symtab_node *n = symtab_node::get (TREE_OPERAND (cst, 0));
> +   new_root->create_reference (n, IPA_REF_LOAD, NULL);
> +   if (dump_file)
> + fprintf (dump_file, "ipa-prop: Address IPA constant will reach "
> +  "a load so adding LOAD reference from %s to %s.\n",
> +  new_root->dump_name (), n->dump_name ());
> + }
> if (rdesc->refcount == 0)
>   {
> tree cst = ipa_get_jf_constant (jf);
> @@ -4193,20 +4207,8 @@ propagate_controlled_uses (struct cgraph_edge *cs)
> symtab_node *n = symtab_node::get (TREE_OPERAND (cst, 0));
> if (n)
>   {
> -   struct cgraph_node *clone;
> -   bool removed = remove_described_reference (n, rdesc);
> -   /* The reference might have been removed by IPA-CP.  */
> -   if (removed
> -   && ipa_get_param_load_dereferenced (old_root_info, i))
> - {
> -   new_root->create_reference (n, IPA_REF_LOAD, NULL);
> -   if (dump_file)
> - fprintf (dump_file, "ipa-prop: ...replaced it with "
> -  "LOAD one from %s to %s.\n",
> -  new_root->dump_name (), n->dump_name ());
> - }
> -
> -   clone = cs->caller;
> +   remove_described_reference (n, rdesc);
> +   cgraph_node *clone = cs->caller;
> while (clone->inlined_to
>&& clone->ipcp_clone
>&& clone != rdesc->cs->caller)
> diff --git a/gcc/testsuite/gcc.dg/ipa/remref-6.c 
> b/gcc/testsuite/gcc.dg/ipa/remref-6.c
> index 7deae3114a4..f31f4c14319 100644
> --- a/gcc/testsuite/gcc.dg/ipa/remref-6.c
> +++ b/gcc/testsuite/gcc.dg/ipa/remref-6.c
> @@ -20,5 +20,5 @@ void entry()
>  }
>  
>  /* { dg-final { scan-ipa-dump 

[PATCH] ipa-cp: Do not create clones for values outside known value range (PR 102513)

2022-02-14 Thread Martin Jambor
Hi,

PR 102513 shows we emit bogus array access warnings when IPA-CP
creates clones specialized for values which it deduces from arithmetic
jump functions describing self-recursive calls.  Those can however be
avoided if we consult the IPA-VR information that the same pass also
has.

The patch below does that at the stage when normally values are only
examined for profitability.  It would be better not to create lattices
describing such bogus values in the first place, however that presents
an ordering problem, the pass currently propagates all information,
and so both constants and VR, in no particular order when processing
SCCs, and so this approach seemed much simpler.

I plan to rearrange the pass so that it clones in multiple passes over
the call graph (or rather the lattice dependence graph) and it feels
natural to only do propagation for these kinds of recursion in the
second or later passes, which would fix the issue more elegantly.

Bootstrapped and tested on x86_64, OK for trunk (and perhaps also for
GCC 11)?

Thanks,

Martin


gcc/ChangeLog:

2022-02-14  Martin Jambor  

PR ipa/102513
* ipa-cp.cc (decide_whether_version_node): Skip scalar values
which do not fit the known value_range.

gcc/testsuite/ChangeLog:

2022-02-14  Martin Jambor  

PR ipa/102513
* gcc.dg/ipa/pr102513.c: New test.
---
 gcc/ipa-cp.cc   | 28 ++--
 gcc/testsuite/gcc.dg/ipa/pr102513.c | 33 +
 2 files changed, 59 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr102513.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 453e9c93cc3..ec37632d487 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -6154,8 +6154,32 @@ decide_whether_version_node (struct cgraph_node *node)
{
  ipcp_value *val;
  for (val = lat->values; val; val = val->next)
-   ret |= decide_about_value (node, i, -1, val, ,
-  _gen_clones);
+   {
+ /* If some values generated for self-recursive calls with
+arithmetic jump functions fall outside of the known
+value_range for the parameter, we can skip them.  VR interface
+supports this only for integers now.  */
+ if (TREE_CODE (val->value) == INTEGER_CST
+ && !plats->m_value_range.bottom_p ()
+ && !plats->m_value_range.m_vr.contains_p (val->value))
+   {
+ /* This can happen also if a constant present in the source
+code falls outside of the range of parameter's type, so we
+cannot assert.  */
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, " - skipping%s value ",
+  val->self_recursion_generated_p ()
+  ? " self_recursion_generated" : "");
+ print_ipcp_constant_value (dump_file, val->value);
+ fprintf (dump_file, " because it is outside known "
+  "value range.\n");
+   }
+ continue;
+   }
+ ret |= decide_about_value (node, i, -1, val, ,
+_gen_clones);
+   }
}
 
   if (!plats->aggs_bottom)
diff --git a/gcc/testsuite/gcc.dg/ipa/pr102513.c 
b/gcc/testsuite/gcc.dg/ipa/pr102513.c
new file mode 100644
index 000..9ee5431b730
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr102513.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -Warray-bounds" } */
+
+extern int block2[7][256];
+
+static int encode_block(int block2[7][256], unsigned level)
+{
+int best_score = 0;
+
+for (unsigned x = 0; x < level; x++) {
+int v = block2[1][x];
+block2[level][x] = 0;
+best_score += v * v;
+}
+
+if (level > 0 && best_score > 64) {
+int score = 0;
+
+score += encode_block(block2, level - 1);
+score += encode_block(block2, level - 1);
+
+if (score < best_score) {
+best_score = score;
+}
+}
+
+return best_score;
+}
+
+int foo(void)
+{
+return encode_block(block2, 5);
+}
-- 
2.34.1



[committed] libstdc++: Fix stream extraction of IEEE128 long double [PR100912]

2022-02-14 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux with both IEEE/IBM ABIs. Pushed to trunk.

-- >8 --

The std::__convert_from_v helper that formats double and long double
values into a char buffer was not being duplicated for the two long
double ABIs. This resulted in an ODR violation inside the library, where
some callers needed it to use snprintf to format __ibm128 values and
other callers needed it to use __snprintfieee128 to format __ieee128
values. The linker discarded one of the definitions, leaving one set of
callers using the wrong code.

This puts __convert_from_v in the __gnu_cxx_ieee128 inline namespace
when long double is __ieee128, so that there are two different
definitions of the function.

The std::money_put::__do_put overload for __ibm128 values needs a
different fix, because that is defined when long double is __ieee128 and
so would call the one in the inline namespace. That can be fixed by just
inlining the code directly into the function and using an asm alias to
call the right version of snprintf for the __ibm128 format. The code to
do that can be simpler than __convert_from_v because if we're defining
the ALT128_COMPAT symbols we know that we have a recent glibc and so we
can assume that uselocale and snprintf are supported.

libstdc++-v3/ChangeLog:

PR libstdc++/100912
* config/locale/gnu/c_locale.h (__convert_from_v): Use inline
namespace for IEEE128 long double mode.
* config/os/gnu-linux/ldbl-ieee128-extra.ver: Add new symbol
version and export __gnu_cxx_ieee128::__convert_from_v.
* include/bits/locale_facets_nonio.tcc (money_put::__do_put):
Make __ibm128 overload use snprintf directly
* testsuite/util/testsuite_abi.cc: Add new symbol version.
Remove stable IEEE128/LDBL versions.
---
 libstdc++-v3/config/locale/gnu/c_locale.h | 10 
 .../os/gnu-linux/ldbl-ieee128-extra.ver   |  7 +-
 .../include/bits/locale_facets_nonio.tcc  | 23 ---
 libstdc++-v3/testsuite/util/testsuite_abi.cc  |  8 +++
 4 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/libstdc++-v3/config/locale/gnu/c_locale.h 
b/libstdc++-v3/config/locale/gnu/c_locale.h
index 176e022b85c..526b62215e1 100644
--- a/libstdc++-v3/config/locale/gnu/c_locale.h
+++ b/libstdc++-v3/config/locale/gnu/c_locale.h
@@ -61,6 +61,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   typedef __locale_t   __c_locale;
 
+#if defined _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT \
+  && defined __LONG_DOUBLE_IEEE128__
+namespace __gnu_cxx_ieee128 {
+#endif
+
   // Convert numeric value of type double and long double to string and
   // return length of string.  If vsnprintf is available use it, otherwise
   // fall back to the unsafe vsprintf which, in general, can be dangerous
@@ -108,6 +113,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 return __ret;
   }
 
+#if defined _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT \
+  && defined __LONG_DOUBLE_IEEE128__
+} // namespace __gnu_cxx_ieee128
+#endif
+
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace
 
diff --git a/libstdc++-v3/config/os/gnu-linux/ldbl-ieee128-extra.ver 
b/libstdc++-v3/config/os/gnu-linux/ldbl-ieee128-extra.ver
index 9b421b6f1f9..830cb8c40f4 100644
--- a/libstdc++-v3/config/os/gnu-linux/ldbl-ieee128-extra.ver
+++ b/libstdc++-v3/config/os/gnu-linux/ldbl-ieee128-extra.ver
@@ -2,7 +2,8 @@
 
 GLIBCXX_IEEE128_3.4.29 {
 
-  *__gnu_cxx_ieee128*;
+  _Z*St17__gnu_cxx_ieee1287num_get*;
+  _Z*St17__gnu_cxx_ieee1287num_put*;
 
   _ZNSt14numeric_limitsIu9__ieee128E*;
   _ZNSirsERu9__ieee128;
@@ -45,6 +46,10 @@ GLIBCXX_IEEE128_3.4.29 {
 
 } GLIBCXX_3.4.29;
 
+GLIBCXX_IEEE128_3.4.30 {
+  _ZNSt17__gnu_cxx_ieee12816__convert_from_vERKP15__locale_structPciPKcz;
+} GLIBCXX_3.4.30;
+
 CXXABI_IEEE128_1.3.13 {
 
   _ZT[IS]u9__ieee128;
diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.tcc 
b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
index 64737823374..98442418f51 100644
--- a/libstdc++-v3/include/bits/locale_facets_nonio.tcc
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.tcc
@@ -635,6 +635,9 @@ _GLIBCXX_BEGIN_NAMESPACE_LDBL_OR_CXX11
 
 #if defined _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT \
   && defined __LONG_DOUBLE_IEEE128__
+extern "C"
+__typeof__(__builtin_snprintf) __glibcxx_snprintfibm128 __asm__("snprintf");
+
   template
 _OutIter
 money_put<_CharT, _OutIter>::
@@ -643,30 +646,24 @@ _GLIBCXX_BEGIN_NAMESPACE_LDBL_OR_CXX11
 {
   const locale __loc = __io.getloc();
   const ctype<_CharT>& __ctype = use_facet >(__loc);
-#if _GLIBCXX_USE_C99_STDIO
   // First try a buffer perhaps big enough.
   int __cs_size = 64;
   char* __cs = static_cast(__builtin_alloca(__cs_size));
+  const __c_locale __old = __gnu_cxx::__uselocale(_S_get_c_locale());
+
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 328. Bad sprintf format modifier in money_put<>::do_put()
-  int __len = std::__convert_from_v(_S_get_c_locale(), __cs, __cs_size,
-   

Re: [PATCH] PR fortran/104211 - ICE in find_array_section, at fortran/expr.cc:1720

2022-02-14 Thread Thomas Koenig via Gcc-patches

Hi Harald,


when referencing a bad array section after an erroneous previous
declaration we might hit an assert.  The assert can be replaced
by a more gracious error recovery.  Reported by Gerhard.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?


OK.

Thanks for the patch!

Best regards

Thomas


[OG11][committed] Fortran metadirective fixes

2022-02-14 Thread Kwok Cheung Yeung

Hello

I have backported and committed the recent two fixes for Fortran 
metadirective handling onto the devel/omp/gcc-11 development branch:


38ed9d83b893df0bbd098c7b44dbbeb56ed7dd1c  openmp: Eliminate non-matching 
metadirective variants early in Fortran front-end 
(https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590349.html)


8cb0121af50eacb63098a79ff8c6deae05883c6f  openmp: More Fortran front-end 
fixes for metadirectives 
(https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590350.html)


Kwok


Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-14 Thread Vladimir Makarov via Gcc-patches



On 2022-02-14 11:00, Richard Sandiford wrote:

Hi Vlad,

Vladimir Makarov via Gcc-patches  writes:


Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.

I think there is no need for this code and it is misleading.  If
'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]'
will help for any existing target.  As machine-dependent code for any
target most probably (for ppc64 darwin it is exactly the case) checks
address only in memory, it can wrongly accept wrong address by reloading
it into reg and use it in memory. So these are my arguments for the
remove this code from process_address_1.

I'm probably making too much of this, but:

I think the code is potentially useful in that existing targets do forbid
forbid lo_sum addresses in certain contexts (due to limited offset range)
while still wanting lo_sum to be used to be load the address.  If we
handle the high/lo_sum split in generic code then we have more chance
of being able to optimise things.  So it feels like this is setting an
unfortunate precedent.

I still don't understand what went wrong before though (the PR trail
was a bit too long to process :-)).  Is there a case where
(lo_sum (high X) X) != X?  If so, that seems like a target bug to me.
Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.

Sometimes it is hard to make a line where an RA bug is a bug in 
machine-dependent code or in RA itself.


For this case I would say it is a bug in the both parts.

Low-sum is generated by LRA and it does not know that it should be 
wrapped by unspec for darwin. Generally speaking we could avoid the 
change in LRA but it would require to do non-trivial analysis in machine 
dependent code to find cases when 'reg=low_sum ... mem[reg]' is 
incorrect code for darwin (PIC) target (and may be some other PIC 
targets too). Therefore I believe the change in LRA is a good solution 
even if the change can potentially result in less optimized code for 
some cases.  Taking your concern into account we could probably improve 
the patch by introducing a hook (I never liked such solutions as we 
already have too many hooks directing RA) or better to make the LRA 
change working only for PIC target. Something like this (it probably 
needs better recognition of pic target):


--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3616,21 +3616,21 @@ process_address_1 (int nop, bool check_only_p,
  if (HAVE_lo_sum)
    {
  /* addr => lo_sum (new_base, addr), case (2) above.  */
  insn = emit_insn (gen_rtx_SET
    (new_reg,
 gen_rtx_HIGH (Pmode, copy_rtx (addr;
  code = recog_memoized (insn);
  if (code >= 0)
    {
  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
- if (!valid_address_p (op, , cn))
+ if (!valid_address_p (op, , cn) && !flag_pic)
    {
  /* Try to put lo_sum into register.  */
  insn = emit_insn (gen_rtx_SET
    (new_reg,
 gen_rtx_LO_SUM (Pmode, 
new_reg, addr)));

  code = recog_memoized (insn);
  if (code >= 0)
    {
  *ad.inner = new_reg;
  if (!valid_address_p (op, , cn))



Re: [PATCH] c++: return-type-req in constraint using only outer tparms [PR104527]

2022-02-14 Thread Patrick Palka via Gcc-patches
On Mon, 14 Feb 2022, Patrick Palka wrote:

> Here the template context for the atomic constraint has two levels of
> template arguments, but since it depends only on the innermost argument
> T we use a single-level argument vector during substitution into the
> constraint (built by get_mapped_args).  We eventually pass this vector
> to do_auto_deduction as part of checking the return-type-requirement
> inside the atom, but do_auto_deduction expects outer_targs to be a full
> set of arguments for sake of satisfaction.
> 
> do_auto_deduction has a workaround in place to compensate for callers
> that pass only the innermost arguments as outer_targs, but here we're
> passing the _outermost_ arguments.  Since the former situation should
> now (after r12-7101) only occur with adc_unify callers and the latter

Whoops, this should be r12-6919 (after which we pass outer_targs
appropriately during adc_variable_type deduction of non-function-scope
variables), not r12-7101, sorry about that.

> only with adc_requirement callers, this patch conditions the existing
> workaround according to the auto_deduction_context: if the context is
> adc_requirement, we add dummy innermost levels, otherwise we add dummy
> outermost levels as before and also assert that the context is adc_unify.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu and tested on cmcstl2
> and range-v3, does this look OK for trunk?
> 
>   PR c++/104527
> 
> gcc/cp/ChangeLog:
> 
>   * pt.cc (do_auto_deduction): When template argument levels are
>   missing from outer_targs, fill in the innermost rather than the
>   outermost levels with dummy args if the context is
>   adc_requirement, otherwise also assert that the context is
>   adc_unify.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp2a/concepts-return-req4.C: New test.
> ---
>  gcc/cp/pt.cc  | 28 +--
>  .../g++.dg/cpp2a/concepts-return-req4.C   | 24 
>  2 files changed, 44 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-return-req4.C
> 
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 1b18e2a7787..4ff2710b8ba 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -30215,20 +30215,32 @@ do_auto_deduction (tree type, tree init, tree 
> auto_node,
>  
>tree full_targs = add_to_template_args (outer_targs, targs);
>  
> -  /* HACK: Compensate for callers not always communicating all levels of
> -  outer template arguments by filling in the outermost missing levels
> -  with dummy levels before checking satisfaction.  We'll still crash
> -  if the constraint depends on a template argument belonging to one of
> -  these missing levels, but this hack otherwise allows us to handle a
> -  large subset of possible constraints (including all non-dependent
> -  constraints).  */
>if (int missing_levels = (TEMPLATE_TYPE_ORIG_LEVEL (auto_node)
>   - TMPL_ARGS_DEPTH (full_targs)))
>   {
> tree dummy_levels = make_tree_vec (missing_levels);
> for (int i = 0; i < missing_levels; ++i)
>   TREE_VEC_ELT (dummy_levels, i) = make_tree_vec (0);
> -   full_targs = add_to_template_args (dummy_levels, full_targs);
> +   if (context == adc_requirement)
> + /* We're checking a requires-expr's return-type-requirement that's
> +part of an atomic constraint that doesn't depend on any innermost
> +template arguments, so OUTER_TARGS (built by get_mapped_args) is
> +missing at least one innermost level.  Fill in the innermost
> +levels of OUTER_TARGS with dummy levels.  */
> + full_targs = add_to_template_args
> +   (add_to_template_args (outer_targs, dummy_levels), targs);
> +   else
> + {
> +   /* Otherwise, fill in the _outermost_ levels with dummy levels.
> +  This compensates for adc_unify callers that only pass the
> +  innermost level of template arguments as OUTER_TARGS.  We'll
> +  still crash if the constraint depends on a template argument
> +  belonging to one of these missing levels, but this hack
> +  otherwise allows us to handle a large subset of possible
> +  constraints (including all non-dependent constraints).  */
> +   gcc_checking_assert (context == adc_unify);
> +   full_targs = add_to_template_args (dummy_levels, full_targs);
> + }
>   }
>  
>if (!constraints_satisfied_p (auto_node, full_targs))
> diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-return-req4.C 
> b/gcc/testsuite/g++.dg/cpp2a/concepts-return-req4.C
> new file mode 100644
> index 000..471946bc8eb
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp2a/concepts-return-req4.C
> @@ -0,0 +1,24 @@
> +// PR c++/104527
> +// { dg-do compile { target c++20 } }
> +
> +template
> +concept is_same = __is_same(T, 

Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-14 Thread Richard Sandiford via Gcc-patches
Iain Sandoe  writes:
> Hi Richard,
>
> (hopefully, my take won’t cloud the issue ….)
>
>> On 14 Feb 2022, at 16:00, Richard Sandiford via Gcc-patches 
>>  wrote:
>> 
>> Hi Vlad,
>> 
>> Vladimir Makarov via Gcc-patches  writes:
>>> On 2022-02-14 04:44, Richard Sandiford via Gcc-patches wrote:
 Iain Sandoe via Gcc-patches  writes:
> Two issues resulted in this PR, which manifests when we force a constant 
> into
> memory in LRA (in PIC code on Darwin).  The presence of such forced 
> constants
> is quite dependent on other RTL optimisations, and it is easy for the 
> issue to
> become latent for a specific case.
> 
> First, in the Darwin-specific rs6000 backend code, we were not being 
> careful
> enough in rejecting invalid symbolic addresses.  Specifically, when 
> generating
> PIC code, we require a SYMBOL_REF to be wrapped in an 
> UNSPEC_MACHOPIC_OFFSET.
> 
> Second, LRA was attempting to load a register using an invalid lo_sum 
> address.
> 
> The LRA changes are approved in the PR by Vladimir, and the RS6000 
> changes are
> Darwin-specific (although, of course, any observations are welcome).
> 
> Tested on several lo_sum targets and x86_64 all languages except as noted:
> powerpc64-linux (m32/m64) -D
> powerpc64le-linux  -D
> powerpc64-aix -Ada -Go -D
> aarch64-linux -Ada -D
> x86_64-linux all langs -D
> powerpc-darwin9 (master and 11.2) -D -Go.
> 
> pushed to master, thanks,
> Iain
> 
> Signed-off-by: Iain Sandoe 
> Co-authored-by: Vladimir Makarov 
> 
>   PR target/104117
> 
> gcc/ChangeLog:
> 
>   * config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
>   Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
>   emitting PIC code.
>   (legitimate_lo_sum_address_p): Likewise.
>   * lra-constraints.cc (process_address_1): Do not attempt to emit a reg
>   load from an invalid lo_sum address.
> ---
>  gcc/config/rs6000/rs6000.cc | 38 +++--
>  gcc/lra-constraints.cc  | 17 ++---
>  2 files changed, 38 insertions(+), 17 deletions(-)
> 
> […]
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> index fdff9e0720a..c700c3f4578 100644
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
> *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
> if (!valid_address_p (op, , cn))
>   {
> -   /* Try to put lo_sum into register.  */
> -   insn = emit_insn (gen_rtx_SET
> - (new_reg,
> -  gen_rtx_LO_SUM (Pmode, new_reg, 
> addr)));
> -   code = recog_memoized (insn);
> -   if (code >= 0)
> - {
> -   *ad.inner = new_reg;
> -   if (!valid_address_p (op, , cn))
> - {
> -   *ad.inner = addr;
> -   code = -1;
> - }
> - }
> -
> +   *ad.inner = addr; /* Punt.  */
> +   code = -1;
>   }
>   }
> if (code < 0)
 Could you go into more details about this?  Why is it OK to continue
 to try:
 
   (lo_sum new_reg addr)
 
 directly as an address (the context at the top of the hunk), but not try
 moving the lo_sum into a register?  They should be semantically equivalent,
 so it seems that if one is wrong, the other would be too.
 
>>> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>>> 
>>> I think there is no need for this code and it is misleading.  If 
>>> 'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]' 
>>> will help for any existing target.  As machine-dependent code for any 
>>> target most probably (for ppc64 darwin it is exactly the case) checks 
>>> address only in memory, it can wrongly accept wrong address by reloading 
>>> it into reg and use it in memory. So these are my arguments for the 
>>> remove this code from process_address_1.
>> 
>> I'm probably making too much of this, but:
>> 
>> I think the code is potentially useful in that existing targets do forbid
>> forbid lo_sum addresses in certain contexts (due to limited offset range)
>> while still wanting lo_sum to be used to be load the address.  If we
>> handle the high/lo_sum split in generic code then we have more chance
>> of being able to optimise things.  So it feels like this is setting an
>> unfortunate precedent.
>> 
>> I still don't understand what went wrong before though (the PR trail
>> was a bit too long to process :-)).  Is there a case where
>> 

[PATCH] c++: return-type-req in constraint using only outer tparms [PR104527]

2022-02-14 Thread Patrick Palka via Gcc-patches
Here the template context for the atomic constraint has two levels of
template arguments, but since it depends only on the innermost argument
T we use a single-level argument vector during substitution into the
constraint (built by get_mapped_args).  We eventually pass this vector
to do_auto_deduction as part of checking the return-type-requirement
inside the atom, but do_auto_deduction expects outer_targs to be a full
set of arguments for sake of satisfaction.

do_auto_deduction has a workaround in place to compensate for callers
that pass only the innermost arguments as outer_targs, but here we're
passing the _outermost_ arguments.  Since the former situation should
now (after r12-7101) only occur with adc_unify callers and the latter
only with adc_requirement callers, this patch conditions the existing
workaround according to the auto_deduction_context: if the context is
adc_requirement, we add dummy innermost levels, otherwise we add dummy
outermost levels as before and also assert that the context is adc_unify.

Bootstrapped and regtested on x86_64-pc-linux-gnu and tested on cmcstl2
and range-v3, does this look OK for trunk?

PR c++/104527

gcc/cp/ChangeLog:

* pt.cc (do_auto_deduction): When template argument levels are
missing from outer_targs, fill in the innermost rather than the
outermost levels with dummy args if the context is
adc_requirement, otherwise also assert that the context is
adc_unify.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-return-req4.C: New test.
---
 gcc/cp/pt.cc  | 28 +--
 .../g++.dg/cpp2a/concepts-return-req4.C   | 24 
 2 files changed, 44 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-return-req4.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 1b18e2a7787..4ff2710b8ba 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -30215,20 +30215,32 @@ do_auto_deduction (tree type, tree init, tree 
auto_node,
 
   tree full_targs = add_to_template_args (outer_targs, targs);
 
-  /* HACK: Compensate for callers not always communicating all levels of
-outer template arguments by filling in the outermost missing levels
-with dummy levels before checking satisfaction.  We'll still crash
-if the constraint depends on a template argument belonging to one of
-these missing levels, but this hack otherwise allows us to handle a
-large subset of possible constraints (including all non-dependent
-constraints).  */
   if (int missing_levels = (TEMPLATE_TYPE_ORIG_LEVEL (auto_node)
- TMPL_ARGS_DEPTH (full_targs)))
{
  tree dummy_levels = make_tree_vec (missing_levels);
  for (int i = 0; i < missing_levels; ++i)
TREE_VEC_ELT (dummy_levels, i) = make_tree_vec (0);
- full_targs = add_to_template_args (dummy_levels, full_targs);
+ if (context == adc_requirement)
+   /* We're checking a requires-expr's return-type-requirement that's
+  part of an atomic constraint that doesn't depend on any innermost
+  template arguments, so OUTER_TARGS (built by get_mapped_args) is
+  missing at least one innermost level.  Fill in the innermost
+  levels of OUTER_TARGS with dummy levels.  */
+   full_targs = add_to_template_args
+ (add_to_template_args (outer_targs, dummy_levels), targs);
+ else
+   {
+ /* Otherwise, fill in the _outermost_ levels with dummy levels.
+This compensates for adc_unify callers that only pass the
+innermost level of template arguments as OUTER_TARGS.  We'll
+still crash if the constraint depends on a template argument
+belonging to one of these missing levels, but this hack
+otherwise allows us to handle a large subset of possible
+constraints (including all non-dependent constraints).  */
+ gcc_checking_assert (context == adc_unify);
+ full_targs = add_to_template_args (dummy_levels, full_targs);
+   }
}
 
   if (!constraints_satisfied_p (auto_node, full_targs))
diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-return-req4.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-return-req4.C
new file mode 100644
index 000..471946bc8eb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-return-req4.C
@@ -0,0 +1,24 @@
+// PR c++/104527
+// { dg-do compile { target c++20 } }
+
+template
+concept is_same = __is_same(T, U);
+
+template
+struct A {
+  template
+requires requires { { 0 } -> is_same; }
+  struct B {};
+
+  template
+requires requires { { 1 } -> is_same; }
+  static void f();
+};
+
+A::B<> a1;
+A::B<> a2; // { dg-error "constraint" }
+
+int main() {
+  A::f();
+  A::f(); // { dg-error "no match" }
+}
-- 
2.35.1.102.g2b9c120970



Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-14 Thread Iain Sandoe
Hi Richard,

(hopefully, my take won’t cloud the issue ….)

> On 14 Feb 2022, at 16:00, Richard Sandiford via Gcc-patches 
>  wrote:
> 
> Hi Vlad,
> 
> Vladimir Makarov via Gcc-patches  writes:
>> On 2022-02-14 04:44, Richard Sandiford via Gcc-patches wrote:
>>> Iain Sandoe via Gcc-patches  writes:
 Two issues resulted in this PR, which manifests when we force a constant 
 into
 memory in LRA (in PIC code on Darwin).  The presence of such forced 
 constants
 is quite dependent on other RTL optimisations, and it is easy for the 
 issue to
 become latent for a specific case.
 
 First, in the Darwin-specific rs6000 backend code, we were not being 
 careful
 enough in rejecting invalid symbolic addresses.  Specifically, when 
 generating
 PIC code, we require a SYMBOL_REF to be wrapped in an 
 UNSPEC_MACHOPIC_OFFSET.
 
 Second, LRA was attempting to load a register using an invalid lo_sum 
 address.
 
 The LRA changes are approved in the PR by Vladimir, and the RS6000 changes 
 are
 Darwin-specific (although, of course, any observations are welcome).
 
 Tested on several lo_sum targets and x86_64 all languages except as noted:
 powerpc64-linux (m32/m64) -D
 powerpc64le-linux  -D
 powerpc64-aix -Ada -Go -D
 aarch64-linux -Ada -D
 x86_64-linux all langs -D
 powerpc-darwin9 (master and 11.2) -D -Go.
 
 pushed to master, thanks,
 Iain
 
 Signed-off-by: Iain Sandoe 
 Co-authored-by: Vladimir Makarov 
 
PR target/104117
 
 gcc/ChangeLog:
 
* config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
emitting PIC code.
(legitimate_lo_sum_address_p): Likewise.
* lra-constraints.cc (process_address_1): Do not attempt to emit a reg
load from an invalid lo_sum address.
 ---
  gcc/config/rs6000/rs6000.cc | 38 +++--
  gcc/lra-constraints.cc  | 17 ++---
  2 files changed, 38 insertions(+), 17 deletions(-)
 
 […]
 diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
 index fdff9e0720a..c700c3f4578 100644
 --- a/gcc/lra-constraints.cc
 +++ b/gcc/lra-constraints.cc
 @@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
  if (!valid_address_p (op, , cn))
{
 -/* Try to put lo_sum into register.  */
 -insn = emit_insn (gen_rtx_SET
 -  (new_reg,
 -   gen_rtx_LO_SUM (Pmode, new_reg, 
 addr)));
 -code = recog_memoized (insn);
 -if (code >= 0)
 -  {
 -*ad.inner = new_reg;
 -if (!valid_address_p (op, , cn))
 -  {
 -*ad.inner = addr;
 -code = -1;
 -  }
 -  }
 -
 +*ad.inner = addr; /* Punt.  */
 +code = -1;
}
}
  if (code < 0)
>>> Could you go into more details about this?  Why is it OK to continue
>>> to try:
>>> 
>>>   (lo_sum new_reg addr)
>>> 
>>> directly as an address (the context at the top of the hunk), but not try
>>> moving the lo_sum into a register?  They should be semantically equivalent,
>>> so it seems that if one is wrong, the other would be too.
>>> 
>> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>> 
>> I think there is no need for this code and it is misleading.  If 
>> 'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]' 
>> will help for any existing target.  As machine-dependent code for any 
>> target most probably (for ppc64 darwin it is exactly the case) checks 
>> address only in memory, it can wrongly accept wrong address by reloading 
>> it into reg and use it in memory. So these are my arguments for the 
>> remove this code from process_address_1.
> 
> I'm probably making too much of this, but:
> 
> I think the code is potentially useful in that existing targets do forbid
> forbid lo_sum addresses in certain contexts (due to limited offset range)
> while still wanting lo_sum to be used to be load the address.  If we
> handle the high/lo_sum split in generic code then we have more chance
> of being able to optimise things.  So it feels like this is setting an
> unfortunate precedent.
> 
> I still don't understand what went wrong before though (the PR trail
> was a bit too long to process :-)).  Is there a case where
> (lo_sum (high X) X) != X?  If so, that seems like a target bug to me.

If X is an invalid address (in this case for 

Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-14 Thread Richard Sandiford via Gcc-patches
Hi Vlad,

Vladimir Makarov via Gcc-patches  writes:
> On 2022-02-14 04:44, Richard Sandiford via Gcc-patches wrote:
>> Iain Sandoe via Gcc-patches  writes:
>>> Two issues resulted in this PR, which manifests when we force a constant 
>>> into
>>> memory in LRA (in PIC code on Darwin).  The presence of such forced 
>>> constants
>>> is quite dependent on other RTL optimisations, and it is easy for the issue 
>>> to
>>> become latent for a specific case.
>>>
>>> First, in the Darwin-specific rs6000 backend code, we were not being careful
>>> enough in rejecting invalid symbolic addresses.  Specifically, when 
>>> generating
>>> PIC code, we require a SYMBOL_REF to be wrapped in an 
>>> UNSPEC_MACHOPIC_OFFSET.
>>>
>>> Second, LRA was attempting to load a register using an invalid lo_sum 
>>> address.
>>>
>>> The LRA changes are approved in the PR by Vladimir, and the RS6000 changes 
>>> are
>>> Darwin-specific (although, of course, any observations are welcome).
>>>
>>> Tested on several lo_sum targets and x86_64 all languages except as noted:
>>> powerpc64-linux (m32/m64) -D
>>> powerpc64le-linux  -D
>>> powerpc64-aix -Ada -Go -D
>>> aarch64-linux -Ada -D
>>> x86_64-linux all langs -D
>>> powerpc-darwin9 (master and 11.2) -D -Go.
>>>
>>> pushed to master, thanks,
>>> Iain
>>>
>>> Signed-off-by: Iain Sandoe 
>>> Co-authored-by: Vladimir Makarov 
>>>
>>> PR target/104117
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
>>> Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
>>> emitting PIC code.
>>> (legitimate_lo_sum_address_p): Likewise.
>>> * lra-constraints.cc (process_address_1): Do not attempt to emit a reg
>>> load from an invalid lo_sum address.
>>> ---
>>>   gcc/config/rs6000/rs6000.cc | 38 +++--
>>>   gcc/lra-constraints.cc  | 17 ++---
>>>   2 files changed, 38 insertions(+), 17 deletions(-)
>>>
>>> […]
>>> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
>>> index fdff9e0720a..c700c3f4578 100644
>>> --- a/gcc/lra-constraints.cc
>>> +++ b/gcc/lra-constraints.cc
>>> @@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
>>>   *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
>>>   if (!valid_address_p (op, , cn))
>>> {
>>> - /* Try to put lo_sum into register.  */
>>> - insn = emit_insn (gen_rtx_SET
>>> -   (new_reg,
>>> -gen_rtx_LO_SUM (Pmode, new_reg, 
>>> addr)));
>>> - code = recog_memoized (insn);
>>> - if (code >= 0)
>>> -   {
>>> - *ad.inner = new_reg;
>>> - if (!valid_address_p (op, , cn))
>>> -   {
>>> - *ad.inner = addr;
>>> - code = -1;
>>> -   }
>>> -   }
>>> -
>>> + *ad.inner = addr; /* Punt.  */
>>> + code = -1;
>>> }
>>> }
>>>   if (code < 0)
>> Could you go into more details about this?  Why is it OK to continue
>> to try:
>>
>>(lo_sum new_reg addr)
>>
>> directly as an address (the context at the top of the hunk), but not try
>> moving the lo_sum into a register?  They should be semantically equivalent,
>> so it seems that if one is wrong, the other would be too.
>>
> Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.
>
> I think there is no need for this code and it is misleading.  If 
> 'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]' 
> will help for any existing target.  As machine-dependent code for any 
> target most probably (for ppc64 darwin it is exactly the case) checks 
> address only in memory, it can wrongly accept wrong address by reloading 
> it into reg and use it in memory. So these are my arguments for the 
> remove this code from process_address_1.

I'm probably making too much of this, but:

I think the code is potentially useful in that existing targets do forbid
forbid lo_sum addresses in certain contexts (due to limited offset range)
while still wanting lo_sum to be used to be load the address.  If we
handle the high/lo_sum split in generic code then we have more chance
of being able to optimise things.  So it feels like this is setting an
unfortunate precedent.

I still don't understand what went wrong before though (the PR trail
was a bit too long to process :-)).  Is there a case where
(lo_sum (high X) X) != X?  If so, that seems like a target bug to me.
Or does the target accept (set R1 (lo_sum R2 X)) for an X that cannot
be split into a HIGH/LO_SUM pair?  I'd argue that's a target bug too.

Thanks,
Richard


Re: [PATCH] c: Add diagnostic when operator= is used as truth cond [PR25689]

2022-02-14 Thread Jason Merrill via Gcc-patches

On 2/12/22 01:59, Zhao Wei Liew wrote:

On Fri, 11 Feb 2022 at 20:47, Jason Merrill  wrote:


On the other hand, for empty classes, it seems that a COMPOUND_EXPR
is built in build_over_call under the is_really_empty_class guard (line 9791).
I don't understand the tree structure that I should identify though.
Could you give me any further explanations on that?


True, that's not very recognizable.  I suppose we could use a
TREE_LANG_FLAG to mark that COMPOUND_EXPR, or we could leave empty
classes alone; neither assignment nor comparison of empty classes should
happen much in practice.


I agree. I'll leave them alone.



Got it. May I know why it's better to use *_nofold here?


To avoid trying to do constexpr evaluation of the function operand when
it's non-constant; in the interesting cases it won't be needed.

However, have you tested virtual operator=?



Yup, everything seems to work as expected for structs using virtual operator=.
I've updated the test suite to reflect the tests.

One thing to note: I've commented out 2 test statements that shouldn't
work. One of them
is caused by trivial assignments in empty classes being COMPOUND_EXPRs as we
discussed above. The other is an existing issue caused by trivial
assignments in non-empty
classes being MODIFY_EXPRs.


On an unrelated note, I adjusted the if condition to use INDIRECT_REF_P (cond)
instead of TREE_OPERAND_LENGTH (cond) >= 1.
I hope that's better for semantics.


Yes; REFERENCE_REF_P might be even better.



Alright, I've changed it to REFERENCE_REF_P and it seems to work fine as well.



-Everything below is the actual patch v3-



When compiling the following code with g++ -Wparentheses, GCC does not
warn on the if statement. For example, there is no warning for this code:

struct A {
A& operator=(int);
operator bool();
};

void f(A a) {
if (a = 0); // no warning
}

This is because a = 0 is a call to operator=, which GCC does not handle.

This patch fixes this issue by handling calls to operator= when deciding
to warn.

Bootstrapped and tested on x86_64-pc-linux-gnu.

v2: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590236.html
Changes since v2:
1. Add more test cases in Wparentheses-31.C.
2. Refactor added logic to a function (is_assignment_overload_ref_p).
3. Use REFERENCE_REF_P instead of INDIRECT_REF_P.

v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590158.html
Changes since v1:
1. Use CALL_EXPR_OPERATOR_SYNTAX to avoid warnings for explicit
operator=() calls.
2. Use INDIRECT_REF_P to filter implicit operator=() calls.
3. Use cp_get_callee_fndecl_nofold.
4. Add spaces before (.

PR c/25689

gcc/cp/ChangeLog:

* semantics.cc (maybe_convert_cond): Handle the implicit
  operator=() case for -Wparentheses.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wparentheses-31.C: New test.
---
  gcc/cp/semantics.cc | 21 +++-
  gcc/testsuite/g++.dg/warn/Wparentheses-31.C | 55 +
  2 files changed, 75 insertions(+), 1 deletion(-)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wparentheses-31.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 0cb17a6a8ab..30ffb23a032 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -815,6 +815,25 @@ finish_goto_stmt (tree destination)
return add_stmt (build_stmt (input_location, GOTO_EXPR, destination));
  }

+/* Returns true if EXPR is a reference to an implicit
+   call to operator=(). */
+static bool
+is_assignment_overload_ref_p (tree expr)
+{
+  if (expr == NULL_TREE || !REFERENCE_REF_P (expr))
+return false;


This will only warn about op= that returns a reference, which is not 
required.



+  tree fn = TREE_OPERAND (expr, 0);
+  if (TREE_CODE (fn) != CALL_EXPR || !CALL_EXPR_OPERATOR_SYNTAX (fn))
+return false;
+
+  tree fndecl = cp_get_callee_fndecl_nofold (fn);
+  return fndecl != NULL_TREE
+&& DECL_OVERLOADED_OPERATOR_P (fndecl)
+&& DECL_OVERLOADED_OPERATOR_IS (fndecl, NOP_EXPR)
+&& DECL_ASSIGNMENT_OPERATOR_P (fndecl);


This could be

  && DECL_ASSIGNMENT_OPERATOR_P (fndecl)
  && DECL_OVERLOADED_OPERATOR_IS (fndecl, NOP_EXPR);

without the separate DECL_OVERLOADED_OPERATOR_P test.


+}
+
  /* COND is the condition-expression for an if, while, etc.,
 statement.  Convert it to a boolean value, if appropriate.
 In addition, verify sequence points if -Wsequence-point is enabled.  */
@@ -836,7 +855,7 @@ maybe_convert_cond (tree cond)
/* Do the conversion.  */
cond = convert_from_reference (cond);

-  if (TREE_CODE (cond) == MODIFY_EXPR
+  if ((TREE_CODE (cond) == MODIFY_EXPR || is_assignment_overload_ref_p (cond))
&& warn_parentheses
&& !warning_suppressed_p (cond, OPT_Wparentheses)
&& warning_at (cp_expr_loc_or_input_loc (cond),
diff --git a/gcc/testsuite/g++.dg/warn/Wparentheses-31.C
b/gcc/testsuite/g++.dg/warn/Wparentheses-31.C
new file mode 100644
index 000..7a5789fb7a1
--- 

[OG11][committed] amdgcn: Allow vector reductions on constants

2022-02-14 Thread Andrew Stubbs

On 14/02/2022 14:13, Andrew Stubbs wrote:
I've committed this fix for an ICE compiling sollve_vv testcase 
test_target_teams_distribute_defaultmap.c.


Somehow the optimizers result in a vector reduction on a vector of 
duplicated constants. This was a case the backend didn't handle, so we 
ended up with an unrecognised instruction ICE.


It might be better if constant reductions were evaluated at compile 
time, but this patch prevents the ICE right now.


Andrew

P.S. I'll backport this to OG11 shortly.


Backport to devel/omp/gcc-11 branch done.

Andrew


Re: [PATCH, OpenACC] Add support for gang local storage allocation in shared memory

2022-02-14 Thread Thomas Schwinge
Hi Julian!

Two more questions here, in context of 
"[12 Regression] ICE in expand_gimple_stmt_1, at cfgexpand.c:3932 since
r12-980-g29a2f51806c":

On 2019-06-03T17:02:45+0100, Julian Brown  wrote:
> This is a new version of the patch, rebased

The code as we've now got it in master branch has changed some more, but
I think the behavior I'm seeing may have been introduced here:

> and with a couple of
> additional bugfixes, as follows:
>
> Firstly, in mark_oacc_gangprivate, each decl is looked up (using
> maybe_lookup_decl) to apply the "oacc gangprivate" attribute to the
> innermost-nested copy of the decl.

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -137,6 +137,12 @@ struct omp_context

> +  /* Addressable variable decls in this context.  */
> +  vec *oacc_addressable_var_decls;
>  };

> +/* Record vars listed in private clauses in CLAUSES in CTX.  This information
> +   is used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_private_var_clauses (omp_context *ctx, tree clauses)
> +{
> +  tree c;
> +
> +  if (!ctx)
> +return;
> +
> +  for (c = clauses; c; c = OMP_CLAUSE_CHAIN (c))
> +if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_PRIVATE)
> +  {
> + tree decl = OMP_CLAUSE_DECL (c);
> + if (VAR_P (decl) && TREE_ADDRESSABLE (decl))
> +   ctx->oacc_addressable_var_decls->safe_push (decl);
> +  }
> +}

So, here we analyze 'OMP_CLAUSE_DECL (c)' (as is, without translation
through 'lookup_decl (decl, ctx)')...

> +/* Record addressable vars declared in BINDVARS in CTX.  This information is
> +   used to mark up variables that should be made private per-gang.  */
> +
> +static void
> +oacc_record_vars_in_bind (omp_context *ctx, tree bindvars)
> +{
> +  if (!ctx)
> +return;
> +
> +  for (tree v = bindvars; v; v = DECL_CHAIN (v))
> +if (VAR_P (v) && TREE_ADDRESSABLE (v))
> +  ctx->oacc_addressable_var_decls->safe_push (v);
> +}

..., and similarly here analyze 'v' (without 'lookup_decl (v, ctx)')...

> +/* Mark addressable variables which are declared implicitly or explicitly as
> +   gang private with a special attribute.  These may need to have their
> +   declarations altered later on in compilation (e.g. in
> +   execute_oacc_device_lower or the backend, depending on how the OpenACC
> +   execution model is implemented on a given target) to ensure that sharing
> +   semantics are correct.  */
> +
> +static void
> +mark_oacc_gangprivate (vec *decls, omp_context *ctx)
> +{
> +  int i;
> +  tree decl;
> +
> +  FOR_EACH_VEC_ELT (*decls, i, decl)
> +{
> +  for (omp_context *thisctx = ctx; thisctx; thisctx = thisctx->outer)
> + {
> +   tree inner_decl = maybe_lookup_decl (decl, thisctx);
> +   if (inner_decl)
> + {
> +   decl = inner_decl;
> +   break;
> + }
> + }
> +  if (!lookup_attribute ("oacc gangprivate", DECL_ATTRIBUTES (decl)))
> + {
> +   if (dump_file && (dump_flags & TDF_DETAILS))
> + {
> +   fprintf (dump_file,
> +"Setting 'oacc gangprivate' attribute for decl:");
> +   print_generic_decl (dump_file, decl, TDF_SLIM);
> +   fputc ('\n', dump_file);
> + }
> +   DECL_ATTRIBUTES (decl)
> + = tree_cons (get_identifier ("oacc gangprivate"),
> +  NULL, DECL_ATTRIBUTES (decl));
> + }
> +}
> +}

..., but here we action on the 'maybe_lookup_decl'-translated
'inner_decl', if applicable.  In certain cases that one may be different
from the original 'decl'.  (In particular (only?), when the OMP lowering
has made 'decl' "late 'TREE_ADDRESSABLE'".)  This assymetry I understand
to give rise to  "[12 Regression] ICE in
expand_gimple_stmt_1, at cfgexpand.c:3932 since r12-980-g29a2f51806c".

It makes sense to me that we do the OpenACC privatization on the
'lookup_decl' -- but shouldn't we then do that in the analysis phase,
too?  (This appears to work fine for OpenACC 'private' clauses (..., and
avoids marking a few as addressable/gang-private), and for those in
'gimple_bind_vars' it doesn't seem to make a difference (for the current
test cases and/or compiler transformations).)

And, second question: what case did you run into or foresee, that you
here need the 'thisctx' loop and 'maybe_lookup_decl', instead of a plain
'lookup_decl (decl, ctx)'?  Per my testing that's sufficient.

Unless you think this needs more consideration, I suggest to do these two
changes.  (I have a WIP patch in testing.)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [Backport gcc-11, Patch, Fortran] PR100337 Should be able to pass non-present optional arguments to CO_BROADCAST

2022-02-14 Thread Andre Vehreschild via Gcc-patches
Hi everyone,

sorry for missing out on the gcc-11 backport, but better late than never. 

Committed backport as ae57aae60d1.

Regards,
Andre

On Wed, 23 Jun 2021 11:21:45 +0200
Tobias Burnus  wrote:

> On 23.06.21 10:23, Andre Vehreschild wrote:
> 
> > Will wait two weeks for any errors introduced by this patch before
> > backporting to gcc-11, ok?  
> 
> Fine with me.
> 
> Thanks again for the patch.
> 
> Tobias
> 
> -
> Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank
> Thürauf


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH] vect+aarch64: Fix ldp_stp_* regressions

2022-02-14 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Mon, 14 Feb 2022, Richard Sandiford wrote:
>
>> ldp_stp_1.c, ldp_stp_4.c and ldp_stp_5.c have been failing since
>> vectorisation was enabled at -O2.  In all three cases SLP is
>> generating vector code when scalar code would be better.
>> 
>> The problem is that the target costs do not model whether STP could
>> be used for the scalar or vector code, so the normal latency-based
>> costs for store-heavy code can be way off.  It would be good to fix
>> that ?properly? at some point, but it isn't easy; see the existing
>> discussion in aarch64_sve_adjust_stmt_cost for more details.
>> 
>> This patch therefore adds an on-the-side check for whether the
>> code is doing nothing more than set-up+stores.  It then applies
>> STP-based costs to those cases only, in addition to the normal
>> latency-based costs.  (That is, the vector code has to win on
>> both counts rather than on one count individually.)
>> 
>> However, at the moment, SLP costs one vector set-up instruction
>> for every vector in an SLP node, even if the contents are the
>> same as a previous vector in the same node.  Fixing the STP costs
>> without fixing that would regress other cases, tested in the patch.
>> 
>> The patch therefore makes the SLP costing code check for duplicates
>> within a node.  Ideally we'd check for duplicates more globally,
>> but that would require a more global approach to costs: the cost
>> of an initialisation should be amoritised across all trees that
>> use the initialisation, rather than fully counted against one
>> arbitrarily-chosen subtree.
>> 
>> Back on aarch64: an earlier version of the patch tried to apply
>> the new heuristic to constant stores.  However, that didn't work
>> too well in practice; see the comments for details.  The patch
>> therefore just tests the status quo for constant cases, leaving out
>> a match if the current choice is dubious.
>> 
>> ldp_stp_5.c was affected by the same thing.  The test would be
>> worth vectorising if we generated better vector code, but:
>> 
>> (1) We do a bad job of moving the { -1, 1 } constant, given that
>> we have { -1, -1 } and { 1, 1 } to hand.
>> 
>> (2) The vector code has 6 pairable stores to misaligned offsets.
>> We have peephole patterns to handle such misalignment for
>> 4 pairable stores, but not 6.
>> 
>> So the SLP decision isn't wrong as such.  It's just being let
>> down by later codegen.
>> 
>> The patch therefore adds -mstrict-align to preserve the original
>> intention of the test while adding ldp_stp_19.c to check for the
>> preferred vector code (XFAILed for now).
>> 
>> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
>> OK for the vectoriser bits?
>
> I'll look at the patch tomorrow but it reminded me of an old
> patch I'm still sitting on which reworked the SLP discovery
> cache to be based on defs rather than stmts which allows us to
> cache and re-use SLP nodes for invariants during SLP discovery.

Ah, yeah, that should help with the “more global” bit.  I think
in the end we need both though: reduce duplicate nodes, and remove
duplicate vectors (or at least duplicate vector costs) within a node.

Thanks,
Richard

> From 8df9c7003611e690bd08fd5cff0b624527c99bf4 Mon Sep 17 00:00:00 2001
> From: Richard Biener 
> Date: Fri, 20 Mar 2020 11:42:47 +0100
> Subject: [PATCH] rework SLP caching based on ops and CSE constants
> To: gcc-patches@gcc.gnu.org
>
> This reworks SLP caching so that it keys on the defs and not
> their defining stmts so we can use it to CSE SLP nodes for
> constants and invariants.
>
> 2020-03-19  Richard Biener  
>
>   * tree-vect-slp.c (): ...
> ---
>  gcc/tree-vect-slp.c | 222 +---
>  1 file changed, 149 insertions(+), 73 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index 1ffbf6f6af9..e545e34e353 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -129,36 +129,38 @@ vect_free_slp_instance (slp_instance instance, bool 
> final_p)
>free (instance);
>  }
>  
> -
> -/* Create an SLP node for SCALAR_STMTS.  */
> -
> -static slp_tree
> -vect_create_new_slp_node (vec scalar_stmts, unsigned nops)
> -{
> -  slp_tree node = new _slp_tree;
> -  SLP_TREE_SCALAR_STMTS (node) = scalar_stmts;
> -  SLP_TREE_CHILDREN (node).create (nops);
> -  SLP_TREE_DEF_TYPE (node) = vect_internal_def;
> -  SLP_TREE_REPRESENTATIVE (node) = scalar_stmts[0];
> -  SLP_TREE_LANES (node) = scalar_stmts.length ();
> -
> -  unsigned i;
> -  stmt_vec_info stmt_info;
> -  FOR_EACH_VEC_ELT (scalar_stmts, i, stmt_info)
> -STMT_VINFO_NUM_SLP_USES (stmt_info)++;
> -
> -  return node;
> -}
> -
>  /* Create an SLP node for OPS.  */
>  
>  static slp_tree
> -vect_create_new_slp_node (vec ops)
> +vect_create_new_slp_node (vec_info *vinfo,
> +   vec ops, unsigned nops = 0,
> +   vect_def_type def_type = vect_external_def)
>  {
>slp_tree node = new _slp_tree;
>

Re: [PATCH] c++: Don't reject GOTO_EXPRs to cdtor_label in potential_constant_expression_1 [PR104513]

2022-02-14 Thread Jason Merrill via Gcc-patches

On 2/14/22 04:54, Jakub Jelinek wrote:


return in ctors on targetm.cxx.cdtor_returns_this () target like arm
is emitted as GOTO_EXPR cdtor_label where at cdtor_label it emits
RETURN_EXPR with the this.
Similarly, in all dtors regardless of targetm.cxx.cdtor_returns_this ()
a return is emitted similarly.

potential_constant_expression_1 was rejecting these gotos and so we
incorrectly rejected these testcases, but actual cxx_eval* is apparently
handling these just fine.  I was a little bit worried that for the
destruction of bases we wouldn't evaluate something we should, but as the
testcase shows, that is evaluated through try ... finally and there is
nothing after the cdtor_label.  For arm there is RETURN_EXPR this; but we
don't really care about the return value from ctors and dtors during the
constexpr evaluation.

Bootstrapped/regtested on {x86_64,i686,aarch64,powerpc64le,s390x}-linux
and armv7hl-linux-gnueabi, ok for trunk?


OK.


I must say I don't see much the point of cdtor_labels at all, I'd think
that with try ... finally around it for non-arm we could just RETURN_EXPR
instead of the GOTO_EXPR and the try/finally gimplification would DTRT,
and we could just add the right return value for the arm case.


Good point.  I suspect that it's a relic from before try/finally, seems 
like it should be simple to remove at this point.



2022-02-14  Jakub Jelinek  

PR c++/104513
* constexpr.cc (potential_constant_expression_1) :
Don't punt if returns (target).

* g++.dg/cpp1y/constexpr-104513.C: New test.
* g++.dg/cpp2a/constexpr-dtor12.C: New test.

--- gcc/cp/constexpr.cc.jj  2022-02-11 13:52:32.697425776 +0100
+++ gcc/cp/constexpr.cc 2022-02-12 13:51:21.000274390 +0100
@@ -9364,7 +9364,7 @@ potential_constant_expression_1 (tree t,
{
tree *target = _OPERAND (t, 0);
-   /* Gotos representing break and continue are OK.  */
-   if (breaks (target) || continues (target))
+   /* Gotos representing break, continue and cdtor return are OK.  */
+   if (breaks (target) || continues (target) || returns (target))
  {
*jump_target = *target;
return true;
--- gcc/testsuite/g++.dg/cpp1y/constexpr-104513.C.jj2022-02-12 
19:05:14.374101202 +0100
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-104513.C   2022-02-12 
19:06:09.745332658 +0100
@@ -0,0 +1,10 @@
+// PR c++/104513
+// { dg-do compile { target c++14 } }
+
+struct A {
+  int a1;
+  short a2, a3;
+  long a4;
+  constexpr A() : a1(42), a2(42), a3(42), a4(42) { return; }
+};
+constexpr A a;
--- gcc/testsuite/g++.dg/cpp2a/constexpr-dtor12.C.jj2022-02-12 
19:04:41.610555957 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-dtor12.C   2022-02-12 
19:04:18.473877087 +0100
@@ -0,0 +1,13 @@
+// PR c++/104513
+// { dg-do compile { target c++20 } }
+
+struct S {
+  constexpr S () : s (nullptr) {}
+  constexpr ~S () { delete s; }
+  int *s;
+};
+struct T : S {
+  constexpr T () : S () {}
+  constexpr ~T () { s = new int (42); return; }
+};
+constexpr T t;

Jakub





Re: [PATCH] vect+aarch64: Fix ldp_stp_* regressions

2022-02-14 Thread Richard Biener via Gcc-patches
On Mon, 14 Feb 2022, Richard Sandiford wrote:

> ldp_stp_1.c, ldp_stp_4.c and ldp_stp_5.c have been failing since
> vectorisation was enabled at -O2.  In all three cases SLP is
> generating vector code when scalar code would be better.
> 
> The problem is that the target costs do not model whether STP could
> be used for the scalar or vector code, so the normal latency-based
> costs for store-heavy code can be way off.  It would be good to fix
> that ?properly? at some point, but it isn't easy; see the existing
> discussion in aarch64_sve_adjust_stmt_cost for more details.
> 
> This patch therefore adds an on-the-side check for whether the
> code is doing nothing more than set-up+stores.  It then applies
> STP-based costs to those cases only, in addition to the normal
> latency-based costs.  (That is, the vector code has to win on
> both counts rather than on one count individually.)
> 
> However, at the moment, SLP costs one vector set-up instruction
> for every vector in an SLP node, even if the contents are the
> same as a previous vector in the same node.  Fixing the STP costs
> without fixing that would regress other cases, tested in the patch.
> 
> The patch therefore makes the SLP costing code check for duplicates
> within a node.  Ideally we'd check for duplicates more globally,
> but that would require a more global approach to costs: the cost
> of an initialisation should be amoritised across all trees that
> use the initialisation, rather than fully counted against one
> arbitrarily-chosen subtree.
> 
> Back on aarch64: an earlier version of the patch tried to apply
> the new heuristic to constant stores.  However, that didn't work
> too well in practice; see the comments for details.  The patch
> therefore just tests the status quo for constant cases, leaving out
> a match if the current choice is dubious.
> 
> ldp_stp_5.c was affected by the same thing.  The test would be
> worth vectorising if we generated better vector code, but:
> 
> (1) We do a bad job of moving the { -1, 1 } constant, given that
> we have { -1, -1 } and { 1, 1 } to hand.
> 
> (2) The vector code has 6 pairable stores to misaligned offsets.
> We have peephole patterns to handle such misalignment for
> 4 pairable stores, but not 6.
> 
> So the SLP decision isn't wrong as such.  It's just being let
> down by later codegen.
> 
> The patch therefore adds -mstrict-align to preserve the original
> intention of the test while adding ldp_stp_19.c to check for the
> preferred vector code (XFAILed for now).
> 
> Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
> OK for the vectoriser bits?

I'll look at the patch tomorrow but it reminded me of an old
patch I'm still sitting on which reworked the SLP discovery
cache to be based on defs rather than stmts which allows us to
cache and re-use SLP nodes for invariants during SLP discovery.

>From 8df9c7003611e690bd08fd5cff0b624527c99bf4 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Fri, 20 Mar 2020 11:42:47 +0100
Subject: [PATCH] rework SLP caching based on ops and CSE constants
To: gcc-patches@gcc.gnu.org

This reworks SLP caching so that it keys on the defs and not
their defining stmts so we can use it to CSE SLP nodes for
constants and invariants.

2020-03-19  Richard Biener  

* tree-vect-slp.c (): ...
---
 gcc/tree-vect-slp.c | 222 +---
 1 file changed, 149 insertions(+), 73 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 1ffbf6f6af9..e545e34e353 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -129,36 +129,38 @@ vect_free_slp_instance (slp_instance instance, bool 
final_p)
   free (instance);
 }
 
-
-/* Create an SLP node for SCALAR_STMTS.  */
-
-static slp_tree
-vect_create_new_slp_node (vec scalar_stmts, unsigned nops)
-{
-  slp_tree node = new _slp_tree;
-  SLP_TREE_SCALAR_STMTS (node) = scalar_stmts;
-  SLP_TREE_CHILDREN (node).create (nops);
-  SLP_TREE_DEF_TYPE (node) = vect_internal_def;
-  SLP_TREE_REPRESENTATIVE (node) = scalar_stmts[0];
-  SLP_TREE_LANES (node) = scalar_stmts.length ();
-
-  unsigned i;
-  stmt_vec_info stmt_info;
-  FOR_EACH_VEC_ELT (scalar_stmts, i, stmt_info)
-STMT_VINFO_NUM_SLP_USES (stmt_info)++;
-
-  return node;
-}
-
 /* Create an SLP node for OPS.  */
 
 static slp_tree
-vect_create_new_slp_node (vec ops)
+vect_create_new_slp_node (vec_info *vinfo,
+ vec ops, unsigned nops = 0,
+ vect_def_type def_type = vect_external_def)
 {
   slp_tree node = new _slp_tree;
   SLP_TREE_SCALAR_OPS (node) = ops;
-  SLP_TREE_DEF_TYPE (node) = vect_external_def;
   SLP_TREE_LANES (node) = ops.length ();
+  if (nops != 0
+  || (def_type != vect_external_def && def_type != vect_constant_def))
+{
+  if (nops != 0)
+   SLP_TREE_CHILDREN (node).create (nops);
+  SLP_TREE_DEF_TYPE (node) = vect_internal_def;
+
+  SLP_TREE_SCALAR_STMTS (node).create (ops.length ());
+  

[PATCH] vect+aarch64: Fix ldp_stp_* regressions

2022-02-14 Thread Richard Sandiford via Gcc-patches
ldp_stp_1.c, ldp_stp_4.c and ldp_stp_5.c have been failing since
vectorisation was enabled at -O2.  In all three cases SLP is
generating vector code when scalar code would be better.

The problem is that the target costs do not model whether STP could
be used for the scalar or vector code, so the normal latency-based
costs for store-heavy code can be way off.  It would be good to fix
that “properly” at some point, but it isn't easy; see the existing
discussion in aarch64_sve_adjust_stmt_cost for more details.

This patch therefore adds an on-the-side check for whether the
code is doing nothing more than set-up+stores.  It then applies
STP-based costs to those cases only, in addition to the normal
latency-based costs.  (That is, the vector code has to win on
both counts rather than on one count individually.)

However, at the moment, SLP costs one vector set-up instruction
for every vector in an SLP node, even if the contents are the
same as a previous vector in the same node.  Fixing the STP costs
without fixing that would regress other cases, tested in the patch.

The patch therefore makes the SLP costing code check for duplicates
within a node.  Ideally we'd check for duplicates more globally,
but that would require a more global approach to costs: the cost
of an initialisation should be amoritised across all trees that
use the initialisation, rather than fully counted against one
arbitrarily-chosen subtree.

Back on aarch64: an earlier version of the patch tried to apply
the new heuristic to constant stores.  However, that didn't work
too well in practice; see the comments for details.  The patch
therefore just tests the status quo for constant cases, leaving out
a match if the current choice is dubious.

ldp_stp_5.c was affected by the same thing.  The test would be
worth vectorising if we generated better vector code, but:

(1) We do a bad job of moving the { -1, 1 } constant, given that
we have { -1, -1 } and { 1, 1 } to hand.

(2) The vector code has 6 pairable stores to misaligned offsets.
We have peephole patterns to handle such misalignment for
4 pairable stores, but not 6.

So the SLP decision isn't wrong as such.  It's just being let
down by later codegen.

The patch therefore adds -mstrict-align to preserve the original
intention of the test while adding ldp_stp_19.c to check for the
preferred vector code (XFAILed for now).

Tested on aarch64-linux-gnu, aarch64_be-elf and x86_64-linux-gnu.
OK for the vectoriser bits?

Thanks,
Richard


gcc/
* tree-vectorizer.h (vect_scalar_ops_slice): New struct.
(vect_scalar_ops_slice_hash): Likewise.
(vect_scalar_ops_slice::op): New function.
* tree-vect-slp.cc (vect_scalar_ops_slice::all_same_p): New function.
(vect_scalar_ops_slice_hash::hash): Likewise.
(vect_scalar_ops_slice_hash::equal): Likewise.
(vect_prologue_cost_for_slp): Check for duplicate vectors.
* config/aarch64/aarch64.cc
(aarch64_vector_costs::m_stp_sequence_cost): New member variable.
(aarch64_aligned_constant_offset_p): New function.
(aarch64_stp_sequence_cost): Likewise.
(aarch64_vector_costs::add_stmt_cost): Handle new STP heuristic.
(aarch64_vector_costs::finish_cost): Likewise.

gcc/testsuite/
* gcc.target/aarch64/ldp_stp_5.c: Require -mstrict-align.
* gcc.target/aarch64/ldp_stp_14.h,
* gcc.target/aarch64/ldp_stp_14.c: New test.
* gcc.target/aarch64/ldp_stp_15.c: Likewise.
* gcc.target/aarch64/ldp_stp_16.c: Likewise.
* gcc.target/aarch64/ldp_stp_17.c: Likewise.
* gcc.target/aarch64/ldp_stp_18.c: Likewise.
* gcc.target/aarch64/ldp_stp_19.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc | 140 ++
 gcc/testsuite/gcc.target/aarch64/ldp_stp_14.c |  89 +++
 gcc/testsuite/gcc.target/aarch64/ldp_stp_14.h |  50 +++
 gcc/testsuite/gcc.target/aarch64/ldp_stp_15.c | 137 +
 gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c | 133 +
 gcc/testsuite/gcc.target/aarch64/ldp_stp_17.c | 120 +++
 gcc/testsuite/gcc.target/aarch64/ldp_stp_18.c | 123 +++
 gcc/testsuite/gcc.target/aarch64/ldp_stp_19.c |   6 +
 gcc/testsuite/gcc.target/aarch64/ldp_stp_5.c  |   2 +-
 gcc/tree-vect-slp.cc  |  75 ++
 gcc/tree-vectorizer.h |  35 +
 11 files changed, 884 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_14.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_14.h
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_15.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_16.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_17.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_18.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_stp_19.c

diff --git a/gcc/tree-vectorizer.h 

Re: [PATCH] PR c/104506: Tolerate error_mark_node in useless_type_conversion_p.

2022-02-14 Thread Richard Biener via Gcc-patches
On Mon, Feb 14, 2022 at 4:31 PM Richard Biener
 wrote:
>
> On Mon, Feb 14, 2022 at 1:54 PM Roger Sayle  
> wrote:
> >
> >
> >
> > This simple fix to the middle-end, resolves PR c/104506, by adding an
> >
> > explicit check for error_mark_node to useless_type_conversion_p.  I first
> >
> > trying fixing this in the C front-end, but the type is valid at the point
> >
> > that the NOP_EXPR is created, so the poisoned type leaks to the middle-end.
>
> Hmm, IMHO "fixing" something to error_mark after it has possibly be used
> looks broken.
>
> I don't like trying to paper over this in useless_type_conversion_p, the
> predicate should not be called on an error_mark_node type.
>
> Alternatively we might want to create an error_type_node that is at least
> a type (with main variant error_type_node and TREE_CODE ERROR_TYPE, etc.).
> But likely more complicated than avoiding to mess with the type of 'x' after
> the fact?

That is,

diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index c701f07befe..29658ade6d7 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -2964,14 +2964,12 @@ duplicate_decls (tree newdecl, tree olddecl)
 {
   /* Avoid `unused variable' and other warnings for OLDDECL.  */
   suppress_warning (olddecl, OPT_Wunused);
-  /* If the types are completely different, poison them both with
+  /* If the types are completely different, poison the new with
 error_mark_node.  */
   if (TREE_CODE (TREE_TYPE (newdecl)) != TREE_CODE (TREE_TYPE (olddecl))
  && olddecl != error_mark_node
  && seen_error ())
{
- if (TREE_CODE (olddecl) != FUNCTION_DECL)
-   TREE_TYPE (olddecl) = error_mark_node;
  if (TREE_CODE (newdecl) != FUNCTION_DECL)
TREE_TYPE (newdecl) = error_mark_node;
}

can you check what happens with that?

>
> Richard.
>
> >
> > Returning either true or false from useless_type_conversion_p avoids the
> >
> > ICE-after-error.  Apologies to Andrew Pinski, I hadn't noticed that he'd
> >
> > assigned this PR to himself until after my regression testing had finished.
> >
> >
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> >
> > make -k check with no new failures.  Ok for mainline?
> >
> >
> >
> >
> >
> > 2022-02-14  Roger Sayle  
> >
> >
> >
> > gcc/ChangeLog
> >
> > PR c/104506
> >
> > * gimple-expr.cc (useless_type_conversion_p): Add a check for
> >
> > error_mark_node.
> >
> >
> >
> > gcc/testsuite/ChangeLog
> >
> > PR c/104506
> >
> > * gcc.dg/pr104506.c: New test case.
> >
> >
> >
> >
> >
> > Roger
> >
> > --
> >
> >
> >


Re: [PATCH] PR c/104506: Tolerate error_mark_node in useless_type_conversion_p.

2022-02-14 Thread Richard Biener via Gcc-patches
On Mon, Feb 14, 2022 at 1:54 PM Roger Sayle  wrote:
>
>
>
> This simple fix to the middle-end, resolves PR c/104506, by adding an
>
> explicit check for error_mark_node to useless_type_conversion_p.  I first
>
> trying fixing this in the C front-end, but the type is valid at the point
>
> that the NOP_EXPR is created, so the poisoned type leaks to the middle-end.

Hmm, IMHO "fixing" something to error_mark after it has possibly be used
looks broken.

I don't like trying to paper over this in useless_type_conversion_p, the
predicate should not be called on an error_mark_node type.

Alternatively we might want to create an error_type_node that is at least
a type (with main variant error_type_node and TREE_CODE ERROR_TYPE, etc.).
But likely more complicated than avoiding to mess with the type of 'x' after
the fact?

Richard.

>
> Returning either true or false from useless_type_conversion_p avoids the
>
> ICE-after-error.  Apologies to Andrew Pinski, I hadn't noticed that he'd
>
> assigned this PR to himself until after my regression testing had finished.
>
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
>
> make -k check with no new failures.  Ok for mainline?
>
>
>
>
>
> 2022-02-14  Roger Sayle  
>
>
>
> gcc/ChangeLog
>
> PR c/104506
>
> * gimple-expr.cc (useless_type_conversion_p): Add a check for
>
> error_mark_node.
>
>
>
> gcc/testsuite/ChangeLog
>
> PR c/104506
>
> * gcc.dg/pr104506.c: New test case.
>
>
>
>
>
> Roger
>
> --
>
>
>


Re: [Backport, committed, PR103970, Fortran, Coarray] Multi-image co_broadcast of derived type with allocatable components fails^

2022-02-14 Thread Andre Vehreschild via Gcc-patches
Hi all,

two weeks have passed with no complains about the patch for PR103970. Therefore
backported and pushed to gcc-11 as 680ee9c3332.

Regards,
Andre

On Fri, 28 Jan 2022 12:39:17 +0100
Andre Vehreschild  wrote:

> Hi Tobias,
> 
> I don't know why that bootstrapped initially. I fixed the patch (naming a
> ```
> else 
>   /* Prevent warning.  */
>   cdesc = NULL_TREE;
> ```
> obvious) and rerun bootstrap making sure to purge everything beforehand. It
> did not break bootstrap on x86_64-linux/f35. Hope it doesn't elsewhere with
> submit 26e237fb5b8.
> 
> Thanks for pointing this out.
> 
> Regards,
>   Andre
> 
> On Fri, 28 Jan 2022 10:36:23 +0100
> Andre Vehreschild via Fortran  wrote:
> 
> > Hi Tobias,
> > 
> > ups, sorry, reverted immediately.
> > 
> > Regards,
> > Andre
> > 
> > On Fri, 28 Jan 2022 10:27:26 +0100
> > Tobias Burnus  wrote:
> >   
> > > Hi Andre,
> > > 
> > > your patch breaks bootstrapping:
> > > 
> > > ../../repos/gcc/gcc/fortran/trans-array.cc: In function ‘tree_node*
> > > structure_alloc_comps(gfc_symbol*, tree, tree, int, int, int,
> > > gfc_co_subroutines_args*)’:
> > > ../../repos/gcc/gcc/fortran/trans-array.cc:9200:42: error: ‘cdesc’ may be
> > > used uninitialized [-Werror=maybe-uninitialized] 9200 |
> > > gfc_conv_descriptor_data_set (, cdesc, comp); |
> > > ~^~~~
> > > ../../repos/gcc/gcc/fortran/trans-array.cc:9082:16: note: ‘cdesc’ was
> > > declared here 9082 |   tree cdesc; |^ cc1plus:
> > > all warnings being treated as errors make[3]: *** [Makefile:1143:
> > > fortran/trans-array.o] Error 1
> > > 
> > > Tobias
> > > 
> > > On 28.01.22 10:07, Andre Vehreschild via Fortran wrote:
> > > > Hi Harald,
> > > >
> > > > thanks for the fast review. I have submitted as c9c48ab7bad.
> > > >
> > > > Will wait for two weeks (reminder set :-)) before backporting to gcc-11.
> > > >
> > > > Thank you and regards,
> > > >   Andre
> > > >
> > > > On Tue, 25 Jan 2022 22:30:22 +0100
> > > > Harald Anlauf via Fortran  wrote:
> > > >  
> > > >> Hi Andre',
> > > >>
> > > >> Am 25.01.22 um 17:32 schrieb Andre Vehreschild via Fortran:  
> > > >>> Hi all,
> > > >>>
> > > >>> attached patch fixes wrong code generation when broadcasting a derived
> > > >>> type containing allocatable and non-allocatable scalars. Furthermore
> > > >>> does it prevent broadcasting of coarray-tokens, which are always local
> > > >>> this_image. Thus having them on a different image makes no sense.
> > > >>>
> > > >>> Bootstrapped and regtested ok on x86_64-linux/F35.
> > > >>>
> > > >>> Ok, for trunk and backport to 12 and 11-branch after decent time?
> > > >>>
> > > >>> I perceived that 12 is closed for this kind of bugfix, therefore
> > > >>> asking ok for 13.  
> > > >> I do not think that 12 is closed for bugfixing, especially not for
> > > >> fortran.  And if my cursory reading of the patch is not misleading,
> > > >> the impact of the patch is really limited to coarrays.
> > > >>
> > > >> You may want to wait for another 1-2 days for additional comments.
> > > >> If not, it is OK from my side.
> > > >>
> > > >> Thanks for the patch!
> > > >>
> > > >> Harald
> > > >>  
> > > >>> Regards,
> > > >>> Andre
> > > >>> --
> > > >>> Andre Vehreschild * Email: vehre ad gmx dot de  
> > > >>  
> > > >
> > > > --
> > > > Andre Vehreschild * Email: vehre ad gmx dot de  
> > > -
> > > Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
> > > 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
> > > Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
> > > Registergericht München, HRB 106955
> > 
> >   
> 
> 


-- 
Andre Vehreschild * Email: vehre ad gmx dot de 


Re: [PATCH 6/7] openmp, fortran: Add Fortran support for parsing metadirectives

2022-02-14 Thread Kwok Cheung Yeung
This patch (again, to be applied on top of the current set of 
metadirective patches) fixes two minor issues with metadirectives in the 
Fortran front-end.


- 'sorry' is called if a declarative OpenMP directive is found in a 
metadirective clause.
- An ICE that occurs with an empty metadirective (i.e. just '!$omp 
metadirective' with nothing else) is fixed.


Thanks

KwokFrom 153b8dbd19cf90b1869be7f409d55d1ab5ba81d5 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Fri, 11 Feb 2022 15:42:50 +
Subject: [PATCH 2/2] openmp: More Fortran front-end fixes for metadirectives

This adds a check for declarative OpenMP directives in metadirective
variants (already present in the C/C++ front-ends), and fixes an
ICE when an empty metadirective (i.e. just '!$omp metadirective')
is presented.

2022-02-11  Kwok Cheung Yeung  

gcc/fortran/
* gfortran.h (is_omp_declarative_stmt): New.
* openmp.cc (match_omp_metadirective): Reject declarative OpenMP
directives with 'sorry'.
* parse.cc (parse_omp_metadirective_body): Check that state stack head
is non-null before dereferencing.
(is_omp_declarative_stmt): New.

gcc/testsuite/
* gfortran.dg/gomp/metadirective-2.f90 (main): Test empty
metadirective.
---
 gcc/fortran/gfortran.h   |  1 +
 gcc/fortran/openmp.cc|  3 +++
 gcc/fortran/parse.cc | 16 +++-
 .../gfortran.dg/gomp/metadirective-2.f90 |  5 -
 4 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index bdb4b0f6aa5..37eb039b6d4 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3852,6 +3852,7 @@ bool gfc_parse_file (void);
 void gfc_global_used (gfc_gsymbol *, locus *);
 gfc_namespace* gfc_build_block_ns (gfc_namespace *);
 gfc_statement match_omp_directive (void);
+bool is_omp_declarative_stmt (gfc_statement);
 
 /* dependency.cc */
 int gfc_dep_compare_functions (gfc_expr *, gfc_expr *, bool);
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index 5e87e18ce0d..0071484817d 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -5151,6 +5151,9 @@ match_omp_metadirective (bool begin_p)
   gfc_statement directive = match_omp_directive ();
   gfc_matching_omp_context_selector = false;
 
+  if (is_omp_declarative_stmt (directive))
+   sorry ("declarative directive variants are not supported");
+
   if (gfc_error_flag_test ())
{
  gfc_current_locus = old_loc;
diff --git a/gcc/fortran/parse.cc b/gcc/fortran/parse.cc
index cd18315697e..cb8acb3c68f 100644
--- a/gcc/fortran/parse.cc
+++ b/gcc/fortran/parse.cc
@@ -5841,7 +5841,8 @@ parse_omp_metadirective_body (gfc_statement omp_st)
 
   gfc_in_metadirective_body = old_in_metadirective_body;
 
-  *clause->code = *gfc_state_stack->head;
+  if (gfc_state_stack->head)
+   *clause->code = *gfc_state_stack->head;
   pop_state ();
 
   gfc_commit_symbols ();
@@ -7081,3 +7082,16 @@ is_oacc (gfc_state_data *sd)
   return false;
 }
 }
+
+/* Return true if ST is a declarative OpenMP statement.  */
+bool
+is_omp_declarative_stmt (gfc_statement st)
+{
+  switch (st)
+{
+  case_omp_decl:
+   return true;
+  default:
+   return false;
+}
+}
diff --git a/gcc/testsuite/gfortran.dg/gomp/metadirective-2.f90 
b/gcc/testsuite/gfortran.dg/gomp/metadirective-2.f90
index 06c324589d0..cdd5e85068e 100644
--- a/gcc/testsuite/gfortran.dg/gomp/metadirective-2.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/metadirective-2.f90
@@ -43,7 +43,7 @@ program main
 end do
   !$omp end metadirective
   
-  ! Test labels in the body
+  ! Test labels in the body.
   !$omp begin metadirective &
   !$omp&   when (device={arch("nvptx")}: parallel do) &
   !$omp&   when (device={arch("gcn")}: parallel)
@@ -56,4 +56,7 @@ program main
 20continue
 end do
   !$omp end metadirective
+
+  ! Test empty metadirective.
+  !$omp metadirective
 end program
-- 
2.25.1



Re: [PATCH 6/7] openmp, fortran: Add Fortran support for parsing metadirectives

2022-02-14 Thread Kwok Cheung Yeung

> This patch implements metadirective parsing in the Fortran frontend.

This patch (to be applied on top of the current set of metadirective 
patches) implements a feature that was present in the C and C++ 
front-ends but not in Fortran - the early culling of metadirective 
variants that can be eliminated during parsing because their selectors 
are resolvable at parse-time and still do not match. This is more 
efficient, and allows code with nested metadirectives like this (which 
works on other compilers) to compile:


!$omp metadirective when (implementation={vendor("ibm")}: &
!$omp&  target teams distribute)
  !$omp metadirective when (implementation={vendor("gnu")}: parallel do)

This would currently fail because when parsing the body of the 'target 
teams distribute', the parser sees the metadirective when it is 
expecting a loop nest. If the vendor("ibm") is eliminated early though, 
it would just evaluate to '!$omp nothing' and the following 
metadirective would not be incorrect. This doesn't work for selectors 
such as 'arch' that would need to be deferred until later passes though.


As the selector matching code (omp_context_selector_matches in 
omp-general.cc) works on Generic trees, I have allowed for a limited 
translation from the GFortran AST form to tree form during parsing, 
skipping over things like expression translation that must be done later.


I have also fixed another FE issue with nested metadirectives, that 
occurs when you have something like:


program P
  !$omp metadirective
!$omp metadirective
  !$omp metadirective

end program P

When gfc_match_end is called after parsing the do statement, it needs to 
drop down multiple levels from the innermost metadirective state to that 
 of 'program P' in order to find the proper end type, and not just one 
level as it currently does.


Thanks

KwokFrom 5a7b109a014422a5b43e43669df1dc0d59e830cf Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Fri, 11 Feb 2022 11:20:18 +
Subject: [PATCH 1/2] openmp: Eliminate non-matching metadirective variants
 early in Fortran front-end

This patch checks during parsing if a metadirective selector is both
resolvable and non-matching - if so, it is removed from further
consideration.  This is both more efficient, and avoids spurious
syntax errors caused by considering combinations of selectors that
lead to invalid combinations of OpenMP directives, when that
combination would never arise in the first place.

This exposes another bug - when metadirectives that are not of the
begin-end variety are nested, we might have to drill up through
multiple layers of the state stack to reach the state for the
next statement.  This is now fixed.

2022-02-11  Kwok Cheung Yeung  

gcc/
* omp-general.cc (DELAY_METADIRECTIVES_AFTER_LTO): Check that cfun is
non-null before derefencing.

gcc/fortran/
* decl.cc (gfc_match_end): Search for first previous state that is not
COMP_OMP_METADIRECTIVE.
* gfortran.h (gfc_skip_omp_metadirective_clause): Add prototype.
* openmp.cc (match_omp_metadirective): Skip clause if
result of gfc_skip_omp_metadirective_clause is true.
* trans-openmp.cc (gfc_trans_omp_set_selector): Add argument and
disable expression conversion if false.
(gfc_skip_omp_metadirective_clause): New.

gcc/testsuite/
* gfortran.dg/gomp/metadirective-8.f90: New.
---
 gcc/fortran/decl.cc   | 21 +-
 gcc/fortran/gfortran.h|  4 ++
 gcc/fortran/openmp.cc |  7 +++-
 gcc/fortran/trans-openmp.cc   | 38 ++-
 gcc/omp-general.cc|  5 ++-
 .../gfortran.dg/gomp/metadirective-8.f90  | 22 +++
 6 files changed, 81 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/gomp/metadirective-8.f90

diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc
index e024e360c88..a77ac768175 100644
--- a/gcc/fortran/decl.cc
+++ b/gcc/fortran/decl.cc
@@ -8325,15 +8325,32 @@ gfc_match_end (gfc_statement *st)
 
 case COMP_CONTAINS:
 case COMP_DERIVED_CONTAINS:
-case COMP_OMP_METADIRECTIVE:
 case COMP_OMP_BEGIN_METADIRECTIVE:
   state = gfc_state_stack->previous->state;
   block_name = gfc_state_stack->previous->sym == NULL
-? NULL : gfc_state_stack->previous->sym->name;
+  ? NULL : gfc_state_stack->previous->sym->name;
   abreviated_modproc_decl = gfc_state_stack->previous->sym
&& gfc_state_stack->previous->sym->abr_modproc_decl;
   break;
 
+case COMP_OMP_METADIRECTIVE:
+  {
+   /* Metadirectives can be nested, so we need to drill down to the
+  first state that is not COMP_OMP_METADIRECTIVE.  */
+   gfc_state_data *state_data = gfc_state_stack;
+
+   do
+   {
+ state_data = state_data->previous;
+ state = 

Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-14 Thread Vladimir Makarov via Gcc-patches



On 2022-02-14 04:44, Richard Sandiford via Gcc-patches wrote:

Iain Sandoe via Gcc-patches  writes:

Two issues resulted in this PR, which manifests when we force a constant into
memory in LRA (in PIC code on Darwin).  The presence of such forced constants
is quite dependent on other RTL optimisations, and it is easy for the issue to
become latent for a specific case.

First, in the Darwin-specific rs6000 backend code, we were not being careful
enough in rejecting invalid symbolic addresses.  Specifically, when generating
PIC code, we require a SYMBOL_REF to be wrapped in an UNSPEC_MACHOPIC_OFFSET.

Second, LRA was attempting to load a register using an invalid lo_sum address.

The LRA changes are approved in the PR by Vladimir, and the RS6000 changes are
Darwin-specific (although, of course, any observations are welcome).

Tested on several lo_sum targets and x86_64 all languages except as noted:
powerpc64-linux (m32/m64) -D
powerpc64le-linux  -D
powerpc64-aix -Ada -Go -D
aarch64-linux -Ada -D
x86_64-linux all langs -D
powerpc-darwin9 (master and 11.2) -D -Go.

pushed to master, thanks,
Iain

Signed-off-by: Iain Sandoe 
Co-authored-by: Vladimir Makarov 

PR target/104117

gcc/ChangeLog:

* config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
emitting PIC code.
(legitimate_lo_sum_address_p): Likewise.
* lra-constraints.cc (process_address_1): Do not attempt to emit a reg
load from an invalid lo_sum address.
---
  gcc/config/rs6000/rs6000.cc | 38 +++--
  gcc/lra-constraints.cc  | 17 ++---
  2 files changed, 38 insertions(+), 17 deletions(-)

[…]
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index fdff9e0720a..c700c3f4578 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
  *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
  if (!valid_address_p (op, , cn))
{
- /* Try to put lo_sum into register.  */
- insn = emit_insn (gen_rtx_SET
-   (new_reg,
-gen_rtx_LO_SUM (Pmode, new_reg, 
addr)));
- code = recog_memoized (insn);
- if (code >= 0)
-   {
- *ad.inner = new_reg;
- if (!valid_address_p (op, , cn))
-   {
- *ad.inner = addr;
- code = -1;
-   }
-   }
-
+ *ad.inner = addr; /* Punt.  */
+ code = -1;
}
}
  if (code < 0)

Could you go into more details about this?  Why is it OK to continue
to try:

   (lo_sum new_reg addr)

directly as an address (the context at the top of the hunk), but not try
moving the lo_sum into a register?  They should be semantically equivalent,
so it seems that if one is wrong, the other would be too.


Hi, Richard.  Change LRA is mine and I approved it for Iain's patch.

I think there is no need for this code and it is misleading.  If 
'mem[low_sum]' does not work, I don't think that 'reg=low_sum;mem[reg]' 
will help for any existing target.  As machine-dependent code for any 
target most probably (for ppc64 darwin it is exactly the case) checks 
address only in memory, it can wrongly accept wrong address by reloading 
it into reg and use it in memory. So these are my arguments for the 
remove this code from process_address_1.





[committed] amdgcn: Allow vector reductions on constants

2022-02-14 Thread Andrew Stubbs
I've committed this fix for an ICE compiling sollve_vv testcase 
test_target_teams_distribute_defaultmap.c.


Somehow the optimizers result in a vector reduction on a vector of 
duplicated constants. This was a case the backend didn't handle, so we 
ended up with an unrecognised instruction ICE.


It might be better if constant reductions were evaluated at compile 
time, but this patch prevents the ICE right now.


Andrew

P.S. I'll backport this to OG11 shortly.amdgcn: Allow vector reductions on constants

Obviously it would be better if these reductions could be evaluated at compile
time, but this will avoid an ICE.

gcc/ChangeLog:

* config/gcn/gcn.cc (gcn_expand_reduc_scalar): Use force_reg.

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index 74819c6e4d7..402f0256411 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -4460,7 +4460,7 @@ gcn_expand_reduc_scalar (machine_mode mode, rtx src, int 
unspec)
  pair of lanes, then on every pair of results from the previous
  iteration (thereby effectively reducing every 4 lanes) and so on until
  all lanes are reduced.  */
-  rtx in, out = src;
+  rtx in, out = force_reg (mode, src);
   for (int i = 0, shift = 1; i < 6; i++, shift <<= 1)
 {
   rtx shift_val = gen_rtx_CONST_INT (VOIDmode, shift);


[PATCH] tree-optimization/104528 - free niter estimates after DSE

2022-02-14 Thread Richard Biener via Gcc-patches
When DSE removes a trivially dead def we have to reset niter information
on loops since that might refer to it.  The patch also adds verification
to make sure this does not happen.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-02-14  Richard Biener  

PR tree-optimization/104528
* tree-ssa.h (find_released_ssa_name): Declare.
* tree-ssa.cc (find_released_ssa_name): Export.
* cfgloop.cc (verify_loop_structure): Look for released
SSA names in loops nb_iterations.
* tree-ssa-dse.cc (pass_dse::execute): Release number of iteration
estimates.

* gfortran.dg/pr104528.f: New testcase.
---
 gcc/cfgloop.cc   | 13 
 gcc/testsuite/gfortran.dg/pr104528.f | 44 
 gcc/tree-ssa-dse.cc  |  8 +
 gcc/tree-ssa.cc  |  2 +-
 gcc/tree-ssa.h   |  1 +
 5 files changed, 67 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr104528.f

diff --git a/gcc/cfgloop.cc b/gcc/cfgloop.cc
index 78fd6d5db1b..5ffcc77d93f 100644
--- a/gcc/cfgloop.cc
+++ b/gcc/cfgloop.cc
@@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgloop.h"
 #include "gimple-iterator.h"
 #include "dumpfile.h"
+#include "tree-ssa.h"
+#include "tree-pretty-print.h"
 
 static void flow_loops_cfg_dump (FILE *);
 
@@ -1561,6 +1563,17 @@ verify_loop_structure (void)
err = 1;
  }
}
+
+  /* Check cached number of iterations for released SSA names.  */
+  tree ref;
+  if (loop->nb_iterations
+ && (ref = walk_tree (>nb_iterations,
+  find_released_ssa_name, NULL, NULL)))
+   {
+ error ("loop %d%'s number of iterations %qE references the"
+" released SSA name %qE", i, loop->nb_iterations, ref);
+ err = 1;
+   }
 }
 
   /* Check irreducible loops.  */
diff --git a/gcc/testsuite/gfortran.dg/pr104528.f 
b/gcc/testsuite/gfortran.dg/pr104528.f
new file mode 100644
index 000..5b43feba97b
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr104528.f
@@ -0,0 +1,44 @@
+! { dg-do compile }
+! { dg-options "-O2 -fpeel-loops -ftree-loop-vectorize -fno-tree-scev-cprop 
--param iv-max-considered-uses=2" }
+  REAL   FUNCTION FOO(M, N, A, W)
+
+  INTEGERM, N
+
+  REAL   W(*)
+  COMPLEXA(*)
+
+  INTEGERII, JI, JJ, KK, LL, MP
+
+  EXTERNAL   BAR
+
+  INTEGERQUX
+  EXTERNAL   QUX
+
+  CALL BAR(II)
+
+  IF (M .EQ. 0) THEN
+ IF (N .EQ. 0) THEN
+DO 140 KK = II, II + MP
+   W(KK) = 0
+ 140CONTINUE
+ ELSE
+KK = II + MP
+ END IF
+
+ DO 130 JI = KK, KK + MP
+DO 120 LL = JJ, JJ + MP
+   DO 110 KK = II, II + MP
+  W(KK) = (A(KK))
+ 110   CONTINUE
+ 120CONTINUE
+ 130 CONTINUE
+
+ IF (W(KK) .EQ. 0) THEN
+FOO = W(QUX(MP, W, 1))
+ END IF
+
+  END IF
+
+  RETURN
+
+  END
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 47997df2125..2b22a61ad97 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -44,6 +44,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-modref-tree.h"
 #include "ipa-modref.h"
 #include "target.h"
+#include "tree-ssa-loop-niter.h"
 
 /* This file implements dead store elimination.
 
@@ -1418,6 +1419,8 @@ unsigned int
 pass_dse::execute (function *fun)
 {
   unsigned todo = 0;
+  bool released_def = false;
+
   need_eh_cleanup = BITMAP_ALLOC (NULL);
   need_ab_cleanup = BITMAP_ALLOC (NULL);
   auto_sbitmap live_bytes (param_dse_max_object_size);
@@ -1460,6 +1463,7 @@ pass_dse::execute (function *fun)
  if (gsi_remove (, true) && need_eh_cleanup)
bitmap_set_bit (need_eh_cleanup, bb->index);
  release_defs (stmt);
+ released_def = true;
}
}
  if (gsi_end_p (gsi))
@@ -1481,6 +1485,7 @@ pass_dse::execute (function *fun)
}
  remove_phi_node (, true);
  removed_phi = true;
+ released_def = true;
}
  else
gsi_next ();
@@ -1506,6 +1511,9 @@ pass_dse::execute (function *fun)
   BITMAP_FREE (need_eh_cleanup);
   BITMAP_FREE (need_ab_cleanup);
 
+  if (released_def)
+free_numbers_of_iterations_estimates (fun);
+
   return todo;
 }
 
diff --git a/gcc/tree-ssa.cc b/gcc/tree-ssa.cc
index 8fe0682981d..430875ae37a 100644
--- a/gcc/tree-ssa.cc
+++ b/gcc/tree-ssa.cc
@@ -272,7 +272,7 @@ target_for_debug_bind (tree var)
 /* Called via walk_tree, look for SSA_NAMEs that have already been
released.  */
 
-static tree
+tree
 find_released_ssa_name (tree *tp, int *walk_subtrees, void *data_)
 {
   struct 

[committed] libstdc++: Use __cpp_concepts instead of custom macro [PR103891]

2022-02-14 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux (and clang-12.0.1), pushed to trunk.

-- >8 --

With the new value of __cpp_concepts required by P2493, we can test
whether the compiler supports conditionally trivial special members.
This allows us to remove the workaround that disables fully-constexpr
std::variant for Clang. Now it should work for non-GCC compilers (such
as future releases of Clang) that support conditionally trivial
destructors and define the new value of __cpp_concepts.

libstdc++-v3/ChangeLog:

PR libstdc++/103891
* include/bits/c++config (_GLIBCXX_HAVE_COND_TRIVIAL_SPECIAL_MEMBERS):
Remove.
* include/std/variant: Check feature test macros instead.
* include/std/version: Likewise.
---
 libstdc++-v3/include/bits/c++config |  5 -
 libstdc++-v3/include/std/variant| 18 --
 libstdc++-v3/include/std/version|  4 ++--
 3 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/libstdc++-v3/include/bits/c++config 
b/libstdc++-v3/include/bits/c++config
index b197349f976..c64b61b3c90 100644
--- a/libstdc++-v3/include/bits/c++config
+++ b/libstdc++-v3/include/bits/c++config
@@ -810,11 +810,6 @@ namespace std
 
 #undef _GLIBCXX_HAS_BUILTIN
 
-#if __cplusplus >= 202002L && __cpp_concepts && __GNUC__ >= 12
-// XXX workaround for missing feature test macro for P0848R3 (see P2493R0).
-# define _GLIBCXX_HAVE_COND_TRIVIAL_SPECIAL_MEMBERS 1
-#endif
-
 // PSTL configuration
 
 #if __cplusplus >= 201703L
diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index c41f9f27e00..beed396fccb 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -44,24 +44,22 @@
 #include 
 #include 
 #include  // in_place_index_t
-#ifndef _GLIBCXX_HAVE_COND_TRIVIAL_SPECIAL_MEMBERS
-# include 
-#endif
 #if __cplusplus >= 202002L
 # include 
 #endif
 
+#if __cpp_concepts >= 202002L && __cpp_constexpr >= 201811L
+// P2231R1 constexpr needs constexpr unions and constrained destructors.
+# define __cpp_lib_variant 202106L
+#else
+# include  // Use __aligned_membuf instead of union.
+# define __cpp_lib_variant 202102L
+#endif
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
-#ifdef _GLIBCXX_HAVE_COND_TRIVIAL_SPECIAL_MEMBERS
-// P2231R1 constexpr needs constexpr unions and constrained destructors.
-# define __cpp_lib_variant 202106L
-#else
-# define __cpp_lib_variant 202102L
-#endif
-
   template class tuple;
   template class variant;
   template  struct hash;
diff --git a/libstdc++-v3/include/std/version b/libstdc++-v3/include/std/version
index 30e04b5..24311ee05c8 100644
--- a/libstdc++-v3/include/std/version
+++ b/libstdc++-v3/include/std/version
@@ -175,7 +175,7 @@
 # define __cpp_lib_to_chars 201611L
 #endif
 #define __cpp_lib_unordered_map_try_emplace 201411L
-#ifndef _GLIBCXX_HAVE_COND_TRIVIAL_SPECIAL_MEMBERS
+#if !(__cplusplus >= 202002L && __cpp_concepts >= 202002L)
 // N.B. updated value in C++20
 # define __cpp_lib_variant 202102L
 #endif
@@ -292,7 +292,7 @@
 # endif
 #define __cpp_lib_to_address 201711L
 #define __cpp_lib_to_array 201907L
-#ifdef _GLIBCXX_HAVE_COND_TRIVIAL_SPECIAL_MEMBERS
+#if __cplusplus >= 202002L && __cpp_concepts >= 202002L
 # define __cpp_lib_variant 202106L
 #endif
 #endif
-- 
2.34.1



[PATCH] PR c/104506: Tolerate error_mark_node in useless_type_conversion_p.

2022-02-14 Thread Roger Sayle
 

This simple fix to the middle-end, resolves PR c/104506, by adding an

explicit check for error_mark_node to useless_type_conversion_p.  I first

trying fixing this in the C front-end, but the type is valid at the point

that the NOP_EXPR is created, so the poisoned type leaks to the middle-end.

Returning either true or false from useless_type_conversion_p avoids the

ICE-after-error.  Apologies to Andrew Pinski, I hadn't noticed that he'd

assigned this PR to himself until after my regression testing had finished.

 

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and

make -k check with no new failures.  Ok for mainline?

 

 

2022-02-14  Roger Sayle  

 

gcc/ChangeLog

PR c/104506

* gimple-expr.cc (useless_type_conversion_p): Add a check for

error_mark_node.

 

gcc/testsuite/ChangeLog

PR c/104506

* gcc.dg/pr104506.c: New test case.

 

 

Roger

--

 

diff --git a/gcc/gimple-expr.cc b/gcc/gimple-expr.cc
index f9a650b..e63fa3c 100644
--- a/gcc/gimple-expr.cc
+++ b/gcc/gimple-expr.cc
@@ -83,6 +83,9 @@ useless_type_conversion_p (tree outer_type, tree inner_type)
return false;
 }
 
+  if (inner_type == error_mark_node)
+return true;
+
   /* From now on qualifiers on value types do not matter.  */
   inner_type = TYPE_MAIN_VARIANT (inner_type);
   outer_type = TYPE_MAIN_VARIANT (outer_type);
diff --git a/gcc/testsuite/gcc.dg/pr104506.c b/gcc/testsuite/gcc.dg/pr104506.c
new file mode 100644
index 000..79cec1a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104506.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+double x;
+/* { dg-message "note: previous declaration" "previous declaration" { target 
*-*
+ * -* } .-1 } */
+
+void foo (void)
+{
+  x;
+}
+
+int x;  /* { dg-error "conflicting types" } */
+


[committed] libstdc++: Fix typo in pragma

2022-02-14 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux, pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* testsuite/20_util/unsynchronized_pool_resource/allocate.cc:
Fix typo.
---
 .../testsuite/20_util/unsynchronized_pool_resource/allocate.cc  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/libstdc++-v3/testsuite/20_util/unsynchronized_pool_resource/allocate.cc 
b/libstdc++-v3/testsuite/20_util/unsynchronized_pool_resource/allocate.cc
index 25e5ce63b58..1c228eaa46a 100644
--- a/libstdc++-v3/testsuite/20_util/unsynchronized_pool_resource/allocate.cc
+++ b/libstdc++-v3/testsuite/20_util/unsynchronized_pool_resource/allocate.cc
@@ -287,7 +287,7 @@ test07()
 // so that careful_resource::do_allocate can distinguish this allocation
 // from any required for the pool resource's internal data structures):
 void* p = upr.allocate(std::size_t(-2), 1024);
-#pragma GCC distinguish pop
+#pragma GCC diagnostic pop
 // Should not reach here!
 VERIFY( !"attempt to allocate SIZE_MAX-1 should not have succeeded" );
 throw p;
-- 
2.34.1



[committed] libstdc++: Fix std::to_chars for IEEE128 long double

2022-02-14 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux and powerpc64le-linux with both ieee/ibm ABIs.

Pushed to trunk. This should be backported to gcc-11 too.

-- >8 --

The preprocessor check for _GLIBCXX_USE_FLOAT128 is the wrong condition,
because when the compiler is built with --with-long-double-format=ieee
configure determines that __float128 is the same as long double, and so
should not be used. But we do want the std::to_chars overloads for
__float128 in that case, because the floating_to_chars.cc file is built
with -mabi=ibmlongdouble and so the __float128 overloads are actually
the 'long double' ones for -mabi=ieeelongdouble code.

This fixes missing definitions of the __float128 overloads of
std::to_chars for --with-long-double-format=ieee builds. Without this,
there are symbols present in the --with-long-double-abi=ibm build which
are missing from the --with-long-double-abi=ieee build.

libstdc++-v3/ChangeLog:

* src/c++17/floating_to_chars.cc (FLOAT128_TO_CHARS): Depend on
LONG_DOUBLE_ALT128_COMPAT instead of USE_FLOAT128.
---
 libstdc++-v3/src/c++17/floating_to_chars.cc | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/src/c++17/floating_to_chars.cc 
b/libstdc++-v3/src/c++17/floating_to_chars.cc
index 8da2f2385a0..5825e661bf4 100644
--- a/libstdc++-v3/src/c++17/floating_to_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_to_chars.cc
@@ -76,16 +76,12 @@ extern "C" int __sprintfieee128(char*, const char*, ...);
 # define LONG_DOUBLE_KIND LDK_UNSUPPORTED
 #endif
 
-#if defined _GLIBCXX_USE_FLOAT128 && __FLT128_MANT_DIG__ == 113
+// For now we only support __float128 when it's the powerpc64 __ieee128 type.
+#if defined _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT && __FLT128_MANT_DIG__ == 113
 // Define overloads of std::to_chars for __float128.
 # define FLOAT128_TO_CHARS 1
 #endif
 
-// For now we only support __float128 when it's the powerpc64 __ieee128 type.
-#ifndef _GLIBCXX_LONG_DOUBLE_ALT128_COMPAT
-# undef FLOAT128_TO_CHARS
-#endif
-
 #ifdef FLOAT128_TO_CHARS
 using F128_type = __float128;
 #else
-- 
2.34.1



Re: [PATCH] ipa: Careful processing ANCESTOR jump functions and NULL pointers (PR 103083)

2022-02-14 Thread Martin Jambor
Hello Honza,

On Mon, Dec 13 2021, Jan Hubicka wrote:
>> >>> +  || (only_for_nonzero && 
>> >>> !src_lats->bits_lattice.known_nonzero_p ()))
>> >>> +{
>> >>> +  if (jfunc->bits)
>> >>> +return dest_lattice->meet_with (jfunc->bits->value,
>> >>> +jfunc->bits->mask, 
>> >>> precision);
>> >>> +  else
>> >>> +return dest_lattice->set_to_bottom ();
>> >>> +}
>> >> I do not think you need to set to bottom here. For pointers, we
>> >> primarily track alignment and NULL is aligned - all you need to do is to
>> >> make sure that we do not believe that some bits are 1.
>> >
>> > I am not sure I understand, the testcase you wrote has all bits as zeros
>> > and still miscompiles?  It is primarily used for alignment but not only
>> > for that.
>
> Maybe I misunderstand the code.  But if you have only_for_nonzero and
> you do not know htat src lattice is non-zero you will get
>  - if src is 0, then dest is 0
>  - if src is non-zero then dest is src+offset
>
> So all trialing bits that are known to be 0 in src and offset are known
> to be 0 in the result.  But I do not see where th eoffset is mixed in?
>

I went back and tried to figure out again what you meant and I hope it
was the following patch, which does not drop the lattice to bottom for
ANCESTORs but instead masks any bits that would be considered known
ones.  It is indeed clever and I did not see the possibility when
writing the patch.

The following has passed bootstrap, LTO bootstrap and testing on
x86-64.  OK for trunk (and then to be backported to 11 and 10)?

Thanks,

Martin


IPA_JF_ANCESTOR jump functions are constructed also when the formal
parameter of the caller is first checked whether it is NULL and left
as it is if it is NULL, to accommodate C++ casts to an ancestor class.

The jump function type was invented for devirtualization and IPA-CP
propagation of tree constants is also careful to apply it only to
existing DECLs(*) but as PR 103083 shows, the part propagating "known
bits" was not careful about this, which can lead to miscompilations.

This patch introduces a flag to the ancestor jump functions which
tells whether a NULL-check was elided when creating it and makes the
bits propagation behave accordingly, masking any bits otherwise would
be known to be one.  This should safely preserve alignment info, which
is the primary ifnormation that we keep in bits for pointers.

(*) There still may remain problems when a DECL resides on address
zero (with -fno-delete-null-pointer-checks ...I hope it cannot happen
otherwise).  I am looking into that now but I think it will be easier
for everyone if I do so in a follow-up patch.

gcc/ChangeLog:

2022-02-11  Martin Jambor  

PR ipa/103083
* ipa-prop.h (ipa_ancestor_jf_data): New flag keep_null;
(ipa_get_jf_ancestor_keep_null): New function.
* ipa-prop.c (ipa_set_ancestor_jf): Initialize keep_null field of the
ancestor function.
(compute_complex_assign_jump_func): Pass false to keep_null
parameter of ipa_set_ancestor_jf.
(compute_complex_ancestor_jump_func): Pass true to keep_null
parameter of ipa_set_ancestor_jf.
(update_jump_functions_after_inlining): Carry over keep_null from the
original ancestor jump-function or merge them.
(ipa_write_jump_function): Stream keep_null flag.
(ipa_read_jump_function): Likewise.
(ipa_print_node_jump_functions_for_edge): Print the new flag.
* ipa-cp.c (class ipcp_bits_lattice): Make various getters const.  New
member function known_nonzero_p.
(ipcp_bits_lattice::known_nonzero_p): New.
(ipcp_bits_lattice::meet_with_1): New parameter drop_all_ones,
observe it.
(ipcp_bits_lattice::meet_with): Likewise.
(propagate_bits_across_jump_function): Simplify.  Pass true in
drop_all_ones when it is necessary.
(propagate_aggs_across_jump_function): Take care of keep_null
flag.
(ipa_get_jf_ancestor_result): Propagate NULL accross keep_null
jump functions.

gcc/testsuite/ChangeLog:

2021-11-25  Martin Jambor  

* gcc.dg/ipa/pr103083-1.c: New test.
* gcc.dg/ipa/pr103083-2.c: Likewise.
---
 gcc/ipa-cp.cc | 75 ++-
 gcc/ipa-prop.cc   | 20 +--
 gcc/ipa-prop.h| 13 +
 gcc/testsuite/gcc.dg/ipa/pr103083-1.c | 28 ++
 gcc/testsuite/gcc.dg/ipa/pr103083-2.c | 30 +++
 5 files changed, 137 insertions(+), 29 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103083-1.c
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr103083-2.c

diff --git a/gcc/ipa-cp.cc b/gcc/ipa-cp.cc
index 453e9c93cc3..93a54ae83fc 100644
--- a/gcc/ipa-cp.cc
+++ b/gcc/ipa-cp.cc
@@ -306,17 +306,18 @@ public:
 class ipcp_bits_lattice
 {
 public:
-  bool bottom_p () { return m_lattice_val 

[PATCH] tree-optimization/104511 - avoid FP to DFP conversion for VEC_PACK_TRUNC

2022-02-14 Thread Richard Biener via Gcc-patches
This avoids forwprop from matching DFP <-> FP vector conversions
using VEC_[UN]PACK{_TRUNC,_LO,_HI}.  Maybe DFP vectors shouldn't be
a thing, but they appearantly are.  Re-using CONVERT/NOP_EXPR for
DFP <-> FP conversions was probably a mistake.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-02-14  Richard Biener  

PR tree-optimization/104511
* tree-ssa-forwprop.c (simplify_vector_constructor): Avoid
touching DFP <-> FP conversions.

* gcc.dg/pr104511.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr104511.c | 16 
 gcc/tree-ssa-forwprop.cc|  9 +
 2 files changed, 25 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr104511.c

diff --git a/gcc/testsuite/gcc.dg/pr104511.c b/gcc/testsuite/gcc.dg/pr104511.c
new file mode 100644
index 000..ad5430c67c8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr104511.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target dfp } } */
+/* { dg-options "-O -Wno-psabi" } */
+
+typedef _Float64 __attribute__((__vector_size__ (32))) F;
+typedef _Decimal32 __attribute__((__vector_size__ (16))) D;
+
+extern void bar (void);
+
+D g;
+void
+foo (F f)
+{
+  D d = __builtin_convertvector (f, D);
+  bar ();
+  g = d;
+}
diff --git a/gcc/tree-ssa-forwprop.cc b/gcc/tree-ssa-forwprop.cc
index 709bde6defa..484491fa1c5 100644
--- a/gcc/tree-ssa-forwprop.cc
+++ b/gcc/tree-ssa-forwprop.cc
@@ -2824,6 +2824,15 @@ simplify_vector_constructor (gimple_stmt_iterator *gsi)
 ? VEC_UNPACK_FLOAT_HI_EXPR
 : VEC_UNPACK_HI_EXPR);
 
+ /* Conversions between DFP and FP have no special tree code
+but we cannot handle those since all relevant vector conversion
+optabs only have a single mode.  */
+ if (CONVERT_EXPR_CODE_P (conv_code)
+ && FLOAT_TYPE_P (TREE_TYPE (type))
+ && (DECIMAL_FLOAT_TYPE_P (TREE_TYPE (type))
+ != DECIMAL_FLOAT_TYPE_P (TREE_TYPE (conv_src_type
+   return false;
+
  if (CONVERT_EXPR_CODE_P (conv_code)
  && (2 * TYPE_PRECISION (TREE_TYPE (TREE_TYPE (orig[0])))
  == TYPE_PRECISION (TREE_TYPE (type)))
-- 
2.34.1


[PATCH] c++: Don't reject GOTO_EXPRs to cdtor_label in potential_constant_expression_1 [PR104513]

2022-02-14 Thread Jakub Jelinek via Gcc-patches
Hi!

return in ctors on targetm.cxx.cdtor_returns_this () target like arm
is emitted as GOTO_EXPR cdtor_label where at cdtor_label it emits
RETURN_EXPR with the this.
Similarly, in all dtors regardless of targetm.cxx.cdtor_returns_this ()
a return is emitted similarly.

potential_constant_expression_1 was rejecting these gotos and so we
incorrectly rejected these testcases, but actual cxx_eval* is apparently
handling these just fine.  I was a little bit worried that for the
destruction of bases we wouldn't evaluate something we should, but as the
testcase shows, that is evaluated through try ... finally and there is
nothing after the cdtor_label.  For arm there is RETURN_EXPR this; but we
don't really care about the return value from ctors and dtors during the
constexpr evaluation.

Bootstrapped/regtested on {x86_64,i686,aarch64,powerpc64le,s390x}-linux
and armv7hl-linux-gnueabi, ok for trunk?

I must say I don't see much the point of cdtor_labels at all, I'd think
that with try ... finally around it for non-arm we could just RETURN_EXPR
instead of the GOTO_EXPR and the try/finally gimplification would DTRT,
and we could just add the right return value for the arm case.

2022-02-14  Jakub Jelinek  

PR c++/104513
* constexpr.cc (potential_constant_expression_1) :
Don't punt if returns (target).

* g++.dg/cpp1y/constexpr-104513.C: New test.
* g++.dg/cpp2a/constexpr-dtor12.C: New test.

--- gcc/cp/constexpr.cc.jj  2022-02-11 13:52:32.697425776 +0100
+++ gcc/cp/constexpr.cc 2022-02-12 13:51:21.000274390 +0100
@@ -9364,7 +9364,7 @@ potential_constant_expression_1 (tree t,
   {
tree *target = _OPERAND (t, 0);
-   /* Gotos representing break and continue are OK.  */
-   if (breaks (target) || continues (target))
+   /* Gotos representing break, continue and cdtor return are OK.  */
+   if (breaks (target) || continues (target) || returns (target))
  {
*jump_target = *target;
return true;
--- gcc/testsuite/g++.dg/cpp1y/constexpr-104513.C.jj2022-02-12 
19:05:14.374101202 +0100
+++ gcc/testsuite/g++.dg/cpp1y/constexpr-104513.C   2022-02-12 
19:06:09.745332658 +0100
@@ -0,0 +1,10 @@
+// PR c++/104513
+// { dg-do compile { target c++14 } }
+
+struct A {
+  int a1;
+  short a2, a3;
+  long a4;
+  constexpr A() : a1(42), a2(42), a3(42), a4(42) { return; }
+};
+constexpr A a;
--- gcc/testsuite/g++.dg/cpp2a/constexpr-dtor12.C.jj2022-02-12 
19:04:41.610555957 +0100
+++ gcc/testsuite/g++.dg/cpp2a/constexpr-dtor12.C   2022-02-12 
19:04:18.473877087 +0100
@@ -0,0 +1,13 @@
+// PR c++/104513
+// { dg-do compile { target c++20 } }
+
+struct S {
+  constexpr S () : s (nullptr) {}
+  constexpr ~S () { delete s; }
+  int *s;
+};
+struct T : S {
+  constexpr T () : S () {}
+  constexpr ~T () { s = new int (42); return; }
+};
+constexpr T t;

Jakub



Re: [pushed] LRA, rs6000, Darwin: Amend lo_sum use for forced constants [PR104117].

2022-02-14 Thread Richard Sandiford via Gcc-patches
Iain Sandoe via Gcc-patches  writes:
> Two issues resulted in this PR, which manifests when we force a constant into
> memory in LRA (in PIC code on Darwin).  The presence of such forced constants
> is quite dependent on other RTL optimisations, and it is easy for the issue to
> become latent for a specific case.
>
> First, in the Darwin-specific rs6000 backend code, we were not being careful
> enough in rejecting invalid symbolic addresses.  Specifically, when generating
> PIC code, we require a SYMBOL_REF to be wrapped in an UNSPEC_MACHOPIC_OFFSET.
>
> Second, LRA was attempting to load a register using an invalid lo_sum address.
>
> The LRA changes are approved in the PR by Vladimir, and the RS6000 changes are
> Darwin-specific (although, of course, any observations are welcome).
>
> Tested on several lo_sum targets and x86_64 all languages except as noted:
> powerpc64-linux (m32/m64) -D
> powerpc64le-linux  -D
> powerpc64-aix -Ada -Go -D
> aarch64-linux -Ada -D
> x86_64-linux all langs -D
> powerpc-darwin9 (master and 11.2) -D -Go.
>
> pushed to master, thanks,
> Iain
>
> Signed-off-by: Iain Sandoe 
> Co-authored-by: Vladimir Makarov 
>
>   PR target/104117
>
> gcc/ChangeLog:
>
>   * config/rs6000/rs6000.cc (darwin_rs6000_legitimate_lo_sum_const_p):
>   Check for UNSPEC_MACHOPIC_OFFSET wrappers on symbolic addresses when
>   emitting PIC code.
>   (legitimate_lo_sum_address_p): Likewise.
>   * lra-constraints.cc (process_address_1): Do not attempt to emit a reg
>   load from an invalid lo_sum address.
> ---
>  gcc/config/rs6000/rs6000.cc | 38 +++--
>  gcc/lra-constraints.cc  | 17 ++---
>  2 files changed, 38 insertions(+), 17 deletions(-)
>
> […]
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> index fdff9e0720a..c700c3f4578 100644
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -3625,21 +3625,8 @@ process_address_1 (int nop, bool check_only_p,
> *ad.inner = gen_rtx_LO_SUM (Pmode, new_reg, addr);
> if (!valid_address_p (op, , cn))
>   {
> -   /* Try to put lo_sum into register.  */
> -   insn = emit_insn (gen_rtx_SET
> - (new_reg,
> -  gen_rtx_LO_SUM (Pmode, new_reg, 
> addr)));
> -   code = recog_memoized (insn);
> -   if (code >= 0)
> - {
> -   *ad.inner = new_reg;
> -   if (!valid_address_p (op, , cn))
> - {
> -   *ad.inner = addr;
> -   code = -1;
> - }
> - }
> -
> +   *ad.inner = addr; /* Punt.  */
> +   code = -1;
>   }
>   }
> if (code < 0)

Could you go into more details about this?  Why is it OK to continue
to try:

  (lo_sum new_reg addr)

directly as an address (the context at the top of the hunk), but not try
moving the lo_sum into a register?  They should be semantically equivalent,
so it seems that if one is wrong, the other would be too.

Thanks,
Richard


Re: [PATCH] c/104505 - ICE with internal function call in diagnostic expression

2022-02-14 Thread Jakub Jelinek via Gcc-patches
On Mon, Feb 14, 2022 at 10:32:52AM +0100, Richard Biener wrote:
> The following handles internal function calls similar to how the
> C++ frontend does, avoiding ICEing on those.
> 
> Bootstrapped and tested on x86_64-unkown-linux-gnu, OK?
> 
> Thanks,
> Richard.
> 
> 2022-02-14  Richard Biener  
> 
>   PR c/104505
> gcc/c-family/
>   * c-pretty-print.cc (c_pretty_printer::postfix_expression): Handle
>   internal function calls.
> 
> gcc/testsuite/
>   * c-c++-common/pr104505.c: New testcase.

LGTM.

Jakub



[PATCH] c/104505 - ICE with internal function call in diagnostic expression

2022-02-14 Thread Richard Biener via Gcc-patches
The following handles internal function calls similar to how the
C++ frontend does, avoiding ICEing on those.

Bootstrapped and tested on x86_64-unkown-linux-gnu, OK?

Thanks,
Richard.

2022-02-14  Richard Biener  

PR c/104505
gcc/c-family/
* c-pretty-print.cc (c_pretty_printer::postfix_expression): Handle
internal function calls.

gcc/testsuite/
* c-c++-common/pr104505.c: New testcase.
---
 gcc/c-family/c-pretty-print.cc|  6 +-
 gcc/testsuite/c-c++-common/pr104505.c | 12 
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/c-c++-common/pr104505.c

diff --git a/gcc/c-family/c-pretty-print.cc b/gcc/c-family/c-pretty-print.cc
index ceedaea962a..dac17753acb 100644
--- a/gcc/c-family/c-pretty-print.cc
+++ b/gcc/c-family/c-pretty-print.cc
@@ -32,6 +32,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "selftest.h"
 #include "langhooks.h"
 #include "options.h"
+#include "internal-fn.h"
 
 /* The pretty-printer code is primarily designed to closely follow
(GNU) C and C++ grammars.  That is to be contrasted with spaghetti
@@ -1601,7 +1602,10 @@ c_pretty_printer::postfix_expression (tree e)
   {
call_expr_arg_iterator iter;
tree arg;
-   postfix_expression (CALL_EXPR_FN (e));
+   if (CALL_EXPR_FN (e) != NULL_TREE)
+ postfix_expression (CALL_EXPR_FN (e));
+   else
+ pp_string (this, internal_fn_name (CALL_EXPR_IFN (e)));
pp_c_left_paren (this);
FOR_EACH_CALL_EXPR_ARG (arg, iter, e)
  {
diff --git a/gcc/testsuite/c-c++-common/pr104505.c 
b/gcc/testsuite/c-c++-common/pr104505.c
new file mode 100644
index 000..7fa3d841197
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr104505.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+
+typedef char __attribute__((__vector_size__ (8))) U;
+typedef short __attribute__((__vector_size__ (16))) V;
+
+U u;
+
+void
+foo (V v)
+{
+  u = __builtin_shufflevector (u, u, __builtin_convertvector (v, U)); /* { 
dg-error "invalid element index" } */
+}
-- 
2.34.1


Re: [PATCH] middle-end/104497 - gimplification of vector indexing

2022-02-14 Thread Richard Biener via Gcc-patches
On Fri, 11 Feb 2022, Jason Merrill wrote:

> On 2/11/22 06:26, Richard Biener wrote:
> > The following attempts to address gimplification of
> > 
> > ... = VIEW_CONVERT_EXPR((i & 1) != 0 ? inv : src)[i];
> > 
> > which is problematic since gimplifying the base object
> > ? inv : src produces a register temporary but GIMPLE does not
> > really support a register as a base for an ARRAY_REF (even
> > though that's not strictly validated it seems as can be seen
> > at -O0).
> 
> I suppose that isn't easy to fix?

I think it's more that we don't like to have that.  There are some
optimization passes that do not expect SSA variables as bases
of (nested) tcc_reference ops.  We obviously have a few exceptions
for non-nested {REAL,IMAG}PART_EXPR, BIT_FIELD_REF and
VIEW_CONVERT_EXRP.  For the case in question it would be
1) ARRAY_REF, and 2) nested (because SSA names never have array type).

For IL "niceyness" iff we want a variable-index operation for
vector types I'd rather have non-nested tcc_reference here and
allow ARRAY_REF to operate on VECTOR_TYPE directly.

So yes, at this point that isn't easy to fix.

> And COMPONENT_REF has the same problem?

Yes.  SSA names never have record or union type.
 
> > Interestingly the C++ frontend avoids this issue
> > by emitting the following GENERIC instead:
> > 
> > ... = (i & 1) != 0 ? VIEW_CONVERT_EXPR(inv)[i] :
> > VIEW_CONVERT_EXPR(src)[i];
> 
> Yes, because in C++ ?: of two lvalues is an lvalue.

Ah, so maybe one could reproduce with a mixed lvalue / rvalue.
No, that ends up with

(i & 1) != 0 ? VIEW_CONVERT_EXPR(inv)[i] : NON_LVALUE_EXPR 
(NON_LVALUE_EXPR )[i]>

then.

> > The proposed patch below fixes things up when using an rvalue
> > as the base is OK by emitting a copy from a register base to a
> > non-register one.  The ?: as lvalue extension seems to be gone
> > for C, C++ again unwraps the COND_EXPR in that case.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > OK?
> 
> OK, assuming "yes" answers to my questions above.

So what eventually might work is have the C frontend produce IL
similat to the C++ FE.  But then I'm not really sure that
the COND_EXPR case is the only one that requires special treatment.
It is really the gimplify_compound_lval outer refs speciality
that we do not communicate to the base gimplification, so
conceptually the fix is correct.

I've pushed the change now.

Thanks,
Richard.

> > Thanks,
> > Richard.
> > 
> > 2022-02-11  Richard Biener  
> > 
> >  PR middle-end/104497
> >  * gimplify.cc (gimplify_compound_lval): Make sure the
> >  base is a non-register if needed and possible.
> > 
> > * c-c++-common/torture/pr104497.c: New testcase.
> > ---
> >   gcc/gimplify.cc   | 17 ++---
> >   gcc/testsuite/c-c++-common/torture/pr104497.c | 12 
> >   2 files changed, 26 insertions(+), 3 deletions(-)
> >   create mode 100644 gcc/testsuite/c-c++-common/torture/pr104497.c
> > 
> > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> > index 8d676fb96c8..cdf1ccbe48b 100644
> > --- a/gcc/gimplify.cc
> > +++ b/gcc/gimplify.cc
> > @@ -250,6 +250,7 @@ static enum gimplify_status gimplify_compound_expr (tree
> > *, gimple_seq *, bool);
> >   static hash_map *oacc_declare_returns;
> >   static enum gimplify_status gimplify_expr (tree *, gimple_seq *,
> >   gimple_seq *,
> >bool (*) (tree), fallback_t, bool);
> > +static void prepare_gimple_addressable (tree *, gimple_seq *);
> >   
> >   /* Shorter alias name for the above function for use in gimplify.cc
> >  only.  */
> > @@ -3126,10 +3127,12 @@ gimplify_compound_lval (tree *expr_p, gimple_seq
> > *pre_p, gimple_seq *post_p,
> >gimplified before gimplifying the size expressions.
> >   
> >So we do this in three steps.  First we deal with variable
> > - bounds, sizes, and positions, then we gimplify the base,
> > - then we deal with the annotations for any variables in the
> > - components and any indices, from left to right.  */
> > + bounds, sizes, and positions, then we gimplify the base and
> > + ensure it is memory if needed, then we deal with the annotations
> > + for any variables in the components and any indices, from left
> > + to right.  */
> >   +  bool need_non_reg = false;
> > for (i = expr_stack.length () - 1; i >= 0; i--)
> >   {
> > tree t = expr_stack[i];
> > @@ -3165,6 +3168,7 @@ gimplify_compound_lval (tree *expr_p, gimple_seq
> > *pre_p, gimple_seq *post_p,
> >   TREE_OPERAND (t, 3) = elmt_size;
> > }
> > }
> > + need_non_reg = true;
> >}
> >  else if (TREE_CODE (t) == COMPONENT_REF)
> > {
> > @@ -3186,6 +3190,7 @@ gimplify_compound_lval (tree *expr_p, gimple_seq
> > *pre_p, gimple_seq *post_p,
> >   TREE_OPERAND (t, 2) = offset;
> > }
> > }
> > + need_non_reg = true;
> >}
> >   }
> >   @@ -3196,6 +3201,12 @@