[PATCH v1] RISC-V: Refactor RVV frm_mode attr for rounding mode intrinsic

2023-08-05 Thread Pan Li via Gcc-patches
From: Pan Li 

The frm_mode attr has some assumptions for each define insn as below.

1. The define insn has at least 9 operands.
2. The operands[9] must be frm reg.
3. The operands[9] must be const int.

Actually, the frm operand can be operands[8], operands[9] or
operands[10], and not all the define insn has frm operands.

This patch would like to refactor frm and eliminate the above
assumptions, as well as unblock the underlying rounding mode intrinsic
API support.

After refactor, the default frm will be none, and the selected insn type
will be dyn. For the floating point which honors the frm, we will
set the frm_mode attr explicitly in define_insn.

Passed both the riscv.exp and rvv.exp for rv32/rv64 tests.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (get_frm_mode): Remove operand
assumptions.
* config/riscv/riscv-v.cc (get_frm_mode): New function.
* config/riscv/riscv-vector-builtins.cc
(function_expander::use_ternop_insn):
* config/riscv/vector.md: Set frm_mode attr explicitly.
---
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   |  28 
 gcc/config/riscv/riscv-vector-builtins.cc |  22 ++-
 gcc/config/riscv/vector.md| 170 ++
 4 files changed, 159 insertions(+), 62 deletions(-)

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 324991e2619..33f7cb1d670 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -236,6 +236,7 @@ bool check_builtin_call (location_t, vec, 
unsigned int,
   tree, unsigned int, tree *);
 bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
 bool legitimize_move (rtx, rtx);
+int get_frm_mode (rtx);
 void emit_vlmax_vsetvl (machine_mode, rtx);
 void emit_hard_vlmax_vsetvl (machine_mode, rtx);
 void emit_vlmax_insn (unsigned, int, rtx *, rtx = 0);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 278452b9e05..d5fb8611d6e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -1513,6 +1513,34 @@ expand_const_vector (rtx target, rtx src)
 gcc_unreachable ();
 }
 
+/* Get the frm mode with given CONST_INT rtx, the default mode is
+   FRM_MODE_DYN.  */
+int
+get_frm_mode (rtx operand)
+{
+  gcc_assert (CONST_INT_P (operand));
+
+  switch (INTVAL (operand))
+{
+case FRM_RNE:
+  return FRM_MODE_RNE;
+case FRM_RTZ:
+  return FRM_MODE_RTZ;
+case FRM_RDN:
+  return FRM_MODE_RDN;
+case FRM_RUP:
+  return FRM_MODE_RUP;
+case FRM_RMM:
+  return FRM_MODE_RMM;
+case FRM_DYN:
+  return FRM_MODE_DYN;
+default:
+  return FRM_MODE_DYN;
+}
+
+  gcc_unreachable ();
+}
+
 /* Expand a pre-RA RVV data move from SRC to DEST.
It expands move for RVV fractional vector modes.  */
 bool
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 528dca7ae85..abab06c00ed 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -3730,17 +3730,29 @@ function_expander::use_ternop_insn (bool vd_accum_p, 
insn_code icode)
 }
 
   for (int argno = arg_offset; argno < call_expr_nargs (exp); argno++)
-add_input_operand (argno);
+{
+  if (base->has_rounding_mode_operand_p ()
+ && argno == call_expr_nargs (exp) - 2)
+   {
+ /* Since the rounding mode argument position is not consistent with
+the instruction pattern, we need to skip rounding mode argument
+here.  */
+ continue;
+   }
+  add_input_operand (argno);
+}
 
   add_input_operand (Pmode, get_tail_policy_for_pred (pred));
   add_input_operand (Pmode, get_mask_policy_for_pred (pred));
   add_input_operand (Pmode, get_avl_type_rtx (avl_type::NONVLMAX));
 
-  /* TODO: Currently, we don't support intrinsic that is modeling rounding 
mode.
- We add default rounding mode for the intrinsics that didn't model rounding
- mode yet.  */
+  if (base->has_rounding_mode_operand_p ())
+add_input_operand (call_expr_nargs (exp) - 2);
+
+  /* The RVV floating-point only support dynamic rounding mode in the
+ FRM register.  */
   if (opno != insn_data[icode].n_generator_args)
-add_input_operand (Pmode, const0_rtx);
+add_input_operand (Pmode, gen_int_mode (riscv_vector::FRM_DYN, Pmode));
 
   return generate_insn (icode);
 }
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 750b2de8df9..db3ee105ef4 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -867,26 +867,8 @@ (define_attr "vxrm_mode" "rnu,rne,rdn,rod,none"
 ;; Defines rounding mode of an floating-point operation.
 (define_attr "frm_mode" "rne,rtz,rdn,rup,rmm,dyn,dyn_exit,dyn_call,none"
   (cond [(eq_attr "type" "vfalu,vfwalu,vfmul,vfdiv,vfwmul")
- (cond
-  [(match_test "INTVAL 

Re: [PATCH] Add -Wdisabled-optimization warning for not optimizing sibling calls

2023-08-05 Thread David Malcolm via Gcc-patches
On Sun, 2023-08-06 at 02:28 +0530, Prathamesh Kulkarni via Gcc-patches
wrote:
> On Fri, 4 Aug 2023 at 23:28, Bradley Lucier via Gcc-patches
>  wrote:

Hi Bradley and Prathamesh...

> > 
> > The patch at the end adds a warning when a tail/sibling call cannot
> > be
> > optimized for various reasons.
> > 
> > I built and tested GCC with and without the patch with
> > configuration
> > 
> > Configured with: ../../gcc-mainline/configure --enable-languages=c
> > --disable-multilib --prefix=/pkgs/gcc-mainline --disable-werror
> > 
> > There were some changes in the test results, but I can't say that
> > they
> > look substantive:
> > 

[...]

> > 
> > to test the new warning.  The warnings are of the form, e.g.,
> > 
> > ../../../gcc-mainline/gcc/tree-vect-stmts.cc:11990:44: warning:
> > cannot
> > apply sibling-call optimization: callee required more stack slots
> > than
> > the caller [-Wdisabled-optimization]
> > 
> > These are the number of times this warning was triggered building
> > stage1:
> > 
> > grep warning: build.log | grep sibling | sed 's/^.*://' | sort |
> > uniq -c
> >  259  callee required more stack slots than the caller
> > [-Wdisabled-optimization]
> >   43  callee returns a structure [-Wdisabled-optimization]
> > 
> > If this patch is OK, someone else will need to commit it for me.
> > 
> > Brad
> > 
> > gcc/Changelog
> > 
> >     * calls.cc (maybe_complain_about_tail_call) Add warning
> > when
> >     tail or sibling call cannot be optimized.
> Hi Bradley,
> I don't have comments on the patch, but a new warning will also
> require a corresponding entry in doc/invoke.texi.

To nitpick, this isn't a new warning; the patch is extending an
existing warning.  Looking at the existing entry for that warning I
see:

@opindex Wdisabled-optimization
@opindex Wno-disabled-optimization
@item -Wdisabled-optimization
Warn if a requested optimization pass is disabled.  This warning does
not generally indicate that there is anything wrong with your code; it
merely indicates that GCC's optimizers are unable to handle the code
effectively.  Often, the problem is that your code is too big or too
complex; GCC refuses to optimize programs when the optimization
itself is likely to take inordinate amounts of time.

...which arguably fits the new functionality.  Though I don't know how
the optimizer maintainers feel about it.  Also, as we add more stuff to
this warning, would users need more fine-grained control over which
things for the optimizer to complain about?  I'm not sure.

> 
> Thanks,
> Prathamesh
> > 
> > diff --git a/gcc/calls.cc b/gcc/calls.cc
> > index 1f3a6d5c450..b95c876fda8 100644
> > --- a/gcc/calls.cc
> > +++ b/gcc/calls.cc
> > @@ -1242,10 +1242,12 @@ void
> >   maybe_complain_about_tail_call (tree call_expr, const char
> > *reason)
> >   {
> >     gcc_assert (TREE_CODE (call_expr) == CALL_EXPR);
> > -  if (!CALL_EXPR_MUST_TAIL_CALL (call_expr))
> > -    return;
> > -
> > -  error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s",
> > reason);
> > +  if (CALL_EXPR_MUST_TAIL_CALL (call_expr))
> > +    error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s",
> > reason);

The existing code use error_at, passing it the location of the
call_expr...

> > +  else if (flag_optimize_sibling_calls)
> > +    warning (OPT_Wdisabled_optimization,
> > + "cannot apply sibling-call optimization: %s",
> > reason);

...but the warning branch uses "warning", which implicitly uses the
input_location global variable.  Is the warning reported at the correct
place?  It's better to use warning_at and pass it the location at which
the warning should be emitted.

The patch doesn't add any test cases, but I imagine any such cases
would be very target-dependent (did I add any to my libgccjit version
of this way back when?)

Thanks for the patch; hope this is constructive
Dave



> > +  return;
> >   }
> > 
> >   /* Fill in ARGS_SIZE and ARGS array based on the parameters found
> > in
> > 
> > 
> 



Re: [PATCH] Add -Wdisabled-optimization warning for not optimizing sibling calls

2023-08-05 Thread Prathamesh Kulkarni via Gcc-patches
On Sun, 6 Aug 2023 at 03:07, Bradley Lucier  wrote:
>
> On 8/5/23 4:58 PM, Prathamesh Kulkarni wrote:
> > I don't have comments on the patch, but a new warning will also
> > require a corresponding entry in doc/invoke.texi.
>
> Thank you for your comment.
>
> -Wdisabled-optimization is an established warning, it's just that I'd
> like it to apply in another circumstance.  Maybe that doesn't need new
> documentation.
Oops I misread your patch as adding a new warning :/
Sorry for the noise.

Best Regards,
Prathamesh
>
> Brad Lucier


Re: [PATCH] Add -Wdisabled-optimization warning for not optimizing sibling calls

2023-08-05 Thread Bradley Lucier via Gcc-patches

On 8/5/23 4:58 PM, Prathamesh Kulkarni wrote:

I don't have comments on the patch, but a new warning will also
require a corresponding entry in doc/invoke.texi.


Thank you for your comment.

-Wdisabled-optimization is an established warning, it's just that I'd 
like it to apply in another circumstance.  Maybe that doesn't need new 
documentation.


Brad Lucier


Re: [PATCH] Add -Wdisabled-optimization warning for not optimizing sibling calls

2023-08-05 Thread Prathamesh Kulkarni via Gcc-patches
On Fri, 4 Aug 2023 at 23:28, Bradley Lucier via Gcc-patches
 wrote:
>
> The patch at the end adds a warning when a tail/sibling call cannot be
> optimized for various reasons.
>
> I built and tested GCC with and without the patch with configuration
>
> Configured with: ../../gcc-mainline/configure --enable-languages=c
> --disable-multilib --prefix=/pkgs/gcc-mainline --disable-werror
>
> There were some changes in the test results, but I can't say that they
> look substantive:
>
> diff -C 2 summary.log ../gcc-mainline
> *** summary.log Thu Aug  3 22:56:13 2023
> --- ../gcc-mainline/summary.log Thu Aug  3 19:42:33 2023
> ***
> *** 14,22 
> === g++ Summary ===
>
> ! # of expected passes  239234
># of unexpected failures 5
># of expected failures   2087
> ! # of unsupported tests10566
> ! /home/lucier/programs/gcc/objdirs/gcc-mainline-new/gcc/xg++  version
> 14.0.0 20230802 (experimental) (GCC)
>
> === gcc tests ===
> --- 14,22 
> === g++ Summary ===
>
> ! # of expected passes  239262
># of unexpected failures 5
># of expected failures   2087
> ! # of unsupported tests10562
> ! /home/lucier/programs/gcc/objdirs/gcc-mainline/gcc/xg++  version
> 14.0.0 20230802 (experimental) (GCC)
>
> === gcc tests ===
> ***
> *** 155,164 
> === gcc Summary ===
>
> ! # of expected passes  192553
># of unexpected failures 109
># of unexpected successes19
># of expected failures   1506
> ! # of unsupported tests2623
> ! /home/lucier/programs/gcc/objdirs/gcc-mainline-new/gcc/xgcc  version
> 14.0.0 20230802 (experimental) (GCC)
>
> === libatomic tests ===
> --- 155,164 
> === gcc Summary ===
>
> ! # of expected passes  192563
># of unexpected failures 109
># of unexpected successes19
># of expected failures   1506
> ! # of unsupported tests2619
> ! /home/lucier/programs/gcc/objdirs/gcc-mainline/gcc/xgcc  version
> 14.0.0 20230802 (experimental) (GCC)
>
> === libatomic tests ===
>
> I then configured and built GCC with
>
>   ../../gcc-mainline/configure CXX="/pkgs/gcc-mainline-new/bin/g++
> -Wdisabled-optimization" --enable-languages=c --disable-multilib
> --prefix=/pkgs/gcc-mainline-test --disable-werror --disable-bootstrap
>
> to test the new warning.  The warnings are of the form, e.g.,
>
> ../../../gcc-mainline/gcc/tree-vect-stmts.cc:11990:44: warning: cannot
> apply sibling-call optimization: callee required more stack slots than
> the caller [-Wdisabled-optimization]
>
> These are the number of times this warning was triggered building stage1:
>
> grep warning: build.log | grep sibling | sed 's/^.*://' | sort | uniq -c
>  259  callee required more stack slots than the caller
> [-Wdisabled-optimization]
>   43  callee returns a structure [-Wdisabled-optimization]
>
> If this patch is OK, someone else will need to commit it for me.
>
> Brad
>
> gcc/Changelog
>
> * calls.cc (maybe_complain_about_tail_call) Add warning when
> tail or sibling call cannot be optimized.
Hi Bradley,
I don't have comments on the patch, but a new warning will also
require a corresponding entry in doc/invoke.texi.

Thanks,
Prathamesh
>
> diff --git a/gcc/calls.cc b/gcc/calls.cc
> index 1f3a6d5c450..b95c876fda8 100644
> --- a/gcc/calls.cc
> +++ b/gcc/calls.cc
> @@ -1242,10 +1242,12 @@ void
>   maybe_complain_about_tail_call (tree call_expr, const char *reason)
>   {
> gcc_assert (TREE_CODE (call_expr) == CALL_EXPR);
> -  if (!CALL_EXPR_MUST_TAIL_CALL (call_expr))
> -return;
> -
> -  error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason);
> +  if (CALL_EXPR_MUST_TAIL_CALL (call_expr))
> +error_at (EXPR_LOCATION (call_expr), "cannot tail-call: %s", reason);
> +  else if (flag_optimize_sibling_calls)
> +warning (OPT_Wdisabled_optimization,
> + "cannot apply sibling-call optimization: %s", reason);
> +  return;
>   }
>
>   /* Fill in ARGS_SIZE and ARGS array based on the parameters found in
>
>


[PATCH] MATCH: Extend min_value/max_value to pointer types

2023-08-05 Thread Andrew Pinski via Gcc-patches
Since we already had the infrastructure to optimize
`(x == 0) && (x > y)` to false for integer types,
this extends the same to pointer types as indirectly
requested by PR 96695.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/96695
* match.pd (min_value, max_value): Extend to
pointer types too.

gcc/testsuite/ChangeLog:

PR tree-optimization/96695
* gcc.dg/pr96695-1.c: New test.
* gcc.dg/pr96695-10.c: New test.
* gcc.dg/pr96695-11.c: New test.
* gcc.dg/pr96695-12.c: New test.
* gcc.dg/pr96695-2.c: New test.
* gcc.dg/pr96695-3.c: New test.
* gcc.dg/pr96695-4.c: New test.
* gcc.dg/pr96695-5.c: New test.
* gcc.dg/pr96695-6.c: New test.
* gcc.dg/pr96695-7.c: New test.
* gcc.dg/pr96695-8.c: New test.
* gcc.dg/pr96695-9.c: New test.
---
 gcc/match.pd  |  6 --
 gcc/testsuite/gcc.dg/pr96695-1.c  | 18 ++
 gcc/testsuite/gcc.dg/pr96695-10.c | 20 
 gcc/testsuite/gcc.dg/pr96695-11.c | 18 ++
 gcc/testsuite/gcc.dg/pr96695-12.c | 18 ++
 gcc/testsuite/gcc.dg/pr96695-2.c  | 18 ++
 gcc/testsuite/gcc.dg/pr96695-3.c  | 20 
 gcc/testsuite/gcc.dg/pr96695-4.c  | 21 +
 gcc/testsuite/gcc.dg/pr96695-5.c  | 19 +++
 gcc/testsuite/gcc.dg/pr96695-6.c  | 20 
 gcc/testsuite/gcc.dg/pr96695-7.c  | 19 +++
 gcc/testsuite/gcc.dg/pr96695-8.c  | 19 +++
 gcc/testsuite/gcc.dg/pr96695-9.c  | 20 
 13 files changed, 234 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr96695-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr96695-10.c
 create mode 100644 gcc/testsuite/gcc.dg/pr96695-11.c
 create mode 100644 gcc/testsuite/gcc.dg/pr96695-12.c
 create mode 100644 gcc/testsuite/gcc.dg/pr96695-2.c
 create mode 100644 gcc/testsuite/gcc.dg/pr96695-3.c
 create mode 100644 gcc/testsuite/gcc.dg/pr96695-4.c
 create mode 100644 gcc/testsuite/gcc.dg/pr96695-5.c
 create mode 100644 gcc/testsuite/gcc.dg/pr96695-6.c
 create mode 100644 gcc/testsuite/gcc.dg/pr96695-7.c
 create mode 100644 gcc/testsuite/gcc.dg/pr96695-8.c
 create mode 100644 gcc/testsuite/gcc.dg/pr96695-9.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 2278029d608..de54b17abba 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -2733,12 +2733,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 (match min_value
  INTEGER_CST
- (if (INTEGRAL_TYPE_P (type)
+ (if ((INTEGRAL_TYPE_P (type)
+   || POINTER_TYPE_P(type))
   && wi::eq_p (wi::to_wide (t), wi::min_value (type)
 
 (match max_value
  INTEGER_CST
- (if (INTEGRAL_TYPE_P (type)
+ (if ((INTEGRAL_TYPE_P (type)
+   || POINTER_TYPE_P(type))
   && wi::eq_p (wi::to_wide (t), wi::max_value (type)
 
 /* x >  y  &&  x != XXX_MIN  -->  x > y
diff --git a/gcc/testsuite/gcc.dg/pr96695-1.c b/gcc/testsuite/gcc.dg/pr96695-1.c
new file mode 100644
index 000..d4287ab4c8c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr96695-1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ifcombine" } */
+
+#include 
+
+_Bool and1(unsigned *x, unsigned *y)
+{
+  /* x > y && x != 0 --> x > y */
+  return x > y && x != 0;
+}
+
+_Bool and2(unsigned *x, unsigned *y)
+{
+  /* x < y && x != -1 --> x < y */
+  return x < y && x != (unsigned*)-1;
+}
+
+/* { dg-final { scan-tree-dump-not " != " "ifcombine" } } */
diff --git a/gcc/testsuite/gcc.dg/pr96695-10.c 
b/gcc/testsuite/gcc.dg/pr96695-10.c
new file mode 100644
index 000..dfe752526f0
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr96695-10.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+#include 
+
+_Bool or1(unsigned *x, unsigned *y)
+{
+  /* x <= y || x != 0 --> true */
+  return x <= y || x != 0;
+}
+
+_Bool or2(unsigned *x, unsigned *y)
+{
+  /* x >= y || x != -1 --> true */
+  return x >= y || x != (unsigned*)-1;
+}
+
+/* { dg-final { scan-tree-dump-not " != " "optimized" } } */
+/* { dg-final { scan-tree-dump-not " <= " "optimized" } } */
+/* { dg-final { scan-tree-dump-not " >= " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/pr96695-11.c 
b/gcc/testsuite/gcc.dg/pr96695-11.c
new file mode 100644
index 000..d3c36168b98
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr96695-11.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ifcombine" } */
+
+#include 
+
+_Bool or1(unsigned *x, unsigned *y)
+{
+  /* x <= y || x == 0 --> x <= y */
+  return x <= y || x == 0;
+}
+
+_Bool or2(unsigned *x, unsigned *y)
+{
+  /* x >= y || x == -1 --> x >= y */
+  return x >= y || x == (unsigned*)-1;
+}
+
+/* { dg-final { scan-tree-dump-not " == " "ifcombine" } } */
diff --git a/gcc/testsuite/gcc.dg/pr96695-12.c 
b/gcc/testsuite/gcc.dg/pr96695-12.c
new file mode 100644
index 

[C PATCH] Support typename as selector in _Generic

2023-08-05 Thread Martin Uecker via Gcc-patches


Clang now has an extension which accepts a typename for
_Generic.  This is simple to implement and is useful.

Do we want this?

Clang calls it a "Clang extension" in the pedantic
warning.  I changed it to "an extension"  I am not
sure what the policy is.

Do we need an extra warning option? Clang has one.

No documentation so far.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.

Martin


c: Support typename as selector in _Generic

Support typenames as first argument to _Generic which is an
extension supported by Clang. It makes it easier to test
for types with qualifiers in combination with typeof.

gcc/c/:
* c-parser.cc (c_parser_generic_selection): Support typename
in _Generic selector.

gcc/testsuite/:
* gnu2x-generic.c: New test.


diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index 57a01dc2fa3..9aea2425294 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -9312,30 +9312,51 @@ c_parser_generic_selection (c_parser *parser)
   if (!parens.require_open (parser))
 return error_expr;
 
-  c_inhibit_evaluation_warnings++;
   selector_loc = c_parser_peek_token (parser)->location;
-  selector = c_parser_expr_no_commas (parser, NULL);
-  selector = default_function_array_conversion (selector_loc, selector);
-  c_inhibit_evaluation_warnings--;
 
-  if (selector.value == error_mark_node)
+  if (c_token_starts_typename (c_parser_peek_token (parser)))
 {
-  c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
-  return selector;
-}
-  mark_exp_read (selector.value);
-  selector_type = TREE_TYPE (selector.value);
-  /* In ISO C terms, rvalues (including the controlling expression of
- _Generic) do not have qualified types.  */
-  if (TREE_CODE (selector_type) != ARRAY_TYPE)
-selector_type = TYPE_MAIN_VARIANT (selector_type);
-  /* In ISO C terms, _Noreturn is not part of the type of expressions
- such as , but in GCC it is represented internally as a type
- qualifier.  */
-  if (FUNCTION_POINTER_TYPE_P (selector_type)
-  && TYPE_QUALS (TREE_TYPE (selector_type)) != TYPE_UNQUALIFIED)
-selector_type
-  = build_pointer_type (TYPE_MAIN_VARIANT (TREE_TYPE (selector_type)));
+  /* Language extension introduced by Clang.  */
+  pedwarn (selector_loc, OPT_Wpedantic, "passing a type argument as "
+  "first argument to %<_Generic%> is an extension");
+  struct c_type_name *type_name;
+  c_inhibit_evaluation_warnings++;
+  type_name = c_parser_type_name (parser);
+  c_inhibit_evaluation_warnings--;
+  if (NULL == type_name)
+   {
+ c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
+ return error_expr;
+   }
+  /* Qualifiers are preserved.  */
+  selector_type = groktypename (type_name, NULL, NULL);
+}
+  else
+{
+  c_inhibit_evaluation_warnings++;
+  selector = c_parser_expr_no_commas (parser, NULL);
+  selector = default_function_array_conversion (selector_loc, selector);
+  c_inhibit_evaluation_warnings--;
+
+  if (selector.value == error_mark_node)
+   {
+ c_parser_skip_until_found (parser, CPP_CLOSE_PAREN, NULL);
+ return selector;
+   }
+  mark_exp_read (selector.value);
+  selector_type = TREE_TYPE (selector.value);
+  /* In ISO C terms, rvalues (including the controlling expression of
+_Generic) do not have qualified types.  */
+  if (TREE_CODE (selector_type) != ARRAY_TYPE)
+   selector_type = TYPE_MAIN_VARIANT (selector_type);
+  /* In ISO C terms, _Noreturn is not part of the type of expressions
+such as , but in GCC it is represented internally as a type
+qualifier.  */
+  if (FUNCTION_POINTER_TYPE_P (selector_type)
+ && TYPE_QUALS (TREE_TYPE (selector_type)) != TYPE_UNQUALIFIED)
+  selector_type
+   = build_pointer_type (TYPE_MAIN_VARIANT (TREE_TYPE (selector_type)));
+}
 
   if (!c_parser_require (parser, CPP_COMMA, "expected %<,%>"))
 {
@@ -9401,7 +9422,7 @@ c_parser_generic_selection (c_parser *parser)
   assoc.expression = c_parser_expr_no_commas (parser, NULL);
 
   if (!match)
- c_inhibit_evaluation_warnings--;
+   c_inhibit_evaluation_warnings--;
 
   if (assoc.expression.value == error_mark_node)
{
diff --git a/gcc/testsuite/gcc.dg/gnu2x-generic.c 
b/gcc/testsuite/gcc.dg/gnu2x-generic.c
new file mode 100644
index 000..82b09578072
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/gnu2x-generic.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+
+_Static_assert(_Generic(const int, const int: 1, int: 0), "");
+_Static_assert(_Generic(  int, const int: 0, int: 1), "");
+_Static_assert(_Generic(int[4], int[4]: 1), "");
+_Static_assert(_Generic(typeof(int[4]), int[4]: 1), "");
+
+void foo(int n)
+{
+   _Static_assert(_Generic(int[n++], int[4]: 1), "");
+}
+
+#pragma GCC diagnostic warning "-Wpedantic"

[committed] c: Less warnings for parameters declared as arrays [PR98536]

2023-08-05 Thread Martin Uecker via Gcc-patches


I splitted up the patch into two parts and committed only
the FE parts which were already approved and the tests.
This solves one of the two issues.

Bootstrapped and regression tested on x86_64-pc-linux-gnu.


Less warnings for parameters declared as arrays [PR98536]

To avoid false positivies, tune the warnings for parameters declared
as arrays with size expressions.  Do not warn when more bounds are
specified in the declaration than before.

PR c/98536

c-family/
* c-warn.cc (warn_parm_array_mismatch): Do not warn if more
bounds are specified.

gcc/testsuite:
* gcc.dg/Wvla-parameter-4.c: Adapt test.
* gcc.dg/attr-access-2.c: Adapt test.


diff --git a/gcc/c-family/c-warn.cc b/gcc/c-family/c-warn.cc
index d4d62c48b20..b7c5d7c01a2 100644
--- a/gcc/c-family/c-warn.cc
+++ b/gcc/c-family/c-warn.cc
@@ -3599,23 +3599,13 @@ warn_parm_array_mismatch (location_t origloc, tree 
fndecl, tree newparms)
  continue;
}
 
- if (newunspec != curunspec)
+ if (newunspec > curunspec)
{
  location_t warnloc = newloc, noteloc = origloc;
  const char *warnparmstr = newparmstr.c_str ();
  const char *noteparmstr = curparmstr.c_str ();
  unsigned warnunspec = newunspec, noteunspec = curunspec;
 
- if (newunspec < curunspec)
-   {
- /* If the new declaration has fewer unspecified bounds
-point the warning to the previous declaration to make
-it clear that that's the one to change.  Otherwise,
-point it to the new decl.  */
- std::swap (warnloc, noteloc);
- std::swap (warnparmstr, noteparmstr);
- std::swap (warnunspec, noteunspec);
-   }
  if (warning_n (warnloc, OPT_Wvla_parameter, warnunspec,
 "argument %u of type %s declared with "
 "%u unspecified variable bound",
@@ -3643,14 +3633,10 @@ warn_parm_array_mismatch (location_t origloc, tree 
fndecl, tree newparms)
}
 
   /* Iterate over the lists of VLA variable bounds, comparing each
-pair for equality, and diagnosing mismatches.  The case of
-the lists having different lengths is handled above so at
-this point they do .  */
-  for (tree newvbl = newa->size, curvbl = cura->size; newvbl;
+pair for equality, and diagnosing mismatches.  */
+  for (tree newvbl = newa->size, curvbl = cura->size; newvbl && curvbl;
   newvbl = TREE_CHAIN (newvbl), curvbl = TREE_CHAIN (curvbl))
{
- gcc_assert (curvbl);
-
  tree newpos = TREE_PURPOSE (newvbl);
  tree curpos = TREE_PURPOSE (curvbl);
 
diff --git a/gcc/testsuite/gcc.dg/Wvla-parameter-4.c 
b/gcc/testsuite/gcc.dg/Wvla-parameter-4.c
index 599ad19a3e4..f35faea361a 100644
--- a/gcc/testsuite/gcc.dg/Wvla-parameter-4.c
+++ b/gcc/testsuite/gcc.dg/Wvla-parameter-4.c
@@ -12,11 +12,6 @@ typedef int IA3[3];
 /* Verify the warning points to the declaration with more unspecified
bounds, guiding the user to specify them rather than making them all
unspecified.  */
-void* f_pIA3ax (IA3 *x[*]); // { dg-warning "argument 1 of type 
'int \\\(\\\*\\\[\\\*]\\\)\\\[3]' .aka '\[^\n\r\}\]+'. declared with 1 
unspecified variable bound" }
-void* f_pIA3ax (IA3 *x[*]);
-void* f_pIA3ax (IA3 *x[n]); // { dg-message "subsequently declared 
as 'int \\\(\\\*\\\[n]\\\)\\\[3]' with 0 unspecified variable bounds" "note" }
-void* f_pIA3ax (IA3 *x[n]) { return x; }
-
 
 void* f_pIA3an (IA3 *x[n]);  // { dg-message "previously declared 
as 'int \\\(\\\*\\\[n]\\\)\\\[3]' with 0 unspecified variable bounds" "note" }
 void* f_pIA3an (IA3 *x[n]);
diff --git a/gcc/testsuite/gcc.dg/attr-access-2.c 
b/gcc/testsuite/gcc.dg/attr-access-2.c
index 76baddffc9f..616b7a9527c 100644
--- a/gcc/testsuite/gcc.dg/attr-access-2.c
+++ b/gcc/testsuite/gcc.dg/attr-access-2.c
@@ -60,16 +60,6 @@ RW (2, 1) void f10 (int n, char a[n])   // { dg-warning 
"attribute 'access *\\\(
 // { dg-warning "argument 2 of type 
'char\\\[n]' declared as a variable length array"  "" { target *-*-* } .-1 }
 { (void) (void) }
 
-
-/* The following is diagnosed to point out declarations with the T[*]
-   form in headers where specifying the bound is just as important as
-   in the definition (to detect bugs).  */
-  void f11 (int, char[*]);  // { dg-warning "argument 2 of type 
'char\\\[\\\*\\\]' declared with 1 unspecified variable bound" }
-  void f11 (int m, char a[m]);  // { dg-message "subsequently declared 
as 'char\\\[m]' with 0 unspecified variable bounds" "note" }
-RW (2, 1) void f11 (int n, char arr[n]) // { dg-message "subsequently declared 
as 'char\\\[n]' with 0 unspecified 

[no subject]

2023-08-05 Thread xeon khjj via Gcc-patches
>From 104178c3912314f41a61a316e7c3bc0487ea8f3c Mon Sep 17 00:00:00 2001
From: K1ZeKaTo 
Date: Sat, 5 Aug 2023 14:50:19 +
Subject: [PATCH] Make the rvalue work fine.

It seems not considering the rvalue in the last version.

---
 libstdc++-v3/include/bits/iterator_concepts.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/bits/iterator_concepts.h
b/libstdc++-v3/include/bits/iterator_concepts.h
index e32e94dc9fc..8bb7c0fb8c0 100644
--- a/libstdc++-v3/include/bits/iterator_concepts.h
+++ b/libstdc++-v3/include/bits/iterator_concepts.h
@@ -86,14 +86,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   concept __can_reference = requires { typename __with_ref<_Tp>; };

 template
-  concept __dereferenceable = requires(_Tp& __t)
+  concept __dereferenceable = requires(_Tp&& __t)
{
- { *__t } -> __can_reference;
+ { *static_cast<_Tp&&>(__t) } -> __can_reference;
};
   } // namespace __detail

   template<__detail::__dereferenceable _Tp>
-using iter_reference_t = decltype(*std::declval<_Tp&>());
+using iter_reference_t = decltype(*std::declval<_Tp&&>());

   namespace ranges
   {
@@ -116,7 +116,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
template
  requires __adl_imove<_Tp>
  struct __result<_Tp>
- { using type = decltype(iter_move(std::declval<_Tp>())); };
+ { using type = decltype(iter_move(std::declval<_Tp&&>())); };

template
  requires (!__adl_imove<_Tp>)
@@ -129,9 +129,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _S_noexcept()
  {
if constexpr (__adl_imove<_Tp>)
- return noexcept(iter_move(std::declval<_Tp>()));
+ return noexcept(iter_move(std::declval<_Tp&&>()));
else
- return noexcept(*std::declval<_Tp>());
+ return noexcept(*std::declval<_Tp&&>());
  }

   public:
@@ -148,9 +148,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
if constexpr (__adl_imove<_Tp>)
  return iter_move(static_cast<_Tp&&>(__e));
else if constexpr (is_lvalue_reference_v>)
- return static_cast<__type<_Tp>>(*__e);
+ return static_cast<__type<_Tp>>(*static_cast<_Tp&&>(__e));
else
- return *__e;
+ return *static_cast<_Tp&&>(__e);
  }
   };
 } // namespace __cust_imove
-- 
2.41.0


Re: [committed] [RISC-V] Avoid sub-word mode comparisons with Zicond

2023-08-05 Thread Jeff Law




On 8/5/23 01:46, Xiao Zeng wrote:



The operands to the comparison need to be in DImode for rv64 and SImode
for rv32.  That's the X iterator.

After analyzing the rtl log, I can't agree more with this sentence.


Note the mode of the comparison
operands may be different than the mode of the destination.  ie, we
might have a 64bit comparison and produce a 32bit sign extended result
much like the setcc insns support.

This patch changes the 6 zicond patterns to use the X iterator on the
comparison inputs.  That at least makes the patterns correct and fixes
this particular testcase.   There's a few other lurking problems that
I'll address in additional patches.

Committed to the trunk,
Jeff


1 In any case, jeff's analysis is convincing, here I will add a little bit of 
my own analysis.

2 First, for the test cases:

foo(long long int x, long long int y) {
   if (((int)x | (int)y) != 0)
     return 6;
   return x + y;
}

look directly at the compared assembly code. This allows people to quickly
realize where the error occurred.

X_mode.s(right)                                                    
ANYI_mode.S(error)
10 foo:                                                                  10 foo:
       11         or      a5,a0,a1                                          11  
      or      a5,a0,a1
       12         sext.w  a5,a5                                            12   
     li      a4,6
       13         addw    a0,a0,a1                                      13      
  addw    a0,a0,a1
       14         li      a4,6
       15         czero.nez       a1,a0,a5                             14       
 czero.nez       a1,a0,a5
       16         czero.eqz       a0,a4,a5                             15       
 czero.eqz       a0,a4,a5
       17         or      a0,a0,a1                                          16  
      or      a0,a0,a1
       18         ret                                                           
 17        ret

3 You will find that the correct assembly is just one more assembly
instruction: sext.w  a5,a5, the rest of the two are exactly the same.

4 From the perspective of assembly instructions, the a5 value obtained
by sext.w a5, a5 may be different from the original a5 value, which leads
to errors in the test case.

Exactly.



5 However, it is difficult to directly see that an error has occurred
from the rtl log of gcc's passe.
Yes.  It's pretty subtle, but par for the course once subreg expressions 
are involved.  subregs are a major wart in the design/implementation of 
RTL due to them having fairly obscure semantics that are dependent on 
whether they are widening or narrowing subregs, if the inner object is a 
mem or something else, etc and how they interact with 
WORD_REGISTER_OPERATIONS.





6 I'm wondering about transforms like this:

In test.c.c.301r.ira
(insn 36 34 42 2 (set (reg:DI 153)
         (if_then_else:DI (eq:DI (subreg:SI (reg:DI 145) 0)
                 (const_int 0 [0]))
             (reg:DI 149)
             (const_int 0 [0]))) "test.c":4:12 13599 {*czero.nez.disi}
      (expr_list:REG_DEAD (reg:DI 149)
         (nil)))
But it's already broken in this form.   There was a sign extension in 
the IL up through the combine pass.  In fact, I nearly made the argument 
that the bug was in combine's simplification of (sign_extend (subreg 
...)) to (subreg ...).


But ultimately the hardware semantics of risc-v are that comparisons 
look at the full register, plain and simple.  So a narrowing subreg to a 
sub-word mode is simply wrong i this context.  This is also consistent 
with the implementation of conditional branches and scc insns in the 
risc-v backend.




In test.c.c.302r.reload it becomes
(insn 36 34 42 2 (set (reg:DI 11 a1 [153])
         (if_then_else:DI (eq:DI (reg:SI 15 a5 [145])
                 (const_int 0 [0]))
             (reg:DI 10 a0 [149])
             (const_int 0 [0]))) "test.c":4:12 13599 {*czero.nez.disi}
      (nil))

Obviously, (subreg:SI (reg:DI 145) 0) is transformed into (reg:SI 15 a5 [145]) 
after
passing through reload pass. This conversion is wrong, why did gcc not warn?
The only reason subregs exist is because pseudo registers have a single 
mode -- which derives from the fact that there is precisely once 
instance of any given pseudo register.  See RTL sharing section of the 
internals manual.


A subreg expression allows us to look at a pseudo in a mode other than 
its natural mode.   Essentially its the same bits reinterpreted in 
another mode.  I'm over-simplifying a lot.  The full details of subreg 
semantics are documented in the RTL section of the internals manual.


WORD_REGISTER_OPERATIONS is a property of the target that (again 
over-simplifying) says that an operation such as an add may have a given 
size (say SImode) in the RTL, but the hardware performs the operation in 
its native word size (DImode for rv64).


While WORD_REGISTER_OPERATIONS has been around since the 90s, its 
semantics have always been a bit vague.  It's 

Re: [PATCH] core: Support heap-based trampolines

2023-08-05 Thread FX Coudert via Gcc-patches
Hi Richard,

Thanks for your feedback. Here is an amended version of the patch, taking into 
consideration your requests and the following discussion. There is no configure 
option for the libgcc part, and the documentation is amended. The patch is 
split into three commits for core, target and libgcc.

Currently regtesting on x86_64 linux and darwin (it was fine before I split up 
into three commits, so I’m re-testing to make sure I didn’t screw anything up).

OK to commit?
FX




0001-core-Support-heap-based-trampolines.patch
Description: Binary data


0002-target-Support-heap-based-trampolines.patch
Description: Binary data


0003-libgcc-support-heap-based-trampolines.patch
Description: Binary data


Re: [RFC] light expander sra for parameters and returns

2023-08-05 Thread Jiufu Guo via Gcc-patches


Hi,

Richard Biener  writes:

> On Thu, 3 Aug 2023, Jiufu Guo wrote:
>
>> 
>> Hi Richard,
>> 
>> Richard Biener  writes:
>> 
>> > On Tue, 1 Aug 2023, Jiufu Guo wrote:
>> >
>> >> 
>> >> Hi,
>> >> 
>> >> Richard Biener  writes:
>> >> 
>> >> > On Mon, 24 Jul 2023, Jiufu Guo wrote:
>> >> >
>> >> >> 
>> >> >> Hi Martin,
>> >> >> 
>> >> >> Not sure about your current option about re-using the ipa-sra code
>> >> >> in the light-expander-sra. And if anything I could input please
>> >> >> let me know.
>> >> >> 
>> >> >> And I'm thinking about the difference between the expander-sra, ipa-sra
>> >> >> and tree-sra. 1. For stmts walking, expander-sra has special behavior
>> >> >> for return-stmt, and also a little special on assign-stmt. And phi
>> >> >> stmts are not checked by ipa-sra/tree-sra. 2. For the access structure,
>> >> >> I'm also thinking if we need a tree structure; it would be useful when
>> >> >> checking overlaps, it was not used now in the expander-sra.
>> >> >> 
>> >> >> For ipa-sra and tree-sra, I notice that there is some similar code,
>> >> >> but of cause there are differences. While it seems the difference
>> >> >> is 'intended', for example: 1. when creating and accessing,
>> >> >> 'size != max_size' is acceptable in tree-sra but not for ipa-sra.
>> >> >> 2. 'AGGREGATE_TYPE_P' for ipa-sra is accepted for some cases, but
>> >> >> not ok for tree-ipa.  
>> >> >> I'm wondering if those slight difference blocks re-use the code
>> >> >> between ipa-sra and tree-sra.
>> >> >> 
>> >> >> The expander-sra may be more light, for example, maybe we can use
>> >> >> FOR_EACH_IMM_USE_STMT to check the usage of each parameter, and not
>> >> >> need to walk all the stmts.
>> >> >
>> >> > What I was hoping for is shared stmt-level analysis and a shared
>> >> > data structure for the "access"(es) a stmt performs.  Because that
>> >> > can come up handy in multiple places.  The existing SRA data
>> >> > structures could easily embed that subset for example if sharing
>> >> > the whole data structure of [IPA] SRA seems too unwieldly.
>> >> 
>> >> Understand.
>> >> The stmt-level analysis and "access" data structure are similar
>> >> between ipa-sra/tree-sra and the expander-sra.
>> >> 
>> >> I just update the patch, this version does not change the behaviors of
>> >> the previous version.  It is just cleaning/merging some functions only.
>> >> The patch is attached.
>> >> 
>> >> This version (and tree-sra/ipa-sra) is still using the similar
>> >> "stmt analyze" and "access struct"".  This could be extracted as
>> >> shared code.
>> >> I'm thinking to update the code to use the same "base_access" and
>> >> "walk function".
>> >> 
>> >> >
>> >> > With a stmt-leve API using FOR_EACH_IMM_USE_STMT would still be
>> >> > possible (though RTL expansion pre-walks all stmts anyway).
>> >> 
>> >> Yeap, I also notice that "FOR_EACH_IMM_USE_STMT" is not enough.
>> >> For struct parameters, walking stmt is needed.
>> >
>> > I think I mentioned this before, RTL expansion already
>> > pre-walks the whole function looking for variables it has to
>> > expand to the stack in discover_nonconstant_array_refs (which is
>> > now badly named), I'd appreciate if the "SRA" walk would piggy-back
>> > on that existing walk.
>> 
>> I may misunderstand your meaning about 'stmt-level analysis' and
>> 'pre-walk'.
>> I understand it as 'walk to analyze each stmt in each bb'.
>> In 'discover_nonconstant_array_refs', there is a 'walk_gimple_op':
>> 
>>  FOR_EACH_BB_FN (bb, cfun)
>>   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next ())
>>{
>> gimple *stmt = gsi_stmt (gsi);
>> if (!is_gimple_debug (stmt))
>>  {
>>   walk_gimple_op (stmt, discover_nonconstant_array_refs_r, );
>> 
>> Maybe, this 'walk_gimple_op' is what you mentioned as 'stmt-level
>> analysis' and the 'pre-walk'.  Is this right?
>> 
>> Here, 'discover_nonconstant_array_refs_r' is the callback of
>> 'walk_gimple_op'.
>> This callback analyses if an array is accessed through a variant index,
>> if so, the array must be put into the stack.
>> 
>> While it seems 'walk_gimple_op' is not ok for SRA analysis,
>> because, in the callback, it is hard to analyze an access is
>> 'read' or 'write' to an struct object. But the 'read/write' info
>> is needed for SRA.
>> 
>> 'walk_stmt_load_store_addr_ops' is another 'walk on stmt ops'.
>> This 'walk' is not used to analyze SRA access, the primary reason
>> would be: the load/store/addr parameter of the callback is a
>> DECL, the 'size' and 'offset' are stripped.
>> 
>> Currently, in tree-sra/ipa-sra/expand-sra, when analyzing stmt
>> for SRA access, stmt code is checked for 'return/assign/call/asm'.
>> And sub-expressions of these stmt(s) are analyzed.
>
> I realize the stmt walks cannot be shared but the per-stmt
> walk of the SRA analysis should still be done there simply for
> cache locality reasons (thus compile-time).

Thanks Richard!

For the meaning of per-stmt walk, I guess, it may look like below
fake 

Re: Re: [PATCH v3] [RISC-V] Generate Zicond instruction for select pattern with condition eq or neq to 0

2023-08-05 Thread Xiao Zeng
On Sat, Aug 05, 2023 at 05:31:00 AM  Jeff Law  wrote:
>
>On 8/1/23 19:38, Xiao Zeng wrote:
>> This patch recognizes Zicond patterns when the select pattern
>> with condition eq or neq to 0 (using eq as an example), namely:
>>
>> 1 rd = (rs2 == 0) ? non-imm : 0
>> 2 rd = (rs2 == 0) ? non-imm : non-imm
>> 3 rd = (rs2 == 0) ? reg : non-imm
>> 4 rd = (rs2 == 0) ? reg : reg
>>
>> gcc/ChangeLog:
>>
>>  * config/riscv/riscv.cc (riscv_expand_conditional_move): Recognize
>>  Zicond patterns
>>  * config/riscv/riscv.md: Recognize Zicond patterns through 
>>movcc
>So I've made minor adjustments to the remaining three cases.  First we
>need to check the code before optimizing the cases were one of the arms
>of the conditional move matches op0.
>
>I slightly adjusted the case for out of range constants.  Its better to
>check SMALL_OPERAND rather than testing for specific constants.  And
>when that triggers, we can just force the value into a register and
>continue as-is rather than recursing.
> 
These changes make the code more concise and readable. Thumbs up!

>The patch I'm committing fixes one comment typo (whitespace) and a bit
>of accidentally duplicated code I added in a prior commit.
>
>Next up Raphael's patches to handle nontrival conditionals by emiting an
>scc insn :-) 
It would be great to see other implementations for conditional execution. :-)

>
>Jeff
>
>ps.  I'm deferrring the testsuite bits until we sort out the costing
>problems.  THey're definitely not forgotten and I still use them in my
>local tree. 
Thank you Jeff, I look forward to a unified, complete and concise
implementation method for cost calculation as soon as possible.
 
Thanks
Xiao Zeng



Re: RISC-V: Folding memory for FP + constant case

2023-08-05 Thread Manolis Tsamis
Hi Jeff,

Thanks for all the info!
Then I'll prepare/test a patch that removes this regcprop limitation
and send it out.
I have already tested that this change is enough to make fmo optimize
leela as well.

Manolis

On Fri, Aug 4, 2023 at 7:23 PM Jeff Law  wrote:
>
>
>
> On 8/4/23 03:52, Manolis Tsamis wrote:
> > Hi all,
> >
> > It is true that regcprop currently does not propagate sp and hence
> > leela is not optimized, but from what I see this should be something
> > we can address.
> >
> > The reason that the propagation fails is this check that I have added
> > when I introduced maybe_copy_reg_attrs:
> >
> > else if (REG_POINTER (new_reg) != REG_POINTER (old_reg))
> >{
> >  /* Only a single instance of STACK_POINTER_RTX must exist and we cannot
> > modify it. Allow propagation if REG_POINTER for OLD_REG matches and
> > don't touch ORIGINAL_REGNO and REG_ATTRS. */
> >  return NULL_RTX;
> >}
> >
> > To be honest I did add this back then just to be on the safe side of
> > whether a mismatch in REG_POINTER after propagation would be an issue
> > (since the original regcprop had caused enough problems).
> No worries.  Obviously not propagating is the safe thing to do and if we
> find notable missed cases we can always refine the existing code like
> we're considering now.
>
> >
> > I see two ways to solve this and make fmo able to optimize leela as well:
> >   1) Remove the REG_POINTER check in regcprop if that is safe. My
> > understanding is that REG_POINTER is used as a hint and there would be
> > no correctness issues.
> REG_POINTER is meant to be conservatively correct.  If REG_POINTER is
> set, then the register must be a pointer to a valid memory location.  If
> REG_POINTER is not set, then it may or may not point to a valid memory
> location.  sp, fp, ap are by definition pointers and thus have
> REG_POINTER set.
>
> With that definition we can safely copy propagate away an sp->gpr copy
> from a REG_POINTER standpoint.
>
>
>
>
>
>
>
>
>
>
> >   2) Mark the corresponding registers with REG_POINTER. I'm not sure
> > where that is supposed to happen.
> >
> > Since the instructions look like this:
> >(insn 113 11 16 2 (set (reg:DI 15 a5 [226])
> >(reg/f:DI 2 sp)) 179 {*movdi_64bit}
> > (nil))
> >
> > I assume that we'd want to mark a5 as REG_POINTER anyway (which is
> > not), and in that case propagation would work.
> I'd strongly recommend against that.  The problem is the destination
> register might have been used in another context as an index register.
> We've fixed a few bugs in this space through the years I suspect there
> may be others lurking.  You can't directly set that up in C code, but
> once you get down to RTL it can happen.
>
>
> > On the other hand if there's no correctness issue w.r.t. REG_POINTER
> > and regcprop then removing the additional check would increase
> > propagation opportunities in general which is also good.
> I think you can safely remove this limitation.
>
> jeff


Re: [committed] [RISC-V] Avoid sub-word mode comparisons with Zicond

2023-08-05 Thread Xiao Zeng
On Wed, Aug 02, 2023 at 01:41:00 PM  Jeff Law  wrote:
>
>c-torture/execute/pr59014-2.c fails with the Zicond work on rv64.  We
>miscompile the "foo" routine because we have eliminated a required sign
>extension.
>
>The key routine looks like this:
>
>foo (long long int x, long long int y)
>{
>   if (((int) x | (int) y) != 0)
> return 6;
>   return x + y;
>}
>
>So we kindof do the expected thing at expansion time.  We IOR X and Y,
>sign extend the result from 32 to 64 bits (note how the values in the
>source are casted from long long to ints), then emit a suitable
>conditional branch.  ie:
>
>> (insn 10 4 12 2 (set (reg:DI 142)
>> (ior:DI (reg/v:DI 138 [ x ])
>> (reg/v:DI 139 [ y ]))) "j.c":6:16 99 {iordi3}
>>  (nil))
>> (insn 12 10 13 2 (set (reg:DI 144)
>> (sign_extend:DI (subreg:SI (reg:DI 142) 0))) "j.c":6:6 116 
>>{extendsidi2}
>>  (nil))
>> (jump_insn 13 12 14 2 (set (pc)
>> (if_then_else (ne (reg:DI 144)
>> (const_int 0 [0]))
>> (label_ref:DI 27)
>> (pc))) "j.c":6:6 243 {*branchdi}
>>  (expr_list:REG_DEAD (reg:DI 144)
>> (int_list:REG_BR_PROB 233216732 (nil)))
>
>When we if-convert that we generate this sequence:
>
>> (insn 10 4 12 2 (set (reg:DI 142)
>> (ior:DI (reg/v:DI 138 [ x ])
>> (reg/v:DI 139 [ y ]))) "j.c":6:16 99 {iordi3}
>>  (nil))
>> (insn 12 10 30 2 (set (reg:DI 144)
>> (sign_extend:DI (subreg:SI (reg:DI 142) 0))) "j.c":6:6 116 
>>{extendsidi2}
>>  (nil))
>> (insn 30 12 31 2 (set (reg:DI 147)
>> (const_int 6 [0x6])) "j.c":8:12 179 {*movdi_64bit}
>>  (nil))
>> (insn 31 30 33 2 (set (reg:DI 146)
>> (plus:DI (reg/v:DI 138 [ x ])
>> (reg/v:DI 139 [ y ]))) "j.c":8:12 5 {adddi3}
>>  (nil)) 
>> (insn 33 31 34 2 (set (reg:DI 149)
>> (if_then_else:DI (ne:DI (reg:DI 144)
>> (const_int 0 [0]))
>> (const_int 0 [0])
>> (reg:DI 146))) "j.c":8:12 11368 {*czero.nez.didi}
>>  (nil))
>> (insn 34 33 35 2 (set (reg:DI 148)
>> (if_then_else:DI (eq:DI (reg:DI 144)
>> (const_int 0 [0]))
>> (const_int 0 [0])
>> (reg:DI 147))) "j.c":8:12 11367 {*czero.eqz.didi}
>>  (nil))
>> (insn 35 34 21 2 (set (reg:DI 137 [  ])
>> (ior:DI (reg:DI 148)
>> (reg:DI 149))) "j.c":8:12 99 {iordi3}
>>  (nil))
>
>Which looks basically OK.  The sign extended subreg is a bit worrisome
>though.  And sure enough when we get into combine:
>
>> Failed to match this instruction:
>> (parallel [
>> (set (reg:DI 149)
>> (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
>> (const_int 0 [0]))
>> (reg:DI 146)
>> (const_int 0 [0])))
>> (set (reg:DI 144)
>> (sign_extend:DI (subreg:SI (reg:DI 142) 0)))
>> ])
>> Successfully matched this instruction:
>> (set (reg:DI 144)
>> (sign_extend:DI (subreg:SI (reg:DI 142) 0)))
>> Successfully matched this instruction:
>> (set (reg:DI 149)
>> (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
>> (const_int 0 [0]))
>> (reg:DI 146)
>> (const_int 0 [0])))
>> allowing combination of insns 12 and 33
>
>Since we need the side effect we first try the PARALLEL with two sets.
>That, as expected, fails.  Generic combine code then tries to pull apart
>the two sets as distinct insns resulting in this conditional move:
>
>> (insn 33 31 34 2 (set (reg:DI 149)
>> (if_then_else:DI (eq:DI (subreg:SI (reg:DI 142) 0)
>> (const_int 0 [0]))
>> (reg:DI 146)
>> (const_int 0 [0]))) "j.c":8:12 11347 {*czero.nez.disi}
>>  (expr_list:REG_DEAD (reg:DI 146)
>> (nil)))
>
>Bzzt.  We can't actually implement this RTL in the hardware.  Basically
>it's asking to do 32bit comparison on rv64, ignoring the upper 32 bits
>of the input register.  That's not actually how zicond works.
>
>The operands to the comparison need to be in DImode for rv64 and SImode
>for rv32.  That's the X iterator.  
After analyzing the rtl log, I can't agree more with this sentence.

>Note the mode of the comparison
>operands may be different than the mode of the destination.  ie, we
>might have a 64bit comparison and produce a 32bit sign extended result
>much like the setcc insns support.
>
>This patch changes the 6 zicond patterns to use the X iterator on the
>comparison inputs.  That at least makes the patterns correct and fixes
>this particular testcase.   There's a few other lurking problems that
>I'll address in additional patches.
>
>Committed to the trunk,
>Jeff

1 In any case, jeff's analysis is convincing, here I will add a little bit of 
my own analysis.

2 First, for the test cases:

foo(long long int x, long long int y) {
  if (((int)x | (int)y) != 0)
    return 6;
  return x + y;
}

look directly at the compared assembly code. This allows people to quickly
realize