Re: [PATCH V2 0/2] Add support for B extention

2024-07-12 Thread Vineet Gupta



On 7/9/24 10:44, Edwin Lu wrote:
> Support for recognizing B as the collection of zba, zbb, zbs extensions
>
> https://github.com/riscv/riscv-b/tags
>
> V2: add b to riscv_combine_info
>
> Edwin Lu (2):
>   RISC-V: Add support for B standard extension
>   RISC-V: Update testsuite to use b
>
>  gcc/common/config/riscv/riscv-common.cc | 7 +++
>  gcc/config/riscv/arch-canonicalize  | 1 +
>  gcc/testsuite/g++.target/riscv/redundant-bitmap-1.C | 2 +-
>  gcc/testsuite/g++.target/riscv/redundant-bitmap-2.C | 2 +-
>  gcc/testsuite/g++.target/riscv/redundant-bitmap-3.C | 2 +-
>  gcc/testsuite/g++.target/riscv/redundant-bitmap-4.C | 2 +-
>  gcc/testsuite/gcc.target/riscv/shift-add-1.c| 2 +-
>  gcc/testsuite/gcc.target/riscv/shift-add-2.c| 2 +-
>  gcc/testsuite/gcc.target/riscv/synthesis-1.c| 2 +-
>  gcc/testsuite/gcc.target/riscv/synthesis-2.c| 2 +-
>  gcc/testsuite/gcc.target/riscv/synthesis-3.c| 2 +-
>  gcc/testsuite/gcc.target/riscv/synthesis-4.c| 2 +-
>  gcc/testsuite/gcc.target/riscv/synthesis-5.c| 2 +-
>  gcc/testsuite/gcc.target/riscv/synthesis-6.c| 2 +-
>  gcc/testsuite/gcc.target/riscv/synthesis-7.c| 2 +-
>  gcc/testsuite/gcc.target/riscv/synthesis-8.c| 2 +-
>  gcc/testsuite/gcc.target/riscv/zba_zbs_and-1.c  | 2 +-
>  gcc/testsuite/gcc.target/riscv/zbs-zext-3.c | 4 ++--
>  gcc/testsuite/lib/target-supports.exp   | 2 +-
>  19 files changed, 26 insertions(+), 18 deletions(-)

My toolchain fails to bootstrap now since this requires a capable binutils.
Can we do something along the lines of what was done for A extension split ?

Thx,
-Vineet



RE: [PATCH] RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark

2024-07-12 Thread Li, Pan2
Thanks Jeff and Edwin for my silly mistake.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, July 13, 2024 5:40 AM
To: Edwin Lu ; gcc-patches@gcc.gnu.org
Cc: Li, Pan2 ; gnu-toolch...@rivosinc.com
Subject: Re: [PATCH] RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark



On 7/12/24 12:37 PM, Edwin Lu wrote:
> The following testcase was not properly testing anything due to an
> uninitialized variable. As a result, the loop was not iterating through
> the testing data, but instead on undefined values which could cause an
> unexpected abort.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h:
>   initialize variable
OK.  Thanks for chasing this down.

jeff



[PATCH v4] RISC-V: use fclass insns to implement isfinite, isnormal and isinf builtins

2024-07-12 Thread Vineet Gupta
Changes since v3:
  - Remove '*' from define_insn for fclass
  - Remove the dummy expander for fclass.
  - De-duplicate the expanders code by using a helper which takes fclass
val.

Changes since v2:
  - fclass define_insn tightened to check op0 mode "X" with additional
expander w/o mode for callers.
  - builtins expander explicit mode check and FAIL if mode not appropriate.
  - subreg promoted handling to elide potential extension of ret val.
  - Added isinf builtin with bimodal retun value as others.

Changes since v1:
  - Removed UNSPEC_{INFINITE,ISNORMAL}
  - Don't hardcode SI in patterns, try to keep X to avoid potential
sign extension pitfalls. Implementation wise requires skipping
:MODE specifier in match_operand which is flagged as missing mode
warning.
---

Currently thsse builtins use float compare instructions which require
FP flags to be save/restore around them.
Our perf team complained this could be costly in uarch.
RV Base ISA already has FCLASS.{d,s,h} instruction to compare/identify FP
values w/o disturbing FP exception flags.

Coincidently, upstream very recently got support for the corresponding
optabs. So this just requires wiring up in the backend.

Tested for rv64, one additioal failure g++.dg/opt/pr107569.C needs
upstream ranger fix for the new optab.

gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_emit_fp_classify): New
function declaration.
* config/riscv/riscv.cc (riscv_emit_fp_classify): New helper for
the expanders.
* config/riscv/riscv.md: Add UNSPEC_FCLASS.
define_insn for fclass insn.
define_expand for isfinite, isnormal, isinf.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/fclass.c: New tests.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv.cc   | 51 
 gcc/config/riscv/riscv.md   | 63 +
 gcc/testsuite/gcc.target/riscv/fclass.c | 38 +++
 4 files changed, 153 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/fclass.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index a8b76173fa0f..b49cd5cd5a91 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -170,6 +170,7 @@ extern enum memmodel riscv_union_memmodels (enum memmodel, 
enum memmodel);
 extern bool riscv_reg_frame_related (rtx);
 extern void riscv_split_sum_of_two_s12 (HOST_WIDE_INT, HOST_WIDE_INT *,
HOST_WIDE_INT *);
+extern bool riscv_emit_fp_classify (rtx, rtx, HOST_WIDE_INT);
 
 /* Routines implemented in riscv-c.cc.  */
 void riscv_cpu_cpp_builtins (cpp_reader *);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9bba5da016e9..3d6bd29f42e9 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -4865,6 +4865,57 @@ riscv_expand_float_scc (rtx target, enum rtx_code code, 
rtx op0, rtx op1,
 riscv_emit_binary (code, target, op0, op1);
 }
 
+/* Helper for FP classify builtin expanders.
+   For fp value OP1, generate fclass insn with int OP0 output bitval.
+   If output matches FCLASS_VAL (also bitval) generate 1 as
+   builtin output, 0 otherwise.
+   Return false on any failures, true otherwise.  */
+
+bool
+riscv_emit_fp_classify (rtx op0, rtx op1, HOST_WIDE_INT fclass_val)
+{
+  /* Allow SI for rv32/rv64 and DI for rv64.  */
+  if (GET_MODE (op0) != SImode
+  && GET_MODE (op0) != word_mode)
+return false;
+
+  rtx t = gen_reg_rtx (word_mode);
+  rtx t_op0 = gen_reg_rtx (word_mode);
+
+  if (GET_MODE (op1) == SFmode && TARGET_64BIT)
+emit_insn (gen_fclass_sfdi (t, op1));
+  else if (GET_MODE (op1) == SFmode)
+emit_insn (gen_fclass_sfsi (t, op1));
+  else if (GET_MODE (op1) == DFmode && TARGET_64BIT)
+emit_insn (gen_fclass_dfdi (t, op1));
+  else if (GET_MODE (op1) == DFmode)
+emit_insn (gen_fclass_dfsi (t, op1));
+  else if (GET_MODE (op1) == HFmode && TARGET_64BIT)
+emit_insn (gen_fclass_hfdi (t, op1));
+  else if (GET_MODE (op1) == HFmode)
+emit_insn (gen_fclass_hfsi (t, op1));
+  else
+gcc_unreachable ();
+
+  riscv_emit_binary (AND, t, t, GEN_INT (fclass_val));
+  rtx cmp = gen_rtx_NE (word_mode, t, const0_rtx);
+
+  if (TARGET_64BIT)
+emit_insn (gen_cstoredi4 (t_op0, cmp, t, const0_rtx));
+  else
+emit_insn (gen_cstoresi4 (t_op0, cmp, t, const0_rtx));
+
+  if (TARGET_64BIT)
+{
+  t_op0 = gen_lowpart (SImode, t_op0);
+  SUBREG_PROMOTED_VAR_P (t_op0) = 1;
+  SUBREG_PROMOTED_SET (t_op0, SRP_SIGNED);
+}
+
+  emit_move_insn (op0, t_op0);
+  return true;
+}
+
 /* Jump to LABEL if (CODE OP0 OP1) holds.  */
 
 void
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index ff37125e3f28..d67e6908a47c 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -68,6 +68,7 @@
   UNSPEC_FMAX
   UNSPEC_FMINM
   UNSPEC_FMAXM
+  UNSPEC_FCLASS
 
   ;; Stack

Re: [PATCH v3] RISC-V: use fclass insns to implement isfinite,isnormal and isinf builtins

2024-07-12 Thread Vineet Gupta
On 7/12/24 14:52, Jeff Law wrote:
>> +(define_insn "*fclass"
>> +  [(set (match_operand:X 0 "register_operand" "=r")
>> +(unspec [(match_operand:ANYF 1 "register_operand" " f")]
>> +   UNSPEC_FCLASS))]
>> +  "TARGET_HARD_FLOAT"
>> +  "fclass.\t%0,%1";
>> +  [(set_attr "type" "fcmp")
>> +   (set_attr "mode" "")])
>> +
>> +;; Implementation note:
>> +;;  Indirection via the expander needed due to md syntax limitations.
>> +;;  define_insn needs tighter mode check (X for operand0) requiring 
>> +;;  in name.  This forces callers like isfinite to specify mode but can't 
>> due
>> +;;  to their own standard pattern name requirement.  The additional expander
>> +;;  with matching RTL template (but elided mode check) allows callers to use
>> +;;  expander's name with actual asm coming from stricter define_insn.
>> +
>> +(define_expand "fclass"
>> +  [(set (match_operand   0 "register_operand" "=r")
>> +(unspec [(match_operand:ANYF 1 "register_operand" " f")]
>> +   UNSPEC_FCLASS))]
>> +  "TARGET_HARD_FLOAT")
> Note that for fclass, which isn't a standard pattern name, you have 
> additional text in the same -- say for example a mode.  It's just those 
> standard names defined in md.texi where we have naming restrictions.
>
> The net is you can avoid the fclass expander entirely.   So you can drop 
> the '*' in the fclass pattern. 

Note to myself (after verifying):
 - dropping of '*' is also needed to be able to call them explicitly
from other expanders.
 - '*' prefix patterns are purely for internal recog()

> Then call it as
> gen_fclassdi or gen_fclasssi in the is* expanders.

Meaning this requires callers to be

  if (TARGET_64BIT)
    emit_insn (gen_fclassdi (t, operands[1]));
  else
    emit_insn (gen_fclasssi (t, operands[1]));

since they can't do the following due to their own Std pattern name
reqmt lest redefinition issue.

  emit_insn (gen_fclass (t, operands[1]));

>
>> +   /* Allow SI for rv32/rv64 and DI for rv64
>> +  Explicit mode check (vs. specfying modes in RTL template 
>> match_operand)
>> +  due to syntax limitations.
>> +   - Specifying X would cause multiple definitions
>> +   - And to disambuguate that requires adding  to name which
>> + no longer matches the "standard pattern name".  */
>> +  if (GET_MODE (operands[0]) != SImode
>> +  && GET_MODE (operands[0]) != word_mode)
>> +FAIL;
>> +
>> +  rtx t = gen_reg_rtx (word_mode);
>> +  rtx t_op0 = gen_reg_rtx (word_mode);
>> +
>> +  emit_insn (gen_fclass (t, operands[1]));
>> +  riscv_emit_binary (AND, t, t, GEN_INT (0x7e));
>> +  rtx cmp = gen_rtx_NE (word_mode, t, const0_rtx);
>> +  emit_insn (gen_cstore4 (t_op0, cmp, t, const0_rtx));
>> +
>> +  if (TARGET_64BIT)
>> +{
>> +  t_op0 = gen_lowpart (SImode, t_op0);
>> +  SUBREG_PROMOTED_VAR_P (t_op0) = 1;
>> +  SUBREG_PROMOTED_SET (t_op0, SRP_SIGNED);
>> +}
>> +
>> +  emit_move_insn (operands[0], t_op0);
>> +  DONE;
>> +})
> So this is repeated nearly verbatim in all 3 is* expanders.  Seems ripe 
> for refactoring where you pass in the magic constant (which is all that 
> seems to change across those code fragments).

Yeah I should have thought about it myself. The prior tri-modal return
was different but now clearly this can all be combined.
I was hoping to use int iterators, but it seems this will have be a
helper in riscv.cc which is called from 3 explicit expanders.

> Hate to ask for another spin, but the refactoring seems like its worth 
> doing to avoid the duplication.  

Not a problem, I'll eventually get the md syntax :-)

> And if we're going to spin again, might 
> as well drop the fclass expander that we really don't need.

Of course.

-Vineet


[pushed] doc: Update GNU Modula 2 mailing list links

2024-07-12 Thread Gerald Pfeifer
Pushed.

Gerald


gcc:
* doc/gm2.texi (Community): Update lists.nongnu.org and
lists.gnu.org links.
---
 gcc/doc/gm2.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/gm2.texi b/gcc/doc/gm2.texi
index c532339fbb8..5bff9eb3829 100644
--- a/gcc/doc/gm2.texi
+++ b/gcc/doc/gm2.texi
@@ -3014,9 +3014,9 @@ You can subscribe to the GNU Modula-2 mailing by sending 
an
 email to:
 @email{gm2-subscribe@@nongnu.org}
 or by
-@url{http://lists.nongnu.org/mailman/listinfo/gm2}.
+@url{https://lists.nongnu.org/mailman/listinfo/gm2}.
 The mailing list contents can be viewed
-@url{http://lists.gnu.org/archive/html/gm2}.
+@url{https://lists.gnu.org/archive/html/gm2}.
 
 @node Other languages, , Community, Using
 @section Other languages for GCC
-- 
2.45.2


Re: [PATCH v9 05/10] C: Implement musttail attribute for returns

2024-07-12 Thread Marek Polacek
On Mon, Jul 08, 2024 at 09:55:57AM -0700, Andi Kleen wrote:
> Implement a C23 clang compatible musttail attribute similar to the earlier
> C++ implementation in the C parser.

Sorry that we haven't reviewed the C parts before.

> 
>   PR83324

I don't think this is going to pass the pre-commit check; better write it as
PR c/83324.
 
> gcc/c/ChangeLog:
> 
>   * c-parser.cc (struct attr_state): Define with musttail_p.
>   (c_parser_statement_after_labels): Handle [[musttail]]

Missing a '.'.

>   (c_parser_std_attribute): Dito.
>   (c_parser_handle_musttail): Dito.
>   (c_parser_compound_statement_nostart): Dito.
>   (c_parser_all_labels): Dito.
>   (c_parser_statement): Dito.
>   * c-tree.h (c_finish_return): Add musttail_p flag.
>   * c-typeck.cc (c_finish_return): Handle musttail_p flag.
> ---
>  gcc/c/c-parser.cc | 59 +--
>  gcc/c/c-tree.h|  2 +-
>  gcc/c/c-typeck.cc | 15 ++--
>  3 files changed, 61 insertions(+), 15 deletions(-)
> 
> diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
> index 8c4e697a4e10..ce1c2c2be835 100644
> --- a/gcc/c/c-parser.cc
> +++ b/gcc/c/c-parser.cc
> @@ -1621,6 +1621,11 @@ struct omp_for_parse_data {
>bool fail : 1;
>  };
>  
> +struct attr_state
> +{
> +  bool musttail_p; // parsed a musttail for return

This should follow the usual comment conventions:
  /* True if we parsed a musttail attribute for return.  */

> +};
> +
>  static bool c_parser_nth_token_starts_std_attributes (c_parser *,
> unsigned int);
>  static tree c_parser_std_attribute_specifier_sequence (c_parser *);
> @@ -1665,7 +1670,7 @@ static location_t c_parser_compound_statement_nostart 
> (c_parser *);
>  static void c_parser_label (c_parser *, tree);
>  static void c_parser_statement (c_parser *, bool *, location_t * = NULL);
>  static void c_parser_statement_after_labels (c_parser *, bool *,
> -  vec * = NULL);
> +  vec * = NULL, attr_state = 
> {});
>  static tree c_parser_c99_block_statement (c_parser *, bool *,
> location_t * = NULL);
>  static void c_parser_if_statement (c_parser *, bool *, vec *);
> @@ -6982,6 +6987,28 @@ c_parser_handle_directive_omp_attributes (tree &attrs,
>  }
>  }
>  
> +/* Check if STD_ATTR contains a musttail attribute and handle it

Missing a '.'.  Also, "handle" is pretty generic, maybe say "and remove it
if it precedes a return"?

> +   PARSER is the parser and A is the output attr_state.  */
> +
> +static tree
> +c_parser_handle_musttail (c_parser *parser, tree std_attrs, attr_state &a)
> +{
> +  if (c_parser_next_token_is_keyword (parser, RID_RETURN))

The point of this check is to keep a musttail attribute around to give
a diagnostic for code like [[gnu::musttail]] 42;, correct?

> +{
> +  if (lookup_attribute ("gnu", "musttail", std_attrs))
> + {
> +   std_attrs = remove_attribute ("gnu", "musttail", std_attrs);
> +   a.musttail_p = true;
> + }
> +  if (lookup_attribute ("clang", "musttail", std_attrs))
> + {
> +   std_attrs = remove_attribute ("clang", "musttail", std_attrs);
> +   a.musttail_p = true;
> + }
> +}
> +  return std_attrs;
> +}
> +
>  /* Parse a compound statement except for the opening brace.  This is
> used for parsing both compound statements and statement expressions
> (which follow different paths to handling the opening).  */
> @@ -6998,6 +7025,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
>bool in_omp_loop_block
>  = omp_for_parse_state ? omp_for_parse_state->want_nested_loop : false;
>tree sl = NULL_TREE;
> +  attr_state a = {};
>  
>if (c_parser_next_token_is (parser, CPP_CLOSE_BRACE))
>  {
> @@ -7138,7 +7166,10 @@ c_parser_compound_statement_nostart (c_parser *parser)
>   = c_parser_nth_token_starts_std_attributes (parser, 1);
>tree std_attrs = NULL_TREE;
>if (have_std_attrs)
> - std_attrs = c_parser_std_attribute_specifier_sequence (parser);
> + {
> +   std_attrs = c_parser_std_attribute_specifier_sequence (parser);
> +   std_attrs = c_parser_handle_musttail (parser, std_attrs, a);
> + }
>if (c_parser_next_token_is_keyword (parser, RID_CASE)
> || c_parser_next_token_is_keyword (parser, RID_DEFAULT)
> || (c_parser_next_token_is (parser, CPP_NAME)
> @@ -7286,7 +7317,7 @@ c_parser_compound_statement_nostart (c_parser *parser)
> last_stmt = true;
> mark_valid_location_for_stdc_pragma (false);
> if (!omp_for_parse_state)
> - c_parser_statement_after_labels (parser, NULL);
> + c_parser_statement_after_labels (parser, NULL, NULL, a);
> else
>   {
> /* In canonical loop nest form, nested loops can only appear
> @@ -7328,15 +7359,18 @@ c_parser_compound_statement_no

Re: [PATCH v3] RISC-V: use fclass insns to implement isfinite,isnormal and isinf builtins

2024-07-12 Thread Jeff Law




On 7/12/24 3:12 PM, Vineet Gupta wrote:

Changes since v2:
   - fclass define_insn tightened to check op0 mode "X" with additional
 expander w/o mode for callers.
   - builtins expander explicit mode check and FAIL if mode not appropriate.
   - subreg promoted handling to elide potential extension of ret val.
   - Added isinf builtin with bimodal retun value as others.

Changes since v1:
   - Removed UNSPEC_{INFINITE,ISNORMAL}
   - Don't hardcode SI in patterns, try to keep X to avoid potential
 sign extension pitfalls. Implementation wise requires skipping
 :MODE specifier in match_operand which is flagged as missing mode
 warning.
---

Currently thsse builtins use float compare instructions which require
FP flags to be save/restore around them.
Our perf team complained this could be costly in uarch.
RV Base ISA already has FCLASS.{d,s,h} instruction to compare/identify FP
values w/o disturbing FP exception flags.

Coincidently, upstream very recently got support for the corresponding
optabs. So this just requires wiring up in the backend.

Tested for rv64, one additioal failure g++.dg/opt/pr107569.C needs
upstream ranger fix for the new optab.

gcc/ChangeLog:
* config/riscv/riscv.md: Add UNSPEC_FCLASS.
define_insn and define_expand for fclass.
define_expand for isfinite, isnormal, isinf.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/fclass.c: New tests.

Signed-off-by: Vineet Gupta 
---
  gcc/config/riscv/riscv.md   | 134 
  gcc/testsuite/gcc.target/riscv/fclass.c |  38 +++
  2 files changed, 172 insertions(+)
  create mode 100644 gcc/testsuite/gcc.target/riscv/fclass.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index ff37125e3f28..c67e5129753a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -68,6 +68,7 @@
UNSPEC_FMAX
UNSPEC_FMINM
UNSPEC_FMAXM
+  UNSPEC_FCLASS
  
;; Stack tie

UNSPEC_TIE
@@ -3436,6 +3437,139 @@
 (set_attr "mode" "")
 (set (attr "length") (const_int 16))])
  
+;; fclass instruction output bitmap

+;;   0 negative infinity
+;;   1 negative normal number.
+;;   2 negative subnormal number.
+;;   3 -0
+;;   4 +0
+;;   5 positive subnormal number.
+;;   6 positive normal number.
+;;   7 positive infinity
+;;   8 signaling NaN.
+;;   9 quiet NaN
+
+(define_insn "*fclass"
+  [(set (match_operand:X0 "register_operand" "=r")
+   (unspec [(match_operand:ANYF 1 "register_operand" " f")]
+  UNSPEC_FCLASS))]
+  "TARGET_HARD_FLOAT"
+  "fclass.\t%0,%1";
+  [(set_attr "type" "fcmp")
+   (set_attr "mode" "")])
+
+;; Implementation note:
+;;  Indirection via the expander needed due to md syntax limitations.
+;;  define_insn needs tighter mode check (X for operand0) requiring 
+;;  in name.  This forces callers like isfinite to specify mode but can't due
+;;  to their own standard pattern name requirement.  The additional expander
+;;  with matching RTL template (but elided mode check) allows callers to use
+;;  expander's name with actual asm coming from stricter define_insn.
+
+(define_expand "fclass"
+  [(set (match_operand  0 "register_operand" "=r")
+   (unspec [(match_operand:ANYF 1 "register_operand" " f")]
+  UNSPEC_FCLASS))]
+  "TARGET_HARD_FLOAT")
Note that for fclass, which isn't a standard pattern name, you have 
additional text in the same -- say for example a mode.  It's just those 
standard names defined in md.texi where we have naming restrictions.


The net is you can avoid the fclass expander entirely.   So you can drop 
the '*' in the fclass pattern.  Then call it as

gen_fclassdi or gen_fclasssi in the is* expanders.


+   /* Allow SI for rv32/rv64 and DI for rv64
+  Explicit mode check (vs. specfying modes in RTL template match_operand)
+  due to syntax limitations.
+   - Specifying X would cause multiple definitions
+   - And to disambuguate that requires adding  to name which
+no longer matches the "standard pattern name".  */
+  if (GET_MODE (operands[0]) != SImode
+  && GET_MODE (operands[0]) != word_mode)
+FAIL;
+
+  rtx t = gen_reg_rtx (word_mode);
+  rtx t_op0 = gen_reg_rtx (word_mode);
+
+  emit_insn (gen_fclass (t, operands[1]));
+  riscv_emit_binary (AND, t, t, GEN_INT (0x7e));
+  rtx cmp = gen_rtx_NE (word_mode, t, const0_rtx);
+  emit_insn (gen_cstore4 (t_op0, cmp, t, const0_rtx));
+
+  if (TARGET_64BIT)
+{
+  t_op0 = gen_lowpart (SImode, t_op0);
+  SUBREG_PROMOTED_VAR_P (t_op0) = 1;
+  SUBREG_PROMOTED_SET (t_op0, SRP_SIGNED);
+}
+
+  emit_move_insn (operands[0], t_op0);
+  DONE;
+})
So this is repeated nearly verbatim in all 3 is* expanders.  Seems ripe 
for refactoring where you pass in the magic constant (which is all that 
seems to change across those code fragments).



Hate to ask for another spin, but the refactoring seems like its worth 
doing to avoid the duplication.  And if we're go

[PATCH v2] rs6000: Fix .machine cpu selection w/ altivec [PR97367]

2024-07-12 Thread Peter Bergner
René's patch seems to have stalled, so here is an updated version of the
patch with the requested changes to his patch.

I'll note I have added an additional code change, which is to also emit a
".machine altivec" if Altivec is enabled.  The problem this fixes is for
cpus like the G5, which is basically a power4 plus an Altivec unit, its
".machine power4" doesn't enable the assembler to recognize Altivec insns.
That isn't a problem if you use gcc -mcpu=G5 to assemble the assembler file,
since gcc passes -maltivec to the assembler.  However, if you try to assemble
the assembler file with as by hand, you'll get "unrecognized opcode" errors.
I did not do the same for VSX, since all ".machine " for cpus that
support VSX already enable VSX insn recognition, so it's not needed.


rs6000: Fix .machine cpu selection w/ altivec [PR97367]

There are various non-IBM CPUs with altivec, so we cannot use that
flag to determine which .machine cpu to use, so ignore it.
Emit an additional ".machine altivec" if Altivec is enabled so
that the assembler doesn't require an explicit -maltivec option
to assemble any Altivec instructions for those targets where
the ".machine cpu" is insufficient to enable Altivec.  For example,
-mcpu=G5 emits a ".machine power4".

This passed bootstrap and regtesting on powrpc64-linux (running the testsuite
in both 32-bit and 64-bit modes) with no regressions.

Ok for trunk and the release branches after some trunk burn-in time?

Peter


2024-07-12  René Rebe  
Peter Bergner  

gcc/
PR target/97367
* config/rs6000/rs6000.c (rs6000_machine_from_flags): Do not consider
OPTION_MASK_ALTIVEC.
(emit_asm_machine): For Altivec compiles, emit a ".machine altivec".

gcc/testsuite/
PR target/97367
* gcc.target/powerpc/pr97367.c: New test.

Signed-of-by: René Rebe 
---
 gcc/config/rs6000/rs6000.cc|  5 -
 gcc/testsuite/gcc.target/powerpc/pr97367.c | 13 +
 2 files changed, 17 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr97367.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 2cbea6ea2d7..2cb8f35739b 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -5888,7 +5888,8 @@ rs6000_machine_from_flags (void)
   HOST_WIDE_INT flags = rs6000_isa_flags;
 
   /* Disable the flags that should never influence the .machine selection.  */
-  flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | 
OPTION_MASK_ISEL);
+  flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | OPTION_MASK_ISEL
+| OPTION_MASK_ALTIVEC);
 
   if ((flags & (ISA_3_1_MASKS_SERVER & ~ISA_3_0_MASKS_SERVER)) != 0)
 return "power10";
@@ -5913,6 +5914,8 @@ void
 emit_asm_machine (void)
 {
   fprintf (asm_out_file, "\t.machine %s\n", rs6000_machine);
+  if (TARGET_ALTIVEC)
+fprintf (asm_out_file, "\t.machine altivec\n");
 }
 #endif
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97367.c 
b/gcc/testsuite/gcc.target/powerpc/pr97367.c
new file mode 100644
index 000..f9118dbcdec
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97367.c
@@ -0,0 +1,13 @@
+/* PR target/97367 */
+/* { dg-options "-mdejagnu-cpu=G5" } */
+
+/* Verify we emit a ".machine power4" and ".machine altivec" rather
+   than a ".machine power7".  */
+
+int dummy (void)
+{
+  return 0;
+}
+
+/* { dg-final { scan-assembler {\.\mmachine power4\M} } } */
+/* { dg-final { scan-assembler {\.\mmachine altivec\M} } } */
-- 
2.45.2



Re: [PATCH] RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark

2024-07-12 Thread Jeff Law




On 7/12/24 12:37 PM, Edwin Lu wrote:

The following testcase was not properly testing anything due to an
uninitialized variable. As a result, the loop was not iterating through
the testing data, but instead on undefined values which could cause an
unexpected abort.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h:
initialize variable

OK.  Thanks for chasing this down.

jeff



[PATCH v3] RISC-V: use fclass insns to implement isfinite, isnormal and isinf builtins

2024-07-12 Thread Vineet Gupta
Changes since v2:
  - fclass define_insn tightened to check op0 mode "X" with additional
expander w/o mode for callers.
  - builtins expander explicit mode check and FAIL if mode not appropriate.
  - subreg promoted handling to elide potential extension of ret val.
  - Added isinf builtin with bimodal retun value as others.

Changes since v1:
  - Removed UNSPEC_{INFINITE,ISNORMAL}
  - Don't hardcode SI in patterns, try to keep X to avoid potential
sign extension pitfalls. Implementation wise requires skipping
:MODE specifier in match_operand which is flagged as missing mode
warning.
---

Currently thsse builtins use float compare instructions which require
FP flags to be save/restore around them.
Our perf team complained this could be costly in uarch.
RV Base ISA already has FCLASS.{d,s,h} instruction to compare/identify FP
values w/o disturbing FP exception flags.

Coincidently, upstream very recently got support for the corresponding
optabs. So this just requires wiring up in the backend.

Tested for rv64, one additioal failure g++.dg/opt/pr107569.C needs
upstream ranger fix for the new optab.

gcc/ChangeLog:
* config/riscv/riscv.md: Add UNSPEC_FCLASS.
define_insn and define_expand for fclass.
define_expand for isfinite, isnormal, isinf.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/fclass.c: New tests.

Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.md   | 134 
 gcc/testsuite/gcc.target/riscv/fclass.c |  38 +++
 2 files changed, 172 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/fclass.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index ff37125e3f28..c67e5129753a 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -68,6 +68,7 @@
   UNSPEC_FMAX
   UNSPEC_FMINM
   UNSPEC_FMAXM
+  UNSPEC_FCLASS
 
   ;; Stack tie
   UNSPEC_TIE
@@ -3436,6 +3437,139 @@
(set_attr "mode" "")
(set (attr "length") (const_int 16))])
 
+;; fclass instruction output bitmap
+;;   0 negative infinity
+;;   1 negative normal number.
+;;   2 negative subnormal number.
+;;   3 -0
+;;   4 +0
+;;   5 positive subnormal number.
+;;   6 positive normal number.
+;;   7 positive infinity
+;;   8 signaling NaN.
+;;   9 quiet NaN
+
+(define_insn "*fclass"
+  [(set (match_operand:X0 "register_operand" "=r")
+   (unspec [(match_operand:ANYF 1 "register_operand" " f")]
+  UNSPEC_FCLASS))]
+  "TARGET_HARD_FLOAT"
+  "fclass.\t%0,%1";
+  [(set_attr "type" "fcmp")
+   (set_attr "mode" "")])
+
+;; Implementation note:
+;;  Indirection via the expander needed due to md syntax limitations.
+;;  define_insn needs tighter mode check (X for operand0) requiring 
+;;  in name.  This forces callers like isfinite to specify mode but can't due
+;;  to their own standard pattern name requirement.  The additional expander
+;;  with matching RTL template (but elided mode check) allows callers to use
+;;  expander's name with actual asm coming from stricter define_insn.
+
+(define_expand "fclass"
+  [(set (match_operand  0 "register_operand" "=r")
+   (unspec [(match_operand:ANYF 1 "register_operand" " f")]
+  UNSPEC_FCLASS))]
+  "TARGET_HARD_FLOAT")
+
+;; Implements optab for isfinite
+
+(define_expand "isfinite2"
+  [(set (match_operand  0 "register_operand" "=r")
+   (match_operand:ANYF 1 "register_operand" " f"))]
+  "TARGET_HARD_FLOAT"
+{
+   /* Allow SI for rv32/rv64 and DI for rv64
+  Explicit mode check (vs. specfying modes in RTL template match_operand)
+  due to syntax limitations.
+   - Specifying X would cause multiple definitions
+   - And to disambuguate that requires adding  to name which
+no longer matches the "standard pattern name".  */
+  if (GET_MODE (operands[0]) != SImode
+  && GET_MODE (operands[0]) != word_mode)
+FAIL;
+
+  rtx t = gen_reg_rtx (word_mode);
+  rtx t_op0 = gen_reg_rtx (word_mode);
+
+  emit_insn (gen_fclass (t, operands[1]));
+  riscv_emit_binary (AND, t, t, GEN_INT (0x7e));
+  rtx cmp = gen_rtx_NE (word_mode, t, const0_rtx);
+  emit_insn (gen_cstore4 (t_op0, cmp, t, const0_rtx));
+
+  if (TARGET_64BIT)
+{
+  t_op0 = gen_lowpart (SImode, t_op0);
+  SUBREG_PROMOTED_VAR_P (t_op0) = 1;
+  SUBREG_PROMOTED_SET (t_op0, SRP_SIGNED);
+}
+
+  emit_move_insn (operands[0], t_op0);
+  DONE;
+})
+
+;; Implements optab for isnormal
+
+(define_expand "isnormal2"
+  [(set (match_operand  0 "register_operand" "=r")
+   (match_operand:ANYF 1 "register_operand" " f"))]
+  "TARGET_HARD_FLOAT"
+{
+  if (GET_MODE (operands[0]) != SImode
+  && GET_MODE (operands[0]) != word_mode)
+FAIL;
+
+  rtx t = gen_reg_rtx (word_mode);
+  rtx t_op0 = gen_reg_rtx (word_mode);
+
+  emit_insn (gen_fclass (t, operands[1]));
+  riscv_emit_binary (AND, t, t, GEN_INT (0x42));
+  rtx cmp = gen_rtx_NE (word_mode, t, const0_rtx);
+  emit_insn (gen_cstore4 (t_op0, cmp, t

Re: [PATCH v1] c++: Hash placeholder constraint in ctp_hasher

2024-07-12 Thread Seyed Sajad Kahani
I am sorry for the inconvenience, a fixed version was sent just now.


[PATCH v1] c++: Hash placeholder constraint in ctp_hasher

2024-07-12 Thread Seyed Sajad Kahani
This patch addresses a difference between the hash function and the equality
function for canonical types of template parameters (ctp_hasher). The equality
function uses comptypes (typeck.cc) (with COMPARE_STRUCTURAL) and checks
constraint equality for two auto nodes (typeck.cc:1586), while the hash
function ignores it (pt.cc:4528). This leads to hash collisions that can be
avoided by using `hash_placeholder_constraint` (constraint.cc:1150).

* constraint.cc (hash_placeholder_constraint): Rename to
iterative_hash_placeholder_constraint.
(iterative_hash_placeholder_constraint): Rename from
hash_placeholder_constraint and add the initial val argument.
* cp-tree.h (hash_placeholder_constraint): Rename to
iterative_hash_placeholder_constraint.
(iterative_hash_placeholder_constraint): Renamed from
hash_placeholder_constraint and add the initial val argument.
* pt.cc (struct ctp_hasher): Updated to use
iterative_hash_placeholder_constraint in the case of a valid placeholder
constraint.
(auto_hash::hash): Reflect the renaming of hash_placeholder_constraint 
to
iterative_hash_placeholder_constraint.
---
 gcc/cp/constraint.cc | 4 ++--
 gcc/cp/cp-tree.h | 2 +-
 gcc/cp/pt.cc | 9 +++--
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index ebf4255e5..78aacb77a 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1971,13 +1971,13 @@ equivalent_placeholder_constraints (tree c1, tree c2)
 /* Return a hash value for the placeholder ATOMIC_CONSTR C.  */
 
 hashval_t
-hash_placeholder_constraint (tree c)
+iterative_hash_placeholder_constraint (tree c, hashval_t val)
 {
   tree t, a;
   placeholder_extract_concept_and_args (c, t, a);
 
   /* Like hash_tmpl_and_args, but skip the first argument.  */
-  hashval_t val = iterative_hash_object (DECL_UID (t), 0);
+  val = iterative_hash_object (DECL_UID (t), val);
 
   for (int i = TREE_VEC_LENGTH (a)-1; i > 0; --i)
 val = iterative_hash_template_arg (TREE_VEC_ELT (a, i), val);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 4bb3e9c49..294e88f75 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8588,7 +8588,7 @@ extern tree_pair finish_type_constraints  (tree, tree, 
tsubst_flags_t);
 extern tree build_constrained_parameter (tree, tree, tree = NULL_TREE);
 extern void placeholder_extract_concept_and_args (tree, tree&, tree&);
 extern bool equivalent_placeholder_constraints  (tree, tree);
-extern hashval_t hash_placeholder_constraint   (tree);
+extern hashval_t iterative_hash_placeholder_constraint (tree, hashval_t);
 extern bool deduce_constrained_parameter(tree, tree&, tree&);
 extern tree resolve_constraint_check(tree);
 extern tree check_function_concept  (tree);
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d1316483e..9a80c44a5 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -4525,7 +4525,12 @@ struct ctp_hasher : ggc_ptr_hash
 val = iterative_hash_object (TEMPLATE_TYPE_LEVEL (t), val);
 val = iterative_hash_object (TEMPLATE_TYPE_IDX (t), val);
 if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
-  val = iterative_hash_template_arg (CLASS_PLACEHOLDER_TEMPLATE (t), val);
+  {
+   val 
+ = iterative_hash_template_arg (CLASS_PLACEHOLDER_TEMPLATE (t), val);
+   if (tree c = NON_ERROR (PLACEHOLDER_TYPE_CONSTRAINTS (t)))
+ val = iterative_hash_placeholder_constraint(c, val);
+  }
 if (TREE_CODE (t) == BOUND_TEMPLATE_TEMPLATE_PARM)
   val = iterative_hash_template_arg (TYPE_TI_ARGS (t), val);
 --comparing_specializations;
@@ -29605,7 +29610,7 @@ auto_hash::hash (tree t)
   if (tree c = NON_ERROR (PLACEHOLDER_TYPE_CONSTRAINTS (t)))
 /* Matching constrained-type-specifiers denote the same template
parameter, so hash the constraint.  */
-return hash_placeholder_constraint (c);
+return iterative_hash_placeholder_constraint (c, 0);
   else
 /* But unconstrained autos are all separate, so just hash the pointer.  */
 return iterative_hash_object (t, 0);
-- 
2.45.2



Re: [PATCH v1] c++: Hash placeholder constraint in ctp_hasher

2024-07-12 Thread Andrew Pinski
On Tue, Jul 9, 2024 at 6:46 AM Seyed Sajad Kahani  wrote:
>
> This patch addresses a difference between the hash function and the equality
> function for canonical types of template parameters (ctp_hasher). The equality
> function uses comptypes (typeck.cc) (with COMPARE_STRUCTURAL) and checks
> constraint equality for two auto nodes (typeck.cc:1586), while the hash
> function ignores it (pt.cc:4528). This leads to hash collisions that can be
> avoided by using `hash_placeholder_constraint` (constraint.cc:1150).

By the way your mailer looks to have mangled the patch by removing white spaces.

Thanks,
Andrew

>
> * constraint.cc (hash_placeholder_constraint): Rename to
> iterative_hash_placeholder_constraint.
> (iterative_hash_placeholder_constraint): Rename from
> hash_placeholder_constraint and add the initial val argument.
> * cp-tree.h (hash_placeholder_constraint): Rename to
> iterative_hash_placeholder_constraint.
> (iterative_hash_placeholder_constraint): Renamed from
> hash_placeholder_constraint and add the initial val argument.
> * pt.cc (struct ctp_hasher): Updated to use
> iterative_hash_placeholder_constraint in the case of a valid placeholder
> constraint.
> (auto_hash::hash): Reflect the renaming of hash_placeholder_constraint to
> iterative_hash_placeholder_constraint.
> ---
> gcc/cp/constraint.cc | 4 ++--
> gcc/cp/cp-tree.h | 2 +-
> gcc/cp/pt.cc | 9 +++--
> 3 files changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index ebf4255e5..78aacb77a 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -1971,13 +1971,13 @@ equivalent_placeholder_constraints (tree c1, tree c2)
> /* Return a hash value for the placeholder ATOMIC_CONSTR C. */
> hashval_t
> -hash_placeholder_constraint (tree c)
> +iterative_hash_placeholder_constraint (tree c, hashval_t val)
> {
> tree t, a;
> placeholder_extract_concept_and_args (c, t, a);
> /* Like hash_tmpl_and_args, but skip the first argument. */
> - hashval_t val = iterative_hash_object (DECL_UID (t), 0);
> + val = iterative_hash_object (DECL_UID (t), val);
> for (int i = TREE_VEC_LENGTH (a)-1; i > 0; --i)
> val = iterative_hash_template_arg (TREE_VEC_ELT (a, i), val);
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 4bb3e9c49..294e88f75 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -8588,7 +8588,7 @@ extern tree_pair finish_type_constraints (tree,
> tree, tsubst_flags_t);
> extern tree build_constrained_parameter (tree, tree, tree = NULL_TREE);
> extern void placeholder_extract_concept_and_args (tree, tree&, tree&);
> extern bool equivalent_placeholder_constraints (tree, tree);
> -extern hashval_t hash_placeholder_constraint (tree);
> +extern hashval_t iterative_hash_placeholder_constraint (tree, hashval_t);
> extern bool deduce_constrained_parameter (tree, tree&, tree&);
> extern tree resolve_constraint_check (tree);
> extern tree check_function_concept (tree);
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index d1316483e..9a80c44a5 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -4525,7 +4525,12 @@ struct ctp_hasher : ggc_ptr_hash
> val = iterative_hash_object (TEMPLATE_TYPE_LEVEL (t), val);
> val = iterative_hash_object (TEMPLATE_TYPE_IDX (t), val);
> if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
> - val = iterative_hash_template_arg (CLASS_PLACEHOLDER_TEMPLATE (t), val);
> + {
> + val
> + = iterative_hash_template_arg (CLASS_PLACEHOLDER_TEMPLATE (t), val);
> + if (tree c = NON_ERROR (PLACEHOLDER_TYPE_CONSTRAINTS (t)))
> + val = iterative_hash_placeholder_constraint(c, val);
> + }
> if (TREE_CODE (t) == BOUND_TEMPLATE_TEMPLATE_PARM)
> val = iterative_hash_template_arg (TYPE_TI_ARGS (t), val);
> --comparing_specializations;
> @@ -29605,7 +29610,7 @@ auto_hash::hash (tree t)
> if (tree c = NON_ERROR (PLACEHOLDER_TYPE_CONSTRAINTS (t)))
> /* Matching constrained-type-specifiers denote the same template
> parameter, so hash the constraint. */
> - return hash_placeholder_constraint (c);
> + return iterative_hash_placeholder_constraint (c, 0);
> else
> /* But unconstrained autos are all separate, so just hash the pointer. */
> return iterative_hash_object (t, 0);
> --
> 2.45.2
>
> On Tue, 9 Jul 2024 at 14:41, Seyed Sajad Kahani  wrote:
> >
> > Hi.
> >
> > While investigating a fix for C++/PR115030 (a bug in constrained auto
> > deduction), I was wondering why we are not substituting constraint args of 
> > an
> > auto node in tsubst (pt.cc:16533). Instead, this substitution is delayed 
> > until
> > do_auto_deduction (pt.cc), where we attempt to find the substituted args of 
> > the
> > enclosing scope. At that point, it becomes difficult to find partially
> > specialised args, as they are erased in the process. I propose modifying 
> > tsubst
> > to eagerly substitute the constraint args of an auto node.
> >
> > By making this change, we would not need to provide outer_targs for
> > do_auto_deduction in cases where tsubst has been called for the type, which
> > cove

Re: [PATCH v1] c++: Hash placeholder constraint in ctp_hasher

2024-07-12 Thread Patrick Palka
On Tue, 9 Jul 2024, Seyed Sajad Kahani wrote:

> This patch addresses a difference between the hash function and the equality
> function for canonical types of template parameters (ctp_hasher). The equality
> function uses comptypes (typeck.cc) (with COMPARE_STRUCTURAL) and checks
> constraint equality for two auto nodes (typeck.cc:1586), while the hash
> function ignores it (pt.cc:4528). This leads to hash collisions that can be
> avoided by using `hash_placeholder_constraint` (constraint.cc:1150).

This seems like a nice improvement, thanks!  It seems your mail client
removed leading whitespace from your inline patch, making it hard to
read or apply.  Could you resend as an attachment or e.g. via
'git send-email'?

> 
> * constraint.cc (hash_placeholder_constraint): Rename to
> iterative_hash_placeholder_constraint.
> (iterative_hash_placeholder_constraint): Rename from
> hash_placeholder_constraint and add the initial val argument.
> * cp-tree.h (hash_placeholder_constraint): Rename to
> iterative_hash_placeholder_constraint.
> (iterative_hash_placeholder_constraint): Renamed from
> hash_placeholder_constraint and add the initial val argument.
> * pt.cc (struct ctp_hasher): Updated to use
> iterative_hash_placeholder_constraint in the case of a valid placeholder
> constraint.
> (auto_hash::hash): Reflect the renaming of hash_placeholder_constraint to
> iterative_hash_placeholder_constraint.
> ---
> gcc/cp/constraint.cc | 4 ++--
> gcc/cp/cp-tree.h | 2 +-
> gcc/cp/pt.cc | 9 +++--
> 3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index ebf4255e5..78aacb77a 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -1971,13 +1971,13 @@ equivalent_placeholder_constraints (tree c1, tree c2)
> /* Return a hash value for the placeholder ATOMIC_CONSTR C. */
> hashval_t
> -hash_placeholder_constraint (tree c)
> +iterative_hash_placeholder_constraint (tree c, hashval_t val)
> {
> tree t, a;
> placeholder_extract_concept_and_args (c, t, a);
> /* Like hash_tmpl_and_args, but skip the first argument. */
> - hashval_t val = iterative_hash_object (DECL_UID (t), 0);
> + val = iterative_hash_object (DECL_UID (t), val);
> for (int i = TREE_VEC_LENGTH (a)-1; i > 0; --i)
> val = iterative_hash_template_arg (TREE_VEC_ELT (a, i), val);
> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
> index 4bb3e9c49..294e88f75 100644
> --- a/gcc/cp/cp-tree.h
> +++ b/gcc/cp/cp-tree.h
> @@ -8588,7 +8588,7 @@ extern tree_pair finish_type_constraints (tree,
> tree, tsubst_flags_t);
> extern tree build_constrained_parameter (tree, tree, tree = NULL_TREE);
> extern void placeholder_extract_concept_and_args (tree, tree&, tree&);
> extern bool equivalent_placeholder_constraints (tree, tree);
> -extern hashval_t hash_placeholder_constraint (tree);
> +extern hashval_t iterative_hash_placeholder_constraint (tree, hashval_t);
> extern bool deduce_constrained_parameter (tree, tree&, tree&);
> extern tree resolve_constraint_check (tree);
> extern tree check_function_concept (tree);
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index d1316483e..9a80c44a5 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -4525,7 +4525,12 @@ struct ctp_hasher : ggc_ptr_hash
> val = iterative_hash_object (TEMPLATE_TYPE_LEVEL (t), val);
> val = iterative_hash_object (TEMPLATE_TYPE_IDX (t), val);
> if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
> - val = iterative_hash_template_arg (CLASS_PLACEHOLDER_TEMPLATE (t), val);
> + {
> + val
> + = iterative_hash_template_arg (CLASS_PLACEHOLDER_TEMPLATE (t), val);
> + if (tree c = NON_ERROR (PLACEHOLDER_TYPE_CONSTRAINTS (t)))
> + val = iterative_hash_placeholder_constraint(c, val);
> + }
> if (TREE_CODE (t) == BOUND_TEMPLATE_TEMPLATE_PARM)
> val = iterative_hash_template_arg (TYPE_TI_ARGS (t), val);
> --comparing_specializations;
> @@ -29605,7 +29610,7 @@ auto_hash::hash (tree t)
> if (tree c = NON_ERROR (PLACEHOLDER_TYPE_CONSTRAINTS (t)))
> /* Matching constrained-type-specifiers denote the same template
> parameter, so hash the constraint. */
> - return hash_placeholder_constraint (c);
> + return iterative_hash_placeholder_constraint (c, 0);
> else
> /* But unconstrained autos are all separate, so just hash the pointer. */
> return iterative_hash_object (t, 0);
> -- 
> 2.45.2
> 
> On Tue, 9 Jul 2024 at 14:41, Seyed Sajad Kahani  wrote:
> >
> > Hi.
> >
> > While investigating a fix for C++/PR115030 (a bug in constrained auto
> > deduction), I was wondering why we are not substituting constraint args of 
> > an
> > auto node in tsubst (pt.cc:16533). Instead, this substitution is delayed 
> > until
> > do_auto_deduction (pt.cc), where we attempt to find the substituted args of 
> > the
> > enclosing scope. At that point, it becomes difficult to find partially
> > specialised args, as they are erased in the process. I propose modifying 
> > tsubst
> > to eagerly substitute the constraint args of an auto node.
> >
> > By making this change, we would not need t

Re: [RFC] c++: Eagerly substitute auto constraint args in tsubst [PR115030]

2024-07-12 Thread Patrick Palka
Interesting, thanks for the detailed write-up!

On Tue, 9 Jul 2024, Seyed Sajad Kahani wrote:

> Hi.
> 
> While investigating a fix for C++/PR115030 (a bug in constrained auto
> deduction), I was wondering why we are not substituting constraint args of an
> auto node in tsubst (pt.cc:16533). Instead, this substitution is delayed until
> do_auto_deduction (pt.cc), where we attempt to find the substituted args of 
> the
> enclosing scope. At that point, it becomes difficult to find partially
> specialised args, as they are erased in the process.

FWIW it should be much easier to obtain the partially specialised
tmpl/args of a class/variable specialization since GCC 14, by looking
at TI_PARTIAL_INFO of DECL_TEMPLATE_INFO (if it exists).

Maybe passing the TI_PARTIAL_INFO args to do_auto_deduction where
appropriate might fix this PR too?

> I propose modifying tsubst to eagerly substitute the constraint args of an 
> auto node.
> 
> By making this change, we would not need to provide outer_targs for
> do_auto_deduction in cases where tsubst has been called for the type, which
> covers most scenarios. However, we still need outer_targs for cases like:
> 
> template  auto V>
> struct S { };
> 
> Hence, outer_targs cannot be completely removed but will be set to TREE_NULL 
> in
> all calls except the one in convert_template_argument (pt.cc:8788).
> Additionally, the tmpl argument of do_auto_deduction, which helps to provide
> outer args in the scope, will no longer be necessary and can be safely
> removed.
> 
> Also, the trimming hack proposed in
> https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654724.html to address
> C++/PR114915 is no longer needed. We will add an assertion to ensure that the
> missing levels do not become negative.
> 
> Substituting constraint arguments earlier will slightly alter error messages
> (see testsuite/g++.dg/cpp2a/concepts-placeholder3.C as an example).
> 
> To summarise, I have made the following changes:
> - Modified tsubst to substitute the constraint args.
> - Fixed shallow checks for using the cache from TEMPLATE_TYPE_DESCENDANTS
> (pt.cc:16513).
> - Substituted the constraint args and the auto node itself, while retaining 
> all
> other parameters as is (pt.cc:16533).
> - Removed now unnecessary code that attempted to find outer scope template 
> info
> and args in various locations.
> - Updated the missing levels hack (pt.cc:31320) to work with the substituted
> constraints.
> - Used level instead of orig_level to find the missing levels.
> - Added an assertion for future safety.
> 
> Details of these changes can be found in the patch "[PATCH v1] c++: Eagerly
> substitute auto constraint args in tsubst [PR115030]" that will be replied to
> this thread.
> 
> This patch, in my opinion, improves code quality by removing an argument from
> do_auto_deduction and eliminating the out-of-scope task of finding outer_targs
> within do_auto_deduction. It simplifies the usage of do_auto_deduction and
> resolves C++/PR115030 without complex calculations for specialised args. It
> also enhances the consistency of tsubst behaviour by not leaving constraints
> un-substituted. However, these changes make args in constraints_satisfied_p
> (being called from do_auto_deduction) a bit misleading, as it will not carry
> the actual args of the constraint and can even be an empty vec.

Your approach indeed seems like a nice simplification, and while it
won't change the behavior of the vast majority of code, eager
substitution into constraints versus waiting until satisfaction
can cause us to reject otherwise valid code in some edge cases.
Consider

  template
  concept C = __is_same(T, int) || __is_same(T, U);

  template
  void f(T t) {
C auto x = t;
  }

  int main() {
f(0);
  }

With the eager approach we'd substitute T=int into 'typename T::type' during
instantiation of f(), which fails and causes the program to be ill-formed
since we're not in a SFINAE context.

With the non-eager approach we pass the original constraint C
and the full set of arguments to satisfaction, which instead substitutes and
evaluates each atom of the constraint in a SFINAE manner.  And since the first
term of the disjunction is satisfied, the entire constraint is satisfied, and
there's no error.

This is basically the motivation for deferring substitution into constraints,
most importantly the associated constraints of template declaration.
Admittedly it's probably less important to do this for constrained autos, but
IIUC we do so anyway for QoI/consistency.  Clang and MSVC reject the above
testcase FWIW, indicating they do substitute into constrained autos eagerly.
I'm not sure if the standard is completely clear about what to do here.

Jason would have more context, but IIUC we deliberately don't eagerly
substitute into constrained autos and accept the implementation
complexities that this approach entails.

> 
> I have added a testsuite for C++/PR115030, and as far as I have tested (only
> c++ dg.exp on

Re: Lower zeroing array assignment to memset for allocatable arrays

2024-07-12 Thread Harald Anlauf

Hi Prathamesh,

Am 12.07.24 um 15:31 schrieb Prathamesh Kulkarni:

It seems that component references are not currently handled even for static 
size arrays ?
For eg:
subroutine test_dt (dt, y)
implicit none
real :: y (10, 20, 30)
type t
   real :: x(10, 20, 30)
end type t
type(t) :: dt
y = 0
dt% x = 0
end subroutine

With trunk, it generates memset for 'y' but not for dt%x.
That happens because copyable_array_p returns false for dt%x,
because expr->ref->next is non NULL:

   /* First check it's an array.  */
   if (expr->rank < 1 || !expr->ref || expr->ref->next)
 return false;

and gfc_full_array_ref_p(expr) bails out if expr->ref->type != REF_ARRAY.


Indeed that check (as is) prevents the use of component refs.
(I just tried to modify the this part to cycle thru the refs,
but then I get regressions in the testsuite for some of the
coarray tests.  Furthermore, gfc_trans_zero_assign would
need further changes to handle even the constant shapes
from above.)


Looking thru git history, it seems both the checks were added in 18eaa2c0cd20 
to fix PR33370.
(Even after removing these checks, the previous patch bails out from 
gfc_trans_zero_assign because
GFC_DESCRIPTOR_TYPE_P (type) returns false for component ref and ends up 
returning NULL_TREE)
I am working on extending the patch to handle component refs for statically 
sized as well as allocatable arrays.

Since it looks like a bigger change and an extension to current functionality, 
will it be OK to commit the previous patch as-is (if it looks correct)
and address component refs in follow up one ?


I agree that it is reasonable to defer the handling of arrays as
components of derived types, and recommend to do the following:

- replace "&& gfc_is_simply_contiguous (expr, true, false))" in your
  last patch by "&& gfc_is_simply_contiguous (expr, false, false))",
  as that would also allow to treat

  z(:,::1,:) = 0

  as contiguous if z is allocatable or a contiguous pointer.

- open a PR in bugzilla to track the missed-optimization for
  the cases we discussed here, and link the discussion in the ML.

Your patch then will be OK for mainline.

Thanks,
Harald


Thanks,
Prathamesh


Thanks,
Harald


Bootstrapped+tested on aarch64-linux-gnu.
Does the attached patch look OK ?

Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh


Thanks,
Harald



Signed-off-by: Prathamesh Kulkarni 

Thanks,
Prathamesh








[committed][PR rtl-optimization/115876] Fix one of two ubsan reported issues in new ext-dce.cc code

2024-07-12 Thread Jeff Law


David Binderman did a bootstrap build with ubsan enabled which triggered 
a few errors in the new ext-dce.cc code.  This fixes the trivial case of 
shifting negative values.


Bootstrapped and regression tested on x86.

Pushing to the trunk.

Jeff
PR rtl-optimization/115876
* ext-dce.cc (carry_backpropagate): Make mask and mmask unsigned.

diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index adc9084df57..91789d283fc 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -374,13 +374,13 @@ binop_implies_op2_fully_live (rtx_code code)
exclusively pertain to the first operand.  */
 
 HOST_WIDE_INT
-carry_backpropagate (HOST_WIDE_INT mask, enum rtx_code code, rtx x)
+carry_backpropagate (unsigned HOST_WIDE_INT mask, enum rtx_code code, rtx x)
 {
   if (mask == 0)
 return 0;
 
   enum machine_mode mode = GET_MODE_INNER (GET_MODE (x));
-  HOST_WIDE_INT mmask = GET_MODE_MASK (mode);
+  unsigned HOST_WIDE_INT mmask = GET_MODE_MASK (mode);
   switch (code)
 {
 case PLUS:


[PATCH] c++: alias template with dependent attributes [PR115897]

2024-07-12 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/14?

-- >8 --

Here we're prematurely stripping the dependent alias template-id A to
its defining-type-id T when used as a template argument, which in turn
causes us to essentially ignore A's vector_size attribute.  It seems this
has always been an issue for class template-ids, but after r14-2170
variable template-ids are affected as well.

To fix this, it seems natural to mark alias templates that have a
dependent attribute as complex, alongside e.g. constrained alias
templates, which prevents us from looking through them prematurely.

PR c++/115897

gcc/cp/ChangeLog:

* pt.cc (complex_alias_template_p): Return true for an alias
template with attributes.
(get_underlying_template): Don't look through an alias template
with attributes.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alias-decl-77.C: New test.
---
 gcc/cp/pt.cc   | 10 +++
 gcc/testsuite/g++.dg/cpp0x/alias-decl-77.C | 32 ++
 2 files changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/alias-decl-77.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index e38e02488be..8239392923f 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -6628,6 +6628,11 @@ complex_alias_template_p (const_tree tmpl, tree 
*seen_out)
   if (get_constraints (tmpl))
 return true;
 
+  /* An alias with dependent type attributes is complex.  */
+  if (any_dependent_type_attributes_p (DECL_ATTRIBUTES
+  (DECL_TEMPLATE_RESULT (tmpl
+return true;
+
   if (!complex_alias_tmpl_info)
 complex_alias_tmpl_info = hash_map::create_ggc (13);
 
@@ -6780,6 +6785,11 @@ get_underlying_template (tree tmpl)
   if (!at_least_as_constrained (underlying, tmpl))
break;
 
+  /* If TMPL adds dependent attributes, it isn't equivalent.  */
+  if (any_dependent_type_attributes_p (DECL_ATTRIBUTES
+  (DECL_TEMPLATE_RESULT (tmpl
+   break;
+
   /* Alias is equivalent.  Strip it and repeat.  */
   tmpl = underlying;
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/alias-decl-77.C 
b/gcc/testsuite/g++.dg/cpp0x/alias-decl-77.C
new file mode 100644
index 000..d518c040a92
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/alias-decl-77.C
@@ -0,0 +1,32 @@
+// PR c++/115897
+// { dg-do compile { target c++11 } }
+
+template
+struct is_same { static constexpr bool value = __is_same(T, U); };
+
+#if __cpp_variable_templates
+template
+constexpr bool is_same_v = __is_same(T, U);
+#endif
+
+template
+using A [[gnu::vector_size(16)]] = T;
+
+template
+using B = T;
+
+template
+using C [[gnu::vector_size(16)]] = B;
+
+template
+void f() {
+  static_assert(!is_same>::value, "");
+  static_assert(!is_same>::value, "");
+
+#if __cpp_variable_templates
+  static_assert(!is_same_v>, "");
+  static_assert(!is_same_v>, "");
+#endif
+};
+
+template void f();
-- 
2.45.2.827.g557ae147e6



__builtin_inf (was Re: [PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins)

2024-07-12 Thread Vineet Gupta



On 7/9/24 21:36, Jeff Law wrote:
>> +;; TODO: isinf is a bit tricky as it require trimodal return
>> +;;  1 if 0x80, -1 if 0x1, 0 otherwise

[..]

>> +  rtx tmp2 = gen_reg_rtx (word_mode);
>> +  emit_insn (gen_ashldi3 (tmp2, rclass, GEN_INT (w)));
>> +  emit_insn (gen_lshrdi3 (tmp2, tmp2, GEN_INT (w)));
>>
>> Above doesn't work for the simple case where we have int return value of
>> the builtin.
> "doesn't work" does not give me enough information to guess what's 
> wrong.  

Sorry for being terse and tardy with that half arsed effort - I can
blame summer and rush to send the patch right before heading for
vacation :-)

> I do see that you're explicitly gen_ashldi3 and gen_lshrdi3, 
> that'll work for rv64, but won't work for rv32.
>
> If you're generating code without immediately  faulting, then I'd look 
> at the .expand dump to see if the resulting code makes sense.  If you're 
> faulting, it'd be helpful to know where/why.

Yeah because of the mixing SI and DI modes the following emit_move_insn
() was ICE'ing.

I did get the implementation working, posting below for posterity (and
some redemption that I did get it working as desired)

(define_expand "isinf2"
  [(set (match_operand  0 "register_operand" "=r")
    (match_operand:ANYF 1 "register_operand" " f"))]
  "TARGET_HARD_FLOAT"
{
  if (GET_MODE (operands[0]) != SImode
  && GET_MODE (operands[0]) != word_mode)
    FAIL;

  rtx rclass = gen_reg_rtx (word_mode);
  rtx tmp = gen_reg_rtx (word_mode);
  rtx t_op0 = operands[0];
  rtx label = gen_label_rtx ();
  rtx end_label = gen_label_rtx ();

  emit_insn (gen_fclass (rclass, operands[1]));

  /* positive infinity */
  riscv_emit_binary (AND, tmp, rclass, GEN_INT (0x80));
  riscv_expand_conditional_branch (label, NE, tmp, const0_rtx);

  /* negative infinity return -1 */
  HOST_WIDE_INT w = GET_MODE_BITSIZE(word_mode) - 1;

  riscv_emit_binary (AND, tmp, rclass, GEN_INT (0x1));
  riscv_emit_unary (NEG, tmp, tmp);
  riscv_emit_binary (ASHIFTRT, tmp, tmp, GEN_INT (w));

  if (TARGET_64BIT && GET_MODE (operands[0]) == SImode)
    {
  t_op0 = gen_reg_rtx (DImode);
  emit_insn (gen_extend_insn (t_op0, operands[0], DImode, SImode, 0));
    }

  emit_move_insn (t_op0, tmp);
  emit_jump_insn (gen_jump (end_label));
  emit_barrier ();

  /* positive infinity return 1 */
  emit_label (label);
  emit_move_insn (t_op0, GEN_INT (1));

  emit_label (end_label);
  if (TARGET_64BIT)
    {
  t_op0 = gen_lowpart (SImode, t_op0);
  SUBREG_PROMOTED_VAR_P (t_op0) = 1;
  SUBREG_PROMOTED_SET (t_op0, SRP_SIGNED);
  emit_move_insn (operands[0], t_op0);
    }
  DONE;
})

But this causes 1 additional failure in testsuite as tg-test.h ->
type-generic-1.[cC] expect a 1 for negative infinity.

Meaning we need to go back to bimodal semantics.

Thx,
-Vineet


[pushed] doc: remove @opindex for fconcepts-ts

2024-07-12 Thread Marek Polacek
Applying to trunk as obvious.

-- >8 --
We're getting complaints from the CI system about this removed option.
I suspect I should have removed the @opindex and @itemx for it.  This
patch does that.

gcc/ChangeLog:

* doc/invoke.texi: Remove @opindex and @itemx for -fconcepts-ts.
---
 gcc/doc/invoke.texi | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4850c7379bf..d10796cabd6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -3217,16 +3217,14 @@ exhaustion is signalled by throwing 
@code{std::bad_alloc}.  See also
 @samp{new (nothrow)}.
 
 @opindex fconcepts
-@opindex fconcepts-ts
 @item -fconcepts
-@itemx -fconcepts-ts
 Enable support for the C++ Concepts feature for constraining template
 arguments.  With @option{-std=c++20} and above, Concepts are part of
 the language standard, so @option{-fconcepts} defaults to on.
 
 Some constructs that were allowed by the earlier C++ Extensions for
 Concepts Technical Specification, ISO 19217 (2015), but didn't make it
-into the standard, can additionally be enabled by
+into the standard, could additionally be enabled by
 @option{-fconcepts-ts}.  The option @option{-fconcepts-ts} was deprecated
 in GCC 14 and removed in GCC 15; users are expected to convert their code
 to C++20 concepts.

base-commit: 08776bef53835ff6318ecfeade8f6c6896ffd81f
-- 
2.45.2



[PATCH] RISC-V: Fix testcase for vector .SAT_SUB in zip benchmark

2024-07-12 Thread Edwin Lu
The following testcase was not properly testing anything due to an
uninitialized variable. As a result, the loop was not iterating through
the testing data, but instead on undefined values which could cause an
unexpected abort.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h:
initialize variable

Signed-off-by: Edwin Lu 
---
 .../gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h   | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
index d238c6392de..309d63377d5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
@@ -9,6 +9,7 @@ main ()
 
   for (i = 0; i < sizeof (DATA) / sizeof (DATA[0]); i++)
 {
+  d = DATA[i];
   RUN_BINARY_VX (&d.x[N], d.b, N);
 
   for (k = 0; k < N; k++)
-- 
2.34.1



Re: [PATCH v1] RISC-V: Add testcases for vector .SAT_SUB in zip benchmark

2024-07-12 Thread Edwin Lu

Hi Pan,

This patch appears to be tripping up our postcommit for building linux 
with vector https://github.com/patrick-rivos/gcc-postcommit-ci/issues/1325.


FAIL: gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_zip-run.c 
execution test


Looking at the logs, the test fails due to the __builtin_abort call when
d.x[k] != d.expect[k].


diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
new file mode 100644
index 000..d238c6392de
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_vx.h
@@ -0,0 +1,22 @@
+#ifndef HAVE_DEFINED_VEC_SAT_BINARY_VX_H
+#define HAVE_DEFINED_VEC_SAT_BINARY_VX_H
+
+int
+main ()
+{
+  unsigned i, k;
+  T d;
+
+  for (i = 0; i < sizeof (DATA) / sizeof (DATA[0]); i++)
+{
+  RUN_BINARY_VX (&d.x[N], d.b, N);
+
+  for (k = 0; k < N; k++)
+   if (d.x[k] != d.expect[k])
+ __builtin_abort ();
+}
+
+  return 0;
+}
+
+#endif


I think you forgot to initialize d = data[i] in the loop?

Edwin



Re: [patch,avr] PR115830: Improve code by using more condition code

2024-07-12 Thread Georg-Johann Lay

Am 12.07.24 um 18:40 schrieb Jeff Law:


On 7/10/24 3:05 AM, Georg-Johann Lay wrote:


The previous change to avr.md was several days ago, and should not
interfere with this one.  Anyway, I rebased the patch against
master and attached it below.  The patch is atop the ref in the patch
file name : https://gcc.gnu.org/r15-1935
I'll throw this version in.  While avr isn't deeply tested additional 
testing never hurts.


It looks like avr exposes the CC register early, creating references 
to it during expansion to RTL.  Presumably this means you've got a 
reasonable way to reload values, particularly address arithmetic 
without impacting the CC state?


No. CC only comes into existence after reload in .split2 and in some
cases in .avr-ifelse.  reloads may clobber CC, so there is no way
to split cbranch insn prior to relaod.

Clearly I missed all those reload_completed checks.  Thanks for clarifying.

It looks like you're relying heavily on peep2 patterns.  Did you 
explore using cmpelim?


Some question I have:

- compare-elim.cc states that the comparisons must look like
   [(set (reg:CC) (compare:CC (reg) (reg_or_immediate)))]
which is not always the case, in particular the patterns
may have clobbers. compare-elim uses single_set, which
is true for the pattern with a clobber / scratch.  However,
presence / absence of a clobber reg has to be taken into
account when deciding whether a transformation into cc
other than CCmode is possible.  compare-elim only supplies
the SET_SRC, but not the scratch_operand to SELECT_CC_MODE.
We could probably fix that.  I don't think we've had a target where the 
clobbered object affects CC like this.  Could we perhaps pass in the 
full insn?  That would require fixing the various targets that already 
use the compare-elim infrastructure, but I think it'd be a pretty 
mechanical change.


So basically affect all targets.  I definitely cannot do that,
in particular the required testing.


- The internals says that some optimizations / transforms
will happen prior to reload (e.g. .combine).  This is not
possible since CCmode exists only since .split2.

- Insn combine may have transformed some comparisons, e.g.
sign test are represented as a zero_extract + skip like
in the sbrx_branch insns.  This means that compares might
have a non-canonical form at .cmpelim.  But a CCNmode
compare is better still than an sbrx_branch.  Moreover,
CANONICALIZE_COMPARISON also runs already at insn combine,
so that compares have a form not recognized by cmpelim.
There's probably going to be some cases we can't necessarily handle due 
to intricate target dependencies.  That's fine.  I'm more concerned 
about whether or not we're utilizing the existing infrastructure.  If we 
can use the generic infrastructure we should.  If we can use the generic 
infrastructure after making some improvements we should.  If the 
problems aren't tractable with the generic infrastructure, then a 
bespoke target solution may be the only path forward.


Well, what matters in the end is the quality of the generated code.
Which doesn't mean I think how the code is written does not matter.
After all, peep2 is also an existing infrastructure.  Yes, the code
is a bit verbose, but it gives a very well control over what is
happening, as it runs just a couple of passes after .split2.

Plus, there are cases where peephole 2 may provide a scratch, that's
something cmpelim can't do (to my knowledge).


- Comparisons may look like if (0  ref) which is
a form not supported by compare-elim.
That's not canonical RTL.  If the backend is generating that it probably 
needs to be fixed.  Various passes assume the constant will be in the 
other position.


It is generated deliberately during CANONICALIZE_COMPARISON.

Reason is that it is possible to swap comparison operands with
zero costs, but that does not apply to branch instructions:
Not all branches are supported natively, for example there is GE,
but GT has to be emulated by 2 instructions.  Which means
swapping conditions may add or remove costs.

This works well since beginning of time.  Maybe it could be supported
with cmpelim by doubling the number of CCmodes, so things are getting
complicated, or at least annoying.

Moreover, as far as I can tell, cmpelim cannot convert code like

cc = compare (a, b)
if (cc == 0) goto x
cc = compare (a, b)
if (cc > 0) goto y

to

cc = compare (a, b)
if (cc == 0) goto x
if (cc >= 0) goto y

notice the 2nd operator changed, which is valid because >= is in the
wake of ==.  cmpelim only provides local information and would come up
with

cc = compare (a, b)
if (cc == 0) goto x
if (cc > 0) goto y // suboptimal, not enough context known.

so cmpelim would require a device that tells which assertions hold
when a comparison is handled.  For example, in the 1st compare there
is no assertion, but in the 2nf we know that always cc != 0 due to the
branch.


So I guess mainly due to the cmpelim (non-)alternative?
Largely yes.  I'll try to take a look at this

Re: [PATCH] c++/modules: Propagate BINDING_VECTOR_*_DUPS_P on realloc [PR99242]

2024-07-12 Thread Patrick Palka
On Mon, 8 Jul 2024, Nathaniel Shead wrote:

> Bootstrapped and regtested (so far just modules.exp) on
> x86_64-pc-linux-gnu, OK for trunk/14 if full regtest succeeds?
> 
> -- >8 --
> 
> When importing modules, when a binding vector for a name runs out of
> slots it gets reallocated with a larger size, and existing bindings are
> copied across.  However, the flags to indicate whether deduping needs to
> occur did not: this causes ICEs, as it allows a duplicate binding to be
> added which then violates assumptions later on.

LGTM

> 
>   PR c++/99242
> 
> gcc/cp/ChangeLog:
> 
>   * name-lookup.cc (append_imported_binding_slot): Propagate dups
>   flags.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/modules/pr99242_a.H: New test.
>   * g++.dg/modules/pr99242_b.H: New test.
>   * g++.dg/modules/pr99242_c.H: New test.
>   * g++.dg/modules/pr99242_d.C: New test.
> 
> Signed-off-by: Nathaniel Shead 
> ---
>  gcc/cp/name-lookup.cc| 4 
>  gcc/testsuite/g++.dg/modules/pr99242_a.H | 3 +++
>  gcc/testsuite/g++.dg/modules/pr99242_b.H | 3 +++
>  gcc/testsuite/g++.dg/modules/pr99242_c.H | 3 +++
>  gcc/testsuite/g++.dg/modules/pr99242_d.C | 7 +++
>  5 files changed, 20 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/modules/pr99242_a.H
>  create mode 100644 gcc/testsuite/g++.dg/modules/pr99242_b.H
>  create mode 100644 gcc/testsuite/g++.dg/modules/pr99242_c.H
>  create mode 100644 gcc/testsuite/g++.dg/modules/pr99242_d.C
> 
> diff --git a/gcc/cp/name-lookup.cc b/gcc/cp/name-lookup.cc
> index b57893116eb..3f0b8a59ff4 100644
> --- a/gcc/cp/name-lookup.cc
> +++ b/gcc/cp/name-lookup.cc
> @@ -352,6 +352,10 @@ append_imported_binding_slot (tree *slot, tree name, 
> unsigned ix)
>  
>tree new_vec = make_binding_vec (name, want);
>BINDING_VECTOR_NUM_CLUSTERS (new_vec) = have + 1;
> +  BINDING_VECTOR_GLOBAL_DUPS_P (new_vec)
> + = BINDING_VECTOR_GLOBAL_DUPS_P (*slot);
> +  BINDING_VECTOR_PARTITION_DUPS_P (new_vec)
> + = BINDING_VECTOR_PARTITION_DUPS_P (*slot);
>memcpy (BINDING_VECTOR_CLUSTER_BASE (new_vec),
> BINDING_VECTOR_CLUSTER_BASE (*slot),
> have * sizeof (binding_cluster));
> diff --git a/gcc/testsuite/g++.dg/modules/pr99242_a.H 
> b/gcc/testsuite/g++.dg/modules/pr99242_a.H
> new file mode 100644
> index 000..2df0b4184ab
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/pr99242_a.H
> @@ -0,0 +1,3 @@
> +// { dg-additional-options "-fmodule-header" }
> +// { dg-module-cmi {} }
> +bool __is_constant_evaluated();
> diff --git a/gcc/testsuite/g++.dg/modules/pr99242_b.H 
> b/gcc/testsuite/g++.dg/modules/pr99242_b.H
> new file mode 100644
> index 000..2df0b4184ab
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/pr99242_b.H
> @@ -0,0 +1,3 @@
> +// { dg-additional-options "-fmodule-header" }
> +// { dg-module-cmi {} }
> +bool __is_constant_evaluated();
> diff --git a/gcc/testsuite/g++.dg/modules/pr99242_c.H 
> b/gcc/testsuite/g++.dg/modules/pr99242_c.H
> new file mode 100644
> index 000..2df0b4184ab
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/pr99242_c.H
> @@ -0,0 +1,3 @@
> +// { dg-additional-options "-fmodule-header" }
> +// { dg-module-cmi {} }
> +bool __is_constant_evaluated();
> diff --git a/gcc/testsuite/g++.dg/modules/pr99242_d.C 
> b/gcc/testsuite/g++.dg/modules/pr99242_d.C
> new file mode 100644
> index 000..1046d1af984
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/modules/pr99242_d.C
> @@ -0,0 +1,7 @@
> +// { dg-additional-options "-fmodules-ts" }
> +bool __is_constant_evaluated();
> +import "pr99242_a.H";
> +void f() { __is_constant_evaluated(); }
> +import "pr99242_b.H";
> +import "pr99242_c.H";
> +void g() { __is_constant_evaluated(); }
> -- 
> 2.43.2
> 
> 



Re: [PATCH] c++, coroutines, contracts: Handle coroutine and void functions [PR110871,PR110872,PR115434].

2024-07-12 Thread Iain Sandoe
HI Jason,

> On 9 Jul 2024, at 22:55, Jason Merrill  wrote:
> 
> On 7/9/24 11:52 AM, Iain Sandoe wrote:
>> Hi Folks
>>> On 8 Jul 2024, at 20:57, Jason Merrill  wrote:
>>> 
>>> On 7/8/24 3:37 PM, Iain Sandoe wrote:
> On 8 Jul 2024, at 20:19, Jason Merrill  wrote:
> 
> On 6/17/24 8:15 AM, Iain Sandoe wrote:
>>  potentially_transformed_function_body ();
>>}
>>   finally
>>{
>>  handle_post_condition_checking ();
>>}
>>   else [only if the function is not marked noexcept(true) ]
>>{
>>  __rethrow ();
> 
> This sounds undesirable, EH normally continues unwinding after running a 
> cleanup.  Why would the exception be considered caught in a way that 
> needs a rethrow?
 We need to implement that the no-EH cleanup [i.e. the postcondition] does 
 _not_ run when an exception is thrown.  Right now there’s no HANDLER type 
 that does that - we have EH_ONLY but no NON_EH one (otherwise I'd have 
 built a C++ try).
 however, I could not figure out how to make this arm of the EH_ELSE_EXPR 
 work as an empty construct (the gimplifier does not seem to expect this).
>>> 
>>> You can't just put void_node in the else arm?
>> I had tried a number of permutations (void_node, empty_stmt, empty statement 
>> list etc).  Unfortunately, at present that causes the EH_ELSE expression to 
>> be silently dropped in gimplication.  The fact that you think it should work 
>> (as did i) makes me think either we have a gimplifier bug or a 
>> misunderstanding:
>> @Alex;
>> we have (gcc/gimplify.cc:18423):
>>if (!gimple_seq_empty_p (n) && !gimple_seq_empty_p (e))
>>{
>> geh_else *stmt = gimple_build_eh_else (n, e);
>> gimple_seq_add_stmt (&cleanup, stmt);
>>   }
>> Which essentially says “if either of the sub-expressions to this are empty, 
>> then do not build it”
>> Was there a reason for this, or is it a typo?
>> If I replace ‘&&' by ‘||' (i.e. “if either of the sub-expressions is 
>> present, then build it” then things behave as I expect.
>> IMO it the current action _is_ intended
>>  (a) it should at least emit a checking diagnostic
>>  (b) we need to figure out how to extend it so that we can implement what’s 
>> needed (non-EH only cleanups)
>> So I patched the gimplifier as above and initial testing says that this 
>> change does not cause any C++ regressions - but I need to do Ada, 
>> Objective-C etc too.
>> thoughts?
> 
> That sounds good to me, it does seem like a typo.

Here is the revised patch - which implements the gimplifier change and thus 
simplifes the rest.

OK for trunk?
thanks
Iain

P.S. I did not figure out how to find the right message-id to reply to from git 
send-email, so attaching.



> 
> Jason
> 


0001-c-coroutines-contracts-Handle-coroutine-and-void-fun.patch
Description: Binary data


Re: [patch,avr] PR115830: Improve code by using more condition code

2024-07-12 Thread Jeff Law




On 7/10/24 3:05 AM, Georg-Johann Lay wrote:



The previous change to avr.md was several days ago, and should not
interfere with this one.  Anyway, I rebased the patch against
master and attached it below.  The patch is atop the ref in the patch
file name : https://gcc.gnu.org/r15-1935
I'll throw this version in.  While avr isn't deeply tested additional 
testing never hurts.




It looks like avr exposes the CC register early, creating references 
to it during expansion to RTL.  Presumably this means you've got a 
reasonable way to reload values, particularly address arithmetic 
without impacting the CC state?


No. CC only comes into existence after reload in .split2 and in some
cases in .avr-ifelse.  reloads may clobber CC, so there is no way
to split cbranch insn prior to relaod.

Clearly I missed all those reload_completed checks.  Thanks for clarifying.



It looks like you're relying heavily on peep2 patterns.  Did you 
explore using cmpelim?


Some question I have:

- compare-elim.cc states that the comparisons must look like
   [(set (reg:CC) (compare:CC (reg) (reg_or_immediate)))]
which is not always the case, in particular the patterns
may have clobbers. compare-elim uses single_set, which
is true for the pattern with a clobber / scratch.  However,
presence / absence of a clobber reg has to be taken into
account when deciding whether a transformation into cc
other than CCmode is possible.  compare-elim only supplies
the SET_SRC, but not the scratch_operand to SELECT_CC_MODE.
We could probably fix that.  I don't think we've had a target where the 
clobbered object affects CC like this.  Could we perhaps pass in the 
full insn?  That would require fixing the various targets that already 
use the compare-elim infrastructure, but I think it'd be a pretty 
mechanical change.






- The internals says that some optimizations / transforms
will happen prior to reload (e.g. .combine).  This is not
possible since CCmode exists only since .split2.

- Insn combine may have transformed some comparisons, e.g.
sign test are represented as a zero_extract + skip like
in the sbrx_branch insns.  This means that compares might
have a non-canonical form at .cmpelim.  But a CCNmode
compare is better still than an sbrx_branch.  Moreover,
CANONICALIZE_COMPARISON also runs already at insn combine,
so that compares have a form not recognized by cmpelim.
There's probably going to be some cases we can't necessarily handle due 
to intricate target dependencies.  That's fine.  I'm more concerned 
about whether or not we're utilizing the existing infrastructure.  If we 
can use the generic infrastructure we should.  If we can use the generic 
infrastructure after making some improvements we should.  If the 
problems aren't tractable with the generic infrastructure, then a 
bespoke target solution may be the only path forward.






- Comparisons may look like if (0  ref) which is
a form not supported by compare-elim.
That's not canonical RTL.  If the backend is generating that it probably 
needs to be fixed.  Various passes assume the constant will be in the 
other position.






Not really ready to ACK or NAK at this point.

jeff


So I guess mainly due to the cmpelim (non-)alternative?
Largely yes.  I'll try to take a look at this now that I've got more 
background info.  Thanks.


jeff



Re: [PATCH v2] Fix Xcode 16 build break with NULL != nullptr

2024-07-12 Thread Iain Sandoe



> On 11 Jul 2024, at 04:35, David Malcolm  wrote:
> 
> On Wed, 2024-07-10 at 09:43 +, Daniel Bertalan wrote:
>> As of Xcode 16 beta 2 with the macOS 15 SDK, each re-inclusion of the
>> stddef.h header causes the NULL macro in C++ to be re-defined to an
>> integral constant (__null). This makes the workaround in d59a576b8
>> ("Redefine NULL to nullptr") ineffective, as other headers that are
>> typically included after system.h (such as obstack.h) do include
>> stddef.h too.
> 
> Thanks for the patch; the analyzer parts look good to me.

I’ve now pushed this patch on Daniel’s behalf, I did test also on x86_64-linux.
thanks
Iain

> 
> Dave
> 
>> 
>> This can be seen by running the sample below through `clang++ -E`
>> 
>> #include 
>> #define NULL nullptr
>> #include 
>> NULL
>> 
>> The relevant libc++ change is here:
>> https://github.com/llvm/llvm-project/commit/2950283dddab03c183c1be2d7de9d4999cc86131
>> 
>> Filed as FB14261859 to Apple and added a comment about it on LLVM PR
>> 86843.
>> 
>> This fixes the cases in --enable-languages=c,c++,objc,obj-c++,rust
>> build
>> where NULL being an integral constant instead of a null pointer
>> literal
>> (therefore no longer implicitly converting to a pointer when used as
>> a
>> template function's argument) caused issues.
>> 
>> gcc/value-pointer-equiv.cc:65:43: error: no viable conversion
>> from `pair::type, typename
>> __unwrap_ref_decay::type>' to 'const pair'
>> 
>> 65 |   const std::pair  m_marker = std::make_pair
>> (NULL, NULL);
>>|  
>> ^~~
>> 
>> As noted in the previous commit though, the proper solution would be
>> to
>> phase out the usages of NULL in GCC's C++ source code.
>> 
>> gcc/analyzer/ChangeLog:
>> 
>> * diagnostic-manager.cc (saved_diagnostic::saved_diagnostic):
>> Change NULL to nullptr.
>> (struct null_assignment_sm_context): Likewise.
>> * infinite-loop.cc: Likewise.
>> * infinite-recursion.cc: Likewise.
>> * varargs.cc (va_list_state_machine::on_leak): Likewise.
>> 
>> gcc/rust/ChangeLog:
>> 
>> * metadata/rust-imports.cc
>> (Import::try_package_in_directory):
>> Change NULL to nullptr.
>> 
>> gcc/ChangeLog:
>> 
>> * value-pointer-equiv.cc: Change NULL to nullptr.
>> ---
>>  gcc/analyzer/diagnostic-manager.cc | 18 +-
>>  gcc/analyzer/infinite-loop.cc  |  2 +-
>>  gcc/analyzer/infinite-recursion.cc |  2 +-
>>  gcc/analyzer/varargs.cc|  2 +-
>>  gcc/rust/metadata/rust-imports.cc  |  2 +-
>>  gcc/value-pointer-equiv.cc |  2 +-
>>  6 files changed, 14 insertions(+), 14 deletions(-)
>> 
>> diff --git a/gcc/analyzer/diagnostic-manager.cc
>> b/gcc/analyzer/diagnostic-manager.cc
>> index fe943ac61c9e..51304b0795b6 100644
>> --- a/gcc/analyzer/diagnostic-manager.cc
>> +++ b/gcc/analyzer/diagnostic-manager.cc
>> @@ -679,12 +679,12 @@ saved_diagnostic::saved_diagnostic (const
>> state_machine *sm,
>>m_stmt (ploc.m_stmt),
>>/* stmt_finder could be on-stack; we want our own copy that can
>>   outlive that.  */
>> -  m_stmt_finder (ploc.m_finder ? ploc.m_finder->clone () : NULL),
>> +  m_stmt_finder (ploc.m_finder ? ploc.m_finder->clone () : nullptr),
>>m_loc (ploc.m_loc),
>>m_var (var), m_sval (sval), m_state (state),
>> -  m_d (std::move (d)), m_trailing_eedge (NULL),
>> +  m_d (std::move (d)), m_trailing_eedge (nullptr),
>>m_idx (idx),
>> -  m_best_epath (NULL), m_problem (NULL),
>> +  m_best_epath (nullptr), m_problem (nullptr),
>>m_notes ()
>>  {
>>/* We must have an enode in order to be able to look for paths
>> @@ -1800,10 +1800,10 @@ public:
>> stmt,
>> stack_depth,
>> sm,
>> -   NULL,
>> +   nullptr,
>> src_sm_val,
>> dst_sm_val,
>> -   NULL,
>> +   nullptr,
>> dst_state,
>> src_node));
>>  return false;
>> @@ -1993,9 +1993,9 @@ struct null_assignment_sm_context : public
>> sm_context
>> m_sm,
>> var_new_sval,
>> from, to,
>> -   NULL,
>> +   nullptr,
>> *m_new_state,
>> -   NULL));
>> +   nullptr));
>>}
>>  
>>void set_next_state (const gimple *stmt,
>> @@ -2019,9 +2019,9 @@ struct null_assignment_sm_context : public
>> sm_context
>>   

[PATCH] varasm: Add support for emitting binary data with the new gas .base64 directive

2024-07-12 Thread Jakub Jelinek
Hi!

Nick has implemented a new .base64 directive in gas (to be shipped in
the upcoming binutils 2.43; big thanks for that).
See https://sourceware.org/bugzilla/show_bug.cgi?id=31964

The following patch adjusts default_elf_asm_output_ascii (i.e.
ASM_OUTPUT_ASCII elfos.h implementation) to use it if it detects binary
data and gas supports it.

Without this patch, we emit stuff like:
.string "\177ELF\002\001\001\003"
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string "\002"
.string ">"
...
.string "\324\001\236 
0FS\202\002E\n0@\203\004\005&\202\021\337)\021\203C\020A\300\220I\004\t\b\206(\234\0132l\004b\300\bK\006\220$0\303\020P$\233\211\002D\f"
etc., with this patch more compact
.base64 
"f0VMRgIBAQMAAAIAPgABABf3AABAAACneB0AAEAAOAAOAEAALAArAAYEQABAAEAAAEAAQAAAEAMQAwgAAwQAAABQAwAAAFADQAAAUANcABwAAQABBABAAEAAADBwOQAAMHA5EAEFAIA5gHkA"
.base64 
"AACAeQAAxSSgAgDFJKACAAAQAQQAsNkCAACwGQMAALAZAwDMtc0AAMy1zQAAABABBgAAAGhmpwMAaHbnAwBoducDAOAMAQAA4MEeEAIGkH2nAwCQjecDAJCN5wMAQAIAAABAAggABAQAAABwAwAAAHADQAAAcANAAABAAEAACAAA"
.base64 
"AAAEBLADsANAAACwA0AAACAAIAAEAAcEaGanAwBoducDAGh25wMQAAgAU+V0ZAQAAABwAwAAAHADQAAAcANAAABAAEAACABQ5XRkBAw/WAMADD+YAwAMP5gDAPy7CgAA/LsKAAAEAFHldGQG"
.base64 
"ABAAUuV0ZAQAAABoZqcDAGh25wMAaHbnAwCYGQAAAJgZAQAvbGliNjQvbGQtbGludXgteDg2LTY0LnNvLjIAAAQwBQAAAEdOVQACgADABAEAAQABwAQJAAIAAcAEAwAEEAEAAABHTlUAAAMCAAOAAACsqAAAgS0AAOJWAAAjNwAAXjAAF1gAAHsxAABBBwAA"
.base64 
"G0kAALGmAACwoACQhOw1AACNYgAAAFQox3UAALZAiIUAALGeAABBlAAAWEsAAPmRAACmOgAAADh3lCBymgAAaosAAMIjAAAKMQAAMkIAADU05ZwAAFIdAAAIGQAAAMFbAAAoTQAAGDcAAIRgAAA6HgAAlxwAAADOlgAAAEhPAAARiwAAMGgAAOVtAADMFgCrjgAAYl4AACZVAAA/HgBqPwAA"
The patch attempts to juggle between readability and compactness, so
if it detects some hunk of the initializer that would be shorter to be
emitted as .string/.ascii directive, it does so, but if it previously
used .base64 directive it switches mode only if there is a 16+ char
ASCII-ish string.

On my #embed testcase from yesterday
unsigned char a[] = {
#embed "cc1plus"
};
without this patch it emits 2.4GB of assembly, while with this
patch 649M.
Compile times (trunk, so yes,rtl,extra checking) are:
time ./xgcc -B ./ -S -std=c23 -O2 embed-11.c

  


  
real0m13.647s   

  
user0m7.157s

  
sys 0m2.597s

  
time ./xgcc -B ./ -c -std=c23 -O2 embed-11.c

  


  
real0m28.649s   

  
user0m26.653s   

  
sys 0m1.958s

  
without the patch and
time ./xgcc

Re: [PATCH 2/2] rtl-ssa: Fix prev_any_insn [PR115785]

2024-07-12 Thread Jeff Law




On 7/9/24 9:28 AM, Richard Sandiford wrote:

Bit of a brown paper bag issue, but: due to the representation
of the insn chain, insn_info::prev_any_insn would sometimes skip
over instructions.  This led to an invalid update in the PR when
adding and removing instructions.

I think one of the reasons I failed to spot this when checking
the code is that m_prev_insn_or_last_debug_insn is misnamed:
it's the previous instruction *of the same type* or the last
debug instruction in a group.  The patch therefore renames it to
m_prev_sametype_or_last_debug_insn (with the term prev_sametype
already being used in some accessors).

The reason this didn't show up earlier is that (a) prev_any_insn
is rarely used directly, (b) no instructions were lost from the
def-use chains, and (c) only consecutive debug instructions were
skipped when walking the insn chain.

The chaining scheme makes prev_any_insn more complicated than
next_any_insn, prev_nondebug_insn and next_nondebug_insn, but the
object code produced is still relatively simple.

Bootstrapped & regression-tested on aarch64-linux-gnu and
x86_64-linux-gnu.  OK to install?

Richard


gcc/
PR rtl-optimization/115785
* rtl-ssa/insns.h (insn_info::prev_insn_or_last_debug_insn)
(insn_info::next_nondebug_or_debug_insn): Remove typedefs.
(insn_info::m_prev_insn_or_last_debug_insn): Rename to...
(insn_info::m_prev_sametype_or_last_debug_insn): ...this.
* rtl-ssa/internals.inl (insn_info::insn_info): Update after
above renaming.
(insn_info::copy_prev_from): Likewise.
(insn_info::set_prev_sametype_insn): Likewise.
(insn_info::set_last_debug_insn): Likewise.
(insn_info::clear_insn_links): Likewise.
(insn_info::has_insn_links): Likewise.
* rtl-ssa/member-fns.inl (insn_info::prev_nondebug_insn): Likewise.
(insn_info::prev_any_insn): Fix moves from non-debug to debug insns.

gcc/testsuite/
PR rtl-optimization/115785
* g++.dg/torture/pr115785.C: New test.

OK.

Jeff



Re: [PATCH] RISC-V: Implement locality for __builtin_prefetch

2024-07-12 Thread Jeff Law




On 7/12/24 5:35 AM, Monk Chiang wrote:

The patch add the Zihintntl instructions in the prefetch pattern.
Zicbop has prefetch instructions. Zihintntl has NTL instructions.
Insert NTL instructions before prefetch instruction, if target
has Zihintntl extension.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Add 'L' letter
  to print zihintntl instructions string.
* config/riscv/riscv.md (prefetch): Add zihintntl instructions.

 gcc/testsuite/ChangeLog:

* gcc.target/riscv/prefetch-zicbop.c: New test.
* gcc.target/riscv/prefetch-zihintntl.c: New test.
This is fine for the trunk.  It looks like you have commit access, so go 
ahead and commit when it's convenient for you.


Thanks for your patience,
jeff



[PATCH v2 8/8] OpenMP: update documentation for dispatch and adjust_args

2024-07-12 Thread Paul-Antoine Arras
libgomp/ChangeLog:

* libgomp.texi:
---
 libgomp/libgomp.texi | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index 50da248b74d..a2f5897463a 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -294,8 +294,8 @@ The OpenMP 4.5 specification is fully supported.
 @item C/C++'s @code{declare variant} directive: elision support of
   preprocessed code @tab N @tab
 @item @code{declare variant}: new clauses @code{adjust_args} and
-  @code{append_args} @tab N @tab
-@item @code{dispatch} construct @tab N @tab
+  @code{append_args} @tab P @tab Only @code{adjust_args}
+@item @code{dispatch} construct @tab Y @tab
 @item device-specific ICV settings with environment variables @tab Y @tab
 @item @code{assume} and @code{assumes} directives @tab Y @tab
 @item @code{nothing} directive @tab Y @tab
-- 
2.45.2



[PATCH v2 7/8] OpenMP: Fortran front-end support for dispatch + adjust_args

2024-07-12 Thread Paul-Antoine Arras
This patch adds support for the `dispatch` construct and the `adjust_args`
clause to the Fortran front-end.

Handling of `adjust_args` across translation units is missing due to PR115271.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (show_omp_clauses): Handle novariants and nocontext
clauses.
(show_omp_node): Handle EXEC_OMP_DISPATCH.
(show_code_node): Likewise.
* frontend-passes.cc (gfc_code_walker): Handle novariants and nocontext.
* gfortran.h (enum gfc_statement): Add ST_OMP_DISPATCH.
(symbol_attribute): Add omp_declare_variant_need_device_ptr.
(gfc_omp_clauses): Add novariants and nocontext.
(gfc_omp_declare_variant): Add need_device_ptr_arg_list.
(enum gfc_exec_op): Add EXEC_OMP_DISPATCH.
* match.h (gfc_match_omp_dispatch): Declare.
* openmp.cc (gfc_free_omp_clauses): Free novariants and nocontext
clauses.
(gfc_free_omp_declare_variant_list): Free need_device_ptr_arg_list
namelist.
(enum omp_mask2): Add OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT.
(gfc_match_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(OMP_DISPATCH_CLAUSES): Define.
(gfc_match_omp_dispatch): New function.
(gfc_match_omp_declare_variant): Parse adjust_args.
(resolve_omp_clauses): Handle adjust_args, novariants and nocontext.
Adjust handling of OMP_LIST_IS_DEVICE_PTR.
(icode_code_error_callback): Handle EXEC_OMP_DISPATCH.
(omp_code_to_statement): Likewise.
(resolve_omp_dispatch): New function.
(gfc_resolve_omp_directive): Handle EXEC_OMP_DISPATCH.
* parse.cc (decode_omp_directive): Match dispatch.
(next_statement): Handle ST_OMP_DISPATCH.
(gfc_ascii_statement): Likewise.
(parse_omp_dispatch): New function.
(parse_executable): Handle ST_OMP_DISPATCH.
* resolve.cc (gfc_resolve_blocks): Handle EXEC_OMP_DISPATCH.
* st.cc (gfc_free_statement): Likewise.
* trans-decl.cc (create_function_arglist): Declare.
(gfc_get_extern_function_decl): Call it.
* trans-openmp.cc (gfc_trans_omp_clauses): Handle novariants and
nocontext.
(gfc_trans_omp_dispatch): New function.
(gfc_trans_omp_directive): Handle EXEC_OMP_DISPATCH.
(gfc_trans_omp_declare_variant): Handle adjust_args.
* trans.cc (trans_code): Handle EXEC_OMP_DISPATCH:.
* types.def (BT_FN_PTR_CONST_PTR_INT): Declare.

gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/declare-variant-2.f90: Update dg-error.
* gfortran.dg/gomp/declare-variant-21.f90: New test (xfail).
* gfortran.dg/gomp/declare-variant-21-aux.f90: New test.
* gfortran.dg/gomp/adjust-args-1.f90: New test.
* gfortran.dg/gomp/adjust-args-2.f90: New test.
* gfortran.dg/gomp/adjust-args-3.f90: New test.
* gfortran.dg/gomp/adjust-args-4.f90: New test.
* gfortran.dg/gomp/adjust-args-5.f90: New test.
* gfortran.dg/gomp/dispatch-1.f90: New test.
* gfortran.dg/gomp/dispatch-2.f90: New test.
* gfortran.dg/gomp/dispatch-3.f90: New test.
* gfortran.dg/gomp/dispatch-4.f90: New test.
* gfortran.dg/gomp/dispatch-5.f90: New test.
* gfortran.dg/gomp/dispatch-6.f90: New test.
* gfortran.dg/gomp/dispatch-7.f90: New test.
* gfortran.dg/gomp/dispatch-8.f90: New test.
---
 gcc/fortran/dump-parse-tree.cc|  17 ++
 gcc/fortran/frontend-passes.cc|   2 +
 gcc/fortran/gfortran.h|  11 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.cc | 201 --
 gcc/fortran/parse.cc  |  39 +++-
 gcc/fortran/resolve.cc|   2 +
 gcc/fortran/st.cc |   1 +
 gcc/fortran/trans-decl.cc |   9 +-
 gcc/fortran/trans-openmp.cc   | 161 ++
 gcc/fortran/trans.cc  |   1 +
 gcc/fortran/types.def |   1 +
 .../gfortran.dg/gomp/adjust-args-1.f90|  63 ++
 .../gfortran.dg/gomp/adjust-args-2.f90|  18 ++
 .../gfortran.dg/gomp/adjust-args-3.f90|  26 +++
 .../gfortran.dg/gomp/adjust-args-4.f90|  58 +
 .../gfortran.dg/gomp/adjust-args-5.f90|  58 +
 .../gfortran.dg/gomp/declare-variant-2.f90|   6 +-
 .../gomp/declare-variant-21-aux.f90   |  18 ++
 .../gfortran.dg/gomp/declare-variant-21.f90   |  28 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-1.f90 |  77 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-2.f90 |  79 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-3.f90 |  39 
 gcc/testsuite/gfortran.dg/gomp/dispatch-4.f90 |  19 ++
 gcc/testsuite/gfortran.dg/gomp/dispatch-5.f90 |  24 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-6.f90 |  38 
 gcc/testsuite/gfor

[PATCH v2 4/8] OpenMP: C front-end support for dispatch + adjust_args

2024-07-12 Thread Paul-Antoine Arras
This patch adds support to the C front-end to parse the `dispatch` construct and
the `adjust_args` clause. It also includes some common C/C++ bits for pragmas
and attributes.

Additional common C/C++ testcases are in a later patch in the series.

gcc/c-family/ChangeLog:

* c-attribs.cc (c_common_gnu_attributes): Add attribute for adjust_args
need_device_ptr.
* c-omp.cc (c_omp_directives): Uncomment dispatch.
* c-pragma.cc (omp_pragmas): Add dispatch.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_DISPATCH.
(enum pragma_omp_clause): Add PRAGMA_OMP_CLAUSE_NOCONTEXT and
PRAGMA_OMP_CLAUSE_NOVARIANTS.

gcc/c/ChangeLog:

* c-parser.cc (c_parser_omp_dispatch): New function.
(c_parser_omp_clause_name): Handle nocontext and novariants clauses.
(c_parser_omp_clause_novariants): New function.
(c_parser_omp_clause_nocontext): Likewise.
(c_parser_omp_all_clauses): Handle nocontext and novariants clauses.
(c_parser_omp_dispatch_body): New function adapted from
c_parser_expr_no_commas.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(c_parser_omp_dispatch): New function.
(c_finish_omp_declare_variant): Parse adjust_args.
(c_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
* c-typeck.cc (c_finish_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/adjust-args-1.c: New test.
* gcc.dg/gomp/dispatch-1.c: New test.
---
 gcc/c-family/c-attribs.cc |   2 +
 gcc/c-family/c-omp.cc |   4 +-
 gcc/c-family/c-pragma.cc  |   1 +
 gcc/c-family/c-pragma.h   |   3 +
 gcc/c/c-parser.cc | 496 +++---
 gcc/c/c-typeck.cc |   2 +
 gcc/testsuite/gcc.dg/gomp/adjust-args-1.c |  32 ++
 gcc/testsuite/gcc.dg/gomp/dispatch-1.c|  53 +++
 libgomp/testsuite/libgomp.c/dispatch-1.c  |  76 
 9 files changed, 609 insertions(+), 60 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/gomp/adjust-args-1.c
 create mode 100644 gcc/testsuite/gcc.dg/gomp/dispatch-1.c
 create mode 100644 libgomp/testsuite/libgomp.c/dispatch-1.c

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index f9b229aba7f..1cb49d7b911 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -560,6 +560,8 @@ const struct attribute_spec c_common_gnu_attributes[] =
  handle_omp_declare_variant_attribute, NULL },
   { "omp declare variant variant", 0, -1, true,  false, false, false,
  handle_omp_declare_variant_attribute, NULL },
+  { "omp declare variant adjust_args need_device_ptr", 0, -1, true,  false, 
false, false,
+ handle_omp_declare_variant_attribute, NULL },
   { "simd",  0, 1, true,  false, false, false,
  handle_simd_attribute, NULL },
   { "omp declare target", 0, -1, true, false, false, false,
diff --git a/gcc/c-family/c-omp.cc b/gcc/c-family/c-omp.cc
index b5ce1466e5d..c74a9fb2691 100644
--- a/gcc/c-family/c-omp.cc
+++ b/gcc/c-family/c-omp.cc
@@ -4299,8 +4299,8 @@ const struct c_omp_directive c_omp_directives[] = {
 C_OMP_DIR_DECLARATIVE, false },
   { "depobj", nullptr, nullptr, PRAGMA_OMP_DEPOBJ,
 C_OMP_DIR_STANDALONE, false },
-  /* { "dispatch", nullptr, nullptr, PRAGMA_OMP_DISPATCH,
-C_OMP_DIR_CONSTRUCT, false },  */
+  { "dispatch", nullptr, nullptr, PRAGMA_OMP_DISPATCH,
+C_OMP_DIR_DECLARATIVE, false },
   { "distribute", nullptr, nullptr, PRAGMA_OMP_DISTRIBUTE,
 C_OMP_DIR_CONSTRUCT, true },
   { "end", "assumes", nullptr, PRAGMA_OMP_END,
diff --git a/gcc/c-family/c-pragma.cc b/gcc/c-family/c-pragma.cc
index 25251c2b69f..b956819c0a5 100644
--- a/gcc/c-family/c-pragma.cc
+++ b/gcc/c-family/c-pragma.cc
@@ -1526,6 +1526,7 @@ static const struct omp_pragma_def omp_pragmas[] = {
   { "cancellation", PRAGMA_OMP_CANCELLATION_POINT },
   { "critical", PRAGMA_OMP_CRITICAL },
   { "depobj", PRAGMA_OMP_DEPOBJ },
+  { "dispatch", PRAGMA_OMP_DISPATCH },
   { "error", PRAGMA_OMP_ERROR },
   { "end", PRAGMA_OMP_END },
   { "flush", PRAGMA_OMP_FLUSH },
diff --git a/gcc/c-family/c-pragma.h b/gcc/c-family/c-pragma.h
index 2ebde06c471..6b6826b2426 100644
--- a/gcc/c-family/c-pragma.h
+++ b/gcc/c-family/c-pragma.h
@@ -55,6 +55,7 @@ enum pragma_kind {
   PRAGMA_OMP_CRITICAL,
   PRAGMA_OMP_DECLARE,
   PRAGMA_OMP_DEPOBJ,
+  PRAGMA_OMP_DISPATCH,
   PRAGMA_OMP_DISTRIBUTE,
   PRAGMA_OMP_ERROR,
   PRAGMA_OMP_END,
@@ -135,9 +136,11 @@ enum pragma_omp_clause {
   PRAGMA_OMP_CLAUSE_LINK,
   PRAGMA_OMP_CLAUSE_MAP,
   PRAGMA_OMP_CLAUSE_MERGEABLE,
+  PRAGMA_OMP_CLAUSE_NOCONTEXT,
   PRAGMA_OMP_CLAUSE_NOGROUP,
   PRAGMA_OMP_CLAUSE_NONTEMPORAL,
   PRAGMA_OMP_CLAUSE_NOTINBRANCH,
+  PRAGMA_OMP_CLAUSE_NOVARIANTS,
   PRAGMA_OMP_CLAUSE_NOWAIT,
   PRAGMA_OMP_CLAUSE_NUM_TASKS,

[PATCH v2 5/8] OpenMP: C++ front-end support for dispatch + adjust_args

2024-07-12 Thread Paul-Antoine Arras
This patch adds C++ support for the `dispatch` construct and the `adjust_args`
clause. It relies on the c-family bits comprised in the corresponding C front
end patch for pragmas and attributes.

Additional C/C++ common testcases are provided in a subsequent patch in the
series.

gcc/cp/ChangeLog:

* decl.cc (omp_declare_variant_finalize_one): Set adjust_args
need_device_ptr attribute.
* parser.cc (cp_parser_direct_declarator): Update call to
cp_parser_late_return_type_opt.
(cp_parser_late_return_type_opt): Add parameter. Update call to
cp_parser_late_parsing_omp_declare_simd.
(cp_parser_omp_clause_name): Handle nocontext and novariants clauses.
(cp_parser_omp_clause_novariants): New function.
(cp_parser_omp_clause_nocontext): Likewise.
(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_NOVARIANTS and
PRAGMA_OMP_CLAUSE_NOCONTEXT.
(cp_parser_omp_dispatch_body): New function, inspired from
cp_parser_assignment_expression and cp_parser_postfix_expression.
(OMP_DISPATCH_CLAUSE_MASK): Define.
(cp_parser_omp_dispatch): New function.
(cp_finish_omp_declare_variant): Add parameter. Handle adjust_args
clause.
(cp_parser_late_parsing_omp_declare_simd): Add parameter. Update calls
to cp_finish_omp_declare_variant and cp_finish_omp_declare_variant.
(cp_parser_omp_construct): Handle PRAGMA_OMP_DISPATCH.
(cp_parser_pragma): Likewise.
* pt.cc (tsubst_attribute): Skip pseudo-TSS need_device_ptr.
* semantics.cc (finish_omp_clauses): Handle OMP_CLAUSE_NOCONTEXT and
OMP_CLAUSE_NOVARIANTS.

gcc/testsuite/ChangeLog:

* g++.dg/gomp/adjust-args-1.C: New test.
* g++.dg/gomp/adjust-args-2.C: New test.
* g++.dg/gomp/dispatch-1.C: New test.
* g++.dg/gomp/dispatch-2.C: New test.
---
 gcc/cp/decl.cc|  33 ++
 gcc/cp/parser.cc  | 612 --
 gcc/cp/pt.cc  |   3 +
 gcc/cp/semantics.cc   |  20 +
 gcc/testsuite/g++.dg/gomp/adjust-args-1.C |  39 ++
 gcc/testsuite/g++.dg/gomp/adjust-args-2.C |  51 ++
 gcc/testsuite/g++.dg/gomp/dispatch-1.C|  53 ++
 gcc/testsuite/g++.dg/gomp/dispatch-2.C|  62 +++
 8 files changed, 828 insertions(+), 45 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/gomp/adjust-args-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/adjust-args-2.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/dispatch-1.C
 create mode 100644 gcc/testsuite/g++.dg/gomp/dispatch-2.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index edf4c155bf7..fae37d508b0 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -8375,6 +8375,39 @@ omp_declare_variant_finalize_one (tree decl, tree attr)
  if (!omp_context_selector_matches (ctx))
return true;
  TREE_PURPOSE (TREE_VALUE (attr)) = variant;
+
+ for (tree a = ctx; a != NULL_TREE; a = TREE_CHAIN (a))
+   {
+ if (OMP_TSS_CODE (a) == OMP_TRAIT_SET_NEED_DEVICE_PTR)
+   {
+ for (tree need_device_ptr_list = TREE_VALUE (a);
+  need_device_ptr_list != NULL_TREE;
+  need_device_ptr_list = TREE_CHAIN (need_device_ptr_list))
+   {
+ tree parm_decl = TREE_VALUE (need_device_ptr_list);
+ bool found_arg = false;
+ for (tree arg = DECL_ARGUMENTS (variant); arg != NULL;
+  arg = TREE_CHAIN (arg))
+   if (DECL_NAME (arg) == DECL_NAME (parm_decl))
+ {
+   DECL_ATTRIBUTES (arg)
+ = tree_cons (get_identifier (
+"omp declare variant adjust_args "
+"need_device_ptr"),
+  NULL_TREE, DECL_ATTRIBUTES (arg));
+   found_arg = true;
+   break;
+ }
+ if (!found_arg)
+   {
+ error_at (varid_loc,
+   "variant %qD does not have a parameter %qD",
+   variant, parm_decl);
+ return true;
+   }
+   }
+   }
+   }
}
 }
   else if (!processing_template_decl)
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 6bf3f52a059..b85c9c387fb 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -19,6 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 .  */
 
 #include "config.h"
+#include "omp-selectors.h"
 #define INCLUDE_MEMORY
 #include "system.h"
 #include "coretypes.h"
@@ -2587,7 +2588,7 @@ static cp_ref_qualifier cp_parser_

[PATCH v2 2/8] OpenMP: dispatch + adjust_args tree data structures and front-end interfaces

2024-07-12 Thread Paul-Antoine Arras
This patch introduces the OMP_DISPATCH tree node, as well as two new clauses
`nocontext` and `novariants`. It defines/exposes interfaces that will be
used in subsequent patches that add front-end and middle-end support, but
nothing generates these nodes yet.

It also adds support for new OpenMP context selectors: `dispatch` as trait
selector and `need_device_ptr` as pseudo-trait set selector. The purpose of the
latter is for the C++ front-end to store the list of arguments (that need to be
converted to device pointers) until the declaration of the variant function
becomes available.

gcc/ChangeLog:

* builtin-types.def (BT_FN_PTR_CONST_PTR_INT): New.
* omp-selectors.h (enum omp_tss_code): Add
OMP_TRAIT_SET_NEED_DEVICE_PTR.
(enum omp_ts_code): Add OMP_TRAIT_CONSTRUCT_DISPATCH.
* tree-core.h (enum omp_clause_code): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
* tree-pretty-print.cc (dump_omp_clause): Handle OMP_CLAUSE_NOVARIANTS
and OMP_CLAUSE_NOCONTEXT.
(dump_generic_node): Handle OMP_DISPATCH.
* tree.cc (omp_clause_num_ops): Add OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_clause_code_name): Add "novariants" and "nocontext".
* tree.def (OMP_DISPATCH): New.
* tree.h (OMP_DISPATCH_BODY): New macro.
(OMP_DISPATCH_CLAUSES): New macro.
(OMP_CLAUSE_NOVARIANTS_EXPR): New macro.
(OMP_CLAUSE_NOCONTEXT_EXPR): New macro.
---
 gcc/builtin-types.def|  1 +
 gcc/omp-selectors.h  |  3 +++
 gcc/tree-core.h  |  7 +++
 gcc/tree-pretty-print.cc | 21 +
 gcc/tree.cc  |  4 
 gcc/tree.def |  5 +
 gcc/tree.h   |  7 +++
 7 files changed, 48 insertions(+)

diff --git a/gcc/builtin-types.def b/gcc/builtin-types.def
index c97d6bad1de..ef7aaf67d13 100644
--- a/gcc/builtin-types.def
+++ b/gcc/builtin-types.def
@@ -677,6 +677,7 @@ DEF_FUNCTION_TYPE_2 (BT_FN_INT_FEXCEPT_T_PTR_INT, BT_INT, 
BT_FEXCEPT_T_PTR,
 DEF_FUNCTION_TYPE_2 (BT_FN_INT_CONST_FEXCEPT_T_PTR_INT, BT_INT,
 BT_CONST_FEXCEPT_T_PTR, BT_INT)
 DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_UINT8, BT_PTR, BT_CONST_PTR, BT_UINT8)
+DEF_FUNCTION_TYPE_2 (BT_FN_PTR_CONST_PTR_INT, BT_PTR, BT_CONST_PTR, BT_INT)
 
 DEF_POINTER_TYPE (BT_PTR_FN_VOID_PTR_PTR, BT_FN_VOID_PTR_PTR)
 
diff --git a/gcc/omp-selectors.h b/gcc/omp-selectors.h
index c61808ec0ad..12bc9e9afa0 100644
--- a/gcc/omp-selectors.h
+++ b/gcc/omp-selectors.h
@@ -31,6 +31,8 @@ enum omp_tss_code {
   OMP_TRAIT_SET_TARGET_DEVICE,
   OMP_TRAIT_SET_IMPLEMENTATION,
   OMP_TRAIT_SET_USER,
+  OMP_TRAIT_SET_NEED_DEVICE_PTR, // pseudo-set selector used to convey argument
+// list until variant has a decl
   OMP_TRAIT_SET_LAST,
   OMP_TRAIT_SET_INVALID = -1
 };
@@ -55,6 +57,7 @@ enum omp_ts_code {
   OMP_TRAIT_CONSTRUCT_PARALLEL,
   OMP_TRAIT_CONSTRUCT_FOR,
   OMP_TRAIT_CONSTRUCT_SIMD,
+  OMP_TRAIT_CONSTRUCT_DISPATCH,
   OMP_TRAIT_LAST,
   OMP_TRAIT_INVALID = -1
 };
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index 27c569c7702..508f5c580d4 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -542,6 +542,13 @@ enum omp_clause_code {
 
   /* OpenACC clause: nohost.  */
   OMP_CLAUSE_NOHOST,
+
+  /* OpenMP clause: novariants (scalar-expression).  */
+  OMP_CLAUSE_NOVARIANTS,
+
+  /* OpenMP clause: nocontext (scalar-expression).  */
+  OMP_CLAUSE_NOCONTEXT,
+
 };
 
 #undef DEFTREESTRUCT
diff --git a/gcc/tree-pretty-print.cc b/gcc/tree-pretty-print.cc
index 4bb946bb0e8..752a402e0d0 100644
--- a/gcc/tree-pretty-print.cc
+++ b/gcc/tree-pretty-print.cc
@@ -506,6 +506,22 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, 
dump_flags_t flags)
 case OMP_CLAUSE_EXCLUSIVE:
   name = "exclusive";
   goto print_remap;
+case OMP_CLAUSE_NOVARIANTS:
+  pp_string (pp, "novariants");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOVARIANTS_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOVARIANTS_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
+case OMP_CLAUSE_NOCONTEXT:
+  pp_string (pp, "nocontext");
+  pp_left_paren (pp);
+  gcc_assert (OMP_CLAUSE_NOCONTEXT_EXPR (clause));
+  dump_generic_node (pp, OMP_CLAUSE_NOCONTEXT_EXPR (clause), spc, flags,
+false);
+  pp_right_paren (pp);
+  break;
 case OMP_CLAUSE__LOOPTEMP_:
   name = "_looptemp_";
   goto print_remap;
@@ -3947,6 +3963,11 @@ dump_generic_node (pretty_printer *pp, tree node, int 
spc, dump_flags_t flags,
   dump_omp_clauses (pp, OMP_SECTIONS_CLAUSES (node), spc, flags);
   goto dump_omp_body;
 
+case OMP_DISPATCH:
+  pp_string (pp, "#pragma omp dispatch");
+  dump_omp_clauses (pp, OMP_DISPATCH_CLAUSES (node), spc, flags);
+  goto dump_omp_body;
+
 case OMP_SECTION:
   pp_string (pp, "#pragma omp section");
   

[PATCH v2 3/8] OpenMP: middle-end support for dispatch + adjust_args

2024-07-12 Thread Paul-Antoine Arras
This patch adds middle-end support for the `dispatch` construct and the
`adjust_args` clause. The heavy lifting is done in `gimplify_omp_dispatch` and
`gimplify_call_expr` respectively. For `adjust_args`, this mostly consists in
emitting a call to `gomp_get_mapped_ptr` for the adequate device.

For dispatch, the following steps are performed:

* Handle the device clause, if any. This may affect `need_device_ptr` arguments.

* Handle novariants and nocontext clauses, if any. Evaluate compile-time
constants and select a variant, if possible. Otherwise, emit code to handle all
possible cases at run time.

* Create an explicit task, as if the `task` construct was used, that wraps the
body of the `dispatch` statement. Move relevant clauses to the task.

gcc/ChangeLog:

* gimple-low.cc (lower_stmt): Handle GIMPLE_OMP_DISPATCH.
* gimple-pretty-print.cc (dump_gimple_omp_dispatch): New function.
(pp_gimple_stmt_1): Handle GIMPLE_OMP_DISPATCH.
* gimple-walk.cc (walk_gimple_stmt): Likewise.
* gimple.cc (gimple_build_omp_dispatch): New function.
(gimple_copy): Handle GIMPLE_OMP_DISPATCH.
* gimple.def (GIMPLE_OMP_DISPATCH): Define.
* gimple.h (gimple_build_omp_dispatch): Declare.
(gimple_has_substatements): Handle GIMPLE_OMP_DISPATCH.
(gimple_omp_dispatch_clauses): New function.
(gimple_omp_dispatch_clauses_ptr): Likewise.
(gimple_omp_dispatch_set_clauses): Likewise.
(gimple_return_set_retval): Handle GIMPLE_OMP_DISPATCH.
* gimplify.cc (enum omp_region_type): Add ORT_DISPATCH.
(gimplify_call_expr): Handle need_device_ptr arguments.
(is_gimple_stmt): Handle OMP_DISPATCH.
(gimplify_scan_omp_clauses): Handle OMP_CLAUSE_DEVICE in a dispatch
construct. Handle OMP_CLAUSE_NOVARIANTS and OMP_CLAUSE_NOCONTEXT.
(gimplify_adjust_omp_clauses): Handle OMP_CLAUSE_NOVARIANTS and
OMP_CLAUSE_NOCONTEXT.
(omp_construct_selector_matches): Handle OMP_DISPATCH with nocontext
clause.
(omp_has_novariants): New function.
(omp_has_nocontext): Likewise.
(gimplify_omp_dispatch): Likewise.
(gimplify_expr): Handle OMP_DISPATCH.
* gimplify.h (omp_has_novariants): Declare.
(omp_has_nocontext): Declare.
* omp-builtins.def (BUILT_IN_OMP_GET_MAPPED_PTR): Define.
(BUILT_IN_OMP_GET_DEFAULT_DEVICE): Define.
(BUILT_IN_OMP_SET_DEFAULT_DEVICE): Define.
* omp-expand.cc (expand_omp_dispatch): New function.
(expand_omp): Handle GIMPLE_OMP_DISPATCH.
(omp_make_gimple_edges): Likewise.
* omp-general.cc (omp_construct_traits_to_codes): Add OMP_DISPATCH.
(struct omp_ts_info): Add dispatch.
(omp_context_selector_matches): Handle OMP_TRAIT_SET_NEED_DEVICE_PTR.
(omp_resolve_declare_variant): Handle novariants. Adjust
DECL_ASSEMBLER_NAME.
---
 gcc/gimple-low.cc  |   1 +
 gcc/gimple-pretty-print.cc |  33 +++
 gcc/gimple-walk.cc |   1 +
 gcc/gimple.cc  |  20 ++
 gcc/gimple.def |   5 +
 gcc/gimple.h   |  33 ++-
 gcc/gimplify.cc| 412 -
 gcc/gimplify.h |   2 +
 gcc/omp-builtins.def   |   6 +
 gcc/omp-expand.cc  |  18 ++
 gcc/omp-general.cc |  16 +-
 gcc/omp-low.cc |  35 
 gcc/tree-inline.cc |   7 +
 13 files changed, 578 insertions(+), 11 deletions(-)

diff --git a/gcc/gimple-low.cc b/gcc/gimple-low.cc
index e0371988705..712a1ebf776 100644
--- a/gcc/gimple-low.cc
+++ b/gcc/gimple-low.cc
@@ -746,6 +746,7 @@ lower_stmt (gimple_stmt_iterator *gsi, struct lower_data 
*data)
 case GIMPLE_EH_MUST_NOT_THROW:
 case GIMPLE_OMP_FOR:
 case GIMPLE_OMP_SCOPE:
+case GIMPLE_OMP_DISPATCH:
 case GIMPLE_OMP_SECTIONS:
 case GIMPLE_OMP_SECTIONS_SWITCH:
 case GIMPLE_OMP_SECTION:
diff --git a/gcc/gimple-pretty-print.cc b/gcc/gimple-pretty-print.cc
index 08b823c84ef..e7b2df9a0ef 100644
--- a/gcc/gimple-pretty-print.cc
+++ b/gcc/gimple-pretty-print.cc
@@ -1726,6 +1726,35 @@ dump_gimple_omp_scope (pretty_printer *pp, const gimple 
*gs,
 }
 }
 
+/* Dump a GIMPLE_OMP_DISPATCH tuple on the pretty_printer BUFFER.  */
+
+static void
+dump_gimple_omp_dispatch (pretty_printer *buffer, const gimple *gs, int spc,
+ dump_flags_t flags)
+{
+  if (flags & TDF_RAW)
+{
+  dump_gimple_fmt (buffer, spc, flags, "%G <%+BODY <%S>%nCLAUSES <", gs,
+  gimple_omp_body (gs));
+  dump_omp_clauses (buffer, gimple_omp_dispatch_clauses (gs), spc, flags);
+  dump_gimple_fmt (buffer, spc, flags, " >");
+}
+  else
+{
+  pp_string (buffer, "#pragma omp dispatch");
+  dump_omp_clauses (buffer, gimple_omp_dispatch_clauses (gs), spc, flags);
+  if (!gimple_seq_empty_p (gimple_omp_body (gs)))
+   {
+ newline_and_indent (buffer, spc + 2);
+ pp_lef

[PATCH v2 0/8] OpenMP: dispatch + adjust_args support

2024-07-12 Thread Paul-Antoine Arras
This is a respin of my patchset implementing both the `dispatch` construct and 
the `adjust_args` clause to the `declare variant` directive. The original
submission can be found there: 
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/652819.html.

Beside being rebased, this new iteration has the following changes:
 * Add a commit to fix an incorrect warning with `gfc_error`;
 * Add a testcase (xfailed due to PR/115271) to ensure that `adjust_args` works
 across TUs in Fortran;
 * Rule out `c_funptr` for `need_device_ptr`;
 * Fix some warnings turned into errors during bootstrapping.
 

Paul-Antoine Arras (8):
  Fix warnings for tree formats in gfc_error
  OpenMP: dispatch + adjust_args tree data structures and front-end
interfaces
  OpenMP: middle-end support for dispatch + adjust_args
  OpenMP: C front-end support for dispatch + adjust_args
  OpenMP: C++ front-end support for dispatch + adjust_args
  OpenMP: common C/C++ testcases for dispatch + adjust_args
  OpenMP: Fortran front-end support for dispatch + adjust_args
  OpenMP: update documentation for dispatch and adjust_args

 gcc/builtin-types.def |   1 +
 gcc/c-family/c-attribs.cc |   2 +
 gcc/c-family/c-format.cc  |   4 +
 gcc/c-family/c-omp.cc |   4 +-
 gcc/c-family/c-pragma.cc  |   1 +
 gcc/c-family/c-pragma.h   |   3 +
 gcc/c/c-parser.cc | 496 --
 gcc/c/c-typeck.cc |   2 +
 gcc/cp/decl.cc|  33 +
 gcc/cp/parser.cc  | 612 --
 gcc/cp/pt.cc  |   3 +
 gcc/cp/semantics.cc   |  20 +
 gcc/fortran/dump-parse-tree.cc|  17 +
 gcc/fortran/frontend-passes.cc|   2 +
 gcc/fortran/gfortran.h|  11 +-
 gcc/fortran/match.h   |   1 +
 gcc/fortran/openmp.cc | 201 +-
 gcc/fortran/parse.cc  |  39 +-
 gcc/fortran/resolve.cc|   2 +
 gcc/fortran/st.cc |   1 +
 gcc/fortran/trans-decl.cc |   9 +-
 gcc/fortran/trans-openmp.cc   | 161 +
 gcc/fortran/trans.cc  |   1 +
 gcc/fortran/types.def |   1 +
 gcc/gimple-low.cc |   1 +
 gcc/gimple-pretty-print.cc|  33 +
 gcc/gimple-walk.cc|   1 +
 gcc/gimple.cc |  20 +
 gcc/gimple.def|   5 +
 gcc/gimple.h  |  33 +-
 gcc/gimplify.cc   | 412 +++-
 gcc/gimplify.h|   2 +
 gcc/omp-builtins.def  |   6 +
 gcc/omp-expand.cc |  18 +
 gcc/omp-general.cc|  16 +-
 gcc/omp-low.cc|  35 +
 gcc/omp-selectors.h   |   3 +
 .../c-c++-common/gomp/adjust-args-1.c |  30 +
 .../c-c++-common/gomp/adjust-args-2.c |  31 +
 .../c-c++-common/gomp/declare-variant-2.c |   4 +-
 gcc/testsuite/c-c++-common/gomp/dispatch-1.c  |  65 ++
 gcc/testsuite/c-c++-common/gomp/dispatch-2.c  |  28 +
 gcc/testsuite/c-c++-common/gomp/dispatch-3.c  |  15 +
 gcc/testsuite/c-c++-common/gomp/dispatch-4.c  |  18 +
 gcc/testsuite/c-c++-common/gomp/dispatch-5.c  |  26 +
 gcc/testsuite/c-c++-common/gomp/dispatch-6.c  |  19 +
 gcc/testsuite/c-c++-common/gomp/dispatch-7.c  |  28 +
 gcc/testsuite/g++.dg/gomp/adjust-args-1.C |  39 ++
 gcc/testsuite/g++.dg/gomp/adjust-args-2.C |  51 ++
 gcc/testsuite/g++.dg/gomp/dispatch-1.C|  53 ++
 gcc/testsuite/g++.dg/gomp/dispatch-2.C|  62 ++
 gcc/testsuite/gcc.dg/gomp/adjust-args-1.c |  32 +
 gcc/testsuite/gcc.dg/gomp/dispatch-1.c|  53 ++
 .../gfortran.dg/gomp/adjust-args-1.f90|  63 ++
 .../gfortran.dg/gomp/adjust-args-2.f90|  18 +
 .../gfortran.dg/gomp/adjust-args-3.f90|  26 +
 .../gfortran.dg/gomp/adjust-args-4.f90|  58 ++
 .../gfortran.dg/gomp/adjust-args-5.f90|  58 ++
 .../gfortran.dg/gomp/declare-variant-2.f90|   6 +-
 .../gomp/declare-variant-21-aux.f90   |  18 +
 .../gfortran.dg/gomp/declare-variant-21.f90   |  28 +
 gcc/testsuite/gfortran.dg/gomp/dispatch-1.f90 |  77 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-2.f90 |  79 +++
 gcc/testsuite/gfortran.dg/gomp/dispatch-3.f90 |  39 ++
 gcc/testsuite/gfortran.dg/gomp/dispatch-4.f90 |  19 +
 gcc/testsuite/gfortran.dg/gomp/dispatch-5.f90 |  24 +
 gcc/testsuite/gfortran.dg/gomp/dispatch-6.f90 |  38 ++
 gcc/testsuite/gfortran.dg/gomp/dispatch-7.f90 |  27 +
 gcc/testsuite/gfortran.dg/gomp/dispatch-8.f90 |  39 ++
 gcc/tree-core.h  

[PATCH v2 6/8] OpenMP: common C/C++ testcases for dispatch + adjust_args

2024-07-12 Thread Paul-Antoine Arras
gcc/testsuite/ChangeLog:

* c-c++-common/gomp/declare-variant-2.c: Adjust dg-error directives.
* c-c++-common/gomp/adjust-args-1.c: New test.
* c-c++-common/gomp/adjust-args-2.c: New test.
* c-c++-common/gomp/dispatch-1.c: New test.
* c-c++-common/gomp/dispatch-2.c: New test.
* c-c++-common/gomp/dispatch-3.c: New test.
* c-c++-common/gomp/dispatch-4.c: New test.
* c-c++-common/gomp/dispatch-5.c: New test.
* c-c++-common/gomp/dispatch-6.c: New test.
* c-c++-common/gomp/dispatch-7.c: New test.
---
 .../c-c++-common/gomp/adjust-args-1.c | 30 +
 .../c-c++-common/gomp/adjust-args-2.c | 31 +
 .../c-c++-common/gomp/declare-variant-2.c |  4 +-
 gcc/testsuite/c-c++-common/gomp/dispatch-1.c  | 65 +++
 gcc/testsuite/c-c++-common/gomp/dispatch-2.c  | 28 
 gcc/testsuite/c-c++-common/gomp/dispatch-3.c  | 15 +
 gcc/testsuite/c-c++-common/gomp/dispatch-4.c  | 18 +
 gcc/testsuite/c-c++-common/gomp/dispatch-5.c  | 26 
 gcc/testsuite/c-c++-common/gomp/dispatch-6.c  | 19 ++
 gcc/testsuite/c-c++-common/gomp/dispatch-7.c  | 28 
 10 files changed, 262 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/gomp/adjust-args-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/adjust-args-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-1.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-2.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-3.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-4.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-5.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-6.c
 create mode 100644 gcc/testsuite/c-c++-common/gomp/dispatch-7.c

diff --git a/gcc/testsuite/c-c++-common/gomp/adjust-args-1.c 
b/gcc/testsuite/c-c++-common/gomp/adjust-args-1.c
new file mode 100644
index 000..728abe62092
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/adjust-args-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+int f (int a, void *b, float c[2]);
+
+#pragma omp declare variant (f) match (construct={dispatch}) adjust_args 
(nothing: a) adjust_args (need_device_ptr: b, c)
+int f0 (int a, void *b, float c[2]);
+#pragma omp declare variant (f) match (construct={dispatch}) adjust_args 
(nothing: a) adjust_args (need_device_ptr: b) adjust_args (need_device_ptr: c)
+int f1 (int a, void *b, float c[2]);
+
+int test () {
+  int a;
+  void *b;
+  float c[2];
+  struct {int a;} s;
+
+  s.a = f0 (a, b, c);
+  #pragma omp dispatch
+  s.a = f0 (a, b, c);
+
+  f1 (a, b, c);
+  #pragma omp dispatch
+  s.a = f1 (a, b, c);
+
+  return s.a;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_omp_get_default_device 
\\(\\);" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(&c, D\.\[0-9]+\\);" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(b, D\.\[0-9]+\\);" 2 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/adjust-args-2.c 
b/gcc/testsuite/c-c++-common/gomp/adjust-args-2.c
new file mode 100644
index 000..e36d93a01d9
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/adjust-args-2.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fdump-tree-gimple" } */
+
+int f (int a, void *b, float c[2]);
+
+#pragma omp declare variant (f) match (construct={dispatch}) adjust_args 
(nothing: a) adjust_args (need_device_ptr: b, c)
+int f0 (int a, void *b, float c[2]);
+#pragma omp declare variant (f) adjust_args (need_device_ptr: b, c) match 
(construct={dispatch}) adjust_args (nothing: a) 
+int f1 (int a, void *b, float c[2]);
+
+void test () {
+  int a;
+  void *b;
+  float c[2];
+
+  #pragma omp dispatch
+  f0 (a, b, c);
+
+  #pragma omp dispatch device (-4852)
+  f0 (a, b, c);
+
+  #pragma omp dispatch device (a + a)
+  f0 (a, b, c);
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_omp_get_default_device 
\\(\\);" 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(&c, D\.\[0-9]+\\);" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(b, D\.\[0-9]+\\);" 2 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(&c, -4852\\);" 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "D\.\[0-9]+ = 
__builtin_omp_get_mapped_ptr \\(b, -4852\\);" 1 "gimple" } } */
+/* { dg-final { scan-tree-dump-times "#pragma omp dispatch device\\(-4852\\)" 
1 "gimple" } } */
diff --git a/gcc/testsuite/c-c++-common/gomp/declare-variant-2.c 
b/gcc/testsuite/c-c++-common/gomp/declare-variant-2.c
index 05e485ef6a8..50d9b2dcf4b 100644
--- a/gcc/testsuite/c-c++-common/gomp/declare-variant-2.c
+++ b/gcc/testsuite/c-c++-common/gomp/declare-variant

[PATCH v2 1/8] Fix warnings for tree formats in gfc_error

2024-07-12 Thread Paul-Antoine Arras
This enables proper warnings for formats like %qD.

gcc/c-family/ChangeLog:

* c-format.cc (gcc_gfc_char_table): Add formats for tree objects.
---
 gcc/c-family/c-format.cc | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
index 5bfd2fc4469..f4163c9cbc0 100644
--- a/gcc/c-family/c-format.cc
+++ b/gcc/c-family/c-format.cc
@@ -847,6 +847,10 @@ static const format_char_info gcc_gfc_char_table[] =
   /* This will require a "locus" at runtime.  */
   { "L",   0, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN  }, "", "R", NULL },
 
+  /* These will require a "tree" at runtime.  */
+  { "DFTV", 1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN, 
 BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "'",   NULL },
+  { "E",   1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "",   NULL },
+
   /* These will require nothing.  */
   { "<>",0, STD_C89, NOARGUMENTS, "",  "",   NULL },
   { NULL,  0, STD_C89, NOLENGTHS, NULL, NULL, NULL }
-- 
2.45.2



[PATCH v2] Provide more contexts for -Warray-bounds warning messages

2024-07-12 Thread Qing Zhao
due to code duplication from jump threading [PR109071]
Control this with a new option -fdiagnostic-explain-harder.

Compared to V1, the major difference are: (address David's comments)

1. Change the name of the option from:

-fdiagnostic-try-to-explain-harder 
To
-fdiagnostic-explain-harder 

2. Sync the commit comments with the real output of the compilation message.

3. Add one more event in the end of the path to repeat the out-of-bound
   issue.

4. Fixed the unnecessary changes in Makefile.in.

5. Add new copy_history_diagnostic_path.[cc|h] to implement a new
class copy_history_diagnostic_path : public diagnostic_path

for copy_history_t. 

6. Only building the rich locaiton and populating the path when warning_at
is called.

There are two comments from David that I didn't addressed in this version:

1. Make regenerate-opt-urls.
will do this in a later version. 

2. Add a ⚠️  emoji for the last event. 
I didn't add this yet since I think the current message is clear enough.
might not worth the effort to add this emoji (it's not that straightforward
to add on). 

With this new version, the message emitted by GCC:

$gcc -O2 -Wall -fdiagnostics-explain-harder -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
[-Warray-bounds=]
   12 |   int *val = &sg->vals[index];
  |   ^~~
  ‘sparx5_set’: events 1-2
4 |   if (*index >= 4)
  |  ^
  |  |
  |  (1) when the condition is evaluated to true
..
   12 |   int *val = &sg->vals[index];
  |   ~~~
  |   |
  |   (2) out of array bounds here
t.c:8:18: note: while referencing ‘vals’
8 | struct nums {int vals[4];};
  |  ^~~~

Bootstrapped and regression tested on both aarch64 and x86. no issues.

Let me know any further comments and suggestions.

thanks.

Qing

==
$ cat t.c
extern void warn(void);
static inline void assign(int val, int *regs, int *index)
{
  if (*index >= 4)
warn();
  *regs = val;
}
struct nums {int vals[4];};

void sparx5_set (int *ptr, struct nums *sg, int index)
{
  int *val = &sg->vals[index];

  assign(0,ptr, &index);
  assign(*val, ptr, &index);
}

$ gcc -Wall -O2  -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
[-Warray-bounds=]
   12 |   int *val = &sg->vals[index];
  |   ^~~
t.c:8:18: note: while referencing ‘vals’
8 | struct nums {int vals[4];};
  |  ^~~~

In the above, Although the warning is correct in theory, the warning message
itself is confusing to the end-user since there is information that cannot
be connected to the source code directly.

It will be a nice improvement to add more information in the warning message
to report where such index value come from.

In order to achieve this, we add a new data structure copy_history to record
the condition and the transformation that triggered the code duplication.
Whenever there is a code duplication due to some specific transformations,
such as jump threading, loop switching, etc, a copy_history structure is
created and attached to the duplicated gimple statement.

During array out-of-bound checking or other warning checking, the copy_history
that was attached to the gimple statement is used to form a sequence of
diagnostic events that are added to the corresponding rich location to be used
to report the warning message.

This behavior is controled by the new option -fdiagnostic-explain-harder
which is off by default.

With this change, by adding -fdiagnostic-explain-harder,
the warning message for the above testing case is now:

$ gcc -Wall -O2 -fdiagnostics-explain-harder -c -o t.o t.c
t.c: In function ‘sparx5_set’:
t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ 
[-Warray-bounds=]
   12 |   int *val = &sg->vals[index];
  |   ^~~
  ‘sparx5_set’: events 1-2
4 |   if (*index >= 4)
  |  ^
  |  |
  |  (1) when the condition is evaluated to true
..
   12 |   int *val = &sg->vals[index];
  |   ~~~
  |   |
  |   (2) out of array bounds here
t.c:8:18: note: while referencing ‘vals’
8 | struct nums {int vals[4];};
  |  ^~~~

gcc/ChangeLog:

PR tree-optimization/109071

* Makefile.in (OBJS): Add diagnostic-copy-history.o
and copy-history-diagnostic-path.o.
* gcc/common.opt (fdiagnostics-explain-harder): New option.
* gcc/doc/invoke.texi (fdiagnostics-explain-harder): Add
documentation for the new option.
* gimple-array-bounds.cc (build_rich_location_with_diagnostic_path):
New function.
(check_out_of_bounds_and_warn): Add one new parameter. Use rich

RE: Lower zeroing array assignment to memset for allocatable arrays

2024-07-12 Thread Prathamesh Kulkarni


> -Original Message-
> From: Harald Anlauf 
> Sent: Friday, July 12, 2024 1:52 AM
> To: Prathamesh Kulkarni ; gcc-
> patc...@gcc.gnu.org; fort...@gcc.gnu.org
> Subject: Re: Lower zeroing array assignment to memset for allocatable
> arrays
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Prathamesh!
Hi Harald,
> 
> Am 11.07.24 um 12:16 schrieb Prathamesh Kulkarni:
> >
> >
> >> -Original Message-
> >> From: Harald Anlauf 
> >> Sent: Thursday, July 11, 2024 12:53 AM
> >> To: Prathamesh Kulkarni ; gcc-
> >> patc...@gcc.gnu.org; fort...@gcc.gnu.org
> >> Subject: Re: Lower zeroing array assignment to memset for
> allocatable
> >> arrays
> >>
> >> External email: Use caution opening links or attachments
> >>
> >>
> >> Hi Prathamesh,
> >>
> >> Am 10.07.24 um 13:22 schrieb Prathamesh Kulkarni:
> >>> Hi,
> >>> The attached patch lowers zeroing array assignment to memset for
> >> allocatable arrays.
> >>>
> >>> For example:
> >>> subroutine test(z, n)
> >>>   implicit none
> >>>   integer :: n
> >>>   real(4), allocatable :: z(:,:,:)
> >>>
> >>>   allocate(z(n, 8192, 2048))
> >>>   z = 0
> >>> end subroutine
> >>>
> >>> results in following call to memset instead of 3 nested loops for
> z
> >> = 0:
> >>>   (void) __builtin_memset ((void *) z->data, 0, (unsigned
> long)
> >>> MAX_EXPR dim[0].ubound - z->dim[0].lbound, -1> + 1) *
> >>> (MAX_EXPR dim[1].ubound - z->dim[1].lbound, -1> + 1)) *
> >> (MAX_EXPR
> >>> dim[2].ubound - z->dim[2].lbound, -1> + 1)) * 4));
> >>>
> >>> The patch significantly improves speedup for an internal Fortran
> >> application on AArch64 -mcpu=grace (and potentially on other
> AArch64
> >> cores too).
> >>> Bootstrapped+tested on aarch64-linux-gnu.
> >>> Does the patch look OK to commit ?
> >>
> >> no, it is NOT ok.
> >>
> >> Consider:
> >>
> >> subroutine test0 (n, z)
> >> implicit none
> >> integer :: n
> >> real, pointer :: z(:,:,:) ! need not be contiguous!
> >> z = 0
> >> end subroutine
> >>
> >> After your patch this also generates a memset, but this cannot be
> >> true in general.  One would need to have a test on contiguity of
> the
> >> array before memset can be used.
> >>
> >> In principle this is a nice idea, and IIRC there exists a very old
> PR
> >> on this (by Thomas König?).  So it might be worth pursuing.
> > Hi Harald,
> > Thanks for the suggestions!
> > The attached patch checks gfc_is_simply_contiguous(expr, true,
> false)
> > before lowering to memset, which avoids generating memset for your
> example above.
> 
> This is much better, as it avoids generating false memsets where it
> should not.  However, you now miss cases where the array is a
> component reference, as in:
> 
> subroutine test_dt (dt)
>implicit none
>type t
>   real, allocatable :: x(:,:,:) ! contiguous!
>   real, pointer, contiguous :: y(:,:,:) ! contiguous!
>   real, pointer :: z(:,:,:) ! need not be
> contiguous!
>end type t
>type(t) :: dt
>dt% x = 0  ! memset possible!
>dt% y = 0  ! memset possible!
>dt% z = 0  ! memset NOT possible!
> end subroutine
> 
> You'll need to cycle through the component references and apply the
> check for contiguity to the ultimate component, not the top level.
> 
> Can you have another look?
Thanks for the review!
It seems that component references are not currently handled even for static 
size arrays ?
For eg:
subroutine test_dt (dt, y)
   implicit none
   real :: y (10, 20, 30)
   type t
  real :: x(10, 20, 30)
   end type t
   type(t) :: dt
   y = 0
   dt% x = 0
end subroutine

With trunk, it generates memset for 'y' but not for dt%x.
That happens because copyable_array_p returns false for dt%x,
because expr->ref->next is non NULL:

  /* First check it's an array.  */
  if (expr->rank < 1 || !expr->ref || expr->ref->next)
return false;

and gfc_full_array_ref_p(expr) bails out if expr->ref->type != REF_ARRAY.
Looking thru git history, it seems both the checks were added in 18eaa2c0cd20 
to fix PR33370.
(Even after removing these checks, the previous patch bails out from 
gfc_trans_zero_assign because
GFC_DESCRIPTOR_TYPE_P (type) returns false for component ref and ends up 
returning NULL_TREE)
I am working on extending the patch to handle component refs for statically 
sized as well as allocatable arrays.

Since it looks like a bigger change and an extension to current functionality, 
will it be OK to commit the previous patch as-is (if it looks correct)
and address component refs in follow up one ?

Thanks,
Prathamesh  
 
> 
> Thanks,
> Harald
> 
> > Bootstrapped+tested on aarch64-linux-gnu.
> > Does the attached patch look OK ?
> >
> > Signed-off-by: Prathamesh Kulkarni 
> >
> > Thanks,
> > Prathamesh
> >>
> >> Thanks,
> >> Harald
> >>
> >>
> >>> Signed-off-by: Prathamesh Kulkarni 
> >>>
> >>> Thanks,
> >>> Prathamesh
> >



Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-12 Thread Richard Biener
On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:
>
> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
> > Padding is only an issue for very small vectors - the obvious choice is
> > to disallow vector types that would require any padding.  I can hardly
> > see where those are faster than using a vector of up to 4 char elements.
> > Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit 
> > elements
> > with 2 or one element vectors and 4-bit elements with 1 element vectors.
>
> I'd really like to avoid having to support something like
> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16)))
> _BitInt(2) to say size of long long could be acceptable.

I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic way to say
the element should have n (< 8) bits.

> > I have no idea what the stance of supporting _BitInt in C++ are,
> > but most certainly diverging support (or even semantics) of the
> > vector extension in C vs. C++ is undesirable.
>
> I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't
> look favorably to _BitInt support in C++, so at least until something like
> that is standardized in C++ the answer is probably no.

OK, I think that rules out _BitInt use here so while bool is then natural
for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the
number of bits explicitly.  There is signed_bool_precision but like
vector_mask it's use is restricted to the GIMPLE frontend because
interaction with the rest of the language isn't defined.

That said - we're mixing two things here.  The desire to have "proper"
svbool (fix: declare in the backend) and the desire to have "packed"
bit-precision vectors (for whatever actual reason) as part of the
GCC vector extension.

Richard.

> Jakub
>


[committed] c++/modules: Add testcase for fixed issue with usings [PR115798]

2024-07-12 Thread Nathaniel Shead
Tested on x86_64-pc-linux-gnu, pushing to trunk.

-- >8 --

This issue was fixed by r15-2003-gd6bf4b1c932211, but seems worth adding
to the testsuite.

PR c++/115798

gcc/testsuite/ChangeLog:

* g++.dg/modules/using-26_a.C: New test.
* g++.dg/modules/using-26_b.C: New test.
* g++.dg/modules/using-26_c.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/testsuite/g++.dg/modules/using-26_a.C | 15 +++
 gcc/testsuite/g++.dg/modules/using-26_b.C | 10 ++
 gcc/testsuite/g++.dg/modules/using-26_c.C |  9 +
 3 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/modules/using-26_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/using-26_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/using-26_c.C

diff --git a/gcc/testsuite/g++.dg/modules/using-26_a.C 
b/gcc/testsuite/g++.dg/modules/using-26_a.C
new file mode 100644
index 000..adab83d4243
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-26_a.C
@@ -0,0 +1,15 @@
+// PR c++/115798
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi base }
+
+module;
+#include 
+export module base;
+
+export {
+  using ::int8_t;
+}
+
+export namespace std {
+  using std::int8_t;
+}
diff --git a/gcc/testsuite/g++.dg/modules/using-26_b.C 
b/gcc/testsuite/g++.dg/modules/using-26_b.C
new file mode 100644
index 000..06ed599df41
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-26_b.C
@@ -0,0 +1,10 @@
+// PR c++/115798
+// { dg-additional-options "-fmodules-ts" }
+// { dg-module-cmi xstd }
+
+export module xstd;
+import base;
+
+export namespace std {
+  using std::int8_t;
+}
diff --git a/gcc/testsuite/g++.dg/modules/using-26_c.C 
b/gcc/testsuite/g++.dg/modules/using-26_c.C
new file mode 100644
index 000..68f45cacb82
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/using-26_c.C
@@ -0,0 +1,9 @@
+// PR c++/115798
+// { dg-additional-options "-fmodules-ts" }
+
+import xstd;
+import base;
+
+int main() {
+  static_assert(__is_same(int8_t, std::int8_t));
+}
-- 
2.43.2



Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-12 Thread Jakub Jelinek
On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
> Padding is only an issue for very small vectors - the obvious choice is
> to disallow vector types that would require any padding.  I can hardly
> see where those are faster than using a vector of up to 4 char elements.
> Problematic are 1-bit elements with 4, 2 or one element vectors, 2-bit 
> elements
> with 2 or one element vectors and 4-bit elements with 1 element vectors.

I'd really like to avoid having to support something like
_BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) * 16)))
_BitInt(2) to say size of long long could be acceptable.

> I have no idea what the stance of supporting _BitInt in C++ are,
> but most certainly diverging support (or even semantics) of the
> vector extension in C vs. C++ is undesirable.

I believe Clang supports it in C++ next to C, GCC doesn't and Jason didn't
look favorably to _BitInt support in C++, so at least until something like
that is standardized in C++ the answer is probably no.

Jakub



Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-12 Thread Richard Biener
On Fri, Jul 12, 2024 at 12:44 PM Tejas Belagod  wrote:
>
> On 7/12/24 11:46 AM, Richard Biener wrote:
> > On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod  wrote:
> >>
> >> On 7/10/24 4:37 PM, Richard Biener wrote:
> >>> On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
> >>>  wrote:
> 
>  Tejas Belagod  writes:
> > On 7/10/24 2:38 PM, Richard Biener wrote:
> >> On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  
> >> wrote:
> >>>
> >>> On 7/9/24 4:22 PM, Richard Biener wrote:
>  On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod 
>   wrote:
> >
> > On 7/8/24 4:45 PM, Richard Biener wrote:
> >> On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod 
> >>  wrote:
> >>>
> >>> Hi,
> >>>
> >>> Sorry to have dropped the ball on
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, 
> >>> but
> >>> here I've tried to pick it up again and write up a strawman 
> >>> proposal for
> >>> elevating __attribute__((vector_mask)) to the FE from GIMPLE.
> >>>
> >>>
> >>> Thanks,
> >>> Tejas.
> >>>
> >>> Motivation
> >>> --
> >>>
> >>> The idea of packed boolean vectors came about when we wanted to 
> >>> support
> >>> C/C++ operators on SVE ACLE types. The current vector boolean 
> >>> type that
> >>> ACLE specifies does not adequately disambiguate vector lane sizes 
> >>> which
> >>> they were derived off of. Consider this simple, albeit 
> >>> unrealistic, example:
> >>>
> >>> bool foo (svint32_t a, svint32_t b)
> >>> {
> >>>   svbool_t p = a > b;
> >>>
> >>>   // Here p[2] is not the same as a[2] > b[2].
> >>>   return p[2];
> >>> }
> >>>
> >>> In the above example, because svbool_t has a fixed 
> >>> 1-lane-per-byte, p[i]
> >>> does not return the bool value corresponding to a[i] > b[i]. This
> >>> necessitates a 'typed' vector boolean value that unambiguously
> >>> represents results of operations
> >>> of the same type.
> >>>
> >>> __attribute__((vector_mask))
> >>> -
> >>>
> >>> Note: If interested in historical discussions refer to:
> >>> https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html
> >>>
> >>> We define this new attribute which when applied to a base data 
> >>> vector
> >>> produces a new boolean vector type that represents a boolean type 
> >>> that
> >>> is produced as a result of operations on the corresponding base 
> >>> vector
> >>> type. The following is the syntax.
> >>>
> >>> typedef int v8si __attribute__((vector_size (8 * sizeof 
> >>> (int)));
> >>> typedef v8si v8sib __attribute__((vector_mask));
> >>>
> >>> Here the 'base' data vector type is v8si or a vector of 8 
> >>> integers.
> >>>
> >>> Rules
> >>>
> >>> • The layout/size of the boolean vector type is 
> >>> implementation-defined
> >>> for its base data vector type.
> >>>
> >>> • Two boolean vector types who's base data vector types have same 
> >>> number
> >>> of elements and lane-width have the same layout and size.
> >>>
> >>> • Consequently, two boolean vectors who's base data vector types 
> >>> have
> >>> different number of elements or different lane-size have 
> >>> different layouts.
> >>>
> >>> This aligns with gnu vector extensions that generate integer 
> >>> vectors as
> >>> a result of comparisons - "The result of the comparison is a 
> >>> vector of
> >>> the same width and number of elements as the comparison operands 
> >>> with a
> >>> signed integral element type." according to
> >>>  
> >>> https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.
> >>
> >> Without having the time to re-review this all in detail I think 
> >> the GNU
> >> vector extension does not expose the result of the comparison as 
> >> the
> >> machine would produce it but instead a comparison "decays" to
> >> a conditional:
> >>
> >> typedef int v4si __attribute__((vector_size(16)));
> >>
> >> v4si a;
> >> v4si b;
> >>
> >> void foo()
> >> {
> >>auto r = a < b;
> >> }
> >>
> >> produces, with C23:
> >>
> >>vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 
> >> } , { 0,
> 

Re: [Patch, tree-optimization, predcom] Improve unroll factor for predictive commoning

2024-07-12 Thread Richard Biener
On Fri, Jul 12, 2024 at 12:09 PM Ajit Agarwal  wrote:
>
> Hello Richard:
>
> On 11/07/24 2:21 pm, Richard Biener wrote:
> > On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal  
> > wrote:
> >>
> >> Hello All:
> >>
> >> Unroll factor is determined with max distance across loop iterations.
> >> The logic for determining the loop unroll factor is based on
> >> data dependency across loop iterations.
> >>
> >> The max distance across loop iterations is the unrolling factor
> >> that helps in predictive commoning.
> >
> > The old comment in the code says
> >
> >> -  /* The best unroll factor for this chain is equal to the number of
> >> -temporary variables that we create for it.  */
> >
> > why is that wrong and why is the max dependence distance more correct?
> >
> > Do you have a testcase that shows how this makes a (positive) difference?
> >
>
> There is nothing wrong in the existing implementation of unroll
> factor for predictive commoning.
>
> But with max dependence distance we get performance improvement
> with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without
> changes. Improvement in benchmarks with max dependence distance
> changes.
>
> I have used the following flags:
> -O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning 
> -fno-tree-pre
>
> With above flags I ran with and without changes.

A 0.01% geomean improvement is noise.  Why did you disable PRE?

> There is no degradation with spec 2017 (FP benchmarks).
>
> Because in predictive commoning we reuse values computed in
> earlier iterations of a loop in the later ones, max distance is the
> better choice.

The re-use distance is the same though.  So your change merely increases
the unroll factor?  Or can you explain why there is more re-use with
your change.

Richard.

> > Richard.
> >
>
> Thanks & Regards
> Ajit
>
> >> Bootstrapped and regtested on powerpc64-linux-gnu.
> >>
> >> Thanks & Regards
> >> Ajit
> >>
> >> tree-optimization, predcom: Improve unroll factor for predictive commoning
> >>
> >> Unroll factor is determined with max distance across loop iterations.
> >> The logic for determining the loop unroll factor is based on
> >> data dependency across loop iterations.
> >>
> >> The max distance across loop iterations is the unrolling factor
> >> that helps in predictive commoning.
> >>
> >> 2024-07-11  Ajit Kumar Agarwal  
> >>
> >> gcc/ChangeLog:
> >>
> >> * tree-predcom.cc: Change in determining unroll factor with
> >> data dependence across loop iterations.
> >> ---
> >>  gcc/tree-predcom.cc | 51 ++---
> >>  1 file changed, 39 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc
> >> index 9844fee1e97..029b02f5990 100644
> >> --- a/gcc/tree-predcom.cc
> >> +++ b/gcc/tree-predcom.cc
> >> @@ -409,6 +409,7 @@ public:
> >>/* Perform the predictive commoning optimization for chains, make this
> >>   public for being called in callback execute_pred_commoning_cbck.  */
> >>void execute_pred_commoning (bitmap tmp_vars);
> >> +  unsigned determine_unroll_factor (const vec &chains);
> >>
> >>  private:
> >>/* The pointer to the given loop.  */
> >> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p 
> >> chain,
> >> copies as possible.  CHAINS is the list of chains that will be
> >> optimized.  */
> >>
> >> -static unsigned
> >> -determine_unroll_factor (const vec &chains)
> >> +unsigned
> >> +pcom_worker::determine_unroll_factor (const vec &chains)
> >>  {
> >>chain_p chain;
> >> -  unsigned factor = 1, af, nfactor, i;
> >> +  unsigned factor = 1, i;
> >>unsigned max = param_max_unroll_times;
> >> +  struct data_dependence_relation *ddr;
> >> +  unsigned nfactor = 0;
> >> +  int nzfactor = 0;
> >> +
> >> +  /* Best unroll factor is the maximum distance across loop
> >> + iterations.  */
> >> +  FOR_EACH_VEC_ELT (m_dependences, i, ddr)
> >> +{
> >> +  for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++)
> >> +   {
> >> + lambda_vector vec = DDR_DIST_VECT (ddr, j);
> >> + widest_int distance = vec[j];
> >> + unsigned offset = distance.to_uhwi ();
> >> + if (offset == 0)
> >> +   continue;
> >> +
> >> + int dist = offset - nzfactor;
> >> + if (dist  == 0)
> >> +   continue;
> >>
> >> + if (nfactor == 0)
> >> +   {
> >> + nfactor = offset;
> >> + nzfactor = offset;
> >> +   }
> >> + else if (dist <= nzfactor)
> >> +   nfactor = offset;
> >> +
> >> + if (nfactor > 0 && nfactor <= max)
> >> +   factor = nfactor;
> >> +   }
> >> +}
> >> +
> >> +  int max_use = 0;
> >>FOR_EACH_VEC_ELT (chains, i, chain)
> >>  {
> >>if (chain->type == CT_INVARIANT)
> >> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec 
> >> &chains)
> >>   continue;
> >> }
> >>
> >> -  

[PATCH] RISC-V: Implement locality for __builtin_prefetch

2024-07-12 Thread Monk Chiang
The patch add the Zihintntl instructions in the prefetch pattern.
Zicbop has prefetch instructions. Zihintntl has NTL instructions.
Insert NTL instructions before prefetch instruction, if target
has Zihintntl extension.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_print_operand): Add 'L' letter
  to print zihintntl instructions string.
* config/riscv/riscv.md (prefetch): Add zihintntl instructions.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/prefetch-zicbop.c: New test.
* gcc.target/riscv/prefetch-zihintntl.c: New test.
---
 gcc/config/riscv/riscv.cc | 22 +++
 gcc/config/riscv/riscv.md | 10 ++---
 .../gcc.target/riscv/prefetch-zicbop.c| 20 +
 .../gcc.target/riscv/prefetch-zihintntl.c | 20 +
 4 files changed, 69 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d50ac611e1a..34c672cb8e2 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -6485,6 +6485,7 @@ riscv_asm_output_opcode (FILE *asm_out_file, const char 
*p)
'A' Print the atomic operation suffix for memory model OP.
'I' Print the LR suffix for memory model OP.
'J' Print the SC suffix for memory model OP.
+   'L' Print a non-temporal locality hints instruction.
'z' Print x0 if OP is zero, otherwise print OP normally.
'i' Print i if the operand is not a register.
'S' Print shift-index of single-bit mask OP.
@@ -6679,6 +6680,27 @@ riscv_print_operand (FILE *file, rtx op, int letter)
   break;
 }
 
+case 'L':
+  {
+   const char *ntl_hint = NULL;
+   switch (INTVAL (op))
+ {
+ case 0:
+   ntl_hint = "ntl.all";
+   break;
+ case 1:
+   ntl_hint = "ntl.pall";
+   break;
+ case 2:
+   ntl_hint = "ntl.p1";
+   break;
+ }
+
+  if (ntl_hint)
+   asm_fprintf (file, "%s\n\t", ntl_hint);
+  break;
+  }
+
 case 'i':
   if (code != REG)
 fputs ("i", file);
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 2e2379dfca4..4a3bac26ecd 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -4093,12 +4093,16 @@
 {
   switch (INTVAL (operands[1]))
   {
-case 0: return "prefetch.r\t%a0";
-case 1: return "prefetch.w\t%a0";
+case 0: return TARGET_ZIHINTNTL ? "%L2prefetch.r\t%a0" : "prefetch.r\t%a0";
+case 1: return TARGET_ZIHINTNTL ? "%L2prefetch.w\t%a0" : "prefetch.w\t%a0";
 default: gcc_unreachable ();
   }
 }
-  [(set_attr "type" "store")])
+  [(set_attr "type" "store")
+   (set (attr "length") (if_then_else (and (match_test "TARGET_ZIHINTNTL")
+  (match_test "IN_RANGE (INTVAL 
(operands[2]), 0, 2)"))
+ (const_string "8")
+ (const_string "4")))])
 
 (define_insn "riscv_prefetchi_"
   [(unspec_volatile:X [(match_operand:X 0 "address_operand" "r")
diff --git a/gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c 
b/gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c
new file mode 100644
index 000..0faa120f1f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/prefetch-zicbop.c
@@ -0,0 +1,20 @@
+/* { dg-do compile target { { rv64-*-*}}} */
+/* { dg-options "-march=rv64gc_zicbop -mabi=lp64" } */
+
+void foo (char *p)
+{
+  __builtin_prefetch (p, 0, 0);
+  __builtin_prefetch (p, 0, 1);
+  __builtin_prefetch (p, 0, 2);
+  __builtin_prefetch (p, 0, 3);
+  __builtin_prefetch (p, 1, 0);
+  __builtin_prefetch (p, 1, 1);
+  __builtin_prefetch (p, 1, 2);
+  __builtin_prefetch (p, 1, 3);
+}
+
+/* { dg-final { scan-assembler-not "ntl.all\t" } } */
+/* { dg-final { scan-assembler-not "ntl.pall\t" } } */
+/* { dg-final { scan-assembler-not "ntl.p1\t" } } */
+/* { dg-final { scan-assembler-times "prefetch.r" 4 } } */
+/* { dg-final { scan-assembler-times "prefetch.w" 4 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c 
b/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c
new file mode 100644
index 000..78a3afe6833
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/prefetch-zihintntl.c
@@ -0,0 +1,20 @@
+/* { dg-do compile target { { rv64-*-*}}} */
+/* { dg-options "-march=rv64gc_zicbop_zihintntl -mabi=lp64" } */
+
+void foo (char *p)
+{
+  __builtin_prefetch (p, 0, 0);
+  __builtin_prefetch (p, 0, 1);
+  __builtin_prefetch (p, 0, 2);
+  __builtin_prefetch (p, 0, 3);
+  __builtin_prefetch (p, 1, 0);
+  __builtin_prefetch (p, 1, 1);
+  __builtin_prefetch (p, 1, 2);
+  __builtin_prefetch (p, 1, 3);
+}
+
+/* { dg-final { scan-assembler-times "ntl.all" 2 } } */
+/* { dg-final { scan-assembler-times "ntl.pall" 2 } } */
+/* { dg-final { scan-assembler-times "ntl.p1" 2

[patch,avr] Overhaul add and sub insns that extend one operand

2024-07-12 Thread Georg-Johann Lay

These are insns of the forms

  (set (regA:M)
   (plus:M (extend:M (regB:L))
   (regA:M)))
and

  (set (regA:M)
   (minus:M (regA:M)
(extend:M (regB:L

where "extend" may be a sign-extend or zero-extend,
and the integer modes satisfy SImode >= M > L >= QImode.

Currently, these are represented as single insns (and splits).
This patch rewrites them in terms of mode iterators M and L
with SImode >= M > L >= QImode.

This gives 3 new insns (and splits) that cover 6 + 6 + 5 = 17 cases,
where previously there was support for just 8 cases.

The patch handles only the cases that have a plus or a sign-extend.
The minus & zero-extend case was already handled in 077f16b2.

Ok for trunk?

Johann

--

AVR: Overhaul add and sub insns that extend one operand.

These are insns of the forms

  (set (regA:M)
   (plus:M (extend:M (regB:L))
   (regA:M)))
or

  (set (regA:M)
   (minus:M (regA:M)
(extend:M (regB:L

where "extend" may be a sign-extend or zero-extend, and the
integer modes are  SImode >= M > L >= QImode.

The existing patterns are now represented in terms of insns
with mode iterators, and these new insn support all valid
combinations of M and L (which previously was not the case).

gcc/
* config/avr/avr.cc (avr_out_minus): Assimilate into...
(avr_out_plus_ext): ...this new function.
(avr_adjust_insn_length) [ADJUST_LEN_PLUS_EXT]: Handle case.
(avr_rtx_costs_1) [PLUS, MINUS]: Adjust RTX costs.
* config/avr/avr.md (adjust_len) : Add new attribute value.
(PSISI): New mode iterator.
(*addpsi3_zero_extend.hi_split): Assimilate...
(*addpsi3_zero_extend.qi_split): Assimilate...
(*addsi3_zero_extend_split): Assimilate...
(*addsi3_zero_extend.hi_split): Assimilate...
(*add3.zero_extend._split): ...into this
new insn-and-split.
(*addpsi3_zero_extend.hi): Assimilate...
(*addpsi3_zero_extend.qi): Assimilate...
(*addsi3_zero_extend): Assimilate...
(*addsi3_zero_extend.hi): Assimilate...
(*add3.zero_extend.): ...into this new insn.
(*addpsi3_sign_extend.hi_split): Assimilate...
(*addhi3.sign_extend1_split): Assimilate...
(*add3.sign_extend._split): ...into this new
insn-and-split.
(*addpsi3_sign_extend.hi): Assimilate...
(*addhi3.sign_extend1): Assimilate...
(*add3.sign_extend.): ...into this new insn.
(*subpsi3_sign_extend.hi_split): Assimilate...
(*subhi3.sign_extend2_split): Assimilate...
(*sub3.sign_extend._split): ...into this new
insn-and-split.
(*subpsi3_sign_extend.hi): Assimilate...
(*subhi3.sign_extend2): Assimilate...
(*sub3.sign_extend.): ...into this new insn.
(*sub3.zero_extend.): Use avr_out_plus_ext
for asm out.
* config/avr/avr-protos.h (avr_out_minus): Remove.
(avr_out_plus_ext): New proto.
gcc/testsuite/
* gcc.target/avr/torture/add-extend.c: New test.
* gcc.target/avr/torture/sub-extend.c: New test.diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h
index 6e02161759c..568c33a7bbd 100644
--- a/gcc/config/avr/avr-protos.h
+++ b/gcc/config/avr/avr-protos.h
@@ -95,7 +95,7 @@ extern void avr_output_addr_vec (rtx_insn*, rtx);
 extern const char *avr_out_sbxx_branch (rtx_insn *insn, rtx operands[]);
 extern const char* avr_out_bitop (rtx, rtx*, int*);
 extern const char* avr_out_plus (rtx, rtx*, int* =NULL, bool =true);
-extern const char* avr_out_minus (rtx*);
+extern const char* avr_out_plus_ext (rtx_insn*, rtx*, int*);
 extern const char* avr_out_round (rtx_insn *, rtx*, int* =NULL);
 extern const char* avr_out_addto_sp (rtx*, int*);
 extern const char* avr_out_xload (rtx_insn *, rtx*, int*);
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 4a7cbd0e7bc..3478c9d461a 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -8843,30 +8843,90 @@ lshrsi3_out (rtx_insn *insn, rtx operands[], int *len)
 }
 
 
-/* Output subtraction of integer registers XOP[0] and XOP[2] and return ""
+/* Output addition of registers YOP[0] and YOP[1]
 
-  XOP[0] = XOP[0] - XOP[2]
+  YOP[0] += extend (YOP[1])
 
-   where the mode of XOP[0] is in { HI, PSI, SI }, and the mode of
-   XOP[2] is in { QI, HI, PSI }.  When the mode of XOP[0] is larger
-   than the mode of XOP[2], then the latter is zero-extended on the fly.
-   The number of instructions will be the mode size of XOP[0].  */
+   or subtraction of registers YOP[0] and YOP[2]
+
+  YOP[0] -= extend (YOP[2])
+
+   where the integer modes satisfy  SI >= YOP[0].mode > YOP[1/2].mode >= QI,
+   and the extension may be sign- or zero-extend.  Returns "".
+
+   If PLEN == NULL output the instructions.
+   If PLEN != NULL set *PLEN to the length of the sequence in words.  */
 
 const char *
-avr_out_minus (rtx *xop)
+avr_out_plus_ext (rtx_insn *insn, rtx *yop, 

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-12 Thread Tejas Belagod

On 7/12/24 11:46 AM, Richard Biener wrote:

On Fri, Jul 12, 2024 at 6:17 AM Tejas Belagod  wrote:


On 7/10/24 4:37 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 12:44 PM Richard Sandiford
 wrote:


Tejas Belagod  writes:

On 7/10/24 2:38 PM, Richard Biener wrote:

On Wed, Jul 10, 2024 at 10:49 AM Tejas Belagod  wrote:


On 7/9/24 4:22 PM, Richard Biener wrote:

On Tue, Jul 9, 2024 at 11:45 AM Tejas Belagod  wrote:


On 7/8/24 4:45 PM, Richard Biener wrote:

On Mon, Jul 8, 2024 at 11:27 AM Tejas Belagod  wrote:


Hi,

Sorry to have dropped the ball on
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html, but
here I've tried to pick it up again and write up a strawman proposal for
elevating __attribute__((vector_mask)) to the FE from GIMPLE.


Thanks,
Tejas.

Motivation
--

The idea of packed boolean vectors came about when we wanted to support
C/C++ operators on SVE ACLE types. The current vector boolean type that
ACLE specifies does not adequately disambiguate vector lane sizes which
they were derived off of. Consider this simple, albeit unrealistic, example:

bool foo (svint32_t a, svint32_t b)
{
  svbool_t p = a > b;

  // Here p[2] is not the same as a[2] > b[2].
  return p[2];
}

In the above example, because svbool_t has a fixed 1-lane-per-byte, p[i]
does not return the bool value corresponding to a[i] > b[i]. This
necessitates a 'typed' vector boolean value that unambiguously
represents results of operations
of the same type.

__attribute__((vector_mask))
-

Note: If interested in historical discussions refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625535.html

We define this new attribute which when applied to a base data vector
produces a new boolean vector type that represents a boolean type that
is produced as a result of operations on the corresponding base vector
type. The following is the syntax.

typedef int v8si __attribute__((vector_size (8 * sizeof (int)));
typedef v8si v8sib __attribute__((vector_mask));

Here the 'base' data vector type is v8si or a vector of 8 integers.

Rules

• The layout/size of the boolean vector type is implementation-defined
for its base data vector type.

• Two boolean vector types who's base data vector types have same number
of elements and lane-width have the same layout and size.

• Consequently, two boolean vectors who's base data vector types have
different number of elements or different lane-size have different layouts.

This aligns with gnu vector extensions that generate integer vectors as
a result of comparisons - "The result of the comparison is a vector of
the same width and number of elements as the comparison operands with a
signed integral element type." according to
 https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html.


Without having the time to re-review this all in detail I think the GNU
vector extension does not expose the result of the comparison as the
machine would produce it but instead a comparison "decays" to
a conditional:

typedef int v4si __attribute__((vector_size(16)));

v4si a;
v4si b;

void foo()
{
   auto r = a < b;
}

produces, with C23:

   vector(4) int r =  VEC_COND_EXPR < a < b , { -1, -1, -1, -1 } , { 0,
0, 0, 0 } > ;

In fact on x86_64 with AVX and AVX512 you have two different "machine
produced" mask types and the above could either produce a AVX mask with
32bit elements or a AVX512 mask with 1bit elements.

Not exposing "native" mask types requires the compiler optimizing subsequent
uses and makes generic vectors difficult to combine with for example AVX512
intrinsics (where masks are just 'int').  Across an ABI boundary it's also
even more difficult to optimize mask transitions.

But it at least allows portable code and it does not suffer from users trying to
expose machine representations of masks as input to generic vector code
with all the problems of constant folding not only requiring self-consistent
code within the compiler but compatibility with user produced constant masks.

That said, I somewhat question the need to expose the target mask layout
to users for GCCs generic vector extension.



Thanks for your feedback.

IIUC, I can imagine how having a GNU vector extension exposing the
target vector mask layout can pose a challenge - maybe making it a
generic GNU vector extension was too ambitious. I wonder if there's
value in pursuing these alternate paths?

1. Can implementing this extension in a 'generic' way i.e. possibly not
implement it with a target mask, but just a generic int vector, still
maintain the consistency of GNU predicate vectors within the compiler? I
know it may not seem very different from how boolean vectors are
currently implemented (as in your above example), but, having the
__attribute__((vector_mask)) as a 'property' of the object makes it
useful to optimize its uses to target predicates in subsequent stages of
the compiler.

2. Res

Re: [PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and sse targets

2024-07-12 Thread Victor Do Nascimento

On 7/12/24 03:23, Jiang, Haochen wrote:

-Original Message-
From: Hongtao Liu 
Sent: Thursday, July 11, 2024 9:45 AM
To: Victor Do Nascimento 
Cc: gcc-patches@gcc.gnu.org; richard.sandif...@arm.com;
richard.earns...@arm.com
Subject: Re: [PATCH 05/10] i386: Fix dot_prod backend patterns for mmx and
sse targets

On Wed, Jul 10, 2024 at 10:10 PM Victor Do Nascimento
 wrote:


Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.

The patch LGTM. BTW you can use existing  instead of
new  and  instead of 


gcc/ChangeLog:

 * config/i386/mmx.md (usdot_prodv8qi): Deleted.
 (usdot_prodv2siv8qi): New.


Hi Victor,

I suppose all the patterns are renamed not deleted and new right?
If that is the case, I suppose the log might be better and easier to understand
if changed to something like:

(old pattern): Renamed to ...
(new pattern): this.

Thx,
Haochen


You're right, it's a straight-forward renaming.  I will amend the 
changelogs as per your suggestion.


Thanks for the tip!,
Victor


 (sdot_prodv8qi): Deleted.
 (sdot_prodv2siv8qi): New.
 (udot_prodv8qi): Deleted.
 (udot_prodv2siv8qi): New.
 (usdot_prodv4hi): Deleted.
 (usdot_prodv2siv4hi): New.
 (udot_prodv4hi): Deleted.
 (udot_prodv2siv4hi): New.
 (sdot_prodv4hi): Deleted.
 (sdot_prodv2siv4hi): New.
 * config/i386/sse.md (fourwayacc): New.
 (twowayacc): New.
 (sdot_prod): Deleted.
 (sdot_prod): New.
 (sdot_prodv4si): Deleted.
 (sdot_prodv2div4si): New.
 (usdot_prod): Deleted.
 (usdot_prod): New.
 (sdot_prod): Deleted.
 (sdot_prod): New.
 (sdot_prodv64qi): Deleted.
 (sdot_prodv16siv64qi): New.
 (udot_prod): Deleted.
 (udot_prod): New.
 (udot_prodv64qi): Deleted.
 (udot_prodv16qiv64qi): New.
 (usdot_prod): Deleted.
 (usdot_prod): New.
 (udot_prod): Deleted.
 (udot_prod): New.
---
  gcc/config/i386/mmx.md | 30 +--
gcc/config/i386/sse.md | 47 +

-

  2 files changed, 43 insertions(+), 34 deletions(-)

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index
94d3a6e5692..d78739b033d 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -6344,7 +6344,7 @@ (define_expand "usadv8qi"
DONE;
  })

-(define_expand "usdot_prodv8qi"
+(define_expand "usdot_prodv2siv8qi"
[(match_operand:V2SI 0 "register_operand")
 (match_operand:V8QI 1 "register_operand")
 (match_operand:V8QI 2 "register_operand") @@ -6363,7 +6363,7 @@
(define_expand "usdot_prodv8qi"
rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
rtx op0 = gen_reg_rtx (V4SImode);

-  emit_insn (gen_usdot_prodv16qi (op0, op1, op2, op3));
+  emit_insn (gen_usdot_prodv4siv16qi (op0, op1, op2, op3));
emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0,

V4SImode));

   }
 else
@@ -6377,7 +6377,7 @@ (define_expand "usdot_prodv8qi"
emit_move_insn (op3, CONST0_RTX (V4SImode));
emit_insn (gen_zero_extendv8qiv8hi2 (op1, operands[1]));
emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
-  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));

/* vec_perm (op0, 2, 3, 0, 1);  */
emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78))); @@
-6388,7 +6388,7 @@ (define_expand "usdot_prodv8qi"
  DONE;
  })

-(define_expand "sdot_prodv8qi"
+(define_expand "sdot_prodv2siv8qi"
[(match_operand:V2SI 0 "register_operand")
 (match_operand:V8QI 1 "register_operand")
 (match_operand:V8QI 2 "register_operand") @@ -6406,7 +6406,7 @@
(define_expand "sdot_prodv8qi"
rtx op3 = lowpart_subreg (V4SImode, operands[3], V2SImode);
rtx op0 = gen_reg_rtx (V4SImode);

-  emit_insn (gen_sdot_prodv16qi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv16qi (op0, op1, op2, op3));
emit_move_insn (operands[0], lowpart_subreg (V2SImode, op0,

V4SImode));

  }
else
@@ -6420,7 +6420,7 @@ (define_expand "sdot_prodv8qi"
emit_move_insn (op3, CONST0_RTX (V4SImode));
emit_insn (gen_extendv8qiv8hi2 (op1, operands[1]));
emit_insn (gen_extendv8qiv8hi2 (op2, operands[2]));
-  emit_insn (gen_sdot_prodv8hi (op0, op1, op2, op3));
+  emit_insn (gen_sdot_prodv4siv8hi (op0, op1, op2, op3));

/* vec_perm (op0, 2, 3, 0, 1);  */
emit_insn (gen_sse2_pshufd (op0_1, op0, GEN_INT (78))); @@
-6432,7 +6432,7 @@ (define_expand "sdot_prodv8qi"

  })

-(define_expand "udot_prodv8qi"
+(define_expand "udot_prodv2siv8qi"
[(match_operand:V2SI 0 "register_operand")
 (match_operand:V8QI 1 "register_operand")
 (match_operand:V8QI 2 "register_operand

Re: [Patch, tree-optimization, predcom] Improve unroll factor for predictive commoning

2024-07-12 Thread Ajit Agarwal
Hello Richard:

On 11/07/24 2:21 pm, Richard Biener wrote:
> On Thu, Jul 11, 2024 at 10:30 AM Ajit Agarwal  wrote:
>>
>> Hello All:
>>
>> Unroll factor is determined with max distance across loop iterations.
>> The logic for determining the loop unroll factor is based on
>> data dependency across loop iterations.
>>
>> The max distance across loop iterations is the unrolling factor
>> that helps in predictive commoning.
> 
> The old comment in the code says
> 
>> -  /* The best unroll factor for this chain is equal to the number of
>> -temporary variables that we create for it.  */
> 
> why is that wrong and why is the max dependence distance more correct?
> 
> Do you have a testcase that shows how this makes a (positive) difference?
>

There is nothing wrong in the existing implementation of unroll
factor for predictive commoning.

But with max dependence distance we get performance improvement
with spec 2017 benchmarks (INT) of 0.01% (Geomean) with and without
changes. Improvement in benchmarks with max dependence distance
changes.

I have used the following flags:
-O2 -funroll-loops --param max-unroll-times=8 -fpredictive-commoning 
-fno-tree-pre

With above flags I ran with and without changes.

There is no degradation with spec 2017 (FP benchmarks).

Because in predictive commoning we reuse values computed in
earlier iterations of a loop in the later ones, max distance is the
better choice.

> Richard.
>

Thanks & Regards
Ajit
 
>> Bootstrapped and regtested on powerpc64-linux-gnu.
>>
>> Thanks & Regards
>> Ajit
>>
>> tree-optimization, predcom: Improve unroll factor for predictive commoning
>>
>> Unroll factor is determined with max distance across loop iterations.
>> The logic for determining the loop unroll factor is based on
>> data dependency across loop iterations.
>>
>> The max distance across loop iterations is the unrolling factor
>> that helps in predictive commoning.
>>
>> 2024-07-11  Ajit Kumar Agarwal  
>>
>> gcc/ChangeLog:
>>
>> * tree-predcom.cc: Change in determining unroll factor with
>> data dependence across loop iterations.
>> ---
>>  gcc/tree-predcom.cc | 51 ++---
>>  1 file changed, 39 insertions(+), 12 deletions(-)
>>
>> diff --git a/gcc/tree-predcom.cc b/gcc/tree-predcom.cc
>> index 9844fee1e97..029b02f5990 100644
>> --- a/gcc/tree-predcom.cc
>> +++ b/gcc/tree-predcom.cc
>> @@ -409,6 +409,7 @@ public:
>>/* Perform the predictive commoning optimization for chains, make this
>>   public for being called in callback execute_pred_commoning_cbck.  */
>>void execute_pred_commoning (bitmap tmp_vars);
>> +  unsigned determine_unroll_factor (const vec &chains);
>>
>>  private:
>>/* The pointer to the given loop.  */
>> @@ -2400,13 +2401,46 @@ pcom_worker::execute_pred_commoning_chain (chain_p 
>> chain,
>> copies as possible.  CHAINS is the list of chains that will be
>> optimized.  */
>>
>> -static unsigned
>> -determine_unroll_factor (const vec &chains)
>> +unsigned
>> +pcom_worker::determine_unroll_factor (const vec &chains)
>>  {
>>chain_p chain;
>> -  unsigned factor = 1, af, nfactor, i;
>> +  unsigned factor = 1, i;
>>unsigned max = param_max_unroll_times;
>> +  struct data_dependence_relation *ddr;
>> +  unsigned nfactor = 0;
>> +  int nzfactor = 0;
>> +
>> +  /* Best unroll factor is the maximum distance across loop
>> + iterations.  */
>> +  FOR_EACH_VEC_ELT (m_dependences, i, ddr)
>> +{
>> +  for (unsigned j = 0; j < DDR_NUM_DIST_VECTS (ddr); j++)
>> +   {
>> + lambda_vector vec = DDR_DIST_VECT (ddr, j);
>> + widest_int distance = vec[j];
>> + unsigned offset = distance.to_uhwi ();
>> + if (offset == 0)
>> +   continue;
>> +
>> + int dist = offset - nzfactor;
>> + if (dist  == 0)
>> +   continue;
>>
>> + if (nfactor == 0)
>> +   {
>> + nfactor = offset;
>> + nzfactor = offset;
>> +   }
>> + else if (dist <= nzfactor)
>> +   nfactor = offset;
>> +
>> + if (nfactor > 0 && nfactor <= max)
>> +   factor = nfactor;
>> +   }
>> +}
>> +
>> +  int max_use = 0;
>>FOR_EACH_VEC_ELT (chains, i, chain)
>>  {
>>if (chain->type == CT_INVARIANT)
>> @@ -2427,17 +2461,10 @@ determine_unroll_factor (const vec &chains)
>>   continue;
>> }
>>
>> -  /* The best unroll factor for this chain is equal to the number of
>> -temporary variables that we create for it.  */
>> -  af = chain->length;
>>if (chain->has_max_use_after)
>> -   af++;
>> -
>> -  nfactor = factor * af / gcd (factor, af);
>> -  if (nfactor <= max)
>> -   factor = nfactor;
>> +   max_use++;
>>  }
>> -
>> +  factor += max_use;
>>return factor;
>>  }
>>
>> --
>> 2.43.5
>>


Re: [PATCH] [libstdc++] [testsuite] avoid arbitrary errno codes

2024-07-12 Thread Jonathan Wakely
On Fri, 12 Jul 2024 at 10:27, Jonathan Wakely  wrote:
>
> On Fri, 12 Jul 2024 at 09:27, Alexandre Oliva  wrote:
> >
> > On Jul 11, 2024, Jonathan Wakely  wrote:
> >
> > > There's no requirement that system_error uses strerror, that's just an
> > > implementation detail.
> >
> > *nod*.  I meant it was more of a libc test in the case that relied on
> > strerror.  But I fully agree that testing the C++ standard API makes
> > perfect sense.  It's just that maybe we want workarounds for cases in
> > which we implement in terms of libc, but libc doesn't live up to the
> > standard.  Tolerating NULL returns from strerror would be an easy one;
> > do we want that?  Checking strerror acceptable ranges (that don't
> > trigger runtime errors) before calling it, and taking an alternate path
> > when needed, that would be harder to do, and IMHO of dubious value.
>
> Yes, handling a null result from strerror seems sensible if that's the 
> reality.

Does this fix it for your target?

--- a/libstdc++-v3/src/c++11/system_error.cc
+++ b/libstdc++-v3/src/c++11/system_error.cc
@@ -110,7 +110,11 @@ namespace
#else
  string strerror_string(int err)
  {
-return strerror(err); // XXX Not thread-safe.
+auto str = strerror(err); // XXX Not thread-safe.
+if (str) [[__likely__]]
+  return str;
+// strerror should not return NULL, but some implementations do.
+return "Unknown error";
  }
#endif



RE: [PATCH][ivopts]: perform affine fold on unsigned addressing modes known not to overflow. [PR114932]

2024-07-12 Thread Richard Biener
On Thu, 11 Jul 2024, Tamar Christina wrote:

>  -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, July 11, 2024 12:39 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; bin.ch...@linux.alibaba.com
> > Subject: RE: [PATCH][ivopts]: perform affine fold on unsigned addressing 
> > modes
> > known not to overflow. [PR114932]
> > 
> > On Wed, 10 Jul 2024, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Thursday, June 20, 2024 8:55 AM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd ;
> > bin.ch...@linux.alibaba.com
> > > > Subject: RE: [PATCH][ivopts]: perform affine fold on unsigned addressing
> > modes
> > > > known not to overflow. [PR114932]
> > > >
> > > > On Wed, 19 Jun 2024, Tamar Christina wrote:
> > > >
> > > > > > -Original Message-
> > > > > > From: Richard Biener 
> > > > > > Sent: Wednesday, June 19, 2024 1:14 PM
> > > > > > To: Tamar Christina 
> > > > > > Cc: gcc-patches@gcc.gnu.org; nd ;
> > > > bin.ch...@linux.alibaba.com
> > > > > > Subject: Re: [PATCH][ivopts]: perform affine fold on unsigned 
> > > > > > addressing
> > > > modes
> > > > > > known not to overflow. [PR114932]
> > > > > >
> > > > > > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > When the patch for PR114074 was applied we saw a good boost in
> > > > exchange2.
> > > > > > >
> > > > > > > This boost was partially caused by a simplification of the 
> > > > > > > addressing
> > modes.
> > > > > > > With the patch applied IV opts saw the following form for the base
> > > > addressing;
> > > > > > >
> > > > > > >   Base: (integer(kind=4) *) &block + ((sizetype) ((unsigned long) 
> > > > > > > l0_19(D) *
> > > > > > > 324) + 36)
> > > > > > >
> > > > > > > vs what we normally get:
> > > > > > >
> > > > > > >   Base: (integer(kind=4) *) &block + ((sizetype) 
> > > > > > > ((integer(kind=8)) l0_19(D)
> > > > > > > * 81) + 9) * 4
> > > > > > >
> > > > > > > This is because the patch promoted multiplies where one operand 
> > > > > > > is a
> > > > constant
> > > > > > > from a signed multiply to an unsigned one, to attempt to fold 
> > > > > > > away the
> > > > constant.
> > > > > > >
> > > > > > > This patch attempts the same but due to the various problems with 
> > > > > > > SCEV
> > and
> > > > > > > niters not being able to analyze the resulting forms (i.e. 
> > > > > > > PR114322) we
> > can't
> > > > > > > do it during SCEV or in the general form like in fold-const like
> > extract_muldiv
> > > > > > > attempts.
> > > > > > >
> > > > > > > Instead this applies the simplification during IVopts 
> > > > > > > initialization when we
> > > > > > > create the IV.  Essentially when we know the IV won't overflow 
> > > > > > > with
> > regards
> > > > to
> > > > > > > niters then we perform an affine fold which gets it to simplify 
> > > > > > > the internal
> > > > > > > computation, even if this is signed because we know that for 
> > > > > > > IVOPTs uses
> > the
> > > > > > > IV won't ever overflow.  This allows IV opts to see the 
> > > > > > > simplified form
> > > > > > > without influencing the rest of the compiler.
> > > > > > >
> > > > > > > as mentioned in PR114074 it would be good to fix the missed
> > optimization in
> > > > the
> > > > > > > other passes so we can perform this in general.
> > > > > > >
> > > > > > > The reason this has a big impact on fortran code is that fortran 
> > > > > > > doesn't
> > seem
> > > > to
> > > > > > > have unsigned integer types.  As such all it's addressing are 
> > > > > > > created with
> > > > > > > signed types and folding does not happen on them due to the 
> > > > > > > possible
> > > > overflow.
> > > > > > >
> > > > > > > concretely on AArch64 this changes the results from generation:
> > > > > > >
> > > > > > > mov x27, -108
> > > > > > > mov x24, -72
> > > > > > > mov x23, -36
> > > > > > > add x21, x1, x0, lsl 2
> > > > > > > add x19, x20, x22
> > > > > > > .L5:
> > > > > > > add x0, x22, x19
> > > > > > > add x19, x19, 324
> > > > > > > ldr d1, [x0, x27]
> > > > > > > add v1.2s, v1.2s, v15.2s
> > > > > > > str d1, [x20, 216]
> > > > > > > ldr d0, [x0, x24]
> > > > > > > add v0.2s, v0.2s, v15.2s
> > > > > > > str d0, [x20, 252]
> > > > > > > ldr d31, [x0, x23]
> > > > > > > add v31.2s, v31.2s, v15.2s
> > > > > > > str d31, [x20, 288]
> > > > > > > bl  digits_20_
> > > > > > > cmp x21, x19
> > > > > > > bne .L5
> > > > > > >
> > > > > > > into:
> > > > > > >
> > > > > > > .L5:
> > > > > > > ldr d1, [x19, -108]
> > > > > > > add v1.2s, v1.2s, v15.2s
> > > > > > > str d1, [x20, 216]
> > > > > > > ldr d0, [x19, -72]
> > > > > > 

Re: [PATCH] s390: Align *cjump_64 and *icjump_64

2024-07-12 Thread Jakub Jelinek
On Fri, Jul 12, 2024 at 07:09:37AM +0200, Stefan Schulze Frielinghaus wrote:
> On Thu, Jul 11, 2024 at 07:32:17PM +0200, Stefan Schulze Frielinghaus wrote:
> > On Thu, Jul 11, 2024 at 05:14:58PM +0200, Jakub Jelinek wrote:
> > > On Thu, Jul 11, 2024 at 05:09:41PM +0200, Stefan Schulze Frielinghaus 
> > > wrote:
> > > > I didn't have the schedule for 11.5 RC in mind which is tomorrow and the
> > > > release a week afterwards.  I hope this is still appropriate for 11.5?
> > > 
> > > From my side, if Andreas or somebody else approves it, it is tested on 11
> > > branch and committed by tomorrow, it can be added.
> > > But I'd like to know what patches I should wait for tomorrow and 
> > > approximate
> > > ETA (and ideally before end of working day in Europe).  Once rc1 is done, 
> > > only
> > > severe blockers will be possible.
> > 
> > The tester is running over night and will finish around 7 AM CEST.  I
> > will let you know once it has finished.  If anything goes wrong we can
> > skip this patch of course.
> 
> The tester was extremely slow this time and still didn't finish.  I
> don't wanna rush it risking to introduce late time problems for the
> 11.5 release.  Since I'm testing for three different architectures and
> the first one hasn't finished, let's drop this patch for 11.5.

We haven't started the RC yet, so if it finished testing by now, feel free
to commit within the next 15 minutes or so.

Jakub



Re: [PATCH] [libstdc++] [testsuite] avoid arbitrary errno codes

2024-07-12 Thread Jonathan Wakely
On Fri, 12 Jul 2024 at 09:27, Alexandre Oliva  wrote:
>
> On Jul 11, 2024, Jonathan Wakely  wrote:
>
> > There's no requirement that system_error uses strerror, that's just an
> > implementation detail.
>
> *nod*.  I meant it was more of a libc test in the case that relied on
> strerror.  But I fully agree that testing the C++ standard API makes
> perfect sense.  It's just that maybe we want workarounds for cases in
> which we implement in terms of libc, but libc doesn't live up to the
> standard.  Tolerating NULL returns from strerror would be an easy one;
> do we want that?  Checking strerror acceptable ranges (that don't
> trigger runtime errors) before calling it, and taking an alternate path
> when needed, that would be harder to do, and IMHO of dubious value.

Yes, handling a null result from strerror seems sensible if that's the reality.



Re: [PATCH v2] [testsuite] add linkonly to dg-additional-sources [PR115295]

2024-07-12 Thread Alexandre Oliva
On Jun 12, 2024, Alexandre Oliva  wrote:

> for  gcc/ChangeLog

>   PR d/115295
>   * doc/sourcebuild.texi (dg-additional-sources): Add linkonly.

> for  gcc/testsuite/ChangeLog

>   PR d/115295
>   * g++.dg/vect/pr95401.cc: Add linkonly to dg-additional-sources.
>   * g++.dg/vect/pr68762-1.cc: Likewise.
>   * g++.dg/vect/simd-clone-3.cc: Likewise.
>   * g++.dg/vect/simd-clone-5.cc: Likewise.
>   * gcc.dg/vect/vect-simd-clone-10.c: Likewise.  Drop dg-do run.
>   * gcc.dg/vect/vect-simd-clone-12.c: Likewise.  Likewise.
>   * lib/gcc-defs.exp (additional_sources_omit_on_compile): New.
>   (dg-additional-sources): Add to it on linkonly.
>   (dg-additional-files-options): Omit select sources on compile.

Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/654286.html

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] [libstdc++] [testsuite] require dfprt on some tests

2024-07-12 Thread Alexandre Oliva
On Jul 11, 2024, Jonathan Wakely  wrote:

>> * testsuite/decimal/compound-assignment-memfunc.cc: Likewise.

The following file was missing from the ChangeLog.  Added before pushing:

   * testsuite/decimal/compound-assignment.cc: Likewise.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH] [libstdc++] [testsuite] avoid arbitrary errno codes

2024-07-12 Thread Alexandre Oliva
On Jul 11, 2024, Jonathan Wakely  wrote:

> There's no requirement that system_error uses strerror, that's just an
> implementation detail.

*nod*.  I meant it was more of a libc test in the case that relied on
strerror.  But I fully agree that testing the C++ standard API makes
perfect sense.  It's just that maybe we want workarounds for cases in
which we implement in terms of libc, but libc doesn't live up to the
standard.  Tolerating NULL returns from strerror would be an easy one;
do we want that?  Checking strerror acceptable ranges (that don't
trigger runtime errors) before calling it, and taking an alternate path
when needed, that would be harder to do, and IMHO of dubious value.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


Re: [PATCH 2/4] vect: Fix inaccurate vector stmts number for slp reduction with lane-reducing

2024-07-12 Thread Richard Biener
On Fri, Jul 12, 2024 at 7:20 AM Feng Xue OS  wrote:
>
> Hi, Richard,
>
> Let me explain some idea that has to be chosen for lane-reducing. The key
> complication is that these ops are associated with two kinds of vec_nums,
> one is number of effective vector stmts, which is used by partial vectorzation
> function such as vect_get_loop_mask.  The other is number of total created
> vector stmts. Now we should make it aligned with normal op, in order to
> interoperate with normal op. Suppose expressions mixed with lane-reducing
> and normal as:
>
> temp = lane_reducing<16*char> + expr<4*int>;
> temp = cst<4*int> * lane_reducing<16*char>;
>
> If only generating effective vector stmt for lane_reducing, vector def/use
> between ops will never be matched, so extra pass-through copies are
> necessary. This is why we say "refit a lane-reducing to be a fake normal op".

And this only happens in vect_transform_reduction, right?

The other pre-existing issue is that for single_defuse_cycle optimization
SLP_TREE_NUMBER_OF_VEC_STMTS is also off (too large).  But here
the transform also goes through vect_transform_reduction.

> The requirement of two vec_stmts are independent of how we will implement
> SLP_TREE_NUMBER_OF_VEC_STMTS. Moreover, if we want to refactor vect code
> to unify ncopies/vec_num computation and completely remove
> SLP_TREE_NUMBER_OF_VEC_STMTS, this tends to be a a large task, and might
> be overkill for these lane-reducing patches. So I will keep it as before, and 
> do
> not touch it as what I have done in this patch.
>
> Since one SLP_TREE_NUMBER_OF_VEC_STMTS could not be used for two purposes.
> The your previous suggestion might not be work:
>
> > As said, I don't like this much.  vect_slp_analyze_node_operations_1 sets 
> > this
> > and I think the existing "exception"
> >
> >  /* Calculate the number of vector statements to be created for the
> > scalar stmts in this node.  For SLP reductions it is equal to the
> > number of vector statements in the children (which has already been
> > calculated by the recursive call).  Otherwise it is the number of
> > scalar elements in one scalar iteration (DR_GROUP_SIZE) multiplied by
> > VF divided by the number of elements in a vector.  */
> >  if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
> >  && !STMT_VINFO_DATA_REF (stmt_info)
> >  && REDUC_GROUP_FIRST_ELEMENT (stmt_info))
> >{
> >  for (unsigned i = 0; i < SLP_TREE_CHILDREN (node).length (); ++i)
> >if (SLP_TREE_DEF_TYPE (SLP_TREE_CHILDREN (node)[i]) ==
> >  vect_internal_def)
> >  {
> >SLP_TREE_NUMBER_OF_VEC_STMTS (node)
> >  = SLP_TREE_NUMBER_OF_VEC_STMTS (SLP_TREE_CHILDREN (node)[i]);
> >break;
> >  }
> >}
> >
> > could be changed (or amended if replacing doesn't work out) to
> >
> >   if (SLP_TREE_CODE (node) != VEC_PERM_EXPR
> >   && STMT_VINFO_REDUC_IDX (stmt_info)
> >   // do we have this always set?
> >   && STMT_VINFO_REDUC_VECTYPE_IN (stmt_info))
> >{
> >   do the same as in else {} but using VECTYPE_IN
> >}
> >
> > Or maybe scrap the special case and use STMT_VINFO_REDUC_VECTYPE_IN
> > when that's set instead of SLP_TREE_VECTYPE?  As said having wrong
> > SLP_TREE_NUMBER_OF_VEC_STMTS is going to backfire.
>
> Then the alternative is to limit special handling related to the vec_num only
> inside vect_transform_reduction. Is that ok? Or any other suggestion?

I think that's kind-of in line with the suggestion of a reduction
specific VF, so yes,
not using SLP_TREE_NUMBER_OF_VEC_STMTS in vect_transform_reduction
sounds fine to me and would be a step towards not having
SLP_TREE_NUMBER_OF_VEC_STMTS
where the function would be responsible for appropriate allocation as well.

Richard.

> Thanks,
> Feng
> 
> From: Richard Biener 
> Sent: Thursday, July 11, 2024 5:43 PM
> To: Feng Xue OS; Richard Sandiford
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 2/4] vect: Fix inaccurate vector stmts number for slp 
> reduction with lane-reducing
>
> On Thu, Jul 11, 2024 at 10:53 AM Feng Xue OS
>  wrote:
> >
> > Vector stmts number of an operation is calculated based on output vectype.
> > This is over-estimated for lane-reducing operation. Sometimes, to workaround
> > the issue, we have to rely on additional logic to deduce an exactly accurate
> > number by other means. Aiming at the inconvenience, in this patch, we would
> > "turn" lane-reducing operation into a normal one by inserting new trivial
> > statements like zero-valued PHIs and pass-through copies, which could be
> > optimized away by later passes. At the same time, a new field is added for
> > slp node to hold number of vector stmts that are really effective after
> > vectorization. For example:
>
> Adding Richard into the loop.
>
> I'm sorry, but this feels a bit backwards - in the end I was hoping that we
> can get rid of SLP_TREE_NUMBER_OF_VEC_STMTS completely.
> We do cur

Re: Re: [PATCH v2] RISC-V: Disable misaligned vector access in hook riscv_slow_unaligned_access[PR115862]

2024-07-12 Thread Li Xu
Committed, thanks Kito.



xu...@eswincomputing.com
 
From: Kito Cheng
Date: 2024-07-12 15:57
To: Li Xu
CC: gcc-patches; juzhe.zhong; Robin Dapp
Subject: Re: Re: [PATCH v2] RISC-V: Disable misaligned vector access in hook 
riscv_slow_unaligned_access[PR115862]
Oh, okay, my fault, I didn't read the bugzilla, so you can go ahead :P
 
 
On Fri, Jul 12, 2024 at 3:51 PM Li Xu  wrote:
>
> Sorry, I didn't understand.
>
> >>but...this seems to have discovered another bug in the current  trunk?
>
> Isn't PR115862 the same bug as this one?
>
> 
> xu...@eswincomputing.com
>
>
> From: Kito Cheng
> Date: 2024-07-12 14:33
> To: Li Xu
> CC: gcc-patches; juzhe.zhong; rdapp.gcc
> Subject: Re: [PATCH v2] RISC-V: Disable misaligned vector access in hook 
> riscv_slow_unaligned_access[PR115862]
> LGTM, but...this seems to have discovered another bug in the current
> trunk? could you take a look?
>
> Will trigger by -O2 -march=rv64gcv_zvl512b -mabi=lp64d or -O2
> -march=rv64gcv_zvl256b -mabi=lp64d
>
> during RTL pass: combine
> x.c: In function '__libc_mallinfo':
> x.c:47:1: internal compiler error: in smallest_mode_for_size, at
> stor-layout.cc:356
>   47 | }
>  | ^
> 0x2dc82e2 internal_error(char const*, ...)
>
> ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/diagnostic-global-context.cc:491
> 0xc8fdfe fancy_abort(char const*, int, char const*)
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/diagnostic.cc:1725
> 0x13bf6b7 smallest_mode_for_size(poly_int<2u, unsigned long>, mode_class)
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/stor-layout.cc:356
> 0x126efea smallest_int_mode_for_size(poly_int<2u, unsigned long>)
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/machmode.h:916
> 0x126efea get_best_extraction_insn
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs-query.cc:208
> 0x269ac14 make_extraction
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:7779
> 0x269beff make_compound_operation_int
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:8186
> 0x269cf3f make_compound_operation(rtx_def*, rtx_code)
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:8471
> 0x26a0c8e simplify_set
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:6975
> 0x26a0c8e combine_simplify_rtx
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:6374
> 0x26a2f2f subst
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:5630
> 0x26a6fc1 try_combine
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:3312
> 0x26ac211 combine_instructions
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:1264
> 0x26ac211 rest_of_handle_combine
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:15127
> 0x26ac211 execute
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:15171
> Please submit a full bug report, with preprocessed source (by using
> -freport-bug).
> Please include the complete backtrace with any bug report.
> See  for instructions.
>
> On Fri, Jul 12, 2024 at 8:48 AM Li Xu  wrote:
> >
> > From: xuli 
> >
> > The reason is that in the following code, icode = movmisalignv8si has
> > already been rejected by TARGET_VECTOR_MISALIGN_SUPPORTED, but it is
> > allowed by targetm.slow_unaligned_access,which is contradictory.
> >
> > (((icode = optab_handler (movmisalign_optab, mode))
> >!= CODE_FOR_nothing)
> >   || targetm.slow_unaligned_access (mode, align))
> >
> > misaligned vector access should be enabled by -mno-vector-strict-align 
> > option.
> >
> > PR Target/115862
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv.cc (riscv_slow_unaligned_access): Disable 
> > vector misalign.
> >
> > Signed-off-by: Li Xu 
> > ---
> >  gcc/config/riscv/riscv.cc |  5 +-
> >  .../gcc.target/riscv/rvv/base/pr115862.c  | 52 +++
> >  2 files changed, 55 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 61fa74e9322..16b210f323e 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -10269,9 +10269,10 @@ riscv_cannot_copy_insn_p (rtx_insn *insn)
> >  /* Implement TARGET_SLOW_UNALIGNED_ACCESS.  */
> >
> >  static bool
> > -riscv_slow_unaligned_access (machine_mode, unsigned int)
> > +riscv_slow_unaligned_access (machine_mode mode, unsigned int)
> >  {
> > -  return riscv_slow_unaligned_access_p;
> > +  return VECTOR_MODE_P (mode) ? TARGET_VECTOR_MISALIGN_SUPPORTED
> > + : riscv_slow_unaligned_access_p;
> >  }
> >
> >  static bool
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c
> > new file mode 100644
> > index 000..3cbc3c3a0ea
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/

Re: [PATCH] [analyzer] [testsuite] avoid unexpected null dereference warning

2024-07-12 Thread Alexandre Oliva
On Jul 11, 2024, David Malcolm  wrote:

> I intended that particular test as an integration test, to try to make
> sure that -fanalyzer is silent on the result of a minimal "flex"
> script.  So if we're actually getting warnings, that's undesirable, and
> I'm not sure the approach in your patch is the right one.

Understood, thanks, patch withdrawn.

> Given that the testcase includes various C headers, there's going to be
> some variability with different configurations.  Are you able to share
> the preprocessed sources with me?

I'm not sure, but I don't think that will be needed.  Just replacing the
includes with standard declarations and definitions used in the program
was enough to trigger the warning.

> Maybe we should eliminate those #includes from that test and convert
> it to something that's more config-independent?

Here's what I got.  I tried some __builtin_-prefixing, particularly on
exit, hoping that it would bring some noreturn into effect, but that
didn't get rid of the warning, so I left it at that.

diff --git a/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c 
b/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c
index c6ecb25d25d59..41ad3359370ee 100644
--- a/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c
+++ b/gcc/testsuite/c-c++-common/analyzer/flex-without-call-summaries.c
@@ -14,10 +14,31 @@
 /* First, we deal with  platform-specific or compiler-specific issues. */
 
 /* begin standard C headers. */
-#include 
-#include 
-#include 
-#include 
+/* #include  */
+#define EOF -1
+typedef __SIZE_TYPE__ size_t;
+typedef struct FILE FILE;
+extern FILE *stdin, *stdout, *stderr;
+int getc(FILE *);
+int ferror(FILE *);
+size_t fread (void *, size_t, size_t, FILE *);
+size_t fwrite (void *, size_t, size_t, FILE *);
+void clearerr(FILE *);
+int fileno(FILE *);
+int fprintf(FILE *, const char *, ...);
+/* #include  */
+void *memset(void *, int, size_t);
+size_t strlen(const char *);
+/* #include  */
+extern int *__get_errno (void);
+#define errno (*__get_errno ())
+#define EINTR 42
+/* #include  */
+#define NULL 0
+void exit(int);
+void *malloc(size_t);
+void *realloc(void *, size_t);
+void free(void *);
 
 /* end standard C headers. */
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for not behaving ""normal"" is *not* inclusive


[PATCH] testsuite: Avoid running incompatible Arm tests

2024-07-12 Thread Torbjörn SVENSSON
Ok for trunk and releases/gcc-14?

--

Overriding the -mpfu and -mfloat-abi might be incompatible with selected
multilib. As a result, verify that the current multilib is compatible
with the effective target without changing the -mfpu or -mfloat-abi
options.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_arm_hard_vfp_ok): Check -mfpu value.
(check_effective_target_arm_fp16_alternative_ok_nocache):
Reuse check_effective_target_arm_fp16_ok.
(check_effective_target_arm_fp16_none_ok_nocache): Likewise.
(check_effective_target_arm_v8_neon_ok_nocache): Align checks
with skeleton from check_effective_target_arm_fp16_ok_nocache.
(check_effective_target_arm_neonv2_ok_nocache): Likewise.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
 gcc/testsuite/lib/target-supports.exp | 119 ++
 1 file changed, 83 insertions(+), 36 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f001c28072f..f90b80bb495 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4829,6 +4829,7 @@ proc check_effective_target_arm_v8_vfp_ok {} {
 
 proc check_effective_target_arm_hard_vfp_ok { } {
 if { [check_effective_target_arm32]
+&& ! [check-flags [list "" { *-*-* } { "-mfpu=*" } { "-mfpu=vfp" }]]
 && ! [check-flags [list "" { *-*-* } { "-mfloat-abi=*" } { 
"-mfloat-abi=hard" }]] } {
return [check_no_compiler_messages arm_hard_vfp_ok executable {
int main() { return 0;}
@@ -5405,11 +5406,12 @@ proc 
check_effective_target_arm_fp16_alternative_ok_nocache { } {
# Not supported by the target system.
return 0
 }
+global et_arm_fp16_flags
 global et_arm_fp16_alternative_flags
 set et_arm_fp16_alternative_flags ""
-if { [check_effective_target_arm32] } {
-   foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-fp16"
-  "-mfpu=neon-fp16 -mfloat-abi=softfp"} {
+
+if { [check_effective_target_arm32] && 
[check_effective_target_arm_fp16_ok] } {
+   foreach flags [list "" $et_arm_fp16_flags] {
if { [check_no_compiler_messages_nocache \
  arm_fp16_alternative_ok object {
#if !defined (__ARM_FP16_FORMAT_ALTERNATIVE) || ! (__ARM_FP & 2)
@@ -5434,9 +5436,9 @@ proc check_effective_target_arm_fp16_alternative_ok { } {
 # format.  Some multilibs may be incompatible with the options needed.
 
 proc check_effective_target_arm_fp16_none_ok_nocache { } {
-if { [check_effective_target_arm32] } {
-   foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-fp16"
-  "-mfpu=neon-fp16 -mfloat-abi=softfp"} {
+global et_arm_fp16_flags
+if { [check_effective_target_arm32] && 
[check_effective_target_arm_fp16_ok] } {
+   foreach flags [list "" $et_arm_fp16_flags] {
if { [check_no_compiler_messages_nocache \
  arm_fp16_none_ok object {
#if defined (__ARM_FP16_FORMAT_ALTERNATIVE)
@@ -5467,23 +5469,46 @@ proc check_effective_target_arm_fp16_none_ok { } {
 proc check_effective_target_arm_v8_neon_ok_nocache { } {
 global et_arm_v8_neon_flags
 set et_arm_v8_neon_flags ""
-if { [check_effective_target_arm32] } {
-   foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-fp-armv8" 
"-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
-   if { [check_no_compiler_messages_nocache arm_v8_neon_ok object {
-   #if __ARM_ARCH < 8
-   #error not armv8 or later
-   #endif
-   #include "arm_neon.h"
-   void
-   foo ()
-   {
- __asm__ volatile ("vrintn.f32 q0, q0");
-   }
-   } "$flags -march=armv8-a"] } {
-   set et_arm_v8_neon_flags $flags
-   return 1
-   }
+if { ! [check_effective_target_arm32] } {
+   return 0;
+}
+if [check-flags \
+   [list "" { *-*-* } { "-mfpu=*" } \
+{ "-mfpu=*fp-armv8*" } ]] {
+   # Multilib flags would override -mfpu.
+   return 0
+}
+if [check-flags [list "" { *-*-* } { "-mfloat-abi=soft" } { "" } ]] {
+   # Must generate floating-point instructions.
+   return 0
+}
+if [check_effective_target_arm_hf_eabi] {
+   # Use existing float-abi and force an fpu which supports fp16
+   set et_arm_v8_neon_flags "-mfpu=neon-fp-armv8"
+   return 1;
+}
+if [check-flags [list "" { *-*-* } { "-mfpu=*" } { "" } ]] {
+   # The existing -mfpu value is OK; use it, but add softfp.
+   set et_arm_v8_neon_flags "-mfloat-abi=softfp"
+   return 1;
+}
+
+# Add -mfpu for a VFP fp16 variant since there is no preprocessor
+# macro to check for this support.
+set flags "-mfpu=neon-fp-armv8 -mfloat-abi=softfp"
+if { [check_no_compiler_messages_

[committed] RISC-V: Add SiFive extensions, xsfvcp and xsfcease

2024-07-12 Thread Kito Cheng
We have already upstreamed these extensions into binutils, and now we need GCC
to recognize these extensions and pass them to binutils as well. We also plan
to upstream intrinsics in the near future. :)

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_implied_info): Add xsfvcp.
(riscv_ext_version_table): Add xsfvcp, xsfcease.
(riscv_ext_flag_table): Ditto.
* config/riscv/riscv.opt (riscv_sifive_subext): New.
(XSFVCP): New.
(XSFCEASE): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/predef-sf-1.c: New.
* gcc.target/riscv/predef-sf-2.c: New.
---
 gcc/common/config/riscv/riscv-common.cc  |  8 
 gcc/config/riscv/riscv.opt   |  7 +++
 gcc/testsuite/gcc.target/riscv/predef-sf-1.c | 19 +++
 gcc/testsuite/gcc.target/riscv/predef-sf-2.c | 14 ++
 4 files changed, 48 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-sf-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/predef-sf-2.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 3c4178c19c9..d883efa7a3a 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -216,6 +216,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"ssstateen", "zicsr"},
   {"sstc", "zicsr"},
 
+  {"xsfvcp", "zve32x"},
+
   {NULL, NULL}
 };
 
@@ -415,6 +417,9 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 
   {"xventanacondops", ISA_SPEC_CLASS_NONE, 1, 0},
 
+  {"xsfvcp",   ISA_SPEC_CLASS_NONE, 1, 0},
+  {"xsfcease", ISA_SPEC_CLASS_NONE, 1, 0},
+
   /* Terminate the list.  */
   {NULL, ISA_SPEC_CLASS_NONE, 0, 0}
 };
@@ -1822,6 +1827,9 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"xventanacondops", &gcc_options::x_riscv_xventana_subext, 
MASK_XVENTANACONDOPS},
 
+  {"xsfvcp",   &gcc_options::x_riscv_sifive_subext, MASK_XSFVCP},
+  {"xsfcease", &gcc_options::x_riscv_sifive_subext, MASK_XSFCEASE},
+
   {NULL, NULL, 0}
 };
 
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 32a0dda5843..a1d70b63638 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -507,6 +507,13 @@ int riscv_xventana_subext
 
 Mask(XVENTANACONDOPS) Var(riscv_xventana_subext)
 
+TargetVariable
+int riscv_sifive_subext
+
+Mask(XSFVCP) Var(riscv_sifive_subext)
+
+Mask(XSFCEASE) Var(riscv_sifive_subext)
+
 Enum
 Name(isa_spec_class) Type(enum riscv_isa_spec_class)
 Supported ISA specs (for use with the -misa-spec= option):
diff --git a/gcc/testsuite/gcc.target/riscv/predef-sf-1.c 
b/gcc/testsuite/gcc.target/riscv/predef-sf-1.c
new file mode 100644
index 000..d6c07e7d920
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-sf-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64g_xsfvcp -mabi=lp64" } */
+
+int main () {
+#if !defined(__riscv)
+#error "__riscv"
+#endif
+
+#if !defined(__riscv_zve32x)
+#error "__riscv_zve32x"
+#endif
+
+
+#if !defined(__riscv_xsfvcp)
+#error "__riscv_xsfvcp"
+#endif
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/predef-sf-2.c 
b/gcc/testsuite/gcc.target/riscv/predef-sf-2.c
new file mode 100644
index 000..dcb746bcd26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/predef-sf-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64g_xsfcease -mabi=lp64" } */
+
+int main () {
+#if !defined(__riscv)
+#error "__riscv"
+#endif
+
+#if !defined(__riscv_xsfcease)
+#error "__riscv_xsfvcp"
+#endif
+
+  return 0;
+}
-- 
2.34.1



Re: [REPOST 0/3] Add support for -mcpu=power11

2024-07-12 Thread Kewen.Lin
Hi Mike,

on 2024/7/11 01:32, Michael Meissner wrote:
> Note, this is a repost of the 3 patches I posted on June 4th.  The first two
> patches are the same.  The third patch modifies the power11 tests to do a
> compile instead of assemble, and I removed the power11 specific target support
> that was posted as suggested by Kewen.Lin.
> 
> The following 3 patches add support for -mcpu=power11 to GCC 15.  Assuming
> these patches are approved and go into GCC 15, I will need to back port them 
> to
> GCC 14.
> 
> The first patch adds the basic support for -mcpu=power11, except for the
> scheduling infomration.
> 
> The second patch goes through power10.md and adds scheduling support for
> power11, treating -mtune=power11 to be the same as -mtune=power10 at the
> current time.
> 
> The third patch adds some new tests for -mcpu=power11 support.
> 
> In order to use -mcpu=power11, you will need to use a new enough binutils that
> supports the .machine power11 option.
> 
> I have bootstrapped the compiler on both little endian and big endian systems.
> There were no regressions in either case.  Can I check these patches into the
> GCC trunk?  After a waiting period assuming there are no issues, can I check
> these patches into the GCC 14 branch?
> 

Thanks for doing this!  This patch series looks good to me, as it's still on
Segher's review list, I'd leave this to him for final approval, thanks!

BR,
Kewen



Re: Re: [PATCH v2] RISC-V: Disable misaligned vector access in hook riscv_slow_unaligned_access[PR115862]

2024-07-12 Thread Kito Cheng
Oh, okay, my fault, I didn't read the bugzilla, so you can go ahead :P


On Fri, Jul 12, 2024 at 3:51 PM Li Xu  wrote:
>
> Sorry, I didn't understand.
>
> >>but...this seems to have discovered another bug in the current  trunk?
>
> Isn't PR115862 the same bug as this one?
>
> 
> xu...@eswincomputing.com
>
>
> From: Kito Cheng
> Date: 2024-07-12 14:33
> To: Li Xu
> CC: gcc-patches; juzhe.zhong; rdapp.gcc
> Subject: Re: [PATCH v2] RISC-V: Disable misaligned vector access in hook 
> riscv_slow_unaligned_access[PR115862]
> LGTM, but...this seems to have discovered another bug in the current
> trunk? could you take a look?
>
> Will trigger by -O2 -march=rv64gcv_zvl512b -mabi=lp64d or -O2
> -march=rv64gcv_zvl256b -mabi=lp64d
>
> during RTL pass: combine
> x.c: In function '__libc_mallinfo':
> x.c:47:1: internal compiler error: in smallest_mode_for_size, at
> stor-layout.cc:356
>   47 | }
>  | ^
> 0x2dc82e2 internal_error(char const*, ...)
>
> ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/diagnostic-global-context.cc:491
> 0xc8fdfe fancy_abort(char const*, int, char const*)
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/diagnostic.cc:1725
> 0x13bf6b7 smallest_mode_for_size(poly_int<2u, unsigned long>, mode_class)
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/stor-layout.cc:356
> 0x126efea smallest_int_mode_for_size(poly_int<2u, unsigned long>)
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/machmode.h:916
> 0x126efea get_best_extraction_insn
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs-query.cc:208
> 0x269ac14 make_extraction
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:7779
> 0x269beff make_compound_operation_int
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:8186
> 0x269cf3f make_compound_operation(rtx_def*, rtx_code)
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:8471
> 0x26a0c8e simplify_set
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:6975
> 0x26a0c8e combine_simplify_rtx
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:6374
> 0x26a2f2f subst
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:5630
> 0x26a6fc1 try_combine
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:3312
> 0x26ac211 combine_instructions
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:1264
> 0x26ac211 rest_of_handle_combine
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:15127
> 0x26ac211 execute
>../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:15171
> Please submit a full bug report, with preprocessed source (by using
> -freport-bug).
> Please include the complete backtrace with any bug report.
> See  for instructions.
>
> On Fri, Jul 12, 2024 at 8:48 AM Li Xu  wrote:
> >
> > From: xuli 
> >
> > The reason is that in the following code, icode = movmisalignv8si has
> > already been rejected by TARGET_VECTOR_MISALIGN_SUPPORTED, but it is
> > allowed by targetm.slow_unaligned_access,which is contradictory.
> >
> > (((icode = optab_handler (movmisalign_optab, mode))
> >!= CODE_FOR_nothing)
> >   || targetm.slow_unaligned_access (mode, align))
> >
> > misaligned vector access should be enabled by -mno-vector-strict-align 
> > option.
> >
> > PR Target/115862
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/riscv.cc (riscv_slow_unaligned_access): Disable 
> > vector misalign.
> >
> > Signed-off-by: Li Xu 
> > ---
> >  gcc/config/riscv/riscv.cc |  5 +-
> >  .../gcc.target/riscv/rvv/base/pr115862.c  | 52 +++
> >  2 files changed, 55 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 61fa74e9322..16b210f323e 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -10269,9 +10269,10 @@ riscv_cannot_copy_insn_p (rtx_insn *insn)
> >  /* Implement TARGET_SLOW_UNALIGNED_ACCESS.  */
> >
> >  static bool
> > -riscv_slow_unaligned_access (machine_mode, unsigned int)
> > +riscv_slow_unaligned_access (machine_mode mode, unsigned int)
> >  {
> > -  return riscv_slow_unaligned_access_p;
> > +  return VECTOR_MODE_P (mode) ? TARGET_VECTOR_MISALIGN_SUPPORTED
> > + : riscv_slow_unaligned_access_p;
> >  }
> >
> >  static bool
> > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c
> > new file mode 100644
> > index 000..3cbc3c3a0ea
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c
> > @@ -0,0 +1,52 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -march=rv64gcv_zvl512b -mabi=lp64d" } */
> > +
> > +struct mallinfo2
> > +{
> > +  int arena;
> > +  int ordblks;
> > +  int smblks;
> > +  int hblks;
> > +  int

Re: [PATCH] s390: Fully exploit vgm, vgbm, vrepi

2024-07-12 Thread Andreas Krebbel

On 7/2/24 15:48, Stefan Schulze Frielinghaus wrote:

Currently instructions vgm and vrepi are utilized only for constant
vectors where the element mode equals the element mode of the
corresponding instruction.  This patch lifts this restriction by making
use of those instructions for constant vectors even if element modes
do not coincide.  For example, the constant vector

   (v2di){0x7ffe7ffe, 0x7ffe7ffe}

can be loaded via vgmf %v0,1,30.  Similar, the constant vector

   (v4si){0x, 0x, 0x, 0x}

can be loaded via vrepiq %v0,-86.

Analog, if the element mode of a constant vector is smaller than the
element mode of a corresponding instruction, we still may make use of
those instructions.  For example, the constant vector

   (v4si){0x7fff, 0xfffe, 0x7fff, 0xfffe}

can be loaded via vgmg %v0,17,46.  Similar, the constant vector

   (v4si){-1, -16643, -1, -16643}

can be loaded via vrepig %v0,-16643.

Additionally this patch enables vgm, vgbm, vrepi for partial vectors,
i.e., vectors of size less than 16 bytes.  Basically this is done by
treating a vector as a full vector resulting in replicating constants
into the ignored bits whereas vgbm sets those to zero.

Furthermore, there is no restriction to integer vectors anymore, i.e.,
supporting scalars of mode up to and including TI and TF and also
floating-point vectors.

Here are some numbers how often instructions are emitted for SPEC 2017:

 w/o patch w/ patch
vgbm  140  365
vgm 1750824452
vrepi1360 2775

I expect most (maybe even all) to save us a load from the literal pool.

gcc/ChangeLog:

* config/s390/2964.md: Remove extended mnemonics for vgm.
* config/s390/3906.md: Remove extended mnemonics for vgm.
* config/s390/3931.md: Remove extended mnemonics for vgm.
* config/s390/8561.md: Remove extended mnemonics for vgm.
* config/s390/constraints.md (jKK): Remove constraint.
(jzz): Add constraint.
* config/s390/s390-protos.h (s390_contiguous_bitmask_vector_p):
Add prototype.
(s390_constant_via_vgm_p): Add prototype.
(s390_constant_via_vrepi_p): Add prototype.
* config/s390/s390.cc (s390_contiguous_bitmask_vector_p): New
function.
(s390_constant_via_vgm_vrepi_helper): New function.
(s390_constant_via_vgm_p): New function.
(s390_constant_via_vgbm_p): For the sake of symmetry rename
s390_bytemask_vector_p into s390_constant_via_vgbm_p.
(s390_bytemask_vector_p): Deal with non-integer and partial
vectors.
(s390_constant_via_vrepi_p): New function.
(s390_legitimate_constant_p): Allow partial vectors.
(legitimate_reload_constant_p): Fix indentation.
(legitimate_reload_vector_constant_p): Restrict to constraints
j00, jm1, jxx, jyy, jzz only, i.e., allow partial vectors.
(s390_expand_vec_init): Also make use of vrepi if possible.
(print_operand): Add q,p,r for vgm,vrepi,vgbm, respectively.
Remove e,s,t for constant vectors.
* config/s390/s390.md (movti): Add variants utilizing
vgbm,vgm,vrepi.
* config/s390/vector.md (mov): Adapt variants
for vgbm,vgm,vrepi for the new scheme.
(mov): Adapt variants for vgbm,vgm for the new
scheme and add vrepi variant for modes V_8,V_16,V_32,V_64.

gcc/testsuite/ChangeLog:

* gcc.target/s390/vector/vec-copysign.c: Change to non-extended
mnemonic.
* gcc.target/s390/vector/vec-genmask-1.c: Change to non-extended
mnemonic.
* gcc.target/s390/vector/vec-init-1.c: Change to non-extended
mnemonic.
* gcc.target/s390/vector/vec-vrepi-1.c: Change to non-extended
mnemonic.
* gcc.target/s390/zvector/autovec-double-quiet-uneq.c: Change to
non-extended mnemonic.
* gcc.target/s390/zvector/autovec-float-quiet-uneq.c: Change to
non-extended mnemonic.
* gcc.target/s390/zvector/vec-genmask-1.c: Change to
non-extended mnemonic.
* gcc.target/s390/zvector/vec-splat-1.c: Change to non-extended
mnemonic.
* gcc.target/s390/zvector/vec-splat-2.c: Change to non-extended
mnemonic.
* gcc.target/s390/vector/vgbm-double-1.c: New test.
* gcc.target/s390/vector/vgbm-float-1.c: New test.
* gcc.target/s390/vector/vgbm-int128-1.c: New test.
* gcc.target/s390/vector/vgbm-integer-1.c: New test.
* gcc.target/s390/vector/vgbm-longdouble-1.c: New test.
* gcc.target/s390/vector/vgm-df-1.c: New test.
* gcc.target/s390/vector/vgm-di-1.c: New test.
* gcc.target/s390/vector/vgm-hi-1.c: New test.
* gcc.target/s390/vector/vgm-int128-1.c: New test.
* gcc.target/s390/vector/vgm-longdouble-1.c: New test.
* gcc.target/s390/vector/vgm-qi-1.c: New test.
* gcc.target/s390/vector/vgm-sf-1.

Re: [PATCH] Fix SSA_NAME leak due to def_stmt is removed before use_stmt.

2024-07-12 Thread Richard Biener
On Fri, Jul 12, 2024 at 7:24 AM liuhongt  wrote:
>
> >-  _5 = __atomic_fetch_or_8 (&set_work_pending_p, 1, 0);
> >-  # DEBUG old => (long int) _5
> >+  _6 = .ATOMIC_BIT_TEST_AND_SET (&set_work_pending_p, 0, 1, 0, 
> >__atomic_fetch_or_8);
> >+  # DEBUG old => NULL
> >   # DEBUG BEGIN_STMT
> >-  # DEBUG D#2 => _5 & 1
> >+  # DEBUG D#2 => NULL
> >...
> >-  _10 = ~_5;
> >-  _8 = (_Bool) _10;
> >-  # DEBUG ret => _8
> >+  _8 = _6 == 0;
> >+  # DEBUG ret => (_Bool) _10
> >
> >confirmed.  convert_atomic_bit_not does this, it checks for single_use
> >and removes the def, failing to release the name (which would fix this up
> >IIRC).
> >
> >Note the function removes stmts in "wrong" order (before uses of LHS
> >are removed), so it requires larger surgery.  And it leaks SSA names.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?

OK.

Richard.

> gcc/ChangeLog:
>
> PR target/115872
> * tree-ssa-ccp.cc (convert_atomic_bit_not): Remove use_stmt after 
> use_nop_stmt is removed.
> (optimize_atomic_bit_test_and): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr115872.c: New test.
> ---
>  gcc/testsuite/gcc.target/i386/pr115872.c | 16 
>  gcc/tree-ssa-ccp.cc  | 12 
>  2 files changed, 24 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr115872.c
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr115872.c 
> b/gcc/testsuite/gcc.target/i386/pr115872.c
> new file mode 100644
> index 000..937004456d3
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr115872.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -g" } */
> +
> +long set_work_pending_p;
> +_Bool set_work_pending() {
> +  _Bool __trans_tmp_1;
> +  long mask = 1, old = __atomic_fetch_or(&set_work_pending_p, mask, 0);
> +  __trans_tmp_1 = old & mask;
> +  return !__trans_tmp_1;
> +}
> +void __queue_work() {
> +  _Bool ret = set_work_pending();
> +  if (ret)
> +__queue_work();
> +}
> +
> diff --git a/gcc/tree-ssa-ccp.cc b/gcc/tree-ssa-ccp.cc
> index 3749126b5f7..de83d26d311 100644
> --- a/gcc/tree-ssa-ccp.cc
> +++ b/gcc/tree-ssa-ccp.cc
> @@ -3332,9 +3332,10 @@ convert_atomic_bit_not (enum internal_fn fn, gimple 
> *use_stmt,
>  return nullptr;
>
>gimple_stmt_iterator gsi;
> -  gsi = gsi_for_stmt (use_stmt);
> -  gsi_remove (&gsi, true);
>tree var = make_ssa_name (TREE_TYPE (lhs));
> +  /* use_stmt need to be removed after use_nop_stmt,
> + so use_lhs can be released.  */
> +  gimple *use_stmt_removal = use_stmt;
>use_stmt = gimple_build_assign (var, BIT_AND_EXPR, lhs, and_mask);
>gsi = gsi_for_stmt (use_not_stmt);
>gsi_insert_before (&gsi, use_stmt, GSI_NEW_STMT);
> @@ -3344,6 +3345,8 @@ convert_atomic_bit_not (enum internal_fn fn, gimple 
> *use_stmt,
>gsi_insert_after (&gsi, g, GSI_NEW_STMT);
>gsi = gsi_for_stmt (use_not_stmt);
>gsi_remove (&gsi, true);
> +  gsi = gsi_for_stmt (use_stmt_removal);
> +  gsi_remove (&gsi, true);
>return use_stmt;
>  }
>
> @@ -3646,8 +3649,7 @@ optimize_atomic_bit_test_and (gimple_stmt_iterator 
> *gsip,
>*/
> }
>   var = make_ssa_name (TREE_TYPE (use_rhs));
> - gsi = gsi_for_stmt (use_stmt);
> - gsi_remove (&gsi, true);
> + gimple* use_stmt_removal = use_stmt;
>   g = gimple_build_assign (var, BIT_AND_EXPR, use_rhs,
>and_mask);
>   gsi = gsi_for_stmt (use_nop_stmt);
> @@ -3664,6 +3666,8 @@ optimize_atomic_bit_test_and (gimple_stmt_iterator 
> *gsip,
>   gsi_insert_after (&gsi, g, GSI_NEW_STMT);
>   gsi = gsi_for_stmt (use_nop_stmt);
>   gsi_remove (&gsi, true);
> + gsi = gsi_for_stmt (use_stmt_removal);
> + gsi_remove (&gsi, true);
> }
> }
>   else
> --
> 2.31.1
>


Re: Re: [PATCH v2] RISC-V: Disable misaligned vector access in hook riscv_slow_unaligned_access[PR115862]

2024-07-12 Thread Li Xu
Sorry, I didn't understand.

>>but...this seems to have discovered another bug in the current  trunk? 

Isn't PR115862 the same bug as this one?



xu...@eswincomputing.com
 
From: Kito Cheng
Date: 2024-07-12 14:33
To: Li Xu
CC: gcc-patches; juzhe.zhong; rdapp.gcc
Subject: Re: [PATCH v2] RISC-V: Disable misaligned vector access in hook 
riscv_slow_unaligned_access[PR115862]
LGTM, but...this seems to have discovered another bug in the current
trunk? could you take a look?
 
Will trigger by -O2 -march=rv64gcv_zvl512b -mabi=lp64d or -O2
-march=rv64gcv_zvl256b -mabi=lp64d
 
during RTL pass: combine
x.c: In function '__libc_mallinfo':
x.c:47:1: internal compiler error: in smallest_mode_for_size, at
stor-layout.cc:356
  47 | }
 | ^
0x2dc82e2 internal_error(char const*, ...)
   
../../../../riscv-gnu-toolchain-trunk/gcc/gcc/diagnostic-global-context.cc:491
0xc8fdfe fancy_abort(char const*, int, char const*)
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/diagnostic.cc:1725
0x13bf6b7 smallest_mode_for_size(poly_int<2u, unsigned long>, mode_class)
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/stor-layout.cc:356
0x126efea smallest_int_mode_for_size(poly_int<2u, unsigned long>)
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/machmode.h:916
0x126efea get_best_extraction_insn
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/optabs-query.cc:208
0x269ac14 make_extraction
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:7779
0x269beff make_compound_operation_int
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:8186
0x269cf3f make_compound_operation(rtx_def*, rtx_code)
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:8471
0x26a0c8e simplify_set
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:6975
0x26a0c8e combine_simplify_rtx
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:6374
0x26a2f2f subst
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:5630
0x26a6fc1 try_combine
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:3312
0x26ac211 combine_instructions
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:1264
0x26ac211 rest_of_handle_combine
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:15127
0x26ac211 execute
   ../../../../riscv-gnu-toolchain-trunk/gcc/gcc/combine.cc:15171
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.
 
On Fri, Jul 12, 2024 at 8:48 AM Li Xu  wrote:
>
> From: xuli 
>
> The reason is that in the following code, icode = movmisalignv8si has
> already been rejected by TARGET_VECTOR_MISALIGN_SUPPORTED, but it is
> allowed by targetm.slow_unaligned_access,which is contradictory.
>
> (((icode = optab_handler (movmisalign_optab, mode))
>!= CODE_FOR_nothing)
>   || targetm.slow_unaligned_access (mode, align))
>
> misaligned vector access should be enabled by -mno-vector-strict-align option.
>
> PR Target/115862
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_slow_unaligned_access): Disable vector 
> misalign.
>
> Signed-off-by: Li Xu 
> ---
>  gcc/config/riscv/riscv.cc |  5 +-
>  .../gcc.target/riscv/rvv/base/pr115862.c  | 52 +++
>  2 files changed, 55 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 61fa74e9322..16b210f323e 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -10269,9 +10269,10 @@ riscv_cannot_copy_insn_p (rtx_insn *insn)
>  /* Implement TARGET_SLOW_UNALIGNED_ACCESS.  */
>
>  static bool
> -riscv_slow_unaligned_access (machine_mode, unsigned int)
> +riscv_slow_unaligned_access (machine_mode mode, unsigned int)
>  {
> -  return riscv_slow_unaligned_access_p;
> +  return VECTOR_MODE_P (mode) ? TARGET_VECTOR_MISALIGN_SUPPORTED
> + : riscv_slow_unaligned_access_p;
>  }
>
>  static bool
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c
> new file mode 100644
> index 000..3cbc3c3a0ea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/pr115862.c
> @@ -0,0 +1,52 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -march=rv64gcv_zvl512b -mabi=lp64d" } */
> +
> +struct mallinfo2
> +{
> +  int arena;
> +  int ordblks;
> +  int smblks;
> +  int hblks;
> +  int hblkhd;
> +  int usmblks;
> +  int fsmblks;
> +  int uordblks;
> +  int fordblks;
> +  int keepcost;
> +};
> +
> +struct mallinfo
> +{
> +  int arena;
> +  int ordblks;
> +  int smblks;
> +  int hblks;
> +  int hblkhd;
> +  int usmblks;
> +  int fsmblks;
> +  int uordblks;
> +  int fordblks;
> +  int keepcost;
> +};
> +
> +struct mallinfo
> +__libc_mallinfo (void)
> +{
> +  struct mallinfo m;

Re: [PATCH] fortran: Factor the evaluation of MINLOCK/MAXLOC's BACK argument

2024-07-12 Thread Mikael Morin

Le 11/07/2024 à 22:49, Harald Anlauf a écrit :

Hi Mikael,

Am 11.07.24 um 21:55 schrieb Mikael Morin:

From: Mikael Morin 

Hello,

I discovered this while testing the inline MINLOC/MAXLOC (aka PR90608) 
patches.

Regression tested on x86_64-linux.
OK for master?


this is a nice finding!  (NAG seems to fail on the cases with
array size 0, while Intel gets it right.)

The commit message promises to cover all variations ("with/out NANs"?)
but I fail to see these.  Were these removed in the submission?

No, it's actually type with NAN vs type without NAN; the code generated 
is different between integral types (which don't have NANs) and floating 
point types (which may have NANs).


I'll rephrase to integral vs floating-point types.


Otherwise the patch looks pretty simple and is OK for mainline.
But do not forget to s/MINLOCK/MINLOC/ in the summary.


Good catch, thanks.


Thanks for the patch!


Thanks for the review.


Harald


-- 8< --

Move the evaluation of the BACK argument out of the loop in the inline 
code

generated for MINLOC or MAXLOC.  For that, add a new (scalar) element
associated with BACK to the scalarization loop chain, evaluate the 
argument

with the context of that element, and let the scalarizer do its job.

The problem was not only a missed optimisation, but also a wrong code
one in the cases where the expression associated with BACK is not free of
side-effects, making multiple evaluations observable.

The new tests check the evaluation count of the BACK argument, and try to
cover all the variations (with/out NANs, constant or unknown shape, 
absent

or scalar or array MASK) supported by the inline implementation of the
functions.  Care has been taken to not check the case of a constant 
.FALSE.

MASK, for which the evaluation of BACK can be elided.

gcc/fortran/ChangeLog:

* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxloc): Create a new
scalar scalarization chain element if BACK is present.  Add it to
the loop.  Set the scalarization chain before evaluating the
argument.

gcc/testsuite/ChangeLog:

* gfortran.dg/maxloc_5.f90: New test.
* gfortran.dg/minloc_5.f90: New test.
---
  gcc/fortran/trans-intrinsic.cc |  10 +
  gcc/testsuite/gfortran.dg/maxloc_5.f90 | 257 +
  gcc/testsuite/gfortran.dg/minloc_5.f90 | 257 +
  3 files changed, 524 insertions(+)
  create mode 100644 gcc/testsuite/gfortran.dg/maxloc_5.f90
  create mode 100644 gcc/testsuite/gfortran.dg/minloc_5.f90

diff --git a/gcc/fortran/trans-intrinsic.cc 
b/gcc/fortran/trans-intrinsic.cc

index 5ea10e84060..cadbd177452 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -5325,6 +5325,7 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, 
gfc_expr * expr, enum tree_code op)

    gfc_actual_arglist *actual;
    gfc_ss *arrayss;
    gfc_ss *maskss;
+  gfc_ss *backss;
    gfc_se arrayse;
    gfc_se maskse;
    gfc_expr *arrayexpr;
@@ -5390,6 +5391,11 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, 
gfc_expr * expr, enum tree_code op)

  && maskexpr->symtree->n.sym->attr.dummy
  && maskexpr->symtree->n.sym->attr.optional;
    backexpr = actual->next->next->expr;
+  if (backexpr)
+    backss = gfc_get_scalar_ss (gfc_ss_terminator, backexpr);
+  else
+    backss = nullptr;
+
    nonempty = NULL;
    if (maskexpr && maskexpr->rank != 0)
  {
@@ -5449,6 +5455,9 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, 
gfc_expr * expr, enum tree_code op)

    if (maskss)
  gfc_add_ss_to_loop (&loop, maskss);

+  if (backss)
+    gfc_add_ss_to_loop (&loop, backss);
+
    gfc_add_ss_to_loop (&loop, arrayss);

    /* Initialize the loop.  */
@@ -5535,6 +5544,7 @@ gfc_conv_intrinsic_minmaxloc (gfc_se * se, 
gfc_expr * expr, enum tree_code op)

    gfc_add_block_to_block (&block, &arrayse.pre);

    gfc_init_se (&backse, NULL);
+  backse.ss = backss;
    gfc_conv_expr_val (&backse, backexpr);
    gfc_add_block_to_block (&block, &backse.pre);

diff --git a/gcc/testsuite/gfortran.dg/maxloc_5.f90 
b/gcc/testsuite/gfortran.dg/maxloc_5.f90

new file mode 100644
index 000..5d722450c8f
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/maxloc_5.f90
@@ -0,0 +1,257 @@
+! { dg-do run }
+!
+! Check that the evaluation of MAXLOC's BACK argument is made only once
+! before the scalarisation loops.
+
+program p
+  implicit none
+  integer, parameter :: data10(*) = (/ 7, 4, 7, 6, 6, 4, 6, 3, 9, 8 /)
+  logical, parameter :: mask10(*) = (/ .false., .true., .false., &
+   .false., .true., .true.,  &
+   .true. , .true., .false., &
+   .false. /)
+  integer :: calls_count = 0
+  call check_int_const_shape
+  call check_int_const_shape_scalar_mask
+  call check_int_const_shape_array_mask
+  call check_int_const_shape_optional_mask_present
+  call check_int_const_shape_optional_mask_absent
+  call check_int_const_shape_empty
+  call check_int

Re: [i386] adjust flag_omit_frame_pointer in a single function [PR113719]

2024-07-12 Thread Rainer Orth
Hi Alexandre,

> Regstrapped on x86_64-linux-gnu.  Also verified that the regression is
> cured with a i686-solaris cross compiler.  Ok to install?

FWIW, I've also included the patch in last night's i386-pc-solaris2.11
bootstrap: the failures are gone, no regressions.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 0/3] Prepare and drop vcond expanders

2024-07-12 Thread Andreas Krebbel

On 7/1/24 10:32, Stefan Schulze Frielinghaus wrote:


This drops vcond expanders.  The first patch
"s390: Emulate vec_cmp{eq,gt,gtu} for 128-bit integers" is somewhat
independent of the other two, since we run already in ICEs.  However,
since after removing vcond expanders testsuite shows one additional
fallout without this patch, which is why I would like to make sure that
this patch lands first and included it in this series.

Stefan Schulze Frielinghaus (3):
   s390: Emulate vec_cmp{eq,gt,gtu} for 128-bit integers
   s390: Enable vcond_mask for 128-bit ops
   s390: Drop vcond{,u} expanders



Ok. Thanks!


Andreas




  gcc/config/s390/vector.md | 156 --
  .../gcc.target/s390/vector/vec-cmp-emu-1.c|  35 
  .../gcc.target/s390/vector/vec-cmp-emu-2.c|  18 ++
  .../gcc.target/s390/vector/vec-cmp-emu-3.c|  17 ++
  4 files changed, 175 insertions(+), 51 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-cmp-emu-1.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-cmp-emu-2.c
  create mode 100644 gcc/testsuite/gcc.target/s390/vector/vec-cmp-emu-3.c



Re: [PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-12 Thread Richard Biener
On Thu, 11 Jul 2024, Filip Kastl wrote:

> > > > > +/* Check that the "exponential index transform" can be applied to 
> > > > > this switch.
> > > > > +
> > > > > +   See comment of the exp_index_transform function for details about 
> > > > > this
> > > > > +   transformation.
> > > > > +
> > > > > +   We want:
> > > > > +   - This form of the switch is more efficient
> > > > > +   - Cases are powers of 2
> > > > > +
> > > > > +   Expects that SWTCH has at least one case.  */
> > > > > +
> > > > > +bool
> > > > > +switch_conversion::is_exp_index_transform_viable (gswitch *swtch)
> > > > > +{
> > > > > +  tree index = gimple_switch_index (swtch);
> > > > > +  tree index_type = TREE_TYPE (index);
> > > > > +  basic_block swtch_bb = gimple_bb (swtch);
> > > > > +  unsigned num_labels = gimple_switch_num_labels (swtch);
> > > > > +
> > > > > +  /* Check that we can efficiently compute logarithm of 2^k (using 
> > > > > FFS) and
> > > > > + test that a given number is 2^k for some k (using POPCOUNT).  */
> > > > > +  optimization_type opt_type = bb_optimization_type (swtch_bb);
> > > > > +  if (!direct_internal_fn_supported_p (IFN_FFS, index_type, opt_type)
> > > > > +|| !direct_internal_fn_supported_p (IFN_POPCOUNT, index_type, 
> > > > > opt_type))
> > > > > +return false;
> > > > > +
> > > > 
> > > > See above, I think this can be improved.  Would be nice to split out
> > > > a can_pow2p (type) and can_log2 (type) and a corresponding
> > > > gen_pow2p (op) and gen_log2 (op) function so this could be re-used
> > > > and alternate variants added when required.
> > > > 
> > > 
> > > Just to check that I understand this correctly:  You'd like me to create
> > > functions can_pow2p, can_log2.  Those functions will check that there are
> > > optabs for the target machine which allow us to efficiently test
> > > power-of-2-ness of a number and which allow us to efficiently compute the
> > > base-2 log of a power-of-2 number.  You'd also like me to create functions
> > > gen_pow2p and gen_log2 which generate this code.  For now these functions 
> > > will
> > > just use POPCOUNT and FFS but they can be later extended to also consider
> > > different instructions.  Is that right?
> > 
> > Right.
> > 
> > > Into which file should I put these functions?
> > 
> > Just in this file for now.
> >  
> > > Is can_pow2p and gen_pow2p necessary?  As you noted one can always use
> > > (x & -x) == x so testing pow2p can always be done efficiently.
> > 
> > If you add this fallback then can_pow2p / gen_pow2p wouldn't be
> > necessary indeed.
> 
> Hi Richard,
> 
> I put some work into splitting out the can_ and gen_ functions as you
> suggested.  I'm still a bit unsure what your vision of these is so before I
> submit all the changes I made to the patch as version 2 I would like to share
> how I implemented the functions (see bellow).  Is this how you imagined the
> functions?  Would you change something or do the they look ok?
> 
> I wasn't sure how generic to make the functions.  The more generic the more
> convoluted the implementation becomes.  For example: I could make them more
> generic by also including a gsi_iterator_update parameter or I could make them
> less generic but more straightforward by removing the BEFORE parameter.

I think they are fine as-is.  Other APIs like this chose to emit stmts
to a gimple_seq which the caller can then insert in the way of his
choosing.

Ideally we'd abstract this better in the infrastructure, like having
a insert-iterator that embeds both 'before' and update with the
insertion point.  I guess the standard library would have a
insert_before_forward_iterator and insert_after_forward_iterator
and insert_before_backward_iterator, etc. ...  Note that as you
emit a sequence of stmts their order is of course fixed, it's only
the sequence insertion that the caller should influence.

I think that GSI_CONTINUE_LINKING is what works for both before
and after insertion, I'm not sure your GSI_SAME_STMT use for before
is correct?

Richard.

> Cheers,
> Filip Kastl
> 
> 
> /* Does the target have optabs needed to efficiently compute exact base two
>logarithm of a value with type TYPE?
> 
>See gen_log2.  */
> 
> static bool
> can_log2 (tree type, optimization_type opt_type)
> {
>   /* Check if target supports FFS.  */
>   return direct_internal_fn_supported_p (IFN_FFS, type, opt_type);
> }
> 
> /* Assume that OP is a power of two.  Build a sequence of gimple statements
>efficiently computing the base two logarithm of OP using special optabs.
>Insert statements before GSI if BEFORE is true or after GSI otherwise.
>Return the result value as an ssa name tree.
> 
>Leave GSI at the same statement (GSI_SAME_STMT behavior).
> 
>Should only be used if target supports the needed optabs.  See can_log2.  
> */
> 
> static tree
> gen_log2 (gimple_stmt_iterator *gsi, bool before, location_t loc, tree op)
> {
>   /* Use .FFS (op) - 1.  */
>   gimple *s = gsi->ptr;
>   tre

[PATCH] rs6000: Change optab for ibm128 and ieee128 conversion

2024-07-12 Thread Kewen.Lin
Hi,

Currently for 128 bit floating-point ibm128 and ieee128
formats conversion, the corresponding libcalls are:
  ibm128 -> ieee128 "__trunctfkf2"
  ieee128 -> ibm128 "__extendkftf2"
, and generic code handling (like convert_mode_scalar) also
adopts sext_optab for ieee128 -> ibm128 while trunc_optab
for ibm128 -> ieee128.  But in rs6000 port as function
rs6000_expand_float128_convert and init_float128_ieee show,
we adopt sext_optab for ibm128 -> ieee128 with "__trunctfkf2"
while trunc_optab for ieee128 -> ibm128 with "__extendkftf2".

To make them consistent and avoid some surprises, this patch
is to adjust rs6000 internal handlings by adopting trunc_optab
for ibm128 -> ieee128 with "__trunctfkf2" while sext_optab for
ieee128 -> ibm128 with "__extendkftf2".

Bootstrapped and regtested on powerpc64{,le}-linux-gnu
(ibm128 long double default) and powerpc64le-linux-gnu
(ieee128 long double default).

I'm going to install this next week if no objections.

BR,
Kewen
-

gcc/ChangeLog:

* config/rs6000/rs6000.cc (init_float128_ieee): Use trunc_optab rather
than sext_optab for converting FLOAT128_IBM_P mode to FLOAT128_IEEE_P
mode, and use sext_optab rather than trunc_optab for converting
FLOAT128_IEEE_P mode to FLOAT128_IBM_P mode.
(rs6000_expand_float128_convert): Likewise.
---
 gcc/config/rs6000/rs6000.cc | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 4af1eeb3722..7e30ab5b207 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -11460,13 +11460,13 @@ init_float128_ieee (machine_mode mode)
   set_conv_libfunc (trunc_optab, SFmode, mode, "__trunckfsf2");
   set_conv_libfunc (trunc_optab, DFmode, mode, "__trunckfdf2");

-  set_conv_libfunc (sext_optab, mode, IFmode, "__trunctfkf2");
+  set_conv_libfunc (trunc_optab, mode, IFmode, "__trunctfkf2");
   if (mode != TFmode && FLOAT128_IBM_P (TFmode))
-   set_conv_libfunc (sext_optab, mode, TFmode, "__trunctfkf2");
+   set_conv_libfunc (trunc_optab, mode, TFmode, "__trunctfkf2");

-  set_conv_libfunc (trunc_optab, IFmode, mode, "__extendkftf2");
+  set_conv_libfunc (sext_optab, IFmode, mode, "__extendkftf2");
   if (mode != TFmode && FLOAT128_IBM_P (TFmode))
-   set_conv_libfunc (trunc_optab, TFmode, mode, "__extendkftf2");
+   set_conv_libfunc (sext_optab, TFmode, mode, "__extendkftf2");

   set_conv_libfunc (sext_optab, mode, SDmode, "__dpd_extendsdkf");
   set_conv_libfunc (sext_optab, mode, DDmode, "__dpd_extendddkf");
@@ -15624,7 +15624,7 @@ rs6000_expand_float128_convert (rtx dest, rtx src, bool 
unsigned_p)
case E_IFmode:
case E_TFmode:
  if (FLOAT128_IBM_P (src_mode))
-   cvt = sext_optab;
+   cvt = trunc_optab;
  else
do_move = true;
  break;
@@ -15686,7 +15686,7 @@ rs6000_expand_float128_convert (rtx dest, rtx src, bool 
unsigned_p)
case E_IFmode:
case E_TFmode:
  if (FLOAT128_IBM_P (dest_mode))
-   cvt = trunc_optab;
+   cvt = sext_optab;
  else
do_move = true;
  break;
--
2.43.5



[PATCH 2/3 v3] rs6000: Make all 128 bit scalar FP modes have 128 bit precision [PR112993]

2024-07-12 Thread Kewen.Lin
Hi,

On rs6000, there are three 128 bit scalar floating point
modes TFmode, IFmode and KFmode.  With some historical
reasons, we defines them with different mode precisions,
that is KFmode 126, TFmode 127 and IFmode 128.  But in
fact all of them should have the same mode precision 128,
this special setting has caused some issues like some
unexpected failures mentioned in [1] and also made us have
to introduce some workarounds, such as: the workaround in
build_common_tree_nodes for KFmode 126, the workaround in
range_compatible_p for same mode but different precision
issue.

This patch is to make these three 128 bit scalar floating
point modes TFmode, IFmode and KFmode have 128 bit mode
precision, and keep the order same as previous in order
to make machine independent parts of the compiler not try
to widen IFmode to TFmode.  Besides, build_common_tree_nodes
adopts the newly added hook mode_for_floating_type so we
don't need to worry about unexpected mode for long double
type node.

In function convert_mode_scalar, with the proposed change,
it adopts sext_optab for converting ieee128 format mode to
ibm128 format mode while trunc_optab for converting ibm128
format mode to ieee128 format mode.  Thus this patch removes
useless extend and trunc optab supports, supplements new
define_expands expandkftf2 and trunctfkf2 to align with
convert_mode_scalar implementation.  It also unnames two
define_insn_and_split to avoid conflicts and make them more
clear.  Considering the current implementation that there is
no chance to have KF <-> IF conversion (since either of them
would be TF already), it adds two dummy define_expands to
assert this.

Comparing to v2[2], it adjusts the optabs for ibm128 to
ieee128 conversion with trunc as 1/3 v2 new changes[3].

Bootstrapped and regtested on powerpc64{,le}-linux-gnu
(ibm128 long double default) and powerpc64le-linux-gnu
(ieee128 long double default).

[1] https://inbox.sourceware.org/gcc-patches/
718677e7-614d-7977-312d-05a75e1fd...@linux.ibm.com/
[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656371.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656667.html

I'm going to install this next week if no objections.

BR,
Kewen
-
PR target/112993

gcc/ChangeLog:

* config/rs6000/rs6000-modes.def (IFmode, KFmode, TFmode): Define
with FLOAT_MODE instead of FRACTIONAL_FLOAT_MODE, don't use special
precisions any more.
(rs6000-modes.h): Remove include.
* config/rs6000/rs6000-modes.h: Remove.
* config/rs6000/rs6000.h (rs6000-modes.h): Remove include.
* config/rs6000/t-rs6000: Remove rs6000-modes.h include.
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Replace
all uses of FLOAT_PRECISION_TFmode with 128.
(rs6000_c_mode_for_floating_type): Likewise.
* config/rs6000/rs6000.md (define_expand extendiftf2): Remove.
(define_expand extendifkf2): Remove.
(define_expand extendtfkf2): Remove.
(define_expand trunckftf2): Remove.
(define_expand trunctfif2): Remove.
(define_expand extendtfif2): Add new assertion.
(define_expand expandkftf2): New.
(define_expand trunciftf2): Add new assertion.
(define_expand trunctfkf2): New.
(define_expand truncifkf2): Change with gcc_unreachable.
(define_expand expandkfif2): New.
(define_insn_and_split extendkftf2): Rename to  ...
(define_insn_and_split *extendkftf2): ... this.
(define_insn_and_split trunctfkf2): Rename to ...
(define_insn_and_split *extendtfkf2): ... this.
---
 gcc/config/rs6000/rs6000-modes.def | 31 ++
 gcc/config/rs6000/rs6000-modes.h   | 36 
 gcc/config/rs6000/rs6000.cc|  9 +---
 gcc/config/rs6000/rs6000.h |  5 ---
 gcc/config/rs6000/rs6000.md| 67 --
 gcc/config/rs6000/t-rs6000 |  1 -
 6 files changed, 41 insertions(+), 108 deletions(-)
 delete mode 100644 gcc/config/rs6000/rs6000-modes.h

diff --git a/gcc/config/rs6000/rs6000-modes.def 
b/gcc/config/rs6000/rs6000-modes.def
index 094b246c834..b69593c40a6 100644
--- a/gcc/config/rs6000/rs6000-modes.def
+++ b/gcc/config/rs6000/rs6000-modes.def
@@ -18,12 +18,11 @@
along with GCC; see the file COPYING3.  If not see
.  */

-/* We order the 3 128-bit floating point types so that IFmode (IBM 128-bit
-   floating point) is the 128-bit floating point type with the highest
-   precision (128 bits).  This so that machine independent parts of the
-   compiler do not try to widen IFmode to TFmode on ISA 3.0 (power9) that has
-   hardware support for IEEE 128-bit.  We set TFmode (long double mode) in
-   between, and KFmode (explicit __float128) below it.
+/* We order the 3 128-bit floating point type modes here as KFmode, TFmode and
+   IFmode, it is the same as the previous order, to make machine independent
+   parts of the compiler