Re: [PATCH] Machine_Mode: Extend machine_mode from 8 to 16 bits

2023-05-11 Thread Richard Biener via Gcc-patches
On Fri, 12 May 2023, pan2...@intel.com wrote:

> From: Pan Li 
> 
> We are running out of the machine_mode(8 bits) in RISC-V backend. Thus
> we would like to extend the machine mode bit size from 8 to 16 bits.
> However, it is sensitive to extend the memory size in common structure
> like tree or rtx. This patch would like to extend the machine mode bits
> to 16 bits by shrinking, like:
> 
> * Swap the bit size of code and machine code in rtx_def.
> * Reconcile the machine_mode location and spare in tree.
> 
> The memory impact of this patch for correlated structure looks like below:
> 
> +---+--+-+--+
> | struct/bytes  | upstream | patched | diff |
> +---+--+-+--+
> | rtx_obj_reference |8 |  12 |   +4 |
> | ext_modified  |2 |   3 |   +1 |

this struct is packed and we have an array of it - it _might_ be
bad to have elements of size 3 here.  Size 4 shouldn't be too
bad so I suggest to remove the packed attribute there.

> | ira_allocno   |  192 | 200 |   +8 |

that looks unfortunate - did you check if there's now
padding that could be used by re-ordering fields?

> | qty_table_elem|   40 |  40 |0 |
> | reg_stat_type |   64 |  64 |0 |
> | rtx_def   |   40 |  40 |0 |
> | table_elt |   80 |  80 |0 |
> | tree_decl_common  |  112 | 112 |0 |
> | tree_type_common  |  128 | 128 |0 |
> +---+--+-+--+
> 
> The tree and rtx related struct has no memory changes after this patch,
> and the machine_mode changes to 16 bits already.
> 
> Signed-off-by: Pan Li 
> Co-authored-by: Ju-Zhe Zhong 
> Co-authored-by: Kito Cheng 
> 
> gcc/ChangeLog:
> 
>   * combine.cc (struct reg_stat_type): Extended machine mode to 16 bits.
>   * cse.cc (struct qty_table_elem): Ditto.
>   (struct table_elt): Ditto.
>   (struct set): Ditto.
>   * genopinit.cc (main): Reconciled the machine mode limit.
>   * ira-int.h (struct ira_allocno): Extended machine mode to 16 bits.
>   * ree.cc (struct ATTRIBUTE_PACKED): Ditto.

please go over the ChangeLog and properly specify the structure types
altered.  The script generating the changelog isn't perfect.

Richard.

>   * rtl-ssa/accesses.h: Ditto.
>   * rtl.h (RTX_CODE_BITSIZE): New macro.
>   (RTX_MACHINE_MODE_BITSIZE): Ditto.
>   (struct GTY): Swap bit size between code and machine mode.
>   (subreg_shape::unique_id): Reconciled the machine mode limit.
>   * rtlanal.h: Extended machine mode to 16 bits.
>   * tree-core.h (struct tree_type_common): Ditto.
>   (struct tree_decl_common): Reconciled the locate and extended
>   bit size of machine mode.
> ---
>  gcc/combine.cc |  4 ++--
>  gcc/cse.cc |  8 
>  gcc/genopinit.cc   |  3 ++-
>  gcc/ira-int.h  | 12 
>  gcc/ree.cc |  2 +-
>  gcc/rtl-ssa/accesses.h |  6 --
>  gcc/rtl.h  |  9 ++---
>  gcc/rtlanal.h  |  5 +++--
>  gcc/tree-core.h| 11 ---
>  9 files changed, 38 insertions(+), 22 deletions(-)
> 
> diff --git a/gcc/combine.cc b/gcc/combine.cc
> index 5aa0ec5c45a..bdf6f635c80 100644
> --- a/gcc/combine.cc
> +++ b/gcc/combine.cc
> @@ -200,7 +200,7 @@ struct reg_stat_type {
>  
>unsigned HOST_WIDE_INT last_set_nonzero_bits;
>char   last_set_sign_bit_copies;
> -  ENUM_BITFIELD(machine_mode)last_set_mode : 8;
> +  ENUM_BITFIELD(machine_mode)last_set_mode : 
> RTX_MACHINE_MODE_BITSIZE;
>  
>/* Set nonzero if references to register n in expressions should not be
>   used.  last_set_invalid is set nonzero when this register is being
> @@ -235,7 +235,7 @@ struct reg_stat_type {
>   truncation if we know that value already contains a truncated
>   value.  */
>  
> -  ENUM_BITFIELD(machine_mode)truncated_to_mode : 8;
> +  ENUM_BITFIELD(machine_mode)truncated_to_mode : 
> RTX_MACHINE_MODE_BITSIZE;
>  };
>  
>  
> diff --git a/gcc/cse.cc b/gcc/cse.cc
> index b10c9b0c94d..fe594c1bc3d 100644
> --- a/gcc/cse.cc
> +++ b/gcc/cse.cc
> @@ -250,8 +250,8 @@ struct qty_table_elem
>unsigned int first_reg, last_reg;
>/* The sizes of these fields should match the sizes of the
>   code and mode fields of struct rtx_def (see rtl.h).  */
> -  ENUM_BITFIELD(rtx_code) comparison_code : 16;
> -  ENUM_BITFIELD(machine_mode) mode : 8;
> +  ENUM_BITFIELD(rtx_code) comparison_code : RTX_CODE_BITSIZE;
> +  ENUM_BITFIELD(machine_mode) mode : RTX_MACHINE_MODE_BITSIZE;
>  };
>  
>  /* The table of all qtys, indexed by qty number.  */
> @@ -406,7 +406,7 @@ struct table_elt
>int regcost;
>/* The size of this field should match the size
>   of the mode field of struct rtx_def (see rtl.h).  */
> -  ENUM_BITFIELD(machine_mode) mode : 8;
> +  ENUM_BITFIELD(machine_mode) mode : 

RE: [PATCH] RISC-V: Fix fail of vmv-imm-rv64.c in rv32

2023-05-11 Thread Li, Pan2 via Gcc-patches
Committed to trunk.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Friday, May 12, 2023 2:32 PM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; pal...@dabbelt.com; jeffreya...@gmail.com; 
rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Fix fail of vmv-imm-rv64.c in rv32

ok

On Fri, May 12, 2023 at 11:11 AM  wrote:
>
> From: Juzhe-Zhong 
>
> After update local codebase to the trunk. I realize there is one more fail in 
> RV32.
> After this patch, all fails of RVV are cleaned up.
> Thanks.
>
> FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c -O3 -ftree-vectorize 
> (test for excess errors) Excess errors:
> cc1: error: ABI requires '-march=rv32'
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c: Add ABI
>
> ---
>  gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
> index 520321e1c73..e386166f95e 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-additional-options "-std=c99 -march=rv64gcv 
> -fno-vect-cost-model --param=riscv-autovec-preference=scalable 
> -fno-builtin" } */
> +/* { dg-additional-options "-std=c99 -march=rv64gcv -mabi=lp64d 
> +-fno-vect-cost-model --param=riscv-autovec-preference=scalable 
> +-fno-builtin" } */
>
>  #include "vmv-imm-template.h"
>
> --
> 2.36.1
>


Re: [PATCH] RISC-V: Fix RVV binary auto-vectorizaiton test fails

2023-05-11 Thread Robin Dapp via Gcc-patches
> ok, thanks :)
This has likely been discussed at length before, but why need to
specify the additional -mabi with -march (instead of -march implying
a matching abi)?


Re: [PATCH] RISC-V: Fix fail of vmv-imm-rv64.c in rv32

2023-05-11 Thread Kito Cheng via Gcc-patches
ok

On Fri, May 12, 2023 at 11:11 AM  wrote:
>
> From: Juzhe-Zhong 
>
> After update local codebase to the trunk. I realize there is one more fail in 
> RV32.
> After this patch, all fails of RVV are cleaned up.
> Thanks.
>
> FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c -O3 -ftree-vectorize (test 
> for excess errors)
> Excess errors:
> cc1: error: ABI requires '-march=rv32'
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c: Add ABI
>
> ---
>  gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
> index 520321e1c73..e386166f95e 100644
> --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-additional-options "-std=c99 -march=rv64gcv -fno-vect-cost-model 
> --param=riscv-autovec-preference=scalable -fno-builtin" } */
> +/* { dg-additional-options "-std=c99 -march=rv64gcv -mabi=lp64d 
> -fno-vect-cost-model --param=riscv-autovec-preference=scalable -fno-builtin" 
> } */
>
>  #include "vmv-imm-template.h"
>
> --
> 2.36.1
>


Re: [x86_64 PATCH] PR middle-end/109766: Prevent cprop_hardreg bloating code with -Os.

2023-05-11 Thread Uros Bizjak via Gcc-patches
On Thu, May 11, 2023 at 4:21 PM Roger Sayle  wrote:
>
>
> PR 109766 is an interesting case of large code being generated on x86_64,
> caused by an interaction/conflict between register allocation and hardreg
> cprop, that's tricky to fix/resolve within the middle-end.
>
> The task/challenge is to push a DImode value in an SSE register on to
> the stack, when optimizing for size.  GCC's register allocator makes
> the optimal choice to move the SSE register to a GPR, and then use push.
> So after reload we have:
>
> (insn 46 3 4 2 (set (reg:DF 1 dx [101])
> (reg:DF 21 xmm1 [ D1 ])) "pr109766.c":15:74 151 {*movdf_internal}
>  (nil))
> (insn 28 27 29 2 (set (mem:DF (pre_dec:DI (reg/f:DI 7 sp)) [0  S8 A64])
> (reg:DF 1 dx [101])) "pr109766.c":16:5 142 {*pushdf}
>  (expr_list:REG_ARGS_SIZE (const_int 56 [0x38])
> (nil)))
>
> which corresponds to the short 6 byte sequence:
> 66 48 0f 7e ca  movq   %xmm1,%rdx  [5 bytes]
> 52  push   %rdx[1 byte]
>
>
> The problem is that several passes later, after pro_and_epilogue has
> determined that the function doesn't need a stack frame, that the
> hard register cprop pass sees the above two instructions, including
> the initial register to register move, and decides to "simplify" it
> as:
>
> (insn 68 67 69 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0  S8 A64])
> (reg:DI 21 xmm1 [101])) "pr109766.c":16:5 62 {*pushdi2_rex64}
>  (expr_list:REG_ARGS_SIZE (const_int 56 [0x38])
> (nil)))
>
> but as x86_64 doesn't directly support push from SSE registers, the
> above is split during split3 into:
>
> (insn 92 91 93 2 (set (reg/f:DI 7 sp)
> (plus:DI (reg/f:DI 7 sp)
> (const_int -8 [0xfff8]))) "pr109766.c":16:5 247
> {*leadi}
>  (expr_list:REG_ARGS_SIZE (const_int 56 [0x38])
> (nil)))
> (insn 93 92 94 2 (set (mem:DI (reg/f:DI 7 sp) [0  S8 A64])
> (reg:DI 21 xmm1 [101])) "pr109766.c":16:5 88 {*movdi_internal}
>  (nil))
>
> which corresponds to the bigger 10 byte sequence:
>
> 48 8d 64 24 f8  lea-0x8(%rsp),%rsp  [5 bytes]
> 66 0f d6 0c 24  movq   %xmm1,(%rsp) [5 bytes]
>
>
> Clearly the cprop_hardreg substitution is questionable with -Os, but how
> to prevent it is a challenge.  One (labor intensive) approach might be
> to have regcprop.cc query the target's rtx_costs before performing
> this type of substitution, which only works if the backend is
> sufficiently parameterized.  Unfortunately, i386 like many targets
> defines the rtx_cost of (set (dst) (src)) to be rtx_cost(dst) +
> rtx_cost(src), which misses the subtlety of pushing an SSE register
> to the stack.
>
> An alternate solution, which can be implemented entirely in the
> backend, is to prevent *pushdi2_rex64 being recognized (by
> cprop_hardreg) with an SSE hard register operand after reload
> when optimizing for size.

Removing a pattern (or alternative) after reload and depending the
pattern (or alternative) on optimize_insn_for_{speed/size}_p is
fundamentally wrong. Perhaps you want to look at
preferred_for_size/prefered_for_speed attribute that was invented just
for this purpose, These two attributes weigh alternatives depending on
optimization choices. They don't disable alternatives in a "hard" way,
but affect their preferences depending on which optimization is
active.

Uros.

>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures.  Ok for mainline?
>
>
> 2023-05-11  Roger Sayle  
>
> gcc/ChangeLog
> PR middle-end/109766
> * config/i386/i386.md (*pushdi_rex64): Disallow SSE registers
> after reload when optimizing for size.
> (*pushsi2_rex64): Likewise.
> (*pushsi2): Likewise.
>
> gcc/testsuite/ChangeLog
> PR middle-end/109766
> * gcc.target/i386/pr109766.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH] Provide -fcf-protection=branch,return.

2023-05-11 Thread Hongtao Liu via Gcc-patches
On Fri, May 12, 2023 at 1:50 PM Andrew Pinski  wrote:
>
> On Thu, May 11, 2023 at 10:45 PM liuhongt via Gcc-patches
>  wrote:
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > PR target/89701
> > * common.opt: Refactor -fcf-protection= to support combination
> > of param.
> > * lto-wrapper.c (merge_and_complain): Adjusted.
> > * opts.c (parse_cf_protection_options): New.
> > (common_handle_option): Decode argument for -fcf-protection=.
> > * opts.h (parse_cf_protection_options): Declare.
>
> I think this could be simplified if you use either EnumSet or
> EnumBitSet instead in common.opt for `-fcf-protection=`.
Thanks, I didn't know that, i'll try to refactor the patch to EnumSet
or EnumBitSet
>
> Thanks,
> Andrew
>
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/89701
> > * c-c++-common/fcf-protection-8.c: New test.
> > * c-c++-common/fcf-protection-9.c: New test.
> > * c-c++-common/fcf-protection-10.c: New test.
> > * gcc.target/i386/pr89701-1.c: New test.
> > * gcc.target/i386/pr89701-2.c: New test.
> > * gcc.target/i386/pr89701-3.c: New test.
> > * gcc.target/i386/pr89701-4.c: New test.
> > ---
> >  gcc/common.opt| 24 ++
> >  gcc/lto-wrapper.cc| 21 +++--
> >  gcc/opts.cc   | 79 +++
> >  gcc/opts.h|  1 +
> >  .../c-c++-common/fcf-protection-10.c  |  3 +
> >  .../c-c++-common/fcf-protection-11.c  |  2 +
> >  .../c-c++-common/fcf-protection-12.c  |  2 +
> >  gcc/testsuite/c-c++-common/fcf-protection-8.c |  3 +
> >  gcc/testsuite/c-c++-common/fcf-protection-9.c |  3 +
> >  gcc/testsuite/gcc.target/i386/pr89701-1.c |  4 +
> >  gcc/testsuite/gcc.target/i386/pr89701-2.c |  4 +
> >  gcc/testsuite/gcc.target/i386/pr89701-3.c |  5 ++
> >  gcc/testsuite/gcc.target/i386/pr89701-4.c |  5 ++
> >  13 files changed, 130 insertions(+), 26 deletions(-)
> >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-10.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-11.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-12.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-8.c
> >  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-9.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-2.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-3.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-4.c
> >
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index a28ca13385a..ac12da52733 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -229,6 +229,10 @@ bool dump_base_name_prefixed = false
> >  Variable
> >  unsigned int flag_zero_call_used_regs
> >
> > +;; What the CF check should instrument
> > +Variable
> > +unsigned int flag_cf_protection = 0
> > +
> >  ###
> >  Driver
> >
> > @@ -1886,28 +1890,10 @@ fcf-protection
> >  Common RejectNegative Alias(fcf-protection=,full)
> >
> >  fcf-protection=
> > -Common Joined RejectNegative Enum(cf_protection_level) 
> > Var(flag_cf_protection) Init(CF_NONE)
> > +Common Joined
> >  -fcf-protection=[full|branch|return|none|check]Instrument 
> > functions with checks to verify jump/call/return control-flow transfer
> >  instructions have valid targets.
> >
> > -Enum
> > -Name(cf_protection_level) Type(enum cf_protection_level) 
> > UnknownError(unknown Control-Flow Protection Level %qs)
> > -
> > -EnumValue
> > -Enum(cf_protection_level) String(full) Value(CF_FULL)
> > -
> > -EnumValue
> > -Enum(cf_protection_level) String(branch) Value(CF_BRANCH)
> > -
> > -EnumValue
> > -Enum(cf_protection_level) String(return) Value(CF_RETURN)
> > -
> > -EnumValue
> > -Enum(cf_protection_level) String(check) Value(CF_CHECK)
> > -
> > -EnumValue
> > -Enum(cf_protection_level) String(none) Value(CF_NONE)
> > -
> >  finstrument-functions
> >  Common Var(flag_instrument_function_entry_exit,1)
> >  Instrument function entry and exit with profiling calls.
> > diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
> > index 5186d040ce0..568c8af659d 100644
> > --- a/gcc/lto-wrapper.cc
> > +++ b/gcc/lto-wrapper.cc
> > @@ -359,26 +359,33 @@ merge_and_complain (vec 
> > &decoded_options,
> > case OPT_fcf_protection_:
> >   /* Default to link-time option, else append or check identical.  
> > */
> >   if (!cf_protection_option
> > - || cf_protection_option->value == CF_CHECK)
> > + || !memcmp (cf_protection_option->arg, "check", 5))
> > {
> > + const char* parg = decoded_options[existing_opt].arg;
> >   if (existing_opt == -1)
> > decoded_options.safe_push (*foptio

[PATCH v2 9/9] MIPS: Make mips16e2 generating ZEB/ZEH instead of ANDI under certain conditions

2023-05-11 Thread Jie Mei
This patch allows mips16e2 acts the same with -O1~3
when generating ZEB/ZEH instead of ANDI under
the -O0 option, which shrinks the code size.

gcc/ChangeLog:
* config/mips/mips.md(*and3_mips16): Generates
ZEB/ZEH instructions.
---
 gcc/config/mips/mips.md | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 7eb65891820..85ed1735d83 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -3357,9 +3357,9 @@
(set_attr "mode" "")])
 
 (define_insn "*and3_mips16"
-  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d,d,d,d,d")
-   (and:GPR (match_operand:GPR 1 "nonimmediate_operand" 
"%W,W,W,d,0,d,0,0?")
-(match_operand:GPR 2 "and_operand" "Yb,Yh,Yw,Yw,d,Yx,Yz,K")))]
+  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d,d,d,d,d,d,d")
+   (and:GPR (match_operand:GPR 1 "nonimmediate_operand" 
"%0,0,W,W,W,d,0,d,0,0?")
+(match_operand:GPR 2 "and_operand" 
"Yb,Yh,Yb,Yh,Yw,Yw,d,Yx,Yz,K")))]
   "TARGET_MIPS16 && and_operands_ok (mode, operands[1], operands[2])"
 {
   int len;
@@ -3368,38 +3368,42 @@
   switch (which_alternative)
 {
 case 0:
+  return "zeb\t%0";
+case 1:
+  return "zeh\t%0";
+case 2:
   operands[1] = gen_lowpart (QImode, operands[1]);
   return "lbu\t%0,%1";
-case 1:
+case 3:
   operands[1] = gen_lowpart (HImode, operands[1]);
   return "lhu\t%0,%1";
-case 2:
+case 4:
   operands[1] = gen_lowpart (SImode, operands[1]);
   return "lwu\t%0,%1";
-case 3:
+case 5:
   return "#";
-case 4:
+case 6:
   return "and\t%0,%2";
-case 5:
+case 7:
   len = low_bitmask_len (mode, INTVAL (operands[2]));
   operands[2] = GEN_INT (len);
   return "ext\t%0,%1,0,%2";
-case 6:
+case 8:
   mips_bit_clear_info (mode, INTVAL (operands[2]), &pos, &len);
   operands[1] = GEN_INT (pos);
   operands[2] = GEN_INT (len);
   return "ins\t%0,$0,%1,%2";
-case 7:
+case 9:
   return "andi\t%0,%x2";
 default:
   gcc_unreachable ();
 }
 }
-  [(set_attr "move_type" 
"load,load,load,shift_shift,logical,ext_ins,ext_ins,andi")
+  [(set_attr "move_type" 
"andi,andi,load,load,load,shift_shift,logical,ext_ins,ext_ins,andi")
(set_attr "mode" "")
-   (set_attr "extended_mips16" "no,no,no,no,no,yes,yes,yes")
+   (set_attr "extended_mips16" "no,no,no,no,no,no,no,yes,yes,yes")
(set (attr "enabled")
-   (cond [(and (eq_attr "alternative" "7")
+   (cond [(and (eq_attr "alternative" "9")
(not (match_test "ISA_HAS_MIPS16E2")))
   (const_string "no")
   (and (eq_attr "alternative" "0,1")
-- 
2.40.1


[PATCH v2 4/9] MIPS: Add bitwise instructions for mips16e2

2023-05-11 Thread Jie Mei
There are shortened bitwise instructions in the mips16e2 ASE,
for instance, ANDI, ORI/XORI, EXT, INS etc. .

This patch adds these instrutions with corresponding tests.

gcc/ChangeLog:

* config/mips/constraints.md(Yz): New constraints for mips16e2.
* config/mips/mips-protos.h(mips_bit_clear_p): Declared new function.
(mips_bit_clear_info): Same as above.
* config/mips/mips.cc(mips_bit_clear_info): New function for
generating instructions.
(mips_bit_clear_p): Same as above.
* config/mips/mips.h(ISA_HAS_EXT_INS): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md(extended_mips16): Generates EXT and INS 
instructions.
(*and3): Generates INS instruction.
(*and3_mips16): Generates EXT, INS and ANDI instructions.
(ior3): Add logics for ORI instruction.
(*ior3_mips16_asmacro): Generates ORI instrucion.
(*ior3_mips16): Add logics for XORI instruction.
(*xor3_mips16): Generates XORI instrucion.
(*extzv): Add logics for EXT instruction.
(*insv): Add logics for INS instruction.
* config/mips/predicates.md(bit_clear_operand): New predicate for
generating bitwise instructions.
(and_reg_operand): Add logics for generating bitwise instructions.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: New tests for mips16e2.
---
 gcc/config/mips/constraints.md   |   4 +
 gcc/config/mips/mips-protos.h|   4 +
 gcc/config/mips/mips.cc  |  67 ++-
 gcc/config/mips/mips.h   |   3 +-
 gcc/config/mips/mips.md  |  91 
 gcc/config/mips/predicates.md|  13 ++-
 gcc/testsuite/gcc.target/mips/mips16e2.c | 102 +++
 7 files changed, 263 insertions(+), 21 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2.c

diff --git a/gcc/config/mips/constraints.md b/gcc/config/mips/constraints.md
index 49d1a43c613..22d4d84f074 100644
--- a/gcc/config/mips/constraints.md
+++ b/gcc/config/mips/constraints.md
@@ -264,6 +264,10 @@
   (and (match_code "const_vector")
(match_test "op == CONST0_RTX (mode)")))
 
+(define_constraint "Yz"
+  "@internal"
+  (match_operand 0 "bit_clear_operand"))
+
 (define_constraint "YA"
   "@internal
An unsigned 6-bit constant."
diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 20483469105..2791b9f220a 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -388,4 +388,8 @@ extern void mips_register_frame_header_opt (void);
 extern void mips_expand_vec_cond_expr (machine_mode, machine_mode, rtx *);
 extern void mips_expand_vec_cmp_expr (rtx *);
 
+extern bool mips_bit_clear_p (enum machine_mode, unsigned HOST_WIDE_INT);
+extern void mips_bit_clear_info (enum machine_mode, unsigned HOST_WIDE_INT,
+ int *, int *);
+
 #endif /* ! GCC_MIPS_PROTOS_H */
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index be470bbb50d..d86911d10c2 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -3895,6 +3895,10 @@ mips16_constant_cost (int code, HOST_WIDE_INT x)
return 0;
   return -1;
 
+case ZERO_EXTRACT:
+  /* The bit position and size are immediate operands.  */
+  return ISA_HAS_EXT_INS ? COSTS_N_INSNS (1) : -1;
+
 default:
   return -1;
 }
@@ -22753,7 +22757,68 @@ mips_asm_file_end (void)
   if (NEED_INDICATE_EXEC_STACK)
 file_end_indicate_exec_stack ();
 }
-
+
+void
+mips_bit_clear_info (enum machine_mode mode, unsigned HOST_WIDE_INT m,
+ int *start_pos, int *size)
+{
+  unsigned int shift = 0;
+  unsigned int change_count = 0;
+  unsigned int prev_val = 1;
+  unsigned int curr_val = 0;
+  unsigned int end_pos = GET_MODE_SIZE (mode) * BITS_PER_UNIT;
+
+  for (shift = 0 ; shift < (GET_MODE_SIZE (mode) * BITS_PER_UNIT) ; shift++)
+{
+  curr_val = (unsigned int)((m & (unsigned int)(1 << shift)) >> shift);
+  if (curr_val != prev_val)
+{
+  change_count++;
+  switch (change_count)
+{
+  case 1:
+*start_pos = shift;
+break;
+  case 2:
+end_pos = shift;
+break;
+  default:
+gcc_unreachable ();
+}
+}
+  prev_val = curr_val;
+   }
+  *size = (end_pos - *start_pos);
+}
+
+bool
+mips_bit_clear_p (enum machine_mode mode, unsigned HOST_WIDE_INT m)
+{
+  unsigned int shift = 0;
+  unsigned int change_count = 0;
+  unsigned int prev_val = 1;
+  unsigned int curr_val = 0;
+
+  if (mode != SImode && mode != VOIDmode)
+return false;
+
+  if (!ISA_HAS_EXT_INS)
+return false;
+
+  for (shift = 0 ; shift < (UNITS_PER_WORD * BITS_PER_UNIT) ; shift++)
+{
+  curr_val = (unsigned int)((m & (unsigned int)(1 << shift)) >> shift);
+  if (curr_val != prev_val

[PATCH v2 6/9] MIPS: Add load/store word left/right instructions for mips16e2

2023-05-11 Thread Jie Mei
This patch adds LWL/LWR, SWL/SWR instructions with their
corresponding tests.

gcc/ChangeLog:

* config/mips/mips.cc(mips_expand_ins_as_unaligned_store):
Add logics for generating instruction.
* config/mips/mips.h(ISA_HAS_LWL_LWR): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md(mov_l): Generates instructions.
(mov_r): Same as above.
(mov_l): Adjusted for the conditions above.
(mov_r): Same as above.
(mov_l_mips16e2): Add machine description for `define_insn 
mov_l_mips16e2`.
(mov_r_mips16e2): Add machine description for `define_insn 
mov_r_mips16e2`.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: New tests for mips16e2.
---
 gcc/config/mips/mips.cc  |  15 ++-
 gcc/config/mips/mips.h   |   2 +-
 gcc/config/mips/mips.md  |  43 +++--
 gcc/testsuite/gcc.target/mips/mips16e2.c | 116 +++
 4 files changed, 168 insertions(+), 8 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 0792f89cab4..275efc5a390 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -8603,12 +8603,25 @@ mips_expand_ins_as_unaligned_store (rtx dest, rtx src, 
HOST_WIDE_INT width,
 return false;
 
   mode = int_mode_for_size (width, 0).require ();
-  src = gen_lowpart (mode, src);
+  if (TARGET_MIPS16
+  && src == const0_rtx)
+src = force_reg (mode, src);
+  else
+src = gen_lowpart (mode, src);
+
   if (mode == DImode)
 {
+  if (TARGET_MIPS16)
+   gcc_unreachable ();
   emit_insn (gen_mov_sdl (dest, src, left));
   emit_insn (gen_mov_sdr (copy_rtx (dest), copy_rtx (src), right));
 }
+  else if (TARGET_MIPS16)
+{
+  emit_insn (gen_mov_swl_mips16e2 (dest, src, left));
+  emit_insn (gen_mov_swr_mips16e2 (copy_rtx (dest), copy_rtx (src),
+  right));
+}
   else
 {
   emit_insn (gen_mov_swl (dest, src, left));
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index cab5ff422a8..a5c121088b7 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1180,7 +1180,7 @@ struct mips_cpu_info {
  && (MODE) == V2SFmode))   \
 && !TARGET_MIPS16)
 
-#define ISA_HAS_LWL_LWR(mips_isa_rev <= 5 && !TARGET_MIPS16)
+#define ISA_HAS_LWL_LWR(mips_isa_rev <= 5 && (!TARGET_MIPS16 
|| ISA_HAS_MIPS16E2))
 
 #define ISA_HAS_IEEE_754_LEGACY(mips_isa_rev <= 5)
 
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 73c9acd484f..5ef8d99d99c 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -4488,10 +4488,12 @@
(unspec:GPR [(match_operand:BLK 1 "memory_operand" "m")
 (match_operand:QI 2 "memory_operand" "ZC")]
UNSPEC_LOAD_LEFT))]
-  "!TARGET_MIPS16 && mips_mem_fits_mode_p (mode, operands[1])"
+  "(!TARGET_MIPS16 || ISA_HAS_MIPS16E2)
+&& mips_mem_fits_mode_p (mode, operands[1])"
   "l\t%0,%2"
   [(set_attr "move_type" "load")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set_attr "extended_mips16" "yes")])
 
 (define_insn "mov_r"
   [(set (match_operand:GPR 0 "register_operand" "=d")
@@ -4499,17 +4501,20 @@
 (match_operand:QI 2 "memory_operand" "ZC")
 (match_operand:GPR 3 "register_operand" "0")]
UNSPEC_LOAD_RIGHT))]
-  "!TARGET_MIPS16 && mips_mem_fits_mode_p (mode, operands[1])"
+  "(!TARGET_MIPS16 || ISA_HAS_MIPS16E2)
+&& mips_mem_fits_mode_p (mode, operands[1])"
   "r\t%0,%2"
   [(set_attr "move_type" "load")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set_attr "extended_mips16" "yes")])
 
 (define_insn "mov_l"
   [(set (match_operand:BLK 0 "memory_operand" "=m")
(unspec:BLK [(match_operand:GPR 1 "reg_or_0_operand" "dJ")
 (match_operand:QI 2 "memory_operand" "ZC")]
UNSPEC_STORE_LEFT))]
-  "!TARGET_MIPS16 && mips_mem_fits_mode_p (mode, operands[0])"
+  "!TARGET_MIPS16
+   && mips_mem_fits_mode_p (mode, operands[0])"
   "l\t%z1,%2"
   [(set_attr "move_type" "store")
(set_attr "mode" "")])
@@ -4520,11 +4525,37 @@
 (match_operand:QI 2 "memory_operand" "ZC")
 (match_dup 0)]
UNSPEC_STORE_RIGHT))]
-  "!TARGET_MIPS16 && mips_mem_fits_mode_p (mode, operands[0])"
+  "!TARGET_MIPS16
+   && mips_mem_fits_mode_p (mode, operands[0])"
   "r\t%z1,%2"
   [(set_attr "move_type" "store")
(set_attr "mode" "")])
 
+(define_insn "mov_l_mips16e2"
+  [(set (match_operand:BLK 0 "memory_operand" "=m")
+(unspec:BLK [(match_operand:GPR 1 "register_operand" "d")
+ (match_operand:QI 2 "memory_operand" "ZC")]
+UNSPEC_STORE_LEFT))]
+  "TARGET_MIPS16 && ISA_HAS_MIPS16E2
+   && mips_mem_fits_mode_p (mode, operands[0])"
+  "l\t%1,%2"
+  [(set_attr "

[PATCH v2 5/9] MIPS: Add LUI instruction for mips16e2

2023-05-11 Thread Jie Mei
This patch adds LUI instruction from mips16e2
with corresponding test.

gcc/ChangeLog:

* config/mips/mips.cc(mips_symbol_insns_1): Generates LUI instruction.
(mips_const_insns): Same as above.
(mips_output_move): Same as above.
(mips_output_function_prologue): Same as above.
* config/mips/mips.md: Same as above

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2.c: Add new tests for mips16e2.
---
 gcc/config/mips/mips.cc  | 44 ++--
 gcc/config/mips/mips.md  |  2 +-
 gcc/testsuite/gcc.target/mips/mips16e2.c | 22 
 3 files changed, 56 insertions(+), 12 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index d86911d10c2..0792f89cab4 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -2295,7 +2295,9 @@ mips_symbol_insns_1 (enum mips_symbol_type type, 
machine_mode mode)
 The final address is then $at + %lo(symbol).  With 32-bit
 symbols we just need a preparatory LUI for normal mode and
 a preparatory LI and SLL for MIPS16.  */
-  return ABI_HAS_64BIT_SYMBOLS ? 6 : TARGET_MIPS16 ? 3 : 2;
+  return ABI_HAS_64BIT_SYMBOLS 
+ ? 6 
+ : (TARGET_MIPS16 && !ISA_HAS_MIPS16E2) ? 3 : 2;
 
 case SYMBOL_GP_RELATIVE:
   /* Treat GP-relative accesses as taking a single instruction on
@@ -2867,7 +2869,7 @@ mips_const_insns (rtx x)
 
   /* This is simply an LUI for normal mode.  It is an extended
 LI followed by an extended SLL for MIPS16.  */
-  return TARGET_MIPS16 ? 4 : 1;
+  return TARGET_MIPS16 ? (ISA_HAS_MIPS16E2 ? 2 : 4) : 1;
 
 case CONST_INT:
   if (TARGET_MIPS16)
@@ -2879,7 +2881,10 @@ mips_const_insns (rtx x)
: SMALL_OPERAND_UNSIGNED (INTVAL (x)) ? 2
: IN_RANGE (-INTVAL (x), 0, 255) ? 2
: SMALL_OPERAND_UNSIGNED (-INTVAL (x)) ? 3
-   : 0);
+   : ISA_HAS_MIPS16E2
+ ? (trunc_int_for_mode (INTVAL (x), SImode) == INTVAL (x)
+? 4 : 8)
+ : 0);
 
   return mips_build_integer (codes, INTVAL (x));
 
@@ -5252,6 +5257,11 @@ mips_output_move (rtx dest, rtx src)
  if (!TARGET_MIPS16)
return "li\t%0,%1\t\t\t# %X1";
 
+ if (ISA_HAS_MIPS16E2
+ && LUI_INT (src)
+ && !SMALL_OPERAND_UNSIGNED (INTVAL (src)))
+   return "lui\t%0,%%hi(%1)\t\t\t# %X1";
+
  if (SMALL_OPERAND_UNSIGNED (INTVAL (src)))
return "li\t%0,%1";
 
@@ -5260,7 +5270,7 @@ mips_output_move (rtx dest, rtx src)
}
 
   if (src_code == HIGH)
-   return TARGET_MIPS16 ? "#" : "lui\t%0,%h1";
+   return (TARGET_MIPS16 && !ISA_HAS_MIPS16E2) ? "#" : "lui\t%0,%h1";
 
   if (CONST_GP_P (src))
return "move\t%0,%1";
@@ -11983,13 +11993,25 @@ mips_output_function_prologue (FILE *file)
 {
   if (TARGET_MIPS16)
{
- /* This is a fixed-form sequence.  The position of the
-first two instructions is important because of the
-way _gp_disp is defined.  */
- output_asm_insn ("li\t$2,%%hi(_gp_disp)", 0);
- output_asm_insn ("addiu\t$3,$pc,%%lo(_gp_disp)", 0);
- output_asm_insn ("sll\t$2,16", 0);
- output_asm_insn ("addu\t$2,$3", 0);
+ if (ISA_HAS_MIPS16E2)
+   {
+ /* This is a fixed-form sequence.  The position of the
+first two instructions is important because of the
+way _gp_disp is defined.  */
+ output_asm_insn ("lui\t$2,%%hi(_gp_disp)", 0);
+ output_asm_insn ("addiu\t$3,$pc,%%lo(_gp_disp)", 0);
+ output_asm_insn ("addu\t$2,$3", 0);
+   }
+ else
+   {
+ /* This is a fixed-form sequence.  The position of the
+first two instructions is important because of the
+way _gp_disp is defined.  */
+ output_asm_insn ("li\t$2,%%hi(_gp_disp)", 0);
+ output_asm_insn ("addiu\t$3,$pc,%%lo(_gp_disp)", 0);
+ output_asm_insn ("sll\t$2,16", 0);
+ output_asm_insn ("addu\t$2,$3", 0);
+   }
}
   else
{
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 9f652310aa2..73c9acd484f 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -4634,7 +4634,7 @@
 (define_split
   [(set (match_operand:P 0 "d_operand")
(high:P (match_operand:P 1 "symbolic_operand_with_high")))]
-  "TARGET_MIPS16 && reload_completed"
+  "TARGET_MIPS16 && reload_completed && !ISA_HAS_MIPS16E2"
   [(set (match_dup 0) (unspec:P [(match_dup 1)] UNSPEC_UNSHIFTED_HIGH))
(set (match_dup 0) (ashift:P (match_dup 0) (const_int 16)))])
 
diff --git a/gcc/testsuite/gcc.target/mips/mips16e2.c 
b/gcc/testsuite/gcc.target/mips/mips16e2.c
index ce8b4f1819b..780891b4056 100644
--- a/gcc/testsuite/gcc.target/mips/mips16e2.

[PATCH v2 3/9] MIPS: Add instruction about global pointer register for mips16e2

2023-05-11 Thread Jie Mei
The mips16e2 ASE uses eight general-purpose registers
from mips32, with some special-purpose registers,
these registers are GPRs: s0-1, v0-1, a0-3, and
special registers: t8, gp, sp, ra.

As mentioned above, the special register gp is
used in mips16e2, which is the global pointer register,
it is used by some of the instructions in the ASE,
for instance, ADDIU, LB/LBU, etc. .

This patch adds these instructions with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.cc(mips_regno_mode_ok_for_base_p): Generate 
instructions
that uses global pointer register.
(mips16_unextended_reference_p): Same as above.
(mips_pic_base_register): Same as above.
(mips_init_relocs): Same as above.
* config/mips/mips.h(MIPS16_GP_LOADS): Defined a new macro.
(GLOBAL_POINTER_REGNUM): Moved to machine description `mips.md`.
* config/mips/mips.md(GLOBAL_POINTER_REGNUM): Moved to here from above.
(*lowsi_mips16_gp):New `define_insn *low_mips16`.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-gp.c: New tests for mips16e2.
---
 gcc/config/mips/mips.cc |  10 +-
 gcc/config/mips/mips.h  |   6 +-
 gcc/config/mips/mips.md |  11 +++
 gcc/testsuite/gcc.target/mips/mips16e2-gp.c | 101 
 4 files changed, 121 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-gp.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 585a3682c7b..be470bbb50d 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -2474,6 +2474,9 @@ mips_regno_mode_ok_for_base_p (int regno, machine_mode 
mode,
   if (TARGET_MIPS16 && regno == STACK_POINTER_REGNUM)
 return GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8;
 
+  if (MIPS16_GP_LOADS && regno == GLOBAL_POINTER_REGNUM)
+return (UNITS_PER_WORD > 4 ? GET_MODE_SIZE (mode) <= 4 : true);
+
   return TARGET_MIPS16 ? M16_REG_P (regno) : GP_REG_P (regno);
 }
 
@@ -2689,7 +2692,8 @@ static bool
 mips16_unextended_reference_p (machine_mode mode, rtx base,
   unsigned HOST_WIDE_INT offset)
 {
-  if (mode != BLKmode && offset % GET_MODE_SIZE (mode) == 0)
+  if (mode != BLKmode && offset % GET_MODE_SIZE (mode) == 0
+  && REGNO (base) != GLOBAL_POINTER_REGNUM)
 {
   if (GET_MODE_SIZE (mode) == 4 && base == stack_pointer_rtx)
return offset < 256U * GET_MODE_SIZE (mode);
@@ -3249,7 +3253,7 @@ mips16_gp_pseudo_reg (void)
 rtx
 mips_pic_base_register (rtx temp)
 {
-  if (!TARGET_MIPS16)
+  if (MIPS16_GP_LOADS ||!TARGET_MIPS16)
 return pic_offset_table_rtx;
 
   if (currently_expanding_to_rtl)
@@ -8756,7 +8760,7 @@ mips_init_relocs (void)
}
 }
 
-  if (TARGET_MIPS16)
+  if (!MIPS16_GP_LOADS && TARGET_MIPS16)
 {
   /* The high part is provided by a pseudo copy of $gp.  */
   mips_split_p[SYMBOL_GP_RELATIVE] = true;
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index c396e5ea2f3..8a6e43407c5 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1375,6 +1375,8 @@ struct mips_cpu_info {
 /* ISA includes the pop instruction.  */
 #define ISA_HAS_POP(TARGET_OCTEON && !TARGET_MIPS16)
 
+#define MIPS16_GP_LOADS(ISA_HAS_MIPS16E2 && !TARGET_64BIT)
+
 /* The CACHE instruction is available in non-MIPS16 code.  */
 #define TARGET_CACHE_BUILTIN (mips_isa >= MIPS_ISA_MIPS3)
 
@@ -2067,10 +2069,6 @@ FP_ASM_SPEC "\
function address than to call an address kept in a register.  */
 #define NO_FUNCTION_CSE 1
 
-/* The ABI-defined global pointer.  Sometimes we use a different
-   register in leaf functions: see PIC_OFFSET_TABLE_REGNUM.  */
-#define GLOBAL_POINTER_REGNUM (GP_REG_FIRST + 28)
-
 /* We normally use $28 as the global pointer.  However, when generating
n32/64 PIC, it is better for leaf functions to use a call-clobbered
register instead.  They can then avoid saving and restoring $28
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 48d5f419ce0..9de5013aad1 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -167,6 +167,7 @@
(GET_FCSR_REGNUM2)
(SET_FCSR_REGNUM4)
(PIC_FUNCTION_ADDR_REGNUM   25)
+   (GLOBAL_POINTER_REGNUM  28)
(RETURN_ADDR_REGNUM 31)
(CPRESTORE_SLOT_REGNUM  76)
(GOT_VERSION_REGNUM 79)
@@ -4678,6 +4679,16 @@
   [(set_attr "alu_type" "add")
(set_attr "mode" "")])
 
+(define_insn "*lowsi_mips16_gp"
+  [(set (match_operand:SI 0 "register_operand" "=d")
+(lo_sum:SI (reg:SI GLOBAL_POINTER_REGNUM)
+  (match_operand 1 "immediate_operand" "")))]
+  "MIPS16_GP_LOADS"
+  "addiu\t%0,$28,%R1"
+  [(set_attr "alu_type" "add")
+   (set_attr "mode" "SI")
+   (set_attr "extended_mips16" "yes")])
+
 (define_insn "*low_mips16"
   [(set (match_operand:P 0 "register_operand" "=d")
(lo_sum:P (match_operand:P 1 "register_operand" "0")
diff

[PATCH v2 2/9] MIPS: Add MOVx instructions support for mips16e2

2023-05-11 Thread Jie Mei
This patch adds MOVx instructions from mips16e2
(movn,movz,movtn,movtz) with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.h(ISA_HAS_CONDMOVE): Add condition for 
ISA_HAS_MIPS16E2.
* config/mips/mips.md(*mov_on_): Add logics for 
MOVx insts.
(*mov_on__mips16e2): Generate MOVx instruction.
(*mov_on__ne): Add logics for MOVx insts.
(*mov_on__ne_mips16e2): Generate MOVx instruction.
* config/mips/predicates.md(reg_or_0_operand_mips16e2): New predicate 
for MOVx insts.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-cmov.c: Added tests for MOVx instructions.
---
 gcc/config/mips/mips.h|  1 +
 gcc/config/mips/mips.md   | 38 ++-
 gcc/config/mips/predicates.md |  6 ++
 gcc/testsuite/gcc.target/mips/mips16e2-cmov.c | 68 +++
 4 files changed, 111 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-cmov.c

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 8db92c6468f..c396e5ea2f3 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1081,6 +1081,7 @@ struct mips_cpu_info {
ST Loongson 2E/2F.  */
 #define ISA_HAS_CONDMOVE(ISA_HAS_FP_CONDMOVE   \
 || TARGET_MIPS5900 \
+|| ISA_HAS_MIPS16E2\
 || TARGET_LOONGSON_2EF)
 
 /* ISA has LDC1 and SDC1.  */
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index ac1d77afc7d..48d5f419ce0 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -7341,26 +7341,60 @@
 (const_int 0)])
 (match_operand:GPR 2 "reg_or_0_operand" "dJ,0")
 (match_operand:GPR 3 "reg_or_0_operand" "0,dJ")))]
-  "ISA_HAS_CONDMOVE"
+  "!TARGET_MIPS16 && ISA_HAS_CONDMOVE"
   "@
 mov%T4\t%0,%z2,%1
 mov%t4\t%0,%z3,%1"
   [(set_attr "type" "condmove")
(set_attr "mode" "")])
 
+(define_insn "*mov_on__mips16e2"
+  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d")
+(if_then_else:GPR
+ (match_operator 4 "equality_operator"
+[(match_operand:MOVECC 1 "register_operand" 
",,t,t")
+ (const_int 0)])
+ (match_operand:GPR 2 "reg_or_0_operand_mips16e2" "dJ,0,dJ,0")
+ (match_operand:GPR 3 "reg_or_0_operand_mips16e2" "0,dJ,0,dJ")))]
+  "ISA_HAS_MIPS16E2 && ISA_HAS_CONDMOVE"
+  "@
+mov%T4\t%0,%z2,%1
+mov%t4\t%0,%z3,%1
+movt%T4\t%0,%z2
+movt%t4\t%0,%z3"
+  [(set_attr "type" "condmove")
+   (set_attr "mode" "")
+   (set_attr "extended_mips16" "yes")])
+
 (define_insn "*mov_on__ne"
   [(set (match_operand:GPR 0 "register_operand" "=d,d")
(if_then_else:GPR
 (match_operand:GPR2 1 "register_operand" ",")
 (match_operand:GPR 2 "reg_or_0_operand" "dJ,0")
 (match_operand:GPR 3 "reg_or_0_operand" "0,dJ")))]
-  "ISA_HAS_CONDMOVE"
+  "!TARGET_MIPS16 && ISA_HAS_CONDMOVE"
   "@
 movn\t%0,%z2,%1
 movz\t%0,%z3,%1"
   [(set_attr "type" "condmove")
(set_attr "mode" "")])
 
+(define_insn "*mov_on__ne_mips16e2"
+  [(set (match_operand:GPR 0 "register_operand" "=d,d,d,d")
+   (if_then_else:GPR
+(match_operand:GPR2 1 "register_operand" ",,t,t")
+(match_operand:GPR 2 "reg_or_0_operand_mips16e2" "dJ,0,dJ,0")
+(match_operand:GPR 3 "reg_or_0_operand_mips16e2" "0,dJ,0,dJ")))]
+ "ISA_HAS_MIPS16E2 && ISA_HAS_CONDMOVE"
+  "@
+movn\t%0,%z2,%1
+movz\t%0,%z3,%1
+movtn\t%0,%z2
+movtz\t%0,%z3"
+  [(set_attr "type" "condmove")
+   (set_attr "mode" "")
+   (set_attr "extended_mips16" "yes")])
+
 (define_insn "*mov_on_"
   [(set (match_operand:SCALARF 0 "register_operand" "=f,f")
(if_then_else:SCALARF
diff --git a/gcc/config/mips/predicates.md b/gcc/config/mips/predicates.md
index 87460a64652..e2cd5a8c65f 100644
--- a/gcc/config/mips/predicates.md
+++ b/gcc/config/mips/predicates.md
@@ -114,6 +114,12 @@
(not (match_test "TARGET_MIPS16")))
(match_operand 0 "register_operand")))
 
+(define_predicate "reg_or_0_operand_mips16e2"
+  (ior (and (match_operand 0 "const_0_operand")
+(ior (not (match_test "TARGET_MIPS16"))
+ (match_test "ISA_HAS_MIPS16E2")))
+   (match_operand 0 "register_operand")))
+
 (define_predicate "const_1_operand"
   (and (match_code "const_int,const_double,const_vector")
(match_test "op == CONST1_RTX (GET_MODE (op))")))
diff --git a/gcc/testsuite/gcc.target/mips/mips16e2-cmov.c 
b/gcc/testsuite/gcc.target/mips/mips16e2-cmov.c
new file mode 100644
index 000..6e9dd82ebf3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/mips16e2-cmov.c
@@ -0,0 +1,68 @@
+/* { dg-options "-mno-abicalls -mgpopt -G8 -mabi=32 -mips16 -mmips16e2" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+
+/* Test MOVN.  */
+
+/* { dg-final { scan-assembler-times "test01:.*\tmovn\t.*test01\n"

[PATCH v2 0/9] MIPS: Add MIPS16e2 ASE instrucions.

2023-05-11 Thread Jie Mei
The MIPS16e2 ASE is an enhancement to the MIPS16e ASE,
which includes all MIPS16e instructions, with some addition.

This series of patches adds all instructions of MIPS16E2 ASE.

Jie Mei (9):
  MIPS: Add basic support for mips16e2
  MIPS: Add MOVx instructions support for mips16e2
  MIPS: Add instruction about global pointer register for mips16e2
  MIPS: Add bitwise instructions for mips16e2
  MIPS: Add LUI instruction for mips16e2
  MIPS: Add load/store word left/right instructions for mips16e2
  MIPS: Use ISA_HAS_9BIT_DISPLACEMENT for mips16e2
  MIPS: Add CACHE instruction for mips16e2
  MIPS: Make mips16e2 generating ZEB/ZEH instead of ANDI under certain
conditions

 gcc/config/mips/constraints.md|   4 +
 gcc/config/mips/mips-protos.h |   4 +
 gcc/config/mips/mips.cc   | 164 ++--
 gcc/config/mips/mips.h|  32 ++-
 gcc/config/mips/mips.md   | 200 ---
 gcc/config/mips/mips.opt  |   4 +
 gcc/config/mips/predicates.md |  21 +-
 gcc/doc/invoke.texi   |   7 +
 gcc/testsuite/gcc.target/mips/mips.exp|  10 +
 .../gcc.target/mips/mips16e2-cache.c  |  34 +++
 gcc/testsuite/gcc.target/mips/mips16e2-cmov.c |  68 +
 gcc/testsuite/gcc.target/mips/mips16e2-gp.c   | 101 
 gcc/testsuite/gcc.target/mips/mips16e2.c  | 240 ++
 13 files changed, 825 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-cache.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-cmov.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-gp.c
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2.c

-- 
2.40.1


[PATCH v2 8/9] MIPS: Add CACHE instruction for mips16e2

2023-05-11 Thread Jie Mei
This patch adds CACHE instruction from mips16e2
with corresponding tests.

gcc/ChangeLog:

* config/mips/mips.c(mips_9bit_offset_address_p): Restrict the
address register to M16_REGS for MIPS16.
(BUILTIN_AVAIL_MIPS16E2): Defined a new macro.
(AVAIL_MIPS16E2_OR_NON_MIPS16): Same as above.
(AVAIL_NON_MIPS16 (cache..)): Update to
AVAIL_MIPS16E2_OR_NON_MIPS16.
* config/mips/mips.h (ISA_HAS_CACHE): Add clause for ISA_HAS_MIPS16E2.
* config/mips/mips.md (mips_cache): Mark as extended MIPS16.

gcc/testsuite/ChangeLog:

* gcc.target/mips/mips16e2-cache.c: New tests for mips16e2.
---
 gcc/config/mips/mips.cc   | 25 --
 gcc/config/mips/mips.h|  3 +-
 gcc/config/mips/mips.md   |  3 +-
 .../gcc.target/mips/mips16e2-cache.c  | 34 +++
 4 files changed, 60 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/mips16e2-cache.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 275efc5a390..e6f4701ad3a 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -2845,6 +2845,9 @@ mips_9bit_offset_address_p (rtx x, machine_mode mode)
   return (mips_classify_address (&addr, x, mode, false)
  && addr.type == ADDRESS_REG
  && CONST_INT_P (addr.offset)
+ && (!TARGET_MIPS16E2
+ || M16_REG_P (REGNO (addr.reg))
+ || REGNO (addr.reg) >= FIRST_PSEUDO_REGISTER)
  && MIPS_9BIT_OFFSET_P (INTVAL (addr.offset)));
 }
 
@@ -15412,9 +15415,13 @@ mips_loongson_ext2_prefetch_cookie (rtx write, rtx)
The function is available on the current target if !TARGET_MIPS16.
 
BUILTIN_AVAIL_MIPS16
-   The function is available on the current target if TARGET_MIPS16.  */
+   The function is available on the current target if TARGET_MIPS16.
+
+   BUILTIN_AVAIL_MIPS16E2
+   The function is available on the current target if TARGET_MIPS16E2.  */
 #define BUILTIN_AVAIL_NON_MIPS16 1
 #define BUILTIN_AVAIL_MIPS16 2
+#define BUILTIN_AVAIL_MIPS16E2 4
 
 /* Declare an availability predicate for built-in functions that
require non-MIPS16 mode and also require COND to be true.
@@ -15426,6 +15433,17 @@ mips_loongson_ext2_prefetch_cookie (rtx write, rtx)
return (COND) ? BUILTIN_AVAIL_NON_MIPS16 : 0;   \
  }
 
+/* Declare an availability predicate for built-in functions that
+   require non-MIPS16 mode or MIPS16E2 and also require COND to be true.
+   NAME is the main part of the predicate's name.  */
+#define AVAIL_MIPS16E2_OR_NON_MIPS16(NAME, COND)   \
+ static unsigned int   \
+ mips_builtin_avail_##NAME (void)  \
+ { \
+   return ((COND) ? BUILTIN_AVAIL_NON_MIPS16 | BUILTIN_AVAIL_MIPS16E2  \
+  : 0);\
+ }
+
 /* Declare an availability predicate for built-in functions that
support both MIPS16 and non-MIPS16 code and also require COND
to be true.  NAME is the main part of the predicate's name.  */
@@ -15471,7 +15489,7 @@ AVAIL_NON_MIPS16 (dsp_32, !TARGET_64BIT && TARGET_DSP)
 AVAIL_NON_MIPS16 (dsp_64, TARGET_64BIT && TARGET_DSP)
 AVAIL_NON_MIPS16 (dspr2_32, !TARGET_64BIT && TARGET_DSPR2)
 AVAIL_NON_MIPS16 (loongson, TARGET_LOONGSON_MMI)
-AVAIL_NON_MIPS16 (cache, TARGET_CACHE_BUILTIN)
+AVAIL_MIPS16E2_OR_NON_MIPS16 (cache, TARGET_CACHE_BUILTIN)
 AVAIL_NON_MIPS16 (msa, TARGET_MSA)
 
 /* Construct a mips_builtin_description from the given arguments.
@@ -17471,7 +17489,8 @@ mips_expand_builtin (tree exp, rtx target, rtx 
subtarget ATTRIBUTE_UNUSED,
   d = &mips_builtins[fcode];
   avail = d->avail ();
   gcc_assert (avail != 0);
-  if (TARGET_MIPS16 && !(avail & BUILTIN_AVAIL_MIPS16))
+  if (TARGET_MIPS16 && !(avail & BUILTIN_AVAIL_MIPS16)
+  && (!TARGET_MIPS16E2 || !(avail & BUILTIN_AVAIL_MIPS16E2)))
 {
   error ("built-in function %qE not supported for MIPS16",
 DECL_NAME (fndecl));
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 1947be25aca..207b8871b12 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1385,7 +1385,8 @@ struct mips_cpu_info {
 #define TARGET_CACHE_BUILTIN (mips_isa >= MIPS_ISA_MIPS3)
 
 /* The CACHE instruction is available.  */
-#define ISA_HAS_CACHE (TARGET_CACHE_BUILTIN && !TARGET_MIPS16)
+#define ISA_HAS_CACHE (TARGET_CACHE_BUILTIN && (!TARGET_MIPS16 \
+   || TARGET_MIPS16E2))
 
 /* Tell collect what flags to pass to nm.  */
 #ifndef NM_FLAGS
diff --git a/gcc/config/mips/mips.md b/gcc/config/mips/mips.md
index 5ef8d99d99c..7eb65891820 100644
--- a/gcc/config/mips/mips.md
+++ b/gcc/config/mips/mips.md
@@ -5751,7 +5751,8 @@
 (match_operand:QI 1 "address_op

[PATCH v2 1/9] MIPS: Add basic support for mips16e2

2023-05-11 Thread Jie Mei
The MIPS16e2 ASE is an enhancement to the MIPS16e ASE,
which includes all MIPS16e instructions, with some addition.
It defines new special instructions for increasing
code density (e.g. Extend, PC-relative instructions, etc.).

This patch adds basic support for mips16e2 used by the
following series of patches.

gcc/ChangeLog:

* config/mips/mips.cc(mips_file_start): Add mips16e2 info
for output file.
* config/mips/mips.h(__mips_mips16e2): Defined a new
predefine macro.
(ISA_HAS_MIPS16E2): Defined a new macro.
(ASM_SPEC): Pass mmips16e2 to the assembler.
* config/mips/mips.opt: Add -m(no-)mips16e2 option.
* config/mips/predicates.md: Add clause for TARGET_MIPS16E2.
* doc/invoke.texi: Add -m(no-)mips16e2 option..

gcc/testsuite/ChangeLog:
* gcc.target/mips/mips.exp(mips_option_groups): Add -mmips16e2
option.
(mips-dg-init): Handle the recognization of mips16e2 targets.
(mips-dg-options): Add dependencies for mips16e2.
---
 gcc/config/mips/mips.cc|  3 ++-
 gcc/config/mips/mips.h |  8 
 gcc/config/mips/mips.opt   |  4 
 gcc/config/mips/predicates.md  |  2 +-
 gcc/doc/invoke.texi|  7 +++
 gcc/testsuite/gcc.target/mips/mips.exp | 10 ++
 6 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index ca822758b41..585a3682c7b 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -10047,7 +10047,8 @@ mips_file_start (void)
 fputs ("\t.module\tmsa\n", asm_out_file);
   if (TARGET_XPA)
 fputs ("\t.module\txpa\n", asm_out_file);
-  /* FIXME: MIPS16E2 is not supported by GCC? gas does support it */
+  if (TARGET_MIPS16E2)
+fputs ("\t.module\tmips16e2\n", asm_out_file);
   if (TARGET_CRC)
 fputs ("\t.module\tcrc\n", asm_out_file);
   if (TARGET_GINV)
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index 6daf6d37165..8db92c6468f 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -475,6 +475,9 @@ struct mips_cpu_info {
   if (mips_base_compression_flags & MASK_MIPS16)   \
builtin_define ("__mips16");\
\
+  if (TARGET_MIPS16E2) \
+   builtin_define ("__mips_mips16e2"); \
+   \
   if (TARGET_MIPS3D)   \
builtin_define ("__mips3d");\
\
@@ -1291,6 +1294,10 @@ struct mips_cpu_info {
 /* The MSA ASE is available.  */
 #define ISA_HAS_MSA(TARGET_MSA && !TARGET_MIPS16)
 
+/* The MIPS16e V2 instructions are available.  */
+#define ISA_HAS_MIPS16E2   (TARGET_MIPS16 && TARGET_MIPS16E2 \
+   && !TARGET_64BIT)
+
 /* True if the result of a load is not available to the next instruction.
A nop will then be needed between instructions like "lw $4,..."
and "addiu $4,$4,1".  */
@@ -1401,6 +1408,7 @@ struct mips_cpu_info {
 
 #ifdef HAVE_AS_DOT_MODULE
 #define FP_ASM_SPEC "\
+%{mmips16e2} \
 %{mhard-float} %{msoft-float} \
 %{msingle-float} %{mdouble-float}"
 #else
diff --git a/gcc/config/mips/mips.opt b/gcc/config/mips/mips.opt
index 195f5be01cc..4968ed0d544 100644
--- a/gcc/config/mips/mips.opt
+++ b/gcc/config/mips/mips.opt
@@ -380,6 +380,10 @@ msplit-addresses
 Target Mask(SPLIT_ADDRESSES)
 Optimize lui/addiu address loads.
 
+mmips16e2
+Target Var(TARGET_MIPS16E2) Init(0)
+Enable the MIPS16e V2 instructions.
+
 msym32
 Target Var(TARGET_SYM32)
 Assume all symbols have 32-bit values.
diff --git a/gcc/config/mips/predicates.md b/gcc/config/mips/predicates.md
index e34de2937cc..87460a64652 100644
--- a/gcc/config/mips/predicates.md
+++ b/gcc/config/mips/predicates.md
@@ -369,7 +369,7 @@
 {
   /* When generating mips16 code, TARGET_LEGITIMATE_CONSTANT_P rejects
  CONST_INTs that can't be loaded using simple insns.  */
-  if (TARGET_MIPS16)
+  if (TARGET_MIPS16 && !TARGET_MIPS16E2)
 return false;
 
   /* Don't handle multi-word moves this way; we don't want to introduce
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a38547f53e5..0b1cef7c330 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -26709,6 +26709,13 @@ MIPS16 code generation can also be controlled on a 
per-function basis
 by means of @code{mips16} and @code{nomips16} attributes.
 @xref{Function Attributes}, for more information.
 
+@opindex mmips16e2
+@opindex mno-mips16e2
+@item -mmips16e2
+@itemx -mno-mips16e2
+Use (do not use) the MIPS16e2 ASE.  This option modifies the behavior
+of the @option{-mips16} option such that it targets the MIPS16e

[PATCH v2 7/9] MIPS: Use ISA_HAS_9BIT_DISPLACEMENT for mips16e2

2023-05-11 Thread Jie Mei
The MIPS16e2 ASE has PREF, LL and SC instructions,
they use 9 bits immediate, like mips32r6.
The MIPS32 PRE-R6 uses 16 bits immediate.

gcc/ChangeLog:

* config/mips/mips.h(ISA_HAS_9BIT_DISPLACEMENT): Add clause
for ISA_HAS_MIPS16E2.
(ISA_HAS_SYNC): Same as above.
(ISA_HAS_LL_SC): Same as above.
---
 gcc/config/mips/mips.h | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index a5c121088b7..1947be25aca 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -1247,7 +1247,8 @@ struct mips_cpu_info {
 && !TARGET_MIPS16)
 
 /* ISA has data prefetch, LL and SC with limited 9-bit displacement.  */
-#define ISA_HAS_9BIT_DISPLACEMENT  (mips_isa_rev >= 6)
+#define ISA_HAS_9BIT_DISPLACEMENT  (mips_isa_rev >= 6  \
+|| ISA_HAS_MIPS16E2)
 
 /* ISA has data indexed prefetch instructions.  This controls use of
'prefx', along with TARGET_HARD_FLOAT and TARGET_DOUBLE_FLOAT.
@@ -1340,7 +1341,8 @@ struct mips_cpu_info {
 #define ISA_HAS_SYNCI (mips_isa_rev >= 2 && !TARGET_MIPS16)
 
 /* ISA includes sync.  */
-#define ISA_HAS_SYNC ((mips_isa >= MIPS_ISA_MIPS2 || TARGET_MIPS3900) && 
!TARGET_MIPS16)
+#define ISA_HAS_SYNC ((mips_isa >= MIPS_ISA_MIPS2 || TARGET_MIPS3900)  \
+ && (!TARGET_MIPS16 || ISA_HAS_MIPS16E2))
 #define GENERATE_SYNC  \
   (target_flags_explicit & MASK_LLSC   \
? TARGET_LLSC && !TARGET_MIPS16 \
@@ -1349,7 +1351,8 @@ struct mips_cpu_info {
 /* ISA includes ll and sc.  Note that this implies ISA_HAS_SYNC
because the expanders use both ISA_HAS_SYNC and ISA_HAS_LL_SC
instructions.  */
-#define ISA_HAS_LL_SC (mips_isa >= MIPS_ISA_MIPS2 && !TARGET_MIPS5900 && 
!TARGET_MIPS16)
+#define ISA_HAS_LL_SC (mips_isa >= MIPS_ISA_MIPS2 && !TARGET_MIPS5900  \
+  && (!TARGET_MIPS16 || ISA_HAS_MIPS16E2))
 #define GENERATE_LL_SC \
   (target_flags_explicit & MASK_LLSC   \
? TARGET_LLSC && !TARGET_MIPS16 \
-- 
2.40.1


Re: [PATCH v2] RISC-V: Add vector_scalar_shift_operand

2023-05-11 Thread Robin Dapp via Gcc-patches
> The vector shift immediates happen to have the same constraints as some
> of the CSR-related operands, but it's a different usage.  This adds a
> name for them, so I don't get confused again next time.
> 
> gcc/ChangeLog:
> 
>   * config/riscv/autovec.md (shifts): Use
> vector_scalar_shift_operand.
>   * config/riscv/predicates.md (vector_scalar_shift_operand): New
> predicate.

Hehe, I had something similarly named in the first patch iteration
but removed it later.  Helps clarity though so fair enough.


Re: [PATCH 1/2] PR gcc/98350:Add a param to control the length of the chain with FMA in reassoc pass

2023-05-11 Thread Richard Biener via Gcc-patches
On Thu, May 11, 2023 at 5:20 PM Cui, Lili  wrote:
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: Thursday, May 11, 2023 6:53 PM
> > To: Cui, Lili 
> > Cc: gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH 1/2] PR gcc/98350:Add a param to control the length of
> > the chain with FMA in reassoc pass
>
> Hi Richard,
> Thanks for helping to review the patch.
>
> >
> > As you are not changing the number of ops you should be able to use
> > quick_push here and below.  You should be able to do
> >
> >  ops->splice (ops_mult);
> >  ops->splice (ops_others);
> >
> > as well.
> >
> Done.
>
> > > + /* When enabling param_reassoc_max_chain_length_with_fma
> > to
> > > +keep the chain with fma, rank_ops_for_fma will 
> > > detect if
> > > +the chain has fmas and if so it will rearrange the 
> > > ops.  */
> > > + if (param_reassoc_max_chain_length_with_fma > 1
> > > + && direct_internal_fn_supported_p (IFN_FMA,
> > > +TREE_TYPE (lhs),
> > > +opt_type)
> > > + && (rhs_code == PLUS_EXPR || rhs_code == 
> > > MINUS_EXPR))
> > > +   {
> > > + keep_fma_chain = rank_ops_for_fma(&ops);
> > > +   }
> > > +
> > > + int len = ops.length ();
> > >   /* Only rewrite the expression tree to parallel in the
> > >  last reassoc pass to avoid useless work 
> > > back-and-forth
> > >  with initial linearization.  */
> >
> > we are doing the parallel rewrite only in the last reassoc pass, i think it 
> > makes
> > sense to do the same for reassoc-for-fma.
>
> I rearranged the order of ops in reassoc1 without break the chain, it 
> generated more vectorize during vector pass( seen in benchmark 503). So I 
> rewrite the ssa tree and keep the chain with function "rewrite_expr_tree" in 
> reassoc1, break the chain with "rewrite_expr_tree_parallel_for_fma" in 
> reassoc2.
>
> >
> > Why do the existing expr rewrites not work after re-sorting the ops?
>
> For case https://godbolt.org/z/3x9PWE9Kb:  we put  "j" at first.
>
> j + l * m + a * b + c * d + e * f + g * h;
>
> GCC trunk: width = 2, ops_num = 6, old function " rewrite_expr_tree_parallel 
> " generates 3 FMAs.
> ---
>   _1 = l_10(D) * m_11(D);
>   _3 = a_13(D) * b_14(D);
>   _4 = j_12(D) + _3;> Here is one FMA.
>   _5 = c_15(D) * d_16(D);
>   _8 = _1 + _5;> Here is one FMA and lost one.
>   _7 = e_17(D) * f_18(D);
>   _9 = g_19(D) * h_20(D);
>   _2 = _7 + _9;   > Here is one FMA and lost one.
>   _6 = _2 + _4;
>   _21 = _6 + _8;
>   # VUSE <.MEM_22(D)>
>   return _21;
> --
> width = 2, ops_num = 6, new function " rewrite_expr_tree_parallel_for_fma " 
> generates 4 FMAs.
> --
> _1 = a_10(D) * b_11(D);
>   _3 = c_13(D) * d_14(D);
>   _5 = e_15(D) * f_16(D);
>   _7 = g_17(D) * h_18(D);
>   _4 = _5 + _7;   > Here is one FMA and lost one.
>   _8 = _4 + _1;   > Here is one FMA.
>   _9 = l_19(D) * m_20(D);
>   _2 = _9 + j_12(D);> Here is one FMA.
>   _6 = _2 + _3;> Here is one FMA.
>   _21 = _8 + _6;
>   return _21;
> 

ISTR there were no sufficient comments in the code explaining why
rewrite_expr_tree_parallel_for_fma is better by design.  In fact ...

>
> >
> > >   if (!reassoc_insert_powi_p
> > > - && ops.length () > 3
> > > + && len > 3
> > > + && (!keep_fma_chain
> > > + || (keep_fma_chain
> > > + && len >
> > > + param_reassoc_max_chain_length_with_fma))
> >
> > in the case len < param_reassoc_max_chain_length_with_fma we have the
> > chain re-sorted but fall through to non-parallel rewrite.  I wonder if we do
> > not want to instead adjust the reassociation width?  I'd say it depends on 
> > the
> > number of mult cases in the chain (sth the re-sorting could have computed).
> > Why do we have two completely independent --params here?  Can you give
> > an example --param value combination that makes "sense" and show how it
> > is beneficial?
>
> For this small case https://godbolt.org/z/Pxczrre8P
> a * b + c * d + e * f  + j
>
> GCC trunk: ops_num = 4, targetm.sched.reassociation_width is 4 (scalar fp 
> cost is 4). Calculated: Width = 2. we can get 2 FMAs.
> --
>   _1 = a_6(D) * b_7(D);
>   _2 = c_8(D) * d_9(D);
>   _5 = _1 + _2;
>   _4 =

Re: [PATCH] Provide -fcf-protection=branch,return.

2023-05-11 Thread Andrew Pinski via Gcc-patches
On Thu, May 11, 2023 at 10:45 PM liuhongt via Gcc-patches
 wrote:
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/89701
> * common.opt: Refactor -fcf-protection= to support combination
> of param.
> * lto-wrapper.c (merge_and_complain): Adjusted.
> * opts.c (parse_cf_protection_options): New.
> (common_handle_option): Decode argument for -fcf-protection=.
> * opts.h (parse_cf_protection_options): Declare.

I think this could be simplified if you use either EnumSet or
EnumBitSet instead in common.opt for `-fcf-protection=`.

Thanks,
Andrew

>
> gcc/testsuite/ChangeLog:
>
> PR target/89701
> * c-c++-common/fcf-protection-8.c: New test.
> * c-c++-common/fcf-protection-9.c: New test.
> * c-c++-common/fcf-protection-10.c: New test.
> * gcc.target/i386/pr89701-1.c: New test.
> * gcc.target/i386/pr89701-2.c: New test.
> * gcc.target/i386/pr89701-3.c: New test.
> * gcc.target/i386/pr89701-4.c: New test.
> ---
>  gcc/common.opt| 24 ++
>  gcc/lto-wrapper.cc| 21 +++--
>  gcc/opts.cc   | 79 +++
>  gcc/opts.h|  1 +
>  .../c-c++-common/fcf-protection-10.c  |  3 +
>  .../c-c++-common/fcf-protection-11.c  |  2 +
>  .../c-c++-common/fcf-protection-12.c  |  2 +
>  gcc/testsuite/c-c++-common/fcf-protection-8.c |  3 +
>  gcc/testsuite/c-c++-common/fcf-protection-9.c |  3 +
>  gcc/testsuite/gcc.target/i386/pr89701-1.c |  4 +
>  gcc/testsuite/gcc.target/i386/pr89701-2.c |  4 +
>  gcc/testsuite/gcc.target/i386/pr89701-3.c |  5 ++
>  gcc/testsuite/gcc.target/i386/pr89701-4.c |  5 ++
>  13 files changed, 130 insertions(+), 26 deletions(-)
>  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-10.c
>  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-11.c
>  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-12.c
>  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-8.c
>  create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-9.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-1.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-2.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-3.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-4.c
>
> diff --git a/gcc/common.opt b/gcc/common.opt
> index a28ca13385a..ac12da52733 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -229,6 +229,10 @@ bool dump_base_name_prefixed = false
>  Variable
>  unsigned int flag_zero_call_used_regs
>
> +;; What the CF check should instrument
> +Variable
> +unsigned int flag_cf_protection = 0
> +
>  ###
>  Driver
>
> @@ -1886,28 +1890,10 @@ fcf-protection
>  Common RejectNegative Alias(fcf-protection=,full)
>
>  fcf-protection=
> -Common Joined RejectNegative Enum(cf_protection_level) 
> Var(flag_cf_protection) Init(CF_NONE)
> +Common Joined
>  -fcf-protection=[full|branch|return|none|check]Instrument functions 
> with checks to verify jump/call/return control-flow transfer
>  instructions have valid targets.
>
> -Enum
> -Name(cf_protection_level) Type(enum cf_protection_level) 
> UnknownError(unknown Control-Flow Protection Level %qs)
> -
> -EnumValue
> -Enum(cf_protection_level) String(full) Value(CF_FULL)
> -
> -EnumValue
> -Enum(cf_protection_level) String(branch) Value(CF_BRANCH)
> -
> -EnumValue
> -Enum(cf_protection_level) String(return) Value(CF_RETURN)
> -
> -EnumValue
> -Enum(cf_protection_level) String(check) Value(CF_CHECK)
> -
> -EnumValue
> -Enum(cf_protection_level) String(none) Value(CF_NONE)
> -
>  finstrument-functions
>  Common Var(flag_instrument_function_entry_exit,1)
>  Instrument function entry and exit with profiling calls.
> diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
> index 5186d040ce0..568c8af659d 100644
> --- a/gcc/lto-wrapper.cc
> +++ b/gcc/lto-wrapper.cc
> @@ -359,26 +359,33 @@ merge_and_complain (vec 
> &decoded_options,
> case OPT_fcf_protection_:
>   /* Default to link-time option, else append or check identical.  */
>   if (!cf_protection_option
> - || cf_protection_option->value == CF_CHECK)
> + || !memcmp (cf_protection_option->arg, "check", 5))
> {
> + const char* parg = decoded_options[existing_opt].arg;
>   if (existing_opt == -1)
> decoded_options.safe_push (*foption);
> - else if (decoded_options[existing_opt].value != foption->value)
> + else if ((strlen (parg) != strlen (foption->arg))
> +  || memcmp (parg, foption->arg, strlen (foption->arg)))
> {
>   if (cf_protection_option
> - && cf_protection_option->value == CF_CHECK)
> +  

[PATCH] Provide -fcf-protection=branch,return.

2023-05-11 Thread liuhongt via Gcc-patches
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/89701
* common.opt: Refactor -fcf-protection= to support combination
of param.
* lto-wrapper.c (merge_and_complain): Adjusted.
* opts.c (parse_cf_protection_options): New.
(common_handle_option): Decode argument for -fcf-protection=.
* opts.h (parse_cf_protection_options): Declare.

gcc/testsuite/ChangeLog:

PR target/89701
* c-c++-common/fcf-protection-8.c: New test.
* c-c++-common/fcf-protection-9.c: New test.
* c-c++-common/fcf-protection-10.c: New test.
* gcc.target/i386/pr89701-1.c: New test.
* gcc.target/i386/pr89701-2.c: New test.
* gcc.target/i386/pr89701-3.c: New test.
* gcc.target/i386/pr89701-4.c: New test.
---
 gcc/common.opt| 24 ++
 gcc/lto-wrapper.cc| 21 +++--
 gcc/opts.cc   | 79 +++
 gcc/opts.h|  1 +
 .../c-c++-common/fcf-protection-10.c  |  3 +
 .../c-c++-common/fcf-protection-11.c  |  2 +
 .../c-c++-common/fcf-protection-12.c  |  2 +
 gcc/testsuite/c-c++-common/fcf-protection-8.c |  3 +
 gcc/testsuite/c-c++-common/fcf-protection-9.c |  3 +
 gcc/testsuite/gcc.target/i386/pr89701-1.c |  4 +
 gcc/testsuite/gcc.target/i386/pr89701-2.c |  4 +
 gcc/testsuite/gcc.target/i386/pr89701-3.c |  5 ++
 gcc/testsuite/gcc.target/i386/pr89701-4.c |  5 ++
 13 files changed, 130 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-10.c
 create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-11.c
 create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-12.c
 create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-8.c
 create mode 100644 gcc/testsuite/c-c++-common/fcf-protection-9.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/pr89701-4.c

diff --git a/gcc/common.opt b/gcc/common.opt
index a28ca13385a..ac12da52733 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -229,6 +229,10 @@ bool dump_base_name_prefixed = false
 Variable
 unsigned int flag_zero_call_used_regs
 
+;; What the CF check should instrument
+Variable
+unsigned int flag_cf_protection = 0
+
 ###
 Driver
 
@@ -1886,28 +1890,10 @@ fcf-protection
 Common RejectNegative Alias(fcf-protection=,full)
 
 fcf-protection=
-Common Joined RejectNegative Enum(cf_protection_level) Var(flag_cf_protection) 
Init(CF_NONE)
+Common Joined
 -fcf-protection=[full|branch|return|none|check]Instrument functions 
with checks to verify jump/call/return control-flow transfer
 instructions have valid targets.
 
-Enum
-Name(cf_protection_level) Type(enum cf_protection_level) UnknownError(unknown 
Control-Flow Protection Level %qs)
-
-EnumValue
-Enum(cf_protection_level) String(full) Value(CF_FULL)
-
-EnumValue
-Enum(cf_protection_level) String(branch) Value(CF_BRANCH)
-
-EnumValue
-Enum(cf_protection_level) String(return) Value(CF_RETURN)
-
-EnumValue
-Enum(cf_protection_level) String(check) Value(CF_CHECK)
-
-EnumValue
-Enum(cf_protection_level) String(none) Value(CF_NONE)
-
 finstrument-functions
 Common Var(flag_instrument_function_entry_exit,1)
 Instrument function entry and exit with profiling calls.
diff --git a/gcc/lto-wrapper.cc b/gcc/lto-wrapper.cc
index 5186d040ce0..568c8af659d 100644
--- a/gcc/lto-wrapper.cc
+++ b/gcc/lto-wrapper.cc
@@ -359,26 +359,33 @@ merge_and_complain (vec 
&decoded_options,
case OPT_fcf_protection_:
  /* Default to link-time option, else append or check identical.  */
  if (!cf_protection_option
- || cf_protection_option->value == CF_CHECK)
+ || !memcmp (cf_protection_option->arg, "check", 5))
{
+ const char* parg = decoded_options[existing_opt].arg;
  if (existing_opt == -1)
decoded_options.safe_push (*foption);
- else if (decoded_options[existing_opt].value != foption->value)
+ else if ((strlen (parg) != strlen (foption->arg))
+  || memcmp (parg, foption->arg, strlen (foption->arg)))
{
  if (cf_protection_option
- && cf_protection_option->value == CF_CHECK)
+ && !memcmp (cf_protection_option->arg, "check", 5))
fatal_error (input_location,
 "option %qs with mismatching values"
 " (%s, %s)",
 "-fcf-protection",
-decoded_options[existing_opt].arg,
+parg,
 foption->arg);
 

Re: [PATCH] x86: Add a new option -mdaz-ftz to enable FTZ and DAZ flags in MXCSR.

2023-05-11 Thread Hongtao Liu via Gcc-patches
On Wed, May 10, 2023 at 5:10 PM liuhongt  wrote:
>
> > The quoted patch shows -shared in context and  you didn't post a
> > backport version
> > to look at.  But yes, we shouldn't change -shared behavior on a
> > branch, even less so make it
> > inconsistent between targets.
> Here's the patch.
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for GCC 11/12 backport?
I'm going to push the patch next week if there's no objection.
>
> if (mdaz-ftz)
>   link crtfastmath.o
> else if ((Ofast || ffast-math || funsafe-math-optimizations)
>  && !mno-daz-ftz)
>   link crtfastmath.o
> else
>   Don't link crtfastmath.o
>
> gcc/ChangeLog:
>
> * config/i386/cygwin.h (ENDFILE_SPEC): Link crtfastmath.o
> whenever -mdaz-ftz is specified. Don't link crtfastmath.o
> when -mno-daz-ftz is specified.
> * config/i386/darwin.h (ENDFILE_SPEC): Ditto.
> * config/i386/gnu-user-common.h
> (GNU_USER_TARGET_MATHFILE_SPEC): Ditto.
> * config/i386/mingw32.h (ENDFILE_SPEC): Ditto.
> * config/i386/i386.opt (mdaz-ftz): New option.
> * doc/invoke.texi (x86 options): Document mftz-daz.
> ---
>  gcc/config/i386/cygwin.h  |  2 +-
>  gcc/config/i386/darwin.h  |  4 ++--
>  gcc/config/i386/gnu-user-common.h |  2 +-
>  gcc/config/i386/i386.opt  |  4 
>  gcc/config/i386/mingw32.h |  2 +-
>  gcc/doc/invoke.texi   | 11 ++-
>  6 files changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/i386/cygwin.h b/gcc/config/i386/cygwin.h
> index d06eda369cf..5412c5d4479 100644
> --- a/gcc/config/i386/cygwin.h
> +++ b/gcc/config/i386/cygwin.h
> @@ -57,7 +57,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  #undef ENDFILE_SPEC
>  #define ENDFILE_SPEC \
> -  "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s}\
> +  
> "%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}}
>  \
> %{!shared:%:if-exists(default-manifest.o%s)}\
> %{fvtable-verify=none:%s; \
>  fvtable-verify=preinit:vtv_end.o%s; \
> diff --git a/gcc/config/i386/darwin.h b/gcc/config/i386/darwin.h
> index a55f6b2b874..2f773924d6e 100644
> --- a/gcc/config/i386/darwin.h
> +++ b/gcc/config/i386/darwin.h
> @@ -109,8 +109,8 @@ along with GCC; see the file COPYING3.  If not see
>  "%{!force_cpusubtype_ALL:-force_cpusubtype_ALL} "
>
>  #undef ENDFILE_SPEC
> -#define ENDFILE_SPEC \
> -  "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
> +#define ENDFILE_SPEC
> +\  
> "%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}}
>  \
> %{mpc32:crtprec32.o%s} \
> %{mpc64:crtprec64.o%s} \
> %{mpc80:crtprec80.o%s}" TM_DESTRUCTOR
> diff --git a/gcc/config/i386/gnu-user-common.h 
> b/gcc/config/i386/gnu-user-common.h
> index 23b54c5be52..3d2a33f1714 100644
> --- a/gcc/config/i386/gnu-user-common.h
> +++ b/gcc/config/i386/gnu-user-common.h
> @@ -47,7 +47,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  /* Similar to standard GNU userspace, but adding -ffast-math support.  */
>  #define GNU_USER_TARGET_MATHFILE_SPEC \
> -  "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
> +  
> "%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}}
>  \
> %{mpc32:crtprec32.o%s} \
> %{mpc64:crtprec64.o%s} \
> %{mpc80:crtprec80.o%s}"
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index a3675e515bc..5cfb7cdcbc2 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -420,6 +420,10 @@ mpc80
>  Target RejectNegative
>  Set 80387 floating-point precision to 80-bit.
>
> +mdaz-ftz
> +Target
> +Set the FTZ and DAZ Flags.
> +
>  mpreferred-stack-boundary=
>  Target RejectNegative Joined UInteger Var(ix86_preferred_stack_boundary_arg)
>  Attempt to keep stack aligned to this power of 2.
> diff --git a/gcc/config/i386/mingw32.h b/gcc/config/i386/mingw32.h
> index d3ca0cd0279..ddbe6a4054b 100644
> --- a/gcc/config/i386/mingw32.h
> +++ b/gcc/config/i386/mingw32.h
> @@ -197,7 +197,7 @@ along with GCC; see the file COPYING3.  If not see
>
>  #undef ENDFILE_SPEC
>  #define ENDFILE_SPEC \
> -  "%{Ofast|ffast-math|funsafe-math-optimizations:crtfastmath.o%s} \
> +  
> "%{mdaz-ftz:crtfastmath.o%s;Ofast|ffast-math|funsafe-math-optimizations:%{!mno-daz-ftz:crtfastmath.o%s}}
>  \
> %{!shared:%:if-exists(default-manifest.o%s)}\
> %{fvtable-verify=none:%s; \
>  fvtable-verify=preinit:vtv_end.o%s; \
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index cb83dd8a1cc..87eedfffa6c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -1434,7 +1434,7 @@ See RS/6000 and PowerPC Options.
>  -m96bit-long-double  -mlong-double-64  -mlong-double-80  -mlong-double-128 
> @gol
>  -mregparm=@var{num}  -msseregparm @gol
>  -mveclibabi=@var{type}  -mvect8-re

[PATCH] Machine_Mode: Extend machine_mode from 8 to 16 bits

2023-05-11 Thread Pan Li via Gcc-patches
From: Pan Li 

We are running out of the machine_mode(8 bits) in RISC-V backend. Thus
we would like to extend the machine mode bit size from 8 to 16 bits.
However, it is sensitive to extend the memory size in common structure
like tree or rtx. This patch would like to extend the machine mode bits
to 16 bits by shrinking, like:

* Swap the bit size of code and machine code in rtx_def.
* Reconcile the machine_mode location and spare in tree.

The memory impact of this patch for correlated structure looks like below:

+---+--+-+--+
| struct/bytes  | upstream | patched | diff |
+---+--+-+--+
| rtx_obj_reference |8 |  12 |   +4 |
| ext_modified  |2 |   3 |   +1 |
| ira_allocno   |  192 | 200 |   +8 |
| qty_table_elem|   40 |  40 |0 |
| reg_stat_type |   64 |  64 |0 |
| rtx_def   |   40 |  40 |0 |
| table_elt |   80 |  80 |0 |
| tree_decl_common  |  112 | 112 |0 |
| tree_type_common  |  128 | 128 |0 |
+---+--+-+--+

The tree and rtx related struct has no memory changes after this patch,
and the machine_mode changes to 16 bits already.

Signed-off-by: Pan Li 
Co-authored-by: Ju-Zhe Zhong 
Co-authored-by: Kito Cheng 

gcc/ChangeLog:

* combine.cc (struct reg_stat_type): Extended machine mode to 16 bits.
* cse.cc (struct qty_table_elem): Ditto.
(struct table_elt): Ditto.
(struct set): Ditto.
* genopinit.cc (main): Reconciled the machine mode limit.
* ira-int.h (struct ira_allocno): Extended machine mode to 16 bits.
* ree.cc (struct ATTRIBUTE_PACKED): Ditto.
* rtl-ssa/accesses.h: Ditto.
* rtl.h (RTX_CODE_BITSIZE): New macro.
(RTX_MACHINE_MODE_BITSIZE): Ditto.
(struct GTY): Swap bit size between code and machine mode.
(subreg_shape::unique_id): Reconciled the machine mode limit.
* rtlanal.h: Extended machine mode to 16 bits.
* tree-core.h (struct tree_type_common): Ditto.
(struct tree_decl_common): Reconciled the locate and extended
bit size of machine mode.
---
 gcc/combine.cc |  4 ++--
 gcc/cse.cc |  8 
 gcc/genopinit.cc   |  3 ++-
 gcc/ira-int.h  | 12 
 gcc/ree.cc |  2 +-
 gcc/rtl-ssa/accesses.h |  6 --
 gcc/rtl.h  |  9 ++---
 gcc/rtlanal.h  |  5 +++--
 gcc/tree-core.h| 11 ---
 9 files changed, 38 insertions(+), 22 deletions(-)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 5aa0ec5c45a..bdf6f635c80 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -200,7 +200,7 @@ struct reg_stat_type {
 
   unsigned HOST_WIDE_INT   last_set_nonzero_bits;
   char last_set_sign_bit_copies;
-  ENUM_BITFIELD(machine_mode)  last_set_mode : 8;
+  ENUM_BITFIELD(machine_mode)  last_set_mode : RTX_MACHINE_MODE_BITSIZE;
 
   /* Set nonzero if references to register n in expressions should not be
  used.  last_set_invalid is set nonzero when this register is being
@@ -235,7 +235,7 @@ struct reg_stat_type {
  truncation if we know that value already contains a truncated
  value.  */
 
-  ENUM_BITFIELD(machine_mode)  truncated_to_mode : 8;
+  ENUM_BITFIELD(machine_mode)  truncated_to_mode : RTX_MACHINE_MODE_BITSIZE;
 };
 
 
diff --git a/gcc/cse.cc b/gcc/cse.cc
index b10c9b0c94d..fe594c1bc3d 100644
--- a/gcc/cse.cc
+++ b/gcc/cse.cc
@@ -250,8 +250,8 @@ struct qty_table_elem
   unsigned int first_reg, last_reg;
   /* The sizes of these fields should match the sizes of the
  code and mode fields of struct rtx_def (see rtl.h).  */
-  ENUM_BITFIELD(rtx_code) comparison_code : 16;
-  ENUM_BITFIELD(machine_mode) mode : 8;
+  ENUM_BITFIELD(rtx_code) comparison_code : RTX_CODE_BITSIZE;
+  ENUM_BITFIELD(machine_mode) mode : RTX_MACHINE_MODE_BITSIZE;
 };
 
 /* The table of all qtys, indexed by qty number.  */
@@ -406,7 +406,7 @@ struct table_elt
   int regcost;
   /* The size of this field should match the size
  of the mode field of struct rtx_def (see rtl.h).  */
-  ENUM_BITFIELD(machine_mode) mode : 8;
+  ENUM_BITFIELD(machine_mode) mode : RTX_MACHINE_MODE_BITSIZE;
   char in_memory;
   char is_const;
   char flag;
@@ -4155,7 +4155,7 @@ struct set
   /* Original machine mode, in case it becomes a CONST_INT.
  The size of this field should match the size of the mode
  field of struct rtx_def (see rtl.h).  */
-  ENUM_BITFIELD(machine_mode) mode : 8;
+  ENUM_BITFIELD(machine_mode) mode : RTX_MACHINE_MODE_BITSIZE;
   /* Hash value of constant equivalent for SET_SRC.  */
   unsigned src_const_hash;
   /* A constant equivalent for SET_SRC, if any.  */
diff --git a/gcc/genopinit.cc b/gcc/genopinit.cc
index 83cb7504fa1..2add8b925da 100644
--- a/gcc/genopinit.cc
+++ b/gcc/genopinit.cc
@@ -182,7 +182,8 @@ main (int argc, con

RE: [PATCH V3] RISC-V: Add basic vec_init for VLS RVV auto-vectorization

2023-05-11 Thread Li, Pan2 via Gcc-patches
Committed to trunk.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Friday, May 12, 2023 11:00 AM
To: 钟居哲 
Cc: GCC Patches ; Palmer Dabbelt ; 
Jeff Law ; Robin Dapp 
Subject: Re: [PATCH V3] RISC-V: Add basic vec_init for VLS RVV 
auto-vectorization

Ok

 於 2023年5月12日 週五 10:57 寫道:

> From: Juzhe-Zhong 
>
> #include 
>
> typedef int8_t vnx16qi __attribute__((vector_size (16)));
>
> #include 
>
> typedef int8_t vnx16qi __attribute__ ((vector_size (16)));
> typedef int8_t vnx32qi __attribute__ ((vector_size (32)));
> typedef int8_t vnx64qi __attribute__ ((vector_size (64)));
> typedef int8_t vnx128qi __attribute__ ((vector_size (128)));
>
> __attribute__ ((noipa)) void
> f_vnx128qi (int8_t a, int8_t b, int8_t c, int8_t d, int8_t e, int8_t f,
> int8_t g, int8_t h, int8_t *out)
> {
>   vnx128qi v
> = {a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h};
>   *(vnx128qi *) out = v;
> }
>
> This patch codegen:
> f_vnx128qi:
> andia1,a1,0xff
> andia0,a0,0xff
> sllia1,a1,8
> andia2,a2,0xff
> or  a1,a1,a0
> sllia2,a2,16
> andia3,a3,0xff
> or  a2,a2,a1
> sllia3,a3,24
> andia4,a4,0xff
> or  a3,a3,a2
> sllia4,a4,32
> andia5,a5,0xff
> or  a4,a4,a3
> sllia5,a5,40
> andia6,a6,0xff
> or  a5,a5,a4
> sllia6,a6,48
> or  a6,a6,a5
> vsetvli a5,zero,e64,m8,ta,ma
> ld  a5,0(sp)
> sllia7,a7,56
> or  a7,a7,a6
> vmv.v.x v8,a7
> vs8r.v  v8,0(a5)
> ret
>
> We support more optimizations cases in the future. But they are not
> included in this patch.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (vec_init): New pattern.
> * config/riscv/riscv-protos.h (expand_vec_init): New function.
> * config/riscv/riscv-v.cc (class rvv_builder): New class.
> (rvv_builder::can_duplicate_repeating_sequence_p): New function.
> (rvv_builder::get_merged_repeating_sequence): Ditto.
> (expand_vector_init_insert_elems): Ditto.
> (expand_vec_init): Ditto.
> * config/riscv/vector-iterators.md: New attribute.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp:
> * gcc.target/riscv/rvv/autovec/vls-vlmax/insert-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/insert-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/insert-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/insert_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/insert_run-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-4.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-5.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-6.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-4.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-5.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-6.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   |  16 ++
>  gcc/config/riscv/riscv-protos.h   |   1 +
>  gcc/config/riscv/riscv-v.cc   | 127 +++
>  gcc/config/riscv/vector-iterators.md  |   9 +
>  .../riscv/rvv/autovec/vls-vlmax/insert-1.c|  41 
>  .../riscv/rvv/autovec/vls-vlmax/insert-2.c|  41 
>  .../riscv/rvv/autovec/vls-vlmax/insert-3.c|  41 
>  .../rvv/autovec/vls-vlmax/insert_run-1.c  |  46 
>  .../rvv/autovec/vls-vlmax/insert_run-2.c  |  46 
>  .../riscv/rvv/autovec/vls-vlmax/repeat-1.c|  75 +++
>  .../riscv/rvv/autovec/vls-vlmax/repeat-2.c|  61 ++
>  .../riscv/rvv/autovec/vls-vlmax/repeat-3.c|  53 +
>  .../riscv/rvv/autovec/vls-vlmax/repeat-4.c|  39 
>  .../riscv/rvv/autovec/vls-vlmax/repeat-5.c|  74 +++
>  .../riscv/rvv/autovec/vls-vlmax/repeat-6.c|  78 +++
>  .../rvv/autovec/vls-vlmax/repeat_

[PATCH] RISC-V: Fix fail of vmv-imm-rv64.c in rv32

2023-05-11 Thread juzhe . zhong
From: Juzhe-Zhong 

After update local codebase to the trunk. I realize there is one more fail in 
RV32.
After this patch, all fails of RVV are cleaned up.
Thanks.

FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c -O3 -ftree-vectorize (test 
for excess errors)
Excess errors:
cc1: error: ABI requires '-march=rv32'

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c: Add ABI

---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
index 520321e1c73..e386166f95e 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-std=c99 -march=rv64gcv -fno-vect-cost-model 
--param=riscv-autovec-preference=scalable -fno-builtin" } */
+/* { dg-additional-options "-std=c99 -march=rv64gcv -mabi=lp64d 
-fno-vect-cost-model --param=riscv-autovec-preference=scalable -fno-builtin" } 
*/
 
 #include "vmv-imm-template.h"
 
-- 
2.36.1



Re: Re: [PATCH V2] RISC-V: Add basic vec_init for VLS RVV auto-vectorization

2023-05-11 Thread juzhe.zh...@rivai.ai
I have remove comments related to LLVM and reorg testcases:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618256.html 
V3 can you take a look again.

Sorry for sending wrong comments in changelog. Well, I goal is not to disparage 
LLVM.
I am just used to reading the LLVM implementation while sending patch for GCC 
to make sure
the implementation is correct.

The slidedown method is totally the same as LLVM. 

Sorry about that, I won't send any information related to LLVM gain. Thanks :)



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-12 10:23
To: juzhe.zhong
CC: gcc-patches; palmer; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH V2] RISC-V: Add basic vec_init for VLS RVV 
auto-vectorization
> This patch makes vec_init support common init vector handling (using 
> vslide1down to insert element)
> which can handle any cases of initialization vec but it's not optimal for 
> cases.
>
> And support Case 1 optimizaiton:
> https://godbolt.org/z/Yb9PK9jsz
 
Don't use godbolt link in comment, because they are not permanently
preserved on the server, also the reference is not fixed since LLVM
trunk could improve.
 
> LLVM codegen:
> https://godbolt.org/z/xsnavvWqx
>
> ...
> vslide1down.vx (x128 times)
> ...
 
Drop LLVM codegen here, again, it might improve, healthy competition
is good, but I would like to avoid disparaging other compilers in
comments. :)
 
 
> ---
>  gcc/config/riscv/autovec.md   |  16 ++
>  gcc/config/riscv/riscv-protos.h   |   1 +
>  gcc/config/riscv/riscv-v.cc   | 127 +++
>  gcc/config/riscv/vector-iterators.md  |   9 +
>  .../gcc.target/riscv/rvv/autovec/insert-1.c   |  41 
>  .../gcc.target/riscv/rvv/autovec/insert-2.c   |  41 
>  .../gcc.target/riscv/rvv/autovec/insert-3.c   |  41 
>  .../riscv/rvv/autovec/insert_run-1.c  |  46 
>  .../riscv/rvv/autovec/insert_run-2.c  |  46 
>  .../gcc.target/riscv/rvv/autovec/repeat-1.c   |  75 +++
>  .../gcc.target/riscv/rvv/autovec/repeat-2.c   |  61 ++
>  .../gcc.target/riscv/rvv/autovec/repeat-3.c   |  53 +
>  .../gcc.target/riscv/rvv/autovec/repeat-4.c   |  39 
>  .../gcc.target/riscv/rvv/autovec/repeat-5.c   |  74 +++
>  .../gcc.target/riscv/rvv/autovec/repeat-6.c   |  78 +++
>  .../riscv/rvv/autovec/repeat_run-1.c  | 125 +++
>  .../riscv/rvv/autovec/repeat_run-2.c  | 145 +
>  .../riscv/rvv/autovec/repeat_run-3.c  | 203 ++
>  .../riscv/rvv/autovec/repeat_run-4.c  |  77 +++
>  .../riscv/rvv/autovec/repeat_run-5.c  | 124 +++
>  .../riscv/rvv/autovec/repeat_run-6.c  | 122 +++
>  21 files changed, 1544 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/insert-1.c
 
Could you reorg the autovec folder to separate vls-vlmax and vla stuffs?
 
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/insert-2.c
...
> +/* Initialize register TARGET from the elements in PARALLEL rtx VALS.  */
> +
> +void
> +expand_vec_init (rtx target, rtx vals)
> +{
> +  machine_mode mode = GET_MODE (target);
 
I would like to add some assertion here to ensure only VLS mode here.
 


Re: [PATCH V3] RISC-V: Add basic vec_init for VLS RVV auto-vectorization

2023-05-11 Thread Kito Cheng via Gcc-patches
Ok

 於 2023年5月12日 週五 10:57 寫道:

> From: Juzhe-Zhong 
>
> #include 
>
> typedef int8_t vnx16qi __attribute__((vector_size (16)));
>
> #include 
>
> typedef int8_t vnx16qi __attribute__ ((vector_size (16)));
> typedef int8_t vnx32qi __attribute__ ((vector_size (32)));
> typedef int8_t vnx64qi __attribute__ ((vector_size (64)));
> typedef int8_t vnx128qi __attribute__ ((vector_size (128)));
>
> __attribute__ ((noipa)) void
> f_vnx128qi (int8_t a, int8_t b, int8_t c, int8_t d, int8_t e, int8_t f,
> int8_t g, int8_t h, int8_t *out)
> {
>   vnx128qi v
> = {a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
>a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h};
>   *(vnx128qi *) out = v;
> }
>
> This patch codegen:
> f_vnx128qi:
> andia1,a1,0xff
> andia0,a0,0xff
> sllia1,a1,8
> andia2,a2,0xff
> or  a1,a1,a0
> sllia2,a2,16
> andia3,a3,0xff
> or  a2,a2,a1
> sllia3,a3,24
> andia4,a4,0xff
> or  a3,a3,a2
> sllia4,a4,32
> andia5,a5,0xff
> or  a4,a4,a3
> sllia5,a5,40
> andia6,a6,0xff
> or  a5,a5,a4
> sllia6,a6,48
> or  a6,a6,a5
> vsetvli a5,zero,e64,m8,ta,ma
> ld  a5,0(sp)
> sllia7,a7,56
> or  a7,a7,a6
> vmv.v.x v8,a7
> vs8r.v  v8,0(a5)
> ret
>
> We support more optimizations cases in the future. But they are not
> included in this patch.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (vec_init): New pattern.
> * config/riscv/riscv-protos.h (expand_vec_init): New function.
> * config/riscv/riscv-v.cc (class rvv_builder): New class.
> (rvv_builder::can_duplicate_repeating_sequence_p): New function.
> (rvv_builder::get_merged_repeating_sequence): Ditto.
> (expand_vector_init_insert_elems): Ditto.
> (expand_vec_init): Ditto.
> * config/riscv/vector-iterators.md: New attribute.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp:
> * gcc.target/riscv/rvv/autovec/vls-vlmax/insert-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/insert-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/insert-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/insert_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/insert_run-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-4.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-5.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-6.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-4.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-5.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-6.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   |  16 ++
>  gcc/config/riscv/riscv-protos.h   |   1 +
>  gcc/config/riscv/riscv-v.cc   | 127 +++
>  gcc/config/riscv/vector-iterators.md  |   9 +
>  .../riscv/rvv/autovec/vls-vlmax/insert-1.c|  41 
>  .../riscv/rvv/autovec/vls-vlmax/insert-2.c|  41 
>  .../riscv/rvv/autovec/vls-vlmax/insert-3.c|  41 
>  .../rvv/autovec/vls-vlmax/insert_run-1.c  |  46 
>  .../rvv/autovec/vls-vlmax/insert_run-2.c  |  46 
>  .../riscv/rvv/autovec/vls-vlmax/repeat-1.c|  75 +++
>  .../riscv/rvv/autovec/vls-vlmax/repeat-2.c|  61 ++
>  .../riscv/rvv/autovec/vls-vlmax/repeat-3.c|  53 +
>  .../riscv/rvv/autovec/vls-vlmax/repeat-4.c|  39 
>  .../riscv/rvv/autovec/vls-vlmax/repeat-5.c|  74 +++
>  .../riscv/rvv/autovec/vls-vlmax/repeat-6.c|  78 +++
>  .../rvv/autovec/vls-vlmax/repeat_run-1.c  | 125 +++
>  .../rvv/autovec/vls-vlmax/repeat_run-2.c  | 145 +
>  .../rvv/autovec/vls-vlmax/repeat_run-3.c  | 202 ++
>  .../rvv/autovec/vls-vlmax/repeat_run-4.c  |  77 +++
>  .../rvv/autovec/vls-vlmax/repeat_run-5.c  | 124 +

[PATCH V3] RISC-V: Add basic vec_init for VLS RVV auto-vectorization

2023-05-11 Thread juzhe . zhong
From: Juzhe-Zhong 

#include 

typedef int8_t vnx16qi __attribute__((vector_size (16)));

#include 

typedef int8_t vnx16qi __attribute__ ((vector_size (16)));
typedef int8_t vnx32qi __attribute__ ((vector_size (32)));
typedef int8_t vnx64qi __attribute__ ((vector_size (64)));
typedef int8_t vnx128qi __attribute__ ((vector_size (128)));

__attribute__ ((noipa)) void
f_vnx128qi (int8_t a, int8_t b, int8_t c, int8_t d, int8_t e, int8_t f, int8_t 
g, int8_t h, int8_t *out)
{
  vnx128qi v
= {a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h};
  *(vnx128qi *) out = v;
}

This patch codegen:
f_vnx128qi:
andia1,a1,0xff
andia0,a0,0xff
sllia1,a1,8
andia2,a2,0xff
or  a1,a1,a0
sllia2,a2,16
andia3,a3,0xff
or  a2,a2,a1
sllia3,a3,24
andia4,a4,0xff
or  a3,a3,a2
sllia4,a4,32
andia5,a5,0xff
or  a4,a4,a3
sllia5,a5,40
andia6,a6,0xff
or  a5,a5,a4
sllia6,a6,48
or  a6,a6,a5
vsetvli a5,zero,e64,m8,ta,ma
ld  a5,0(sp)
sllia7,a7,56
or  a7,a7,a6
vmv.v.x v8,a7
vs8r.v  v8,0(a5)
ret

We support more optimizations cases in the future. But they are not included in 
this patch.

gcc/ChangeLog:

* config/riscv/autovec.md (vec_init): New pattern.
* config/riscv/riscv-protos.h (expand_vec_init): New function.
* config/riscv/riscv-v.cc (class rvv_builder): New class.
(rvv_builder::can_duplicate_repeating_sequence_p): New function.
(rvv_builder::get_merged_repeating_sequence): Ditto.
(expand_vector_init_insert_elems): Ditto.
(expand_vec_init): Ditto.
* config/riscv/vector-iterators.md: New attribute.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/autovec/vls-vlmax/insert-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/insert-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/insert-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/insert_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/insert_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/repeat_run-6.c: New test.

---
 gcc/config/riscv/autovec.md   |  16 ++
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   | 127 +++
 gcc/config/riscv/vector-iterators.md  |   9 +
 .../riscv/rvv/autovec/vls-vlmax/insert-1.c|  41 
 .../riscv/rvv/autovec/vls-vlmax/insert-2.c|  41 
 .../riscv/rvv/autovec/vls-vlmax/insert-3.c|  41 
 .../rvv/autovec/vls-vlmax/insert_run-1.c  |  46 
 .../rvv/autovec/vls-vlmax/insert_run-2.c  |  46 
 .../riscv/rvv/autovec/vls-vlmax/repeat-1.c|  75 +++
 .../riscv/rvv/autovec/vls-vlmax/repeat-2.c|  61 ++
 .../riscv/rvv/autovec/vls-vlmax/repeat-3.c|  53 +
 .../riscv/rvv/autovec/vls-vlmax/repeat-4.c|  39 
 .../riscv/rvv/autovec/vls-vlmax/repeat-5.c|  74 +++
 .../riscv/rvv/autovec/vls-vlmax/repeat-6.c|  78 +++
 .../rvv/autovec/vls-vlmax/repeat_run-1.c  | 125 +++
 .../rvv/autovec/vls-vlmax/repeat_run-2.c  | 145 +
 .../rvv/autovec/vls-vlmax/repeat_run-3.c  | 202 ++
 .../rvv/autovec/vls-vlmax/repeat_run-4.c  |  77 +++
 .../rvv/autovec/vls-vlmax/repeat_run-5.c  | 124 +++
 .../rvv/autovec/vls-vlmax/repeat_run-6.c  | 122 +++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   4 +
 22 files changed, 1547 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/insert-1.c

[committed] RISC-V: Reorganize binary autovec testcases

2023-05-11 Thread Pan Li via Gcc-patches
From: Pan Li 

1. This patch is moving binary autovec testcases into binop directory to
   make it
easier to maintain.

2. Current binary autovec only tested in LMUL = 1, enable testing in
   LMUL = 2/4/8.

Tested on both rv32/rv64, with no fails in RVV.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/shift-run-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-run-template.h: ...here.
* gcc.target/riscv/rvv/autovec/shift-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-run.c: ...here.
* gcc.target/riscv/rvv/autovec/shift-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/shift-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/shift-scalar-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-scalar-run.c: ...here.
* gcc.target/riscv/rvv/autovec/shift-scalar-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-scalar-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-scalar-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/shift-scalar-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h: ...here.
* gcc.target/riscv/rvv/autovec/shift-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vadd-run-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vadd-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: ...here.
* gcc.target/riscv/rvv/autovec/vadd-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vadd-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vand-run-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vand-run-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vand-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vand-run.c: ...here.
* gcc.target/riscv/rvv/autovec/vand-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vand-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vand-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vand-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vand-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vand-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vdiv-run-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vdiv-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: ...here.
* gcc.target/riscv/rvv/autovec/vdiv-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vdiv-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vmax-run-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vmax-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: ...here.
* gcc.target/riscv/rvv/autovec/vmax-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vmax-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vmin-run-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vmin-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: ...here.
* gcc.target/riscv/rvv/autovec/vmin-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vmin-template.h: Moved to...
* gcc.target/riscv/rvv/autovec

[committed] RISC-V: Fix RVV binary auto-vectorizaiton test fails

2023-05-11 Thread Pan Li via Gcc-patches
From: Pan Li 

In rv32:
FAIL: gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c -O3 -ftree-vectorize
(test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmin-run.c -O3 -ftree-vectorize (test
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c -O3 -ftree-vectorize
(test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vand-run.c -O3 -ftree-vectorize (test
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vrem-run.c -O3 -ftree-vectorize (test
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c -O3 -ftree-vectorize
(test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmul-run.c -O3 -ftree-vectorize (test
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/shift-run.c -O3 -ftree-vectorize
(test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c -O3 -ftree-vectorize
(test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vand-rv64gcv.c -O3 -ftree-vectorize
(test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vdiv-run.c -O3 -ftree-vectorize (test
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c -O3 -ftree-vectorize
(test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vor-run.c -O3 -ftree-vectorize (test
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/shift-rv64gcv.c -O3 -ftree-vectorize
(test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/shift-scalar-run.c -O3
-ftree-vectorize (test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c -O3 -ftree-vectorize
(test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmax-run.c -O3 -ftree-vectorize (test
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vor-rv64gcv.c -O3 -ftree-vectorize
(test for excess errors)

In rv64:
FAIL: gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c -O3 -ftree-vectorize
(test for excess errors)

Signed-off-by: Juzhe Zhong 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/shift-run.c: Fix fail.
* gcc.target/riscv/rvv/autovec/shift-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/shift-scalar-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vand-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vand-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vdiv-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmax-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmin-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmul-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vor-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vor-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vrem-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vxor-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vxor-rv64gcv.c: Ditto.
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-run.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-rv64gcv.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-scalar-run.c | 2 +-
 .../gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vand-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vand-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vdiv-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmax-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmin-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmul-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vor-run.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vor-rv64gcv.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vrem-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c | 4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vxor-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vxor-rv64gcv.c | 2 +-
 22 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-run.c
index 67e9f8ca242..159478c6947 100644
--- a/

Re: [PATCH V2] RISC-V: Add basic vec_init for VLS RVV auto-vectorization

2023-05-11 Thread Kito Cheng via Gcc-patches
> This patch makes vec_init support common init vector handling (using 
> vslide1down to insert element)
> which can handle any cases of initialization vec but it's not optimal for 
> cases.
>
> And support Case 1 optimizaiton:
> https://godbolt.org/z/Yb9PK9jsz

Don't use godbolt link in comment, because they are not permanently
preserved on the server, also the reference is not fixed since LLVM
trunk could improve.

> LLVM codegen:
> https://godbolt.org/z/xsnavvWqx
>
> ...
> vslide1down.vx (x128 times)
> ...

Drop LLVM codegen here, again, it might improve, healthy competition
is good, but I would like to avoid disparaging other compilers in
comments. :)


> ---
>  gcc/config/riscv/autovec.md   |  16 ++
>  gcc/config/riscv/riscv-protos.h   |   1 +
>  gcc/config/riscv/riscv-v.cc   | 127 +++
>  gcc/config/riscv/vector-iterators.md  |   9 +
>  .../gcc.target/riscv/rvv/autovec/insert-1.c   |  41 
>  .../gcc.target/riscv/rvv/autovec/insert-2.c   |  41 
>  .../gcc.target/riscv/rvv/autovec/insert-3.c   |  41 
>  .../riscv/rvv/autovec/insert_run-1.c  |  46 
>  .../riscv/rvv/autovec/insert_run-2.c  |  46 
>  .../gcc.target/riscv/rvv/autovec/repeat-1.c   |  75 +++
>  .../gcc.target/riscv/rvv/autovec/repeat-2.c   |  61 ++
>  .../gcc.target/riscv/rvv/autovec/repeat-3.c   |  53 +
>  .../gcc.target/riscv/rvv/autovec/repeat-4.c   |  39 
>  .../gcc.target/riscv/rvv/autovec/repeat-5.c   |  74 +++
>  .../gcc.target/riscv/rvv/autovec/repeat-6.c   |  78 +++
>  .../riscv/rvv/autovec/repeat_run-1.c  | 125 +++
>  .../riscv/rvv/autovec/repeat_run-2.c  | 145 +
>  .../riscv/rvv/autovec/repeat_run-3.c  | 203 ++
>  .../riscv/rvv/autovec/repeat_run-4.c  |  77 +++
>  .../riscv/rvv/autovec/repeat_run-5.c  | 124 +++
>  .../riscv/rvv/autovec/repeat_run-6.c  | 122 +++
>  21 files changed, 1544 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/insert-1.c

Could you reorg the autovec folder to separate vls-vlmax and vla stuffs?

>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/insert-2.c
...
> +/* Initialize register TARGET from the elements in PARALLEL rtx VALS.  */
> +
> +void
> +expand_vec_init (rtx target, rtx vals)
> +{
> +  machine_mode mode = GET_MODE (target);

I would like to add some assertion here to ensure only VLS mode here.


Re: [PATCH] RISC-V: Fix RVV binary auto-vectorizaiton test fails

2023-05-11 Thread Kito Cheng via Gcc-patches
ok, thanks :)

On Fri, May 12, 2023 at 9:04 AM juzhe.zh...@rivai.ai
 wrote:
>
> This patch has tested on both RV32/RV64, and all fails in RVV are cleaned up.
> Ok for trunk?
>
>
>
> juzhe.zh...@rivai.ai
>
> From: juzhe.zhong
> Date: 2023-05-12 07:29
> To: gcc-patches
> CC: kito.cheng; palmer; jeffreyalaw; Juzhe-Zhong
> Subject: [PATCH] RISC-V: Fix RVV binary auto-vectorizaiton test fails
> From: Juzhe-Zhong 
>
> In rv32:
> FAIL: gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c -O3 -ftree-vectorize (test 
> for excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vmin-run.c -O3 -ftree-vectorize (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c -O3 -ftree-vectorize (test 
> for excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vand-run.c -O3 -ftree-vectorize (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vrem-run.c -O3 -ftree-vectorize (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c -O3 -ftree-vectorize (test 
> for excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vmul-run.c -O3 -ftree-vectorize (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/shift-run.c -O3 -ftree-vectorize (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c -O3 -ftree-vectorize (test 
> for excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vand-rv64gcv.c -O3 -ftree-vectorize (test 
> for excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vdiv-run.c -O3 -ftree-vectorize (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c -O3 -ftree-vectorize (test 
> for excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vor-run.c -O3 -ftree-vectorize (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/shift-rv64gcv.c -O3 -ftree-vectorize (test 
> for excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/shift-scalar-run.c -O3 -ftree-vectorize 
> (test for excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c -O3 -ftree-vectorize (test 
> for excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vmax-run.c -O3 -ftree-vectorize (test for 
> excess errors)
> FAIL: gcc.target/riscv/rvv/autovec/vor-rv64gcv.c -O3 -ftree-vectorize (test 
> for excess errors)
>
> In rv64:
> FAIL: gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c -O3 -ftree-vectorize (test 
> for excess errors)
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/shift-run.c: Fix fail.
> * gcc.target/riscv/rvv/autovec/shift-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/shift-scalar-run.c: Ditto.
> * gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vand-run.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vand-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vdiv-run.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vmax-run.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vmin-run.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vmul-run.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vor-run.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vor-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vrem-run.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vxor-run.c: Ditto.
> * gcc.target/riscv/rvv/autovec/vxor-rv64gcv.c: Ditto.
>
> ---
> gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-run.c| 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-rv64gcv.c| 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-scalar-run.c | 2 +-
> .../gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c   | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vand-run.c | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vand-rv64gcv.c | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vdiv-run.c | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vmax-run.c | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vmin-run.c | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vmul-run.c | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vor-run.c  | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vor-rv64gcv.c  | 2 +-
> gcc/testsuite/gcc.target/riscv/rvv/

Re: [PATCH] RISC-V: Reorganize binary autovec testcases

2023-05-11 Thread Kito Cheng via Gcc-patches
OK

On Fri, May 12, 2023 at 9:03 AM  wrote:
>
> From: Juzhe-Zhong 
>
> 1. This patch is moving binary autovec testcases into binop directory to make 
> it
> easier to maintain.
>
> 2. Current binary autovec only tested in LMUL = 1, enable testing in LMUL = 
> 2/4/8.
>
> Tested on both rv32/rv64, with no fails in RVV.
> Ok for trunk ?
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/shift-run-template.h: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/shift-run-template.h: ...here.
> * gcc.target/riscv/rvv/autovec/shift-run.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/shift-run.c: ...here.
> * gcc.target/riscv/rvv/autovec/shift-rv32gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: ...here.
> * gcc.target/riscv/rvv/autovec/shift-rv64gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: ...here.
> * gcc.target/riscv/rvv/autovec/shift-scalar-run.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/shift-scalar-run.c: ...here.
> * gcc.target/riscv/rvv/autovec/shift-scalar-rv32gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/shift-scalar-rv32gcv.c: ...here.
> * gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/shift-scalar-rv64gcv.c: ...here.
> * gcc.target/riscv/rvv/autovec/shift-scalar-template.h: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h: ...here.
> * gcc.target/riscv/rvv/autovec/shift-template.h: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/shift-template.h: ...here.
> * gcc.target/riscv/rvv/autovec/vadd-run-template.h: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h: ...here.
> * gcc.target/riscv/rvv/autovec/vadd-run.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vadd-run.c: ...here.
> * gcc.target/riscv/rvv/autovec/vadd-rv32gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: ...here.
> * gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: ...here.
> * gcc.target/riscv/rvv/autovec/vadd-template.h: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vadd-template.h: ...here.
> * gcc.target/riscv/rvv/autovec/vand-run-template.h: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vand-run-template.h: ...here.
> * gcc.target/riscv/rvv/autovec/vand-run.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vand-run.c: ...here.
> * gcc.target/riscv/rvv/autovec/vand-rv32gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vand-rv32gcv.c: ...here.
> * gcc.target/riscv/rvv/autovec/vand-rv64gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vand-rv64gcv.c: ...here.
> * gcc.target/riscv/rvv/autovec/vand-template.h: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vand-template.h: ...here.
> * gcc.target/riscv/rvv/autovec/vdiv-run-template.h: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h: ...here.
> * gcc.target/riscv/rvv/autovec/vdiv-run.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: ...here.
> * gcc.target/riscv/rvv/autovec/vdiv-rv32gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: ...here.
> * gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: ...here.
> * gcc.target/riscv/rvv/autovec/vdiv-template.h: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: ...here.
> * gcc.target/riscv/rvv/autovec/vmax-run-template.h: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h: ...here.
> * gcc.target/riscv/rvv/autovec/vmax-run.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vmax-run.c: ...here.
> * gcc.target/riscv/rvv/autovec/vmax-rv32gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: ...here.
> * gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: ...here.
> * gcc.target/riscv/rvv/autovec/vmax-template.h: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vmax-template.h: ...here.
> * gcc.target/riscv/rvv/autovec/vmin-run-template.h: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h: ...here.
> * gcc.target/riscv/rvv/autovec/vmin-run.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vmin-run.c: ...here.
> * gcc.target/riscv/rvv/autovec/vmin-rv32gcv.c: Moved to...
> * gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: ...here.
> * gcc.target/riscv/r

[committed] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-11 Thread Pan Li via Gcc-patches
From: Pan Li 

The decl_or_value is defined as void * before this PATCH. It will take
care of both the tree_node and rtx_def. Unfortunately, given a void
pointer cannot tell the input is tree_node or rtx_def.

Then we have some implicit structure layout requirement similar as
below. Or we will touch unreasonable bits when cast void * to tree_node
or rtx_def.

++---+--+
| offset | tree_node | rtx_def  |
++---+--+
|  0 | code: 16  | code: 16 | <- require the same location and bitssize
++---+--+
| 16 | ...   | mode: 8  |
++---+--+
| ...   |
++---+--+
| 24 | ...   | ...  |
++---+--+

This behavior blocks the PATCH that extend the rtx_def mode from 8 to
16 bits for running out of machine mode. This PATCH introduced the
pointer_mux to tell the input is tree_node or rtx_def, and decouple
the above implicit dependency.

Signed-off-by: Pan Li 
Co-Authored-By: Richard Sandiford 
Co-Authored-By: Richard Biener 
Co-Authored-By: Jakub Jelinek 

gcc/ChangeLog:

* mux-utils.h: Add overload operator == and != for pointer_mux.
* var-tracking.cc: Included mux-utils.h for pointer_tmux.
(decl_or_value): Changed from void * to pointer_mux.
(dv_is_decl_p): Reconciled to the new type, aka pointer_mux.
(dv_as_decl): Ditto.
(dv_as_opaque): Removed due to unnecessary.
(struct variable_hasher): Take decl_or_value as compare_type.
(variable_hasher::equal): Diito.
(dv_from_decl): Reconciled to the new type, aka pointer_mux.
(dv_from_value): Ditto.
(attrs_list_member):  Ditto.
(vars_copy): Ditto.
(var_reg_decl_set): Ditto.
(var_reg_delete_and_set): Ditto.
(find_loc_in_1pdv): Ditto.
(canonicalize_values_star): Ditto.
(variable_post_merge_new_vals): Ditto.
(dump_onepart_variable_differences): Ditto.
(variable_different_p): Ditto.
(set_slot_part): Ditto.
(clobber_slot_part): Ditto.
(clobber_variable_part): Ditto.
---
 gcc/mux-utils.h |  4 +++
 gcc/var-tracking.cc | 85 ++---
 2 files changed, 37 insertions(+), 52 deletions(-)

diff --git a/gcc/mux-utils.h b/gcc/mux-utils.h
index a2b6a316899..486d80915b1 100644
--- a/gcc/mux-utils.h
+++ b/gcc/mux-utils.h
@@ -117,6 +117,10 @@ public:
   //  ...use ptr.known_second ()...
   T2 *second_or_null () const;
 
+  bool operator == (const pointer_mux &pm) const { return m_ptr == pm.m_ptr; }
+
+  bool operator != (const pointer_mux &pm) const { return m_ptr != pm.m_ptr; }
+
   // Return true if the pointer is a T.
   //
   // This is only valid if T1 and T2 are distinct and if T can be
diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc
index fae0c73e02f..68d440d222e 100644
--- a/gcc/var-tracking.cc
+++ b/gcc/var-tracking.cc
@@ -116,6 +116,7 @@
 #include "fibonacci_heap.h"
 #include "print-rtl.h"
 #include "function-abi.h"
+#include "mux-utils.h"
 
 typedef fibonacci_heap  bb_heap_t;
 
@@ -197,14 +198,14 @@ struct micro_operation
 
 
 /* A declaration of a variable, or an RTL value being handled like a
-   declaration.  */
-typedef void *decl_or_value;
+   declaration by pointer_mux.  */
+typedef pointer_mux decl_or_value;
 
 /* Return true if a decl_or_value DV is a DECL or NULL.  */
 static inline bool
 dv_is_decl_p (decl_or_value dv)
 {
-  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
+  return dv.is_first ();
 }
 
 /* Return true if a decl_or_value is a VALUE rtl.  */
@@ -219,7 +220,7 @@ static inline tree
 dv_as_decl (decl_or_value dv)
 {
   gcc_checking_assert (dv_is_decl_p (dv));
-  return (tree) dv;
+  return dv.known_first ();
 }
 
 /* Return the value in the decl_or_value.  */
@@ -227,14 +228,7 @@ static inline rtx
 dv_as_value (decl_or_value dv)
 {
   gcc_checking_assert (dv_is_value_p (dv));
-  return (rtx)dv;
-}
-
-/* Return the opaque pointer in the decl_or_value.  */
-static inline void *
-dv_as_opaque (decl_or_value dv)
-{
-  return dv;
+  return dv.known_second ();
 }
 
 
@@ -483,9 +477,9 @@ static void variable_htab_free (void *);
 
 struct variable_hasher : pointer_hash 
 {
-  typedef void *compare_type;
+  typedef decl_or_value compare_type;
   static inline hashval_t hash (const variable *);
-  static inline bool equal (const variable *, const void *);
+  static inline bool equal (const variable *, const decl_or_value);
   static inline void remove (variable *);
 };
 
@@ -501,11 +495,9 @@ variable_hasher::hash (const variable *v)
 /* Compare the declaration of variable X with declaration Y.  */
 
 inline bool
-variable_hasher::equal (const variable *v, const void *y)
+variable_hasher::equal (const variable *v, const decl_or_value y)
 {
-  decl_or_value dv = CONST_CAST2 (decl_or_value, const void *, y);
-
-  return (dv_as_opaque (v->dv) == dv_as_opaque (dv));

Re: [PATCH v2] RISC-V: Add vector_scalar_shift_operand

2023-05-11 Thread Kito Cheng via Gcc-patches
LGTM, thanks :)

Palmer Dabbelt  於 2023年5月12日 週五 06:32 寫道:

> The vector shift immediates happen to have the same constraints as some
> of the CSR-related operands, but it's a different usage.  This adds a
> name for them, so I don't get confused again next time.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (shifts): Use
>   vector_scalar_shift_operand.
> * config/riscv/predicates.md (vector_scalar_shift_operand): New
>   predicate.
> ---
> Still haven't built-tested it, my box is busy.
>
> Changes since v1 <20230511182555.26183-1-pal...@rivosinc.com>:
> * Change the name to "vector_scalar_shift_operand", as per Juzhe's
>   suggestion.
> * Add a missing second ";" in the comment.
> ---
>  gcc/config/riscv/autovec.md| 2 +-
>  gcc/config/riscv/predicates.md | 5 +
>  2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index ac0c939d277..4561fcbe957 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -132,7 +132,7 @@ (define_expand "3"
>[(set (match_operand:VI 0 "register_operand")
>  (any_shift:VI
>   (match_operand:VI 1 "register_operand")
> - (match_operand: 2 "csr_operand")))]
> + (match_operand: 2 "vector_scalar_shift_operand")))]
>"TARGET_VECTOR"
>  {
>if (!CONST_SCALAR_INT_P (operands[2]))
> diff --git a/gcc/config/riscv/predicates.md
> b/gcc/config/riscv/predicates.md
> index e5adf06fa25..90e6f942c97 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -43,6 +43,11 @@ (define_predicate "csr_operand"
>(ior (match_operand 0 "const_csr_operand")
> (match_operand 0 "register_operand")))
>
> +;; V has 32-bit unsigned immediates.  This happens to be the same
> constraint as
> +;; the csr_operand, but it's not CSR related.
> +(define_predicate "vector_scalar_shift_operand"
> +  (match_operand 0 "csr_operand"))
> +
>  (define_predicate "sle_operand"
>(and (match_code "const_int")
> (match_test "SMALL_OPERAND (INTVAL (op) + 1)")))
> --
> 2.40.0
>
>


[PATCH V2] RISC-V: Add basic vec_init for VLS RVV auto-vectorization

2023-05-11 Thread juzhe . zhong
From: Juzhe-Zhong 

Rebase to trunk.

This is patching is adding basic vec_init support for RVV auto-vectorization.
This patch has been full coverage tested.

This patch makes vec_init support common init vector handling (using 
vslide1down to insert element)
which can handle any cases of initialization vec but it's not optimal for cases.

And support Case 1 optimizaiton:
https://godbolt.org/z/Yb9PK9jsz

#include 

typedef int8_t vnx16qi __attribute__((vector_size (16)));

#include 

typedef int8_t vnx16qi __attribute__ ((vector_size (16)));
typedef int8_t vnx32qi __attribute__ ((vector_size (32)));
typedef int8_t vnx64qi __attribute__ ((vector_size (64)));
typedef int8_t vnx128qi __attribute__ ((vector_size (128)));

__attribute__ ((noipa)) void
f_vnx128qi (int8_t a, int8_t b, int8_t c, int8_t d, int8_t e, int8_t f, int8_t 
g, int8_t h, int8_t *out)
{
  vnx128qi v
= {a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h,
   a, b, c, d, e, f, g, h, a, b, c, d, e, f, g, h};
  *(vnx128qi *) out = v;
}

LLVM codegen:
https://godbolt.org/z/xsnavvWqx

...
vslide1down.vx (x128 times)
...


This patch codegen:
f_vnx128qi:
andia1,a1,0xff
andia0,a0,0xff
sllia1,a1,8
andia2,a2,0xff
or  a1,a1,a0
sllia2,a2,16
andia3,a3,0xff
or  a2,a2,a1
sllia3,a3,24
andia4,a4,0xff
or  a3,a3,a2
sllia4,a4,32
andia5,a5,0xff
or  a4,a4,a3
sllia5,a5,40
andia6,a6,0xff
or  a5,a5,a4
sllia6,a6,48
or  a6,a6,a5
vsetvli a5,zero,e64,m8,ta,ma
ld  a5,0(sp)
sllia7,a7,56
or  a7,a7,a6
vmv.v.x v8,a7
vs8r.v  v8,0(a5)
ret


We support more optimizations cases in the future. But they are not included in 
this patch.

---
 gcc/config/riscv/autovec.md   |  16 ++
 gcc/config/riscv/riscv-protos.h   |   1 +
 gcc/config/riscv/riscv-v.cc   | 127 +++
 gcc/config/riscv/vector-iterators.md  |   9 +
 .../gcc.target/riscv/rvv/autovec/insert-1.c   |  41 
 .../gcc.target/riscv/rvv/autovec/insert-2.c   |  41 
 .../gcc.target/riscv/rvv/autovec/insert-3.c   |  41 
 .../riscv/rvv/autovec/insert_run-1.c  |  46 
 .../riscv/rvv/autovec/insert_run-2.c  |  46 
 .../gcc.target/riscv/rvv/autovec/repeat-1.c   |  75 +++
 .../gcc.target/riscv/rvv/autovec/repeat-2.c   |  61 ++
 .../gcc.target/riscv/rvv/autovec/repeat-3.c   |  53 +
 .../gcc.target/riscv/rvv/autovec/repeat-4.c   |  39 
 .../gcc.target/riscv/rvv/autovec/repeat-5.c   |  74 +++
 .../gcc.target/riscv/rvv/autovec/repeat-6.c   |  78 +++
 .../riscv/rvv/autovec/repeat_run-1.c  | 125 +++
 .../riscv/rvv/autovec/repeat_run-2.c  | 145 +
 .../riscv/rvv/autovec/repeat_run-3.c  | 203 ++
 .../riscv/rvv/autovec/repeat_run-4.c  |  77 +++
 .../riscv/rvv/autovec/repeat_run-5.c  | 124 +++
 .../riscv/rvv/autovec/repeat_run-6.c  | 122 +++
 21 files changed, 1544 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/insert-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/insert-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/insert-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/insert_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/insert_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/repeat-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/repeat-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/repeat-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/repeat-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/repeat-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/repeat-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/repeat_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/repeat_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/repeat_run-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/repeat_run-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/repeat_run-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/repeat_run-6.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index ac0c939d277..ce0b46537ad 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autov

Re: [PATCH] RISC-V: Fix RVV binary auto-vectorizaiton test fails

2023-05-11 Thread juzhe.zh...@rivai.ai
This patch has tested on both RV32/RV64, and all fails in RVV are cleaned up.
Ok for trunk?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-12 07:29
To: gcc-patches
CC: kito.cheng; palmer; jeffreyalaw; Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix RVV binary auto-vectorizaiton test fails
From: Juzhe-Zhong 
 
In rv32:
FAIL: gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmin-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vand-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vrem-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmul-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/shift-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vand-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vdiv-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vor-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/shift-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/shift-scalar-run.c -O3 -ftree-vectorize 
(test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmax-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vor-rv64gcv.c -O3 -ftree-vectorize (test for 
excess errors)
 
In rv64:
FAIL: gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/shift-run.c: Fix fail.
* gcc.target/riscv/rvv/autovec/shift-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/shift-scalar-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vand-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vand-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vdiv-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmax-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmin-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmul-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vor-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vor-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vrem-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vxor-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vxor-rv64gcv.c: Ditto.
 
---
gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-run.c| 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-rv64gcv.c| 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-scalar-run.c | 2 +-
.../gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c   | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vand-run.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vand-rv64gcv.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vdiv-run.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vmax-run.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vmin-run.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vmul-run.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vor-run.c  | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vor-rv64gcv.c  | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vrem-run.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c | 2 +-
gcc/testsuite/gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c | 4 ++--
gcc/testsuite/gcc.target/riscv/rvv/autovec/vxor-run.c | 2 +-
gcc/testsuite/gcc.ta

[PATCH] RISC-V: Reorganize binary autovec testcases

2023-05-11 Thread juzhe . zhong
From: Juzhe-Zhong 

1. This patch is moving binary autovec testcases into binop directory to make it
easier to maintain.

2. Current binary autovec only tested in LMUL = 1, enable testing in LMUL = 
2/4/8.

Tested on both rv32/rv64, with no fails in RVV.
Ok for trunk ?

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/shift-run-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-run-template.h: ...here.
* gcc.target/riscv/rvv/autovec/shift-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-run.c: ...here.
* gcc.target/riscv/rvv/autovec/shift-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/shift-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/shift-scalar-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-scalar-run.c: ...here.
* gcc.target/riscv/rvv/autovec/shift-scalar-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-scalar-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-scalar-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/shift-scalar-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-scalar-template.h: ...here.
* gcc.target/riscv/rvv/autovec/shift-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/shift-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vadd-run-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vadd-run-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vadd-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: ...here.
* gcc.target/riscv/rvv/autovec/vadd-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vadd-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vand-run-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vand-run-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vand-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vand-run.c: ...here.
* gcc.target/riscv/rvv/autovec/vand-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vand-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vand-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vand-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vand-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vand-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vdiv-run-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vdiv-run-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vdiv-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: ...here.
* gcc.target/riscv/rvv/autovec/vdiv-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vdiv-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vmax-run-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmax-run-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vmax-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: ...here.
* gcc.target/riscv/rvv/autovec/vmax-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vmax-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vmin-run-template.h: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmin-run-template.h: ...here.
* gcc.target/riscv/rvv/autovec/vmin-run.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: ...here.
* gcc.target/riscv/rvv/autovec/vmin-rv32gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c: Moved to...
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: ...here.
* gcc.target/riscv/rvv/autovec/vmin-template.h: Moved to...
* gcc.target/ri

Re: [committed] libstdc++: Enforce value_type consistency in strings and streams

2023-05-11 Thread Jonathan Wakely via Gcc-patches
On Thu, 11 May 2023 at 21:20, Jonathan Wakely via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> Tested powerpc64le-linux. Pushed to trunk.
>
> I don't plan to backport the assertions, because they're an API change
> that isn't suitable for the branches. But removing _Alloc_traits_impl
> and replacing it with _S_allocate should be done for gcc-13 to keep the
> contents of the two libstdc++.so.6.0.32 libraries in sync.
>
>
Here's the gcc-13 backport. No new assertions, just the new exported symbol.
commit 0d5a359140503d26adf11325e1f9a09ba7067dfc
Author: Jonathan Wakely 
Date:   Wed May 10 21:30:10 2023

libstdc++: Backport std::basic_string::_S_allocate from trunk

This is a backport of r14-739-gc62e945492afbb to keep the exported
symbol list consistent between trunk and gcc-13. The new assertions from
that commit are not part of this backport.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver: Export basic_string::_S_allocate.
* include/bits/basic_string.h: (basic_string::_Alloc_traits_impl):
Remove class template.
(basic_string::_S_allocate): New static member function.
(basic_string::assign): Use _S_allocate.
* include/bits/basic_string.tcc (basic_string::_M_create)
(basic_string::reserve, basic_string::_M_replace): Likewise.

(cherry picked from commit c62e945492afbbd2a09896fc7b0b07f7e719a606)

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 36bb87880d7..768cd4a4a6c 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1759,7 +1759,9 @@ GLIBCXX_3.4.21 {
 #endif
 
 # ABI-tagged std::basic_string
-_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE1[01]**;
+_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE10_M_[dr]*;
+
_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE10_S_compareE[jmy][jmy];
+
_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE11_M_capacityE[jmy];
 
_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE12_Alloc_hiderC[12]EP[cw]RKS3_;
 _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE12_M*;
 _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE13*;
@@ -2516,6 +2518,7 @@ GLIBCXX_3.4.31 {
 
 GLIBCXX_3.4.32 {
 _ZSt21ios_base_library_initv;
+
_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE11_S_allocateERS3_[jmy];
 } GLIBCXX_3.4.31;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index b16b2898b62..870b4728928 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -89,36 +89,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   typedef typename __gnu_cxx::__alloc_traits<_Alloc>::template
rebind<_CharT>::other _Char_alloc_type;
 
-#if __cpp_lib_constexpr_string < 201907L
   typedef __gnu_cxx::__alloc_traits<_Char_alloc_type> _Alloc_traits;
-#else
-  template
-   struct _Alloc_traits_impl : __gnu_cxx::__alloc_traits<_Char_alloc_type>
-   {
- typedef __gnu_cxx::__alloc_traits<_Char_alloc_type> _Base;
-
- [[__gnu__::__always_inline__]]
- static constexpr typename _Base::pointer
- allocate(_Char_alloc_type& __a, typename _Base::size_type __n)
- {
-   pointer __p = _Base::allocate(__a, __n);
-   if (std::is_constant_evaluated())
- // Begin the lifetime of characters in allocated storage.
- for (size_type __i = 0; __i < __n; ++__i)
-   std::construct_at(__builtin_addressof(__p[__i]));
-   return __p;
- }
-   };
-
-  template
-   struct _Alloc_traits_impl, _Dummy_for_PR85282>
-   : __gnu_cxx::__alloc_traits<_Char_alloc_type>
-   {
- // std::char_traits begins the lifetime of characters.
-   };
-
-  using _Alloc_traits = _Alloc_traits_impl<_Traits, void>;
-#endif
 
   // Types:
 public:
@@ -149,6 +120,22 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 #endif
 
 private:
+  static _GLIBCXX20_CONSTEXPR pointer
+  _S_allocate(_Char_alloc_type& __a, size_type __n)
+  {
+   pointer __p = _Alloc_traits::allocate(__a, __n);
+#if __cpp_lib_constexpr_string >= 201907L
+   // std::char_traits begins the lifetime of characters,
+   // but custom traits might not, so do it here.
+   if constexpr (!is_same_v<_Traits, char_traits<_CharT>>)
+ if (std::__is_constant_evaluated())
+   // Begin the lifetime of characters in allocated storage.
+   for (size_type __i = 0; __i < __n; ++__i)
+ std::construct_at(__builtin_addressof(__p[__i]));
+#endif
+   return __p;
+  }
+
 #if __cplusplus >= 201703L
   // A helper type for avoiding boiler-plate.
   typedef basic_string_view<_CharT, _Traits> __

[PATCH] RISC-V: Fix RVV binary auto-vectorizaiton test fails

2023-05-11 Thread juzhe . zhong
From: Juzhe-Zhong 

In rv32:
FAIL: gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmin-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vand-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vrem-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmul-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/shift-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vand-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vdiv-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vor-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/shift-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/shift-scalar-run.c -O3 -ftree-vectorize 
(test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmax-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vor-rv64gcv.c -O3 -ftree-vectorize (test for 
excess errors)

In rv64:
FAIL: gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/shift-run.c: Fix fail.
* gcc.target/riscv/rvv/autovec/shift-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/shift-scalar-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vand-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vand-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vdiv-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmax-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmin-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmul-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vor-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vor-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vrem-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vxor-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vxor-rv64gcv.c: Ditto.

---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-run.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-rv64gcv.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-scalar-run.c | 2 +-
 .../gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vand-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vand-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vdiv-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmax-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmin-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmul-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vor-run.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vor-rv64gcv.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vrem-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c | 4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vxor-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vxor-rv64gcv.c | 2 +-
 22 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-run.c
index 67e9f8ca242..159478c6947 100644
--- a/gcc/

[PATCH] RISC-V: Fix RVV binary auto-vectorizaiton test fails

2023-05-11 Thread juzhe . zhong
From: Juzhe-Zhong 

In rv32:
FAIL: gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmin-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vand-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vrem-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmul-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/shift-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vand-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vdiv-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vor-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/shift-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/shift-scalar-run.c -O3 -ftree-vectorize 
(test for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vmax-run.c -O3 -ftree-vectorize (test for 
excess errors)
FAIL: gcc.target/riscv/rvv/autovec/vor-rv64gcv.c -O3 -ftree-vectorize (test for 
excess errors)

In rv64:
FAIL: gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c -O3 -ftree-vectorize (test 
for excess errors)

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/shift-run.c: Fix fail.
* gcc.target/riscv/rvv/autovec/shift-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/shift-scalar-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vand-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vand-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vdiv-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmax-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmin-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmul-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vor-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vor-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vrem-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/vxor-run.c: Ditto.
* gcc.target/riscv/rvv/autovec/vxor-rv64gcv.c: Ditto.

---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-run.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-rv64gcv.c| 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-scalar-run.c | 2 +-
 .../gcc.target/riscv/rvv/autovec/shift-scalar-rv64gcv.c   | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vadd-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vand-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vand-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vdiv-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vdiv-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmax-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmax-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmin-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmin-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmul-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmul-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vor-run.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vor-rv64gcv.c  | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vrem-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vrem-rv64gcv.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vsub-rv64gcv.c | 4 ++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vxor-run.c | 2 +-
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vxor-rv64gcv.c | 2 +-
 22 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/shift-run.c
index 67e9f8ca242..159478c6947 100644
--- a/gcc/

Re: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-11 Thread 钟居哲
Hi, Richards.

Could you take a look at this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618241.html 

Thanks


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-11 20:42
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V4] VECT: Add decrement IV iteration loop control by 
variable amount support
"juzhe.zh...@rivai.ai"  writes:
> Thanks. I have read rgroup descriptions again.
> Still I am not fully understand it clearly, bear with me :)
>
> I don't known how to differentiate Case 2 and Case 3.
>
> Case 2 is multiple rgroup for SLP.
> Case 3 is multiple rgroup for non-SLP (VEC_PACK_TRUNC)
>
> Is it correct:
> case 2: rgc->max_nscalarper_iter != 1
 
Yes.
 
> Case 3 : rgc->max_nscalarper_iter == 1 but rgc->factor != 1?
 
For case 3 it's:
 
rgc->max_nscalars_per_iter == 1 && rgc != &LOOP_VINFO_LENS (loop_vinfo)[0]
 
rgc->factor is controlled by the target and just says what units
IFN_LOAD_LEN works in.  E.g. if we're loading 16-byte elements,
but the underlying instruction measures bytes, the factor would be 2.
 
Thanks,
Richard
 


[PATCH V6] VECT: Add decrement IV support in Loop Vectorizer

2023-05-11 Thread juzhe . zhong
From: Ju-Zhe Zhong 

1. Fix document description according Jeff && Richard.
2. Add LOOP_VINFO_USING_SELECT_VL_P for single rgroup.
3. Add LOOP_VINFO_USING_SLP_ADJUSTED_LEN_P for SLP multiple rgroup.

Fix bugs for V5 after testing:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618209.html

gcc/ChangeLog:

* doc/md.texi: Add seletc_vl pattern.
* internal-fn.def (SELECT_VL): New ifn.
* optabs.def (OPTAB_D): New optab.
* tree-vect-loop-manip.cc (vect_adjust_loop_lens): New function.
(vect_set_loop_controls_by_select_vl): Ditto.
(vect_set_loop_condition_partial_vectors): Add loop control for 
decrement IV.
* tree-vect-loop.cc (vect_get_loop_len): Adjust loop len for SLP.
* tree-vect-stmts.cc (get_select_vl_data_ref_ptr): New function.
(vectorizable_store): Support data reference IV added by outcome of 
SELECT_VL.
(vectorizable_load): Ditto.
* tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): New macro.
(LOOP_VINFO_USING_SLP_ADJUSTED_LEN_P): Ditto.
(vect_get_loop_len): Adjust loop len for SLP.

---
 gcc/doc/md.texi |  36 
 gcc/internal-fn.def |   1 +
 gcc/optabs.def  |   1 +
 gcc/tree-vect-loop-manip.cc | 380 +++-
 gcc/tree-vect-loop.cc   |  31 ++-
 gcc/tree-vect-stmts.cc  |  79 +++-
 gcc/tree-vectorizer.h   |  12 +-
 7 files changed, 526 insertions(+), 14 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8ebce31ba78..a94ffc4456d 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,42 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of active elements in a vector to be updated 
+in a loop iteration based on the total number of elements to be updated, 
+the vectorization factor and vector properties of the target.
+operand 1 is the total elements in the vector to be updated.
+operand 2 is the vectorization factor.
+The value of operand 0 is target dependent and flexible in each iteration.
+The operation of this pattern can be:
+
+@smallexample
+Case 1:
+operand0 = MIN (operand1, operand2);
+operand2 can be const_poly_int or poly_int related to vector mode size.
+Some target like RISC-V has a standalone instruction to get MIN (n, MODE SIZE) 
so
+that we can reduce a use of general purpose register.
+
+In this case, only the last iteration of the loop is partial iteration.
+@end smallexample
+
+@smallexample
+Case 2:
+if (operand1 <= operand2)
+  operand0 = operand1;
+else if (operand1 < 2 * operand2)
+  operand0 = ceil (operand1 / 2);
+else
+  operand0 = operand2;
+
+This case will evenly distribute work over the last 2 iterations of a 
stripmine loop.
+@end smallexample
+
+The output of this pattern is not only used as IV of loop control counter, but 
also
+is used as the IV of address calculation with multiply/shift operation. This 
allows
+dynamic adjustment of the number of elements processed each loop iteration.
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7fe742c2ae7..6f6fa7d37f9 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -153,6 +153,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
 
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
 DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 695f5911b30..b637471b76e 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -476,3 +476,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
 OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
 OPTAB_D (len_load_optab, "len_load_$a")
 OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index ff6159e08d5..81334f4f171 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -385,6 +385,353 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
rgroup_controls *dest_rgm,
   return false;
 }
 
+/* Try to use adjust loop lens for non-SLP multiple-rgroups.
+
+ _36 = MIN_EXPR ;
+
+ First length (MIN (X, VF/N)):
+   loop_len_15 = MIN_EXPR <_36, POLY_INT_CST [2, 2]>;
+
+ Second length (X - MIN (X, 1 * VF/N)):
+   loop_len_16 = _36 - loop_len_15;
+
+ Third length (X - MIN (X, 2 * VF/N)):
+   _38 = MIN_EXPR <_36, POLY_INT_CST [4,

Re: Re: [PATCH] RISC-V: Add v_uimm_operand

2023-05-11 Thread 钟居哲
LGTM



juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt
Date: 2023-05-12 06:31
To: juzhe.zhong
CC: gcc-patches; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Add v_uimm_operand
On Thu, 11 May 2023 15:00:48 PDT (-0700), juzhe.zh...@rivai.ai wrote:
>>>  ;; V has 32-bit unsigned immediates.  This happens to be the same 
>>> constraint asIt should be 5-bit unsigned immediates>> ;  the csr_operand, 
>>> but it's not CSR related.
>>> (define_predicate "v_uimm_operand"
>>>   (match_operand 0 "csr_operand"))
> To make name consistent, it should be "vector_", so I suggest it to be 
> "vector_scalar_shift_operand".
 
Makes sense, I sent a v2.
 


Re: [PATCH] RISC-V: Add v_uimm_operand

2023-05-11 Thread Palmer Dabbelt

On Thu, 11 May 2023 15:00:48 PDT (-0700), juzhe.zh...@rivai.ai wrote:

 ;; V has 32-bit unsigned immediates.  This happens to be the same constraint asIt 
should be 5-bit unsigned immediates>> ;  the csr_operand, but it's not CSR 
related.
(define_predicate "v_uimm_operand"
  (match_operand 0 "csr_operand"))

To make name consistent, it should be "vector_", so I suggest it to be 
"vector_scalar_shift_operand".


Makes sense, I sent a v2.


[PATCH v2] RISC-V: Add vector_scalar_shift_operand

2023-05-11 Thread Palmer Dabbelt
The vector shift immediates happen to have the same constraints as some
of the CSR-related operands, but it's a different usage.  This adds a
name for them, so I don't get confused again next time.

gcc/ChangeLog:

* config/riscv/autovec.md (shifts): Use
  vector_scalar_shift_operand.
* config/riscv/predicates.md (vector_scalar_shift_operand): New
  predicate.
---
Still haven't built-tested it, my box is busy.

Changes since v1 <20230511182555.26183-1-pal...@rivosinc.com>:
* Change the name to "vector_scalar_shift_operand", as per Juzhe's
  suggestion.
* Add a missing second ";" in the comment.
---
 gcc/config/riscv/autovec.md| 2 +-
 gcc/config/riscv/predicates.md | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index ac0c939d277..4561fcbe957 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -132,7 +132,7 @@ (define_expand "3"
   [(set (match_operand:VI 0 "register_operand")
 (any_shift:VI
  (match_operand:VI 1 "register_operand")
- (match_operand: 2 "csr_operand")))]
+ (match_operand: 2 "vector_scalar_shift_operand")))]
   "TARGET_VECTOR"
 {
   if (!CONST_SCALAR_INT_P (operands[2]))
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index e5adf06fa25..90e6f942c97 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -43,6 +43,11 @@ (define_predicate "csr_operand"
   (ior (match_operand 0 "const_csr_operand")
(match_operand 0 "register_operand")))
 
+;; V has 32-bit unsigned immediates.  This happens to be the same constraint as
+;; the csr_operand, but it's not CSR related.
+(define_predicate "vector_scalar_shift_operand"
+  (match_operand 0 "csr_operand"))
+
 (define_predicate "sle_operand"
   (and (match_code "const_int")
(match_test "SMALL_OPERAND (INTVAL (op) + 1)")))
-- 
2.40.0



[PATCH] RISC-V: Add v_uimm_operand

2023-05-11 Thread 钟居哲
>>  ;; V has 32-bit unsigned immediates.  This happens to be the same 
>> constraint asIt should be 5-bit unsigned immediates>> ;  the csr_operand, 
>> but it's not CSR related.
>> (define_predicate "v_uimm_operand"
>>   (match_operand 0 "csr_operand"))
To make name consistent, it should be "vector_", so I suggest it to be 
"vector_scalar_shift_operand".

Thanks.


juzhe.zh...@rivai.ai


Re: [PATCH] libstdc++/complex: Remove implicit type casts in complex

2023-05-11 Thread Jonathan Wakely via Gcc-patches
On Mon, 27 Mar 2023 at 22:25, Weslley da Silva Pereira via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> Dear all,
>
> Here follows a patch that removes implicit type casts in std::complex.
>
> *Description:* The current implementation of `complex<_Tp>` assumes that
> `int, double, long double` are explicitly convertible to `_Tp`. Moreover,
> it also assumes that:
>
> 1. `int` is implicitly convertible to `_Tp`, e.g., when using
> `complex<_Tp>(1)`.
> 2. `long double` can be attributed to a `_Tp` variable, e.g., when using
> `const _Tp __pi_2 = 1.5707963267948966192313216916397514L`.
>
> This patch transforms the implicit casts (1) and (2) into explicit type
> casts. As a result, `std::complex` is now able to support more types. One
> example is the type `Eigen::Half` from
> https://eigen.tuxfamily.org/dox-devel/Half_8h_source.html which does not
> implement implicit type conversions.
>
> *ChangeLog:*
> libstdc++-v3/ChangeLog:
>
> * include/std/complex:
>

Thank you for the patch. Now that we're in developement stage 1 for GCC 14,
it's time to consider it.

You're missing a proper changelog entry, I suggest:

   * include/std/complex (polar, __complex_sqrt)
   (__complex_pow_unsigned, pow, __complex_acos): Replace implicit
   conversions from int and long double to value_type.

You're also missing either a copyright assignment on file with the FSF
(unless you've completed that paperwork?), or a DCO sign-off. Please see
https://gcc.gnu.org/contribute.html#legal and https://gcc.gnu.org/dco.html
for more details.


>
> *Patch:* fix_complex.diff. (Also at
> https://github.com/gcc-mirror/gcc/pull/84)
>
> *OBS:* I didn't find a good reason for adding new tests or test results
> here since this is really a small upgrade (in my view) to std::complex.
>

I don't agree. The purpose of this is to support std::complex for a
type Foo without implicit conversions (which isn't required by the standard
btw, only the floating-point types are required to work, but we can support
others as an extension). Without tests, we don't know if that goal has been
met, and we don't know if the goal continues to be met in future versions.
A test would ensure that we don't accidentally re-introduce code requiring
implicit conversions.

With a suitable test, I think this patch will be OK for GCC 14.

Thanks again for contributing.


Re: [PATCH][RFC] c-family: Implement __has_feature and __has_extension [PR60512]

2023-05-11 Thread Jonathan Wakely via Gcc-patches
On Thu, 11 May 2023 at 21:25, Jason Merrill  wrote:

> On 5/9/23 08:07, Alex Coplan wrote:
> > This patch implements clang's __has_feature and __has_extension in GCC.
>
> Thanks!
>
> > Currently the patch aims to implement all documented features (and some
> > undocumented ones) following the documentation at
> > https://clang.llvm.org/docs/LanguageExtensions.html with the following
> > omissions:
> >   - C++ type traits.
> >   - Objective-C-specific features.
> >
> > C++ type traits aren't currently implemented since, as the clang
> > documentation notes, __has_builtin is the correct "modern" way to query
> > for these (which GCC already implements). Of course there's an argument
> > that we should recognize the legacy set of C++ type traits that can be
> > queried through __has_feature for backwards compatibility with older
> > code. I'm happy to do this if reviewers think that's a good idea.
>
> That seems unnecessary unless there's a specific motivation.
>
> > There are some comments in the patch marked with XXX, I'm looking for
> > review comments from C/C++ maintainers on those areas in particular.
> >
> > Bootstrapped/regtested on aarch64-linux-gnu. Any comments?
>
> All the has_*_feature_p functions need to check flag_pedantic_errors,
> for compatibility with the Clang documented behavior "If the
> -pedantic-errors option is given, __has_extension is equivalent to
> __has_feature."
>
> > +static const cp_feature_info cp_feature_table[] =
> > +{
> > +  { "cxx_exceptions", &flag_exceptions },
> > +  { "cxx_rtti", &flag_rtti },
> > +  { "cxx_access_control_sfinae", { cxx11, cxx98 } },
> > +  { "cxx_alias_templates", cxx11 },
> > +  { "cxx_alignas", cxx11 },
> > +  { "cxx_alignof", cxx11 },
> > +  { "cxx_attributes", cxx11 },
> > +  { "cxx_constexpr", cxx11 },
> > +  { "cxx_constexpr_string_builtins", cxx11 },
> > +  { "cxx_decltype", cxx11 },
> > +  { "cxx_decltype_incomplete_return_types", cxx11 },
> > +  { "cxx_default_function_template_args", cxx11 },
> > +  { "cxx_defaulted_functions", cxx11 }, /* XXX: extension in c++98?  */
>
> I'm not sure I see the benefit of advertising a lot of these as C++98
> extensions, even if we do accept them with a pedwarn by default.  The
> ones that indicate DRs like cxx_access_control_sfinae, yes, but I'm
> inclined to be conservative if it isn't an extension that libstdc++
> relies on, like variadic templates or inline namespaces.


FWIW, I think the only other C++11 feature that libstdc++ assumes is
unconditionally available in C++98 mode is 'long long' (which is
technically not defined until C99 and C++11).



> My concern is
> that important implementation is limited to C++11 mode even if we don't
> immediately give an error.  For instance,
>
> struct A
> {
>int i = 42;
>A() = default;
> };
>
> breaks in C++98 mode; even though we only warn for the two C++11
> features, trying to actually combine them fails.
>
> So if there's a question, let's say no.
>
> > +  { "cxx_delegating_constructors", { cxx11, cxx98 } },
> > +  { "cxx_deleted_functions", cxx11 },
> > +  { "cxx_explicit_conversions", { cxx11, cxx98 } },
> > +  { "cxx_generalized_initializers", cxx11 },
> > +  { "cxx_implicit_moves", cxx11 },
> > +  { "cxx_inheriting_constructors", cxx11 }, /* XXX: extension in
> c++98?  */
> > +  { "cxx_inline_namespaces", { cxx11, cxx98 } },
> > +  { "cxx_lambdas", cxx11 }, /* XXX: extension in c++98?  */
> > +  { "cxx_local_type_template_args", cxx11 },
> > +  { "cxx_noexcept", cxx11 },
> > +  { "cxx_nonstatic_member_init", { cxx11, cxx98 } },
> > +  { "cxx_nullptr", cxx11 },
> > +  { "cxx_override_control", { cxx11, cxx98 } },
> > +  { "cxx_reference_qualified_functions", cxx11 },
> > +  { "cxx_range_for", cxx11 },
> > +  { "cxx_raw_string_literals", cxx11 },
> > +  { "cxx_rvalue_references", cxx11 },
> > +  { "cxx_static_assert", cxx11 },
> > +  { "cxx_thread_local", cxx11 },
> > +  { "cxx_auto_type", cxx11 },
> > +  { "cxx_strong_enums", cxx11 },
> > +  { "cxx_trailing_return", cxx11 },
> > +  { "cxx_unicode_literals", cxx11 },
> > +  { "cxx_unrestricted_unions", cxx11 },
> > +  { "cxx_user_literals", cxx11 },
> > +  { "cxx_variadic_templates", { cxx11, cxx98 } },
> > +  { "cxx_binary_literals", { cxx14, cxx98 } },
> > +  { "cxx_contextual_conversions", { cxx14, cxx98 } },
> > +  { "cxx_decltype_auto", cxx14 },
> > +  { "cxx_aggregate_nsdmi", cxx14 },
> > +  { "cxx_init_captures", { cxx14, cxx11 } },
> > +  { "cxx_generic_lambdas", cxx14 },
> > +  { "cxx_relaxed_constexpr", cxx14 },
> > +  { "cxx_return_type_deduction", cxx14 },
> > +  { "cxx_variable_templates", { cxx14, cxx98 } },
> > +  { "modules", &flag_modules },
>
>
>
>


[PATCH v2] libstdc++: Do not use pthread_mutex_clocklock with ThreadSanitizer

2023-05-11 Thread Jonathan Wakely via Gcc-patches
On Thu, 11 May 2023 at 13:42, Jonathan Wakely  wrote:

>
>
> On Thu, 11 May 2023 at 13:19, Mike Crowe  wrote:
>
>> However, ...
>>
>> > > diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
>> > > index 89e7f5f5f45..e2700b05ec3 100644
>> > > --- a/libstdc++-v3/acinclude.m4
>> > > +++ b/libstdc++-v3/acinclude.m4
>> > > @@ -4284,7 +4284,7 @@
>> AC_DEFUN([GLIBCXX_CHECK_PTHREAD_COND_CLOCKWAIT], [
>> > >[glibcxx_cv_PTHREAD_COND_CLOCKWAIT=no])
>> > >])
>> > >if test $glibcxx_cv_PTHREAD_COND_CLOCKWAIT = yes; then
>> > > -AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT, 1, [Define if
>> > > pthread_cond_clockwait is available in .])
>> > > +AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT,
>> (_GLIBCXX_TSAN==0),
>> > > [Define if pthread_cond_clockwait is available in .])
>> > >fi
>>
>> TSan does appear to have an interceptor for pthread_cond_clockwait, even
>> if
>> it lacks the others. Does this mean that this part is unnecessary?
>>
>
> Ah good point, thanks. I grepped for clocklock but not clockwait.
>

In fact it seems like we don't need to change
_GLIBCXX_USE_PTHREAD_RWLOCK_CLOCKLOCK either, because I don't get any tsan
warnings for that. It doesn't have interceptors for
pthread_rwlock_{rd,wr}lock, but it doesn't complain anyway (maybe it's
simply not instrumenting the rwlock functions at all?!)

So I'm now retesting with this version of the patch, which only touches the
USE_PTHREAD_LOCKLOCK macro.

Please take another look, thanks.
commit 4fc14825c125eece32980df21d09da35e3d5bac6
Author: Jonathan Wakely 
Date:   Tue May 9 09:30:48 2023

libstdc++: Do not use pthread_mutex_clocklock with ThreadSanitizer

As noted in https://github.com/llvm/llvm-project/issues/62623 there are
no tsan interceptors for some of the new POSIX-1:202x APIs added by
https://austingroupbugs.net/view.php?id=1216 so tsan gives false
positive warnings for try_lock_for on timed mutexes.

Disable the uses of the new pthread_mutex_clocklock API when tsan is
active. This changes the semantics of the try_lock_for functions,
because it can change which clock is used for the wait. This means those
functions might be affected by system clock adjustments when tsan is
used, when they would not be affected otherwise.

libstdc++-v3/ChangeLog:

* acinclude.m4 (GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK): Define
_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK in terms of _GLIBCXX_TSAN.
* configure: Regenerate.

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index 89e7f5f5f45..dce3d16aa5c 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4314,7 +4314,7 @@ AC_DEFUN([GLIBCXX_CHECK_PTHREAD_MUTEX_CLOCKLOCK], [
   [glibcxx_cv_PTHREAD_MUTEX_CLOCKLOCK=no])
   ])
   if test $glibcxx_cv_PTHREAD_MUTEX_CLOCKLOCK = yes; then
-AC_DEFINE(_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK, 1, [Define if 
pthread_mutex_clocklock is available in .])
+AC_DEFINE(_GLIBCXX_USE_PTHREAD_MUTEX_CLOCKLOCK, (_GLIBCXX_TSAN==0), 
[Define if pthread_mutex_clocklock is available in .])
   fi
 
   CXXFLAGS="$ac_save_CXXFLAGS"


Re: [PATCH] wwwdocs: Clarify experimental status of C++17 prior to GCC 9

2023-05-11 Thread Gerald Pfeifer
On Wed, 22 Mar 2023, Jonathan Wakely via Gcc-patches wrote:
> We don't currently have a single page where you can find out when
> support for a given standard became non-experimental (you have to look
> through all the gcc-X/changes.html pages to find it). I think we should
> have that info on the cxx-status.html page. This adds it for C++17, and
> we can do the same for C++20 when we declare that stable.

I'm not sure why I only noticed this today. Just a little technicality
to fix this page. 

Pushed.

Gerald


Commit a09e584729 introduced an  without corresponding .
---
 htdocs/projects/cxx-status.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index 7f59e5a2..675fbcd0 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -402,7 +402,7 @@
 -->
   
 
-  C++20 Support in GCC
+  C++20 Support in GCC
 
   GCC has experimental support for the latest revision of the C++
   standard, which was published in 2020.
-- 
2.40.1


Re: [PATCH][RFC] c-family: Implement __has_feature and __has_extension [PR60512]

2023-05-11 Thread Jason Merrill via Gcc-patches

On 5/9/23 08:07, Alex Coplan wrote:

This patch implements clang's __has_feature and __has_extension in GCC.


Thanks!


Currently the patch aims to implement all documented features (and some
undocumented ones) following the documentation at
https://clang.llvm.org/docs/LanguageExtensions.html with the following
omissions:
  - C++ type traits.
  - Objective-C-specific features.

C++ type traits aren't currently implemented since, as the clang
documentation notes, __has_builtin is the correct "modern" way to query
for these (which GCC already implements). Of course there's an argument
that we should recognize the legacy set of C++ type traits that can be
queried through __has_feature for backwards compatibility with older
code. I'm happy to do this if reviewers think that's a good idea.


That seems unnecessary unless there's a specific motivation.


There are some comments in the patch marked with XXX, I'm looking for
review comments from C/C++ maintainers on those areas in particular.

Bootstrapped/regtested on aarch64-linux-gnu. Any comments?


All the has_*_feature_p functions need to check flag_pedantic_errors, 
for compatibility with the Clang documented behavior "If the 
-pedantic-errors option is given, __has_extension is equivalent to 
__has_feature."



+static const cp_feature_info cp_feature_table[] =
+{
+  { "cxx_exceptions", &flag_exceptions },
+  { "cxx_rtti", &flag_rtti },
+  { "cxx_access_control_sfinae", { cxx11, cxx98 } },
+  { "cxx_alias_templates", cxx11 },
+  { "cxx_alignas", cxx11 },
+  { "cxx_alignof", cxx11 },
+  { "cxx_attributes", cxx11 },
+  { "cxx_constexpr", cxx11 },
+  { "cxx_constexpr_string_builtins", cxx11 },
+  { "cxx_decltype", cxx11 },
+  { "cxx_decltype_incomplete_return_types", cxx11 },
+  { "cxx_default_function_template_args", cxx11 },
+  { "cxx_defaulted_functions", cxx11 }, /* XXX: extension in c++98?  */


I'm not sure I see the benefit of advertising a lot of these as C++98 
extensions, even if we do accept them with a pedwarn by default.  The 
ones that indicate DRs like cxx_access_control_sfinae, yes, but I'm 
inclined to be conservative if it isn't an extension that libstdc++ 
relies on, like variadic templates or inline namespaces.  My concern is 
that important implementation is limited to C++11 mode even if we don't 
immediately give an error.  For instance,


struct A
{
  int i = 42;
  A() = default;
};

breaks in C++98 mode; even though we only warn for the two C++11 
features, trying to actually combine them fails.


So if there's a question, let's say no.


+  { "cxx_delegating_constructors", { cxx11, cxx98 } },
+  { "cxx_deleted_functions", cxx11 },
+  { "cxx_explicit_conversions", { cxx11, cxx98 } },
+  { "cxx_generalized_initializers", cxx11 },
+  { "cxx_implicit_moves", cxx11 },
+  { "cxx_inheriting_constructors", cxx11 }, /* XXX: extension in c++98?  */
+  { "cxx_inline_namespaces", { cxx11, cxx98 } },
+  { "cxx_lambdas", cxx11 }, /* XXX: extension in c++98?  */
+  { "cxx_local_type_template_args", cxx11 },
+  { "cxx_noexcept", cxx11 },
+  { "cxx_nonstatic_member_init", { cxx11, cxx98 } },
+  { "cxx_nullptr", cxx11 },
+  { "cxx_override_control", { cxx11, cxx98 } },
+  { "cxx_reference_qualified_functions", cxx11 },
+  { "cxx_range_for", cxx11 },
+  { "cxx_raw_string_literals", cxx11 },
+  { "cxx_rvalue_references", cxx11 },
+  { "cxx_static_assert", cxx11 },
+  { "cxx_thread_local", cxx11 },
+  { "cxx_auto_type", cxx11 },
+  { "cxx_strong_enums", cxx11 },
+  { "cxx_trailing_return", cxx11 },
+  { "cxx_unicode_literals", cxx11 },
+  { "cxx_unrestricted_unions", cxx11 },
+  { "cxx_user_literals", cxx11 },
+  { "cxx_variadic_templates", { cxx11, cxx98 } },
+  { "cxx_binary_literals", { cxx14, cxx98 } },
+  { "cxx_contextual_conversions", { cxx14, cxx98 } },
+  { "cxx_decltype_auto", cxx14 },
+  { "cxx_aggregate_nsdmi", cxx14 },
+  { "cxx_init_captures", { cxx14, cxx11 } },
+  { "cxx_generic_lambdas", cxx14 },
+  { "cxx_relaxed_constexpr", cxx14 },
+  { "cxx_return_type_deduction", cxx14 },
+  { "cxx_variable_templates", { cxx14, cxx98 } },
+  { "modules", &flag_modules },






[committed] libstdc++: Use RAII types in strtod-based std::from_chars implementation

2023-05-11 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

Patrick noted that auto_ferounding could be used in floating_to_chars.cc
too, which I'll do later.

-- >8 --

This adds auto_locale and auto_ferounding types to use RAII for changing
and restoring the local and floating-point environment when using strtod
to implement std::from_chars.

The destructors for the RAII objects run slightly later than the
previous statements that restored the locale/fenv, but the differences
are just some trivial assignments and an isinf call.

Reviewed-by: Patrick Palka 

libstdc++-v3/ChangeLog:

* src/c++17/floating_from_chars.cc [USE_STRTOD_FOR_FROM_CHARS]
(auto_locale, auto_ferounding): New class types.
(from_chars_impl): Use auto_locale and auto_ferounding.
---
 libstdc++-v3/src/c++17/floating_from_chars.cc | 88 +++
 1 file changed, 69 insertions(+), 19 deletions(-)

diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
b/libstdc++-v3/src/c++17/floating_from_chars.cc
index 78b9d92cdc0..ebd428d5be3 100644
--- a/libstdc++-v3/src/c++17/floating_from_chars.cc
+++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
@@ -597,6 +597,69 @@ namespace
 return buf.c_str();
   }
 
+  // RAII type to change and restore the locale.
+  struct auto_locale
+  {
+#if _GLIBCXX_HAVE_USELOCALE
+// When we have uselocale we can change the current thread's locale.
+const locale_t loc;
+locale_t orig;
+
+auto_locale()
+: loc(::newlocale(LC_ALL_MASK, "C", (locale_t)0))
+{
+  if (loc)
+   orig = ::uselocale(loc);
+  else
+   ec = errc{errno};
+}
+
+~auto_locale()
+{
+  if (loc)
+   {
+ ::uselocale(orig);
+ ::freelocale(loc);
+   }
+}
+#else
+// Otherwise, we can't change the locale and so strtod can't be used.
+auto_locale() = delete;
+#endif
+
+explicit operator bool() const noexcept { return ec == errc{}; }
+
+errc ec{};
+
+auto_locale(const auto_locale&) = delete;
+auto_locale& operator=(const auto_locale&) = delete;
+  };
+
+  // RAII type to change and restore the floating-point environment.
+  struct auto_ferounding
+  {
+#if _GLIBCXX_USE_C99_FENV_TR1 && defined(FE_TONEAREST)
+const int rounding = std::fegetround();
+
+auto_ferounding()
+{
+  if (rounding != FE_TONEAREST)
+   std::fesetround(FE_TONEAREST);
+}
+
+~auto_ferounding()
+{
+  if (rounding != FE_TONEAREST)
+   std::fesetround(rounding);
+}
+#else
+auto_ferounding() = default;
+#endif
+
+auto_ferounding(const auto_ferounding&) = delete;
+auto_ferounding& operator=(const auto_ferounding&) = delete;
+  };
+
   // Convert the NTBS `str` to a floating-point value of type `T`.
   // If `str` cannot be converted, `value` is unchanged and `0` is returned.
   // Otherwise, let N be the number of characters consumed from `str`.
@@ -607,16 +670,11 @@ namespace
   ptrdiff_t
   from_chars_impl(const char* str, T& value, errc& ec) noexcept
   {
-if (locale_t loc = ::newlocale(LC_ALL_MASK, "C", (locale_t)0)) [[likely]]
+auto_locale loc;
+
+if (loc)
   {
-   locale_t orig = ::uselocale(loc);
-
-#if _GLIBCXX_USE_C99_FENV_TR1 && defined(FE_TONEAREST)
-   const int rounding = std::fegetround();
-   if (rounding != FE_TONEAREST)
- std::fesetround(FE_TONEAREST);
-#endif
-
+   auto_ferounding rounding;
const int save_errno = errno;
errno = 0;
char* endptr;
@@ -647,14 +705,6 @@ namespace
 #endif
const int conv_errno = std::__exchange(errno, save_errno);
 
-#if _GLIBCXX_USE_C99_FENV_TR1 && defined(FE_TONEAREST)
-   if (rounding != FE_TONEAREST)
- std::fesetround(rounding);
-#endif
-
-   ::uselocale(orig);
-   ::freelocale(loc);
-
const ptrdiff_t n = endptr - str;
if (conv_errno == ERANGE) [[unlikely]]
  {
@@ -675,8 +725,8 @@ namespace
  }
return n;
   }
-else if (errno == ENOMEM)
-  ec = errc::not_enough_memory;
+else
+  ec = loc.ec;
 
 return 0;
   }
-- 
2.40.1



[committed] libstdc++: Fix chrono::hh_mm_ss::subseconds() [PR109772]

2023-05-11 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

This is a regression on gcc-13 too, but I'm undecided about the ABI
change for the branch. Generally that would be a no-go, but the affected
specializations are probably so rare that it would be OK. And we
definitely want to fix the ambiguity on the branch anyway.

-- >8 --

I borked the logic in r13-4526-g5329e1a8e1480d so that the selected
partial specialization of hh_mm_ss::__subseconds might not be able to
represent the correct number of subseconds. This can result in a
truncated value being stored for the subseconds, e.g., 4755859375 gets
truncated to 460892079 because the correct value doesn't fit in
uint_least32_t.

Instead of checking whether the maximum value of the incoming duration
type can be represented, we would need to check whether that maximum value
can be represented after being converted to the correct precision type:

   template
 static constexpr bool __fits
   = duration_cast(_Duration::max()).count()
   <= duration_values<_Tp>::max();

However, this can fail to compile, due to integer overflow in the
constexpr multiplications. Instead, we could limit the check to the case
where the incoming duration has the same period as the precision, where
no conversion is needed and so no overflow can happen. But that seems of
very limited value, as it would only benefit specializations like
hh_mm_ss>, which can only represent a
time-of-day between -00:00:00.0215 and +00:00:00.0215 measured in
picoseconds!

Additionally, the hh_mm_ss::__subseconds partial specializations do not
have disjoint constraints, so that some hh_mm_ss specializations result
in ambiguities tying to match a __subseconds partial specialization.

The most practical fix is to just stop using the __fits variable
template in the constraints of the partial specializations. This fixes
the truncated values by not selecting an inappropriate partial
specialization, and fixes the ambiguous match by ensuring the
constraints are disjoint.

Fixing this changes the layout of some specializations, so is an ABI
change. It only affects specializations that have a small (less than
64-bit) representation type and either a very small period (e.g. like
the picosecond specialization above) or a non-power-of-ten period like
ratio<1, 1024>.  For example both hh_mm_ss> and
hh_mm_ss> are affected (increasing from 16
bytes to 24 on x86_64), but hh_mm_ss> and
hh_mm_ss> are not affected.

libstdc++-v3/ChangeLog:

PR libstdc++/109772
* include/std/chrono (hh_mm_ss::__fits): Remove variable
template.
(hh_mm_ss::__subseconds): Remove __fits from constraints.
* testsuite/std/time/hh_mm_ss/109772.cc: New test.
* testsuite/std/time/hh_mm_ss/1.cc: Adjust expected size for
hh_mm_ss>.
---
 libstdc++-v3/include/std/chrono   | 12 ++-
 libstdc++-v3/testsuite/std/time/hh_mm_ss/1.cc |  2 +-
 .../testsuite/std/time/hh_mm_ss/109772.cc | 31 +++
 3 files changed, 34 insertions(+), 11 deletions(-)
 create mode 100644 libstdc++-v3/testsuite/std/time/hh_mm_ss/109772.cc

diff --git a/libstdc++-v3/include/std/chrono b/libstdc++-v3/include/std/chrono
index 7bfc9b79acf..660e8d2b746 100644
--- a/libstdc++-v3/include/std/chrono
+++ b/libstdc++-v3/include/std/chrono
@@ -2398,17 +2398,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{ return {}; }
  };
 
-   // True if the maximum constructor argument can be represented in _Tp.
-   template
- static constexpr bool __fits
-   = duration_values::max()
-   <= duration_values<_Tp>::max();
-
template
  requires (!treat_as_floating_point_v<_Rep>)
&& ratio_less_v<_Period, ratio<1, 1>>
-   && (ratio_greater_equal_v<_Period, ratio<1, 250>>
- || __fits)
+   && ratio_greater_equal_v<_Period, ratio<1, 250>>
  struct __subseconds>
  {
unsigned char _M_r{};
@@ -2421,8 +2414,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
template
  requires (!treat_as_floating_point_v<_Rep>)
&& ratio_less_v<_Period, ratio<1, 250>>
-   && (ratio_greater_equal_v<_Period, ratio<1, 40>>
- || __fits)
+   && ratio_greater_equal_v<_Period, ratio<1, 40>>
  struct __subseconds>
  {
uint_least32_t _M_r{};
diff --git a/libstdc++-v3/testsuite/std/time/hh_mm_ss/1.cc 
b/libstdc++-v3/testsuite/std/time/hh_mm_ss/1.cc
index f8a3e115af3..85f991c5e03 100644
--- a/libstdc++-v3/testsuite/std/time/hh_mm_ss/1.cc
+++ b/libstdc++-v3/testsuite/std/time/hh_mm_ss/1.cc
@@ -109,8 +109,8 @@ size()
   static_assert(sizeof(hh_mm_ss>) == sizeof(S1));
   struct S2 { long long h; char m, s; bool neg; int ss; };
   static_assert(sizeof(hh_mm_ss>) == sizeof(S2));
-  static_assert(sizeof(hh_mm_ss>) == sizeof(S2));
   struct S3 { long long h; char m, s; bool neg; long long ss; };
+  static_assert(sizeof(hh_mm_ss>

[committed] libstdc++: Enforce value_type consistency in strings and streams

2023-05-11 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux. Pushed to trunk.

I don't plan to backport the assertions, because they're an API change
that isn't suitable for the branches. But removing _Alloc_traits_impl
and replacing it with _S_allocate should be done for gcc-13 to keep the
contents of the two libstdc++.so.6.0.32 libraries in sync.

-- >8 --

P1463R1 made it ill-formed for allocator-aware containers (including
std::basic_string) to use an allocator that has a different value_type
from the container itself. We already enforce that for other containers
(since r8-4828-g866e4d3853ccc0), but not for std::basic_string. We
traditionally accepted it as an extension and rebound the allocator, so
this change only adds the enforcement for C++20 and later.

Similarly, P1148R0 made it ill-formed for strings and streams to use a
traits type that has an incorrect char_type. We already enforce that for
std::basic_string_view, so we just need to add it to std::basic_ios and
std::basic_string.

The assertion for the allocator's value_type caused some testsuite
regressions:
FAIL: 21_strings/basic_string/cons/char/deduction.cc (test for excess errors)
FAIL: 21_strings/basic_string/cons/wchar_t/deduction.cc (test for excess errors)
FAIL: 21_strings/basic_string/requirements/explicit_instantiation/debug.cc 
(test for excess errors)
FAIL: 21_strings/basic_string/requirements/explicit_instantiation/int.cc (test 
for excess errors)

The last two are testing the traditional extension that rebinds the
allocator, so need to be disabled for C++20.

The first two are similar to LWG 3076 where an incorrect constructor is
considered for CTAD. In this case, determining that it's not viable
requires instantiating std::basic_string, Alloc>
which then fails the new assertion, because Alloc::value_type is not the
same as Iter. This is only a problem because the size_type parameter of
the non-viable constructor is an alias for
_Alloc_traits_impl::size_type which is a nested type, and so the
enclosing basic_string specialization needs to be instantiated. If we
remove the _Alloc_traits_impl wrapper that was added in
r12-5413-g2d76292bd6719d, then the definition of size_type no longer
depends on basic_string, and we don't instantiate an invalid
specialization and don't fail the assertion. The work done by
_Alloc_traits_impl::allocate can be done in a _S_allocate function
instead, which is probably more efficient to compile anyway.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver: Export basic_string::_S_allocate.
* include/bits/basic_ios.h: Add static assertion checking
traits_type::value_type.
* include/bits/basic_string.h: Likewise. Do not rebind
allocator, and add static assertion checking its value_type.
(basic_string::_Alloc_traits_impl): Remove class template.
(basic_string::_S_allocate): New static member function.
(basic_string::assign): Use _S_allocate.
* include/bits/basic_string.tcc (basic_string::_M_create)
(basic_string::reserve, basic_string::_M_replace): Likewise.
* 
testsuite/21_strings/basic_string/requirements/explicit_instantiation/debug.cc:
Disable for C++20 and later.
* 
testsuite/21_strings/basic_string/requirements/explicit_instantiation/int.cc:
Likweise.
---
 libstdc++-v3/config/abi/pre/gnu.ver   |  5 +-
 libstdc++-v3/include/bits/basic_ios.h |  4 ++
 libstdc++-v3/include/bits/basic_string.h  | 55 ---
 libstdc++-v3/include/bits/basic_string.tcc|  8 +--
 .../explicit_instantiation/debug.cc   |  2 +-
 .../explicit_instantiation/int.cc |  2 +-
 6 files changed, 37 insertions(+), 39 deletions(-)

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver 
b/libstdc++-v3/config/abi/pre/gnu.ver
index 36bb87880d7..768cd4a4a6c 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1759,7 +1759,9 @@ GLIBCXX_3.4.21 {
 #endif
 
 # ABI-tagged std::basic_string
-_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE1[01]**;
+_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE10_M_[dr]*;
+
_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE10_S_compareE[jmy][jmy];
+
_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE11_M_capacityE[jmy];
 
_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE12_Alloc_hiderC[12]EP[cw]RKS3_;
 _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE12_M*;
 _ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE13*;
@@ -2516,6 +2518,7 @@ GLIBCXX_3.4.31 {
 
 GLIBCXX_3.4.32 {
 _ZSt21ios_base_library_initv;
+
_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE11_S_allocateERS3_[jmy];
 } GLIBCXX_3.4.31;
 
 # Symbols in the support library (libsupc++) have their own tag.
diff --git a/libstdc++-v3/include/bits/basic_ios.h 
b/libstdc++-v3/include/bits/basic_ios.h
index de5719c1d68..c7c391c0f49 100644
--- a/libstdc++-v3/i

Re: [PATCH v3] Add pattern to convert vector shift + bitwise and + multiply to vector compare in some cases.

2023-05-11 Thread Philipp Tomsich
Bootstrapped and reg-tested overnight for x86 and aarch64.
Applied to master, thanks!

Philipp.

On Tue, 9 May 2023 at 09:13, Richard Biener  wrote:
>
> On Tue, Dec 20, 2022 at 1:23 PM Manolis Tsamis  
> wrote:
> >
> > When using SWAR (SIMD in a register) techniques a comparison operation 
> > within
> > such a register can be made by using a combination of shifts, bitwise and 
> > and
> > multiplication. If code using this scheme is vectorized then there is 
> > potential
> > to replace all these operations with a single vector comparison, by 
> > reinterpreting
> > the vector types to match the width of the SWAR register.
> >
> > For example, for the test function packed_cmp_16_32, the original generated 
> > code is:
> >
> > ldr q0, [x0]
> > add w1, w1, 1
> > ushrv0.4s, v0.4s, 15
> > and v0.16b, v0.16b, v2.16b
> > shl v1.4s, v0.4s, 16
> > sub v0.4s, v1.4s, v0.4s
> > str q0, [x0], 16
> > cmp w2, w1
> > bhi .L20
> >
> > with this pattern the above can be optimized to:
> >
> > ldr q0, [x0]
> > add w1, w1, 1
> > cmltv0.8h, v0.8h, #0
> > str q0, [x0], 16
> > cmp w2, w1
> > bhi .L20
> >
> > The effect is similar for x86-64.
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Simplify vector shift + bit_and + multiply in some 
> > cases.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/swar_to_vec_cmp.c: New test.
>
> OK if it still bootstraps/tests OK.
>
> Thanks,
> Richard.
>
> > Signed-off-by: Manolis Tsamis 
> >
> > ---
> >
> > Changes in v3:
> > - Changed pattern to use vec_cond_expr.
> > - Changed pattern to work with VLA vector.
> > - Added both expand_vec_cmp_expr_p and
> >   expand_vec_cond_expr_p check.
> > - Fixed type compatibility issues.
> >
> >  gcc/match.pd  | 61 
> >  .../gcc.target/aarch64/swar_to_vec_cmp.c  | 72 +++
> >  2 files changed, 133 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/swar_to_vec_cmp.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 67a0a682f31..320437f8aa3 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -301,6 +301,67 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (view_convert (bit_and:itype (view_convert @0)
> >  (ne @1 { build_zero_cst (type); })))
> >
> > +/* In SWAR (SIMD within a register) code a signed comparison of packed data
> > +   can be constructed with a particular combination of shift, bitwise and,
> > +   and multiplication by constants.  If that code is vectorized we can
> > +   convert this pattern into a more efficient vector comparison.  */
> > +(simplify
> > + (mult (bit_and (rshift @0 uniform_integer_cst_p@1)
> > +   uniform_integer_cst_p@2)
> > +uniform_integer_cst_p@3)
> > + (with {
> > +   tree rshift_cst = uniform_integer_cst_p (@1);
> > +   tree bit_and_cst = uniform_integer_cst_p (@2);
> > +   tree mult_cst = uniform_integer_cst_p (@3);
> > +  }
> > +  /* Make sure we're working with vectors and uniform vector constants.  */
> > +  (if (VECTOR_TYPE_P (type)
> > +   && tree_fits_uhwi_p (rshift_cst)
> > +   && tree_fits_uhwi_p (mult_cst)
> > +   && tree_fits_uhwi_p (bit_and_cst))
> > +   /* Compute what constants would be needed for this to represent a packed
> > +  comparison based on the shift amount denoted by RSHIFT_CST.  */
> > +   (with {
> > + HOST_WIDE_INT vec_elem_bits = vector_element_bits (type);
> > + poly_int64 vec_nelts = TYPE_VECTOR_SUBPARTS (type);
> > + poly_int64 vec_bits = vec_elem_bits * vec_nelts;
> > + unsigned HOST_WIDE_INT cmp_bits_i, bit_and_i, mult_i;
> > + unsigned HOST_WIDE_INT target_mult_i, target_bit_and_i;
> > + cmp_bits_i = tree_to_uhwi (rshift_cst) + 1;
> > + mult_i = tree_to_uhwi (mult_cst);
> > + target_mult_i = (HOST_WIDE_INT_1U << cmp_bits_i) - 1;
> > + bit_and_i = tree_to_uhwi (bit_and_cst);
> > + target_bit_and_i = 0;
> > +
> > + /* The bit pattern in BIT_AND_I should be a mask for the least
> > +   significant bit of each packed element that is CMP_BITS wide.  */
> > + for (unsigned i = 0; i < vec_elem_bits / cmp_bits_i; i++)
> > +   target_bit_and_i = (target_bit_and_i << cmp_bits_i) | 1U;
> > +}
> > +(if ((exact_log2 (cmp_bits_i)) >= 0
> > +&& cmp_bits_i < HOST_BITS_PER_WIDE_INT
> > +&& multiple_p (vec_bits, cmp_bits_i)
> > +&& vec_elem_bits <= HOST_BITS_PER_WIDE_INT
> > +&& target_mult_i == mult_i
> > +&& target_bit_and_i == bit_and_i)
> > + /* Compute the vector shape for the comparison and check if the 
> > target is
> > +   able to expand the comparison with that type.  */
> > + (with {
> > +   /* We're doing a signed comparison.  */
> > +   tree cmp_type = build_n

Re: [PATCH] c++: 'mutable' subobject of constexpr variable [PR109745]

2023-05-11 Thread Jason Merrill via Gcc-patches

On 5/11/23 14:30, Patrick Palka wrote:

r13-2701-g7107ea6fb933f1 made us correctly accept 'mutable' member
accesses during constexpr evaluation of objects constructed during that
evaluation, while continuing to reject such accesses for constexpr
objects constructed outside of that evaluation, by considering the
CONSTRUCTOR_MUTABLE_POISON flag during cxx_eval_component_reference.

However, this flag is set only for the outermost CONSTRUCTOR of a
constexpr variable initializer, so if we're accessing a 'mutable'
subobject within a nested CONSTRUCTOR, the flag won't be set and
we'll incorrectly accept the access.  This can lead to us rejecting
valid code, as in the first testcase, or even wrong code due to
speculative constexpr evaluation as in the second and third testcase.

This patch fixes this by setting CONSTRUCTOR_MUTABLE_POISON recursively
rather than only on the outermost CONSTRUCTOR.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13?


OK.


PR c++/109745

gcc/cp/ChangeLog:

* typeck2.cc (poison_mutable_constructors): Define.
(store_init_value): Use it instead of setting
CONSTRUCTOR_MUTABLE_POISON directly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-mutable4.C: New test.
* g++.dg/cpp0x/constexpr-mutable5.C: New test.
* g++.dg/cpp1y/constexpr-mutable2.C: New test.
---
  gcc/cp/typeck2.cc | 26 +++--
  .../g++.dg/cpp0x/constexpr-mutable4.C | 16 
  .../g++.dg/cpp0x/constexpr-mutable5.C | 39 +++
  .../g++.dg/cpp1y/constexpr-mutable2.C | 20 ++
  4 files changed, 97 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-mutable4.C
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-mutable5.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-mutable2.C

diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index f5cc7c8371c..8a187708482 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -776,6 +776,27 @@ split_nonconstant_init (tree dest, tree init)
return code;
  }
  
+/* T is the initializer of a constexpr variable.  Set CONSTRUCTOR_MUTABLE_POISON

+   for any CONSTRUCTOR within T that contains (directly or indirectly) a 
mutable
+   member, thereby poisoning it so it can't be copied to another a constexpr
+   variable, or read during constexpr evaluation.  */
+
+static void
+poison_mutable_constructors (tree t)
+{
+  if (TREE_CODE (t) != CONSTRUCTOR)
+return;
+
+  if (cp_has_mutable_p (TREE_TYPE (t)))
+{
+  CONSTRUCTOR_MUTABLE_POISON (t) = true;
+
+  if (vec *elts = CONSTRUCTOR_ELTS (t))
+   for (const constructor_elt &ce : *elts)
+ poison_mutable_constructors (ce.value);
+}
+}
+
  /* Perform appropriate conversions on the initial value of a variable,
 store it in the declaration DECL,
 and print any error messages that are appropriate.
@@ -886,10 +907,7 @@ store_init_value (tree decl, tree init, vec** 
cleanups, int flags)
else
value = fold_non_dependent_init (value, tf_warning_or_error,
 /*manifestly_const_eval=*/true, decl);
-  if (TREE_CODE (value) == CONSTRUCTOR && cp_has_mutable_p (type))
-   /* Poison this CONSTRUCTOR so it can't be copied to another
-  constexpr variable.  */
-   CONSTRUCTOR_MUTABLE_POISON (value) = true;
+  poison_mutable_constructors (value);
const_init = (reduced_constant_expression_p (value)
|| error_operand_p (value));
DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = const_init;
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable4.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable4.C
new file mode 100644
index 000..01f32dea1bd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable4.C
@@ -0,0 +1,16 @@
+// PR c++/109745
+// { dg-do compile { target c++11 } }
+
+struct A { mutable int m = 0; };
+
+struct B { A a; };
+
+struct C { B b; };
+
+int main() {
+  constexpr B b;
+  constexpr int bam = b.a.m;// { dg-error "mutable" }
+
+  constexpr C c;
+  constexpr int cbam = c.b.a.m; // { dg-error "mutable" }
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable5.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable5.C
new file mode 100644
index 000..6a530e2abe6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable5.C
@@ -0,0 +1,39 @@
+// PR c++/109745
+// { dg-do run { target c++11 } }
+// { dg-additional-options "-O" }
+
+struct A {
+  mutable int m = 0;
+  void f() const { ++m; };
+  constexpr int get_m() const { return m; }
+};
+
+struct B { A a; };
+
+struct C { B b; };
+
+int main() {
+  constexpr A a;
+  a.m++;
+  if (a.get_m() != 1 || a.m != 1)
+__builtin_abort();
+  a.m++;
+  if (a.get_m() != 2 || a.m != 2)
+__builtin_abort();
+
+  constexpr B b;
+  b.a.m++;
+  if (b.a.get_m() != 1 || b.a.m != 1)
+__builtin_abort();
+  b

Re: [PATCH] [PR96339] AArch64: Optimise svlast[ab]

2023-05-11 Thread Richard Sandiford via Gcc-patches
Tejas Belagod  writes:
> From: Tejas Belagod 
>
>   This PR optimizes an SVE intrinsics sequence where
> svlasta (svptrue_pat_b8 (SV_VL1), x)
>   a scalar is selected based on a constant predicate and a variable vector.
>   This sequence is optimized to return the correspoding element of a NEON
>   vector. For eg.
> svlasta (svptrue_pat_b8 (SV_VL1), x)
>   returns
> umovw0, v0.b[1]
>   Likewise,
> svlastb (svptrue_pat_b8 (SV_VL1), x)
>   returns
>  umovw0, v0.b[0]
>   This optimization only works provided the constant predicate maps to a range
>   that is within the bounds of a 128-bit NEON register.
>
> gcc/ChangeLog:
>
>   PR target/96339
>   * config/aarch64/aarch64-sve-builtins-base.cc (svlast_impl::fold): Fold 
> sve
>   calls that have a constant input predicate vector.
>   (svlast_impl::is_lasta): Query to check if intrinsic is svlasta.
>   (svlast_impl::is_lastb): Query to check if intrinsic is svlastb.
>   (svlast_impl::vect_all_same): Check if all vector elements are equal.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/96339
>   * gcc.target/aarch64/sve/acle/general-c/svlast.c: New.
>   * gcc.target/aarch64/sve/acle/general-c/svlast128_run.c: New.
>   * gcc.target/aarch64/sve/acle/general-c/svlast256_run.c: New.
>   * gcc.target/aarch64/sve/pcs/return_4.c (caller_bf16): Fix asm
>   to expect optimized code for function body.
>   * gcc.target/aarch64/sve/pcs/return_4_128.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_4_256.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_4_512.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_4_1024.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_4_2048.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_5.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_5_128.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_5_256.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_5_512.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_5_1024.c (caller_bf16): Likewise.
>   * gcc.target/aarch64/sve/pcs/return_5_2048.c (caller_bf16): Likewise.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  | 124 +++
>  .../aarch64/sve/acle/general-c/svlast.c   |  63 
>  .../sve/acle/general-c/svlast128_run.c| 313 +
>  .../sve/acle/general-c/svlast256_run.c| 314 ++
>  .../gcc.target/aarch64/sve/pcs/return_4.c |   2 -
>  .../aarch64/sve/pcs/return_4_1024.c   |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_4_128.c |   2 -
>  .../aarch64/sve/pcs/return_4_2048.c   |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_4_256.c |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_4_512.c |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_5.c |   2 -
>  .../aarch64/sve/pcs/return_5_1024.c   |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_5_128.c |   2 -
>  .../aarch64/sve/pcs/return_5_2048.c   |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_5_256.c |   2 -
>  .../gcc.target/aarch64/sve/pcs/return_5_512.c |   2 -
>  16 files changed, 814 insertions(+), 24 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast128_run.c
>  create mode 100644 
> gcc/testsuite/gcc.target/aarch64/sve/acle/general-c/svlast256_run.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index cd9cace3c9b..db2b4dcaac9 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -1056,6 +1056,130 @@ class svlast_impl : public quiet
>  public:
>CONSTEXPR svlast_impl (int unspec) : m_unspec (unspec) {}
>  
> +  bool is_lasta () const { return m_unspec == UNSPEC_LASTA; }
> +  bool is_lastb () const { return m_unspec == UNSPEC_LASTB; }
> +
> +  bool vect_all_same (tree v , int step) const

Nit: stray space after "v".

> +  {
> +int i;
> +int nelts = vector_cst_encoded_nelts (v);
> +int first_el = 0;
> +
> +for (i = first_el; i < nelts; i += step)
> +  if (VECTOR_CST_ENCODED_ELT (v, i) != VECTOR_CST_ENCODED_ELT (v, 
> first_el))

I think this should use !operand_equal_p (..., ..., 0).

> + return false;
> +
> +return true;
> +  }
> +
> +  /* Fold a svlast{a/b} call with constant predicate to a BIT_FIELD_REF.
> + BIT_FIELD_REF lowers to a NEON element extract, so we have to make sure
> + the index of the element being accessed is in the range of a NEON vector
> + width.  */

s/NEON/Advanced SIMD/.  Same in later comments

> +  gimple *fold (gimple_folder & f) const override
> +  {
> +tree pred = gimple_call_arg (f.call, 0);
> +tree val = gimple_call_arg (f.call, 1);
> 

Re: [aarch64] Code-gen for vector initialization involving constants

2023-05-11 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:

> On Tue, 2 May 2023 at 18:22, Richard Sandiford
>  wrote:
>>
>> Prathamesh Kulkarni  writes:
>> > On Tue, 2 May 2023 at 17:32, Richard Sandiford
>> >  wrote:
>> >>
>> >> Prathamesh Kulkarni  writes:
>> >> > On Tue, 2 May 2023 at 14:56, Richard Sandiford
>> >> >  wrote:
>> >> >> > [aarch64] Improve code-gen for vector initialization with single 
>> >> >> > constant element.
>> >> >> >
>> >> >> > gcc/ChangeLog:
>> >> >> >   * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak 
>> >> >> > condition
>> >> >> >   if (n_var == n_elts && n_elts <= 16) to allow a single 
>> >> >> > constant,
>> >> >> >   and if maxv == 1, use constant element for duplicating into 
>> >> >> > register.
>> >> >> >
>> >> >> > gcc/testsuite/ChangeLog:
>> >> >> >   * gcc.target/aarch64/vec-init-single-const.c: New test.
>> >> >> >
>> >> >> > diff --git a/gcc/config/aarch64/aarch64.cc 
>> >> >> > b/gcc/config/aarch64/aarch64.cc
>> >> >> > index 2b0de7ca038..f46750133a6 100644
>> >> >> > --- a/gcc/config/aarch64/aarch64.cc
>> >> >> > +++ b/gcc/config/aarch64/aarch64.cc
>> >> >> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx 
>> >> >> > vals)
>> >> >> >   and matches[X][1] with the count of duplicate elements (if X 
>> >> >> > is the
>> >> >> >   earliest element which has duplicates).  */
>> >> >> >
>> >> >> > -  if (n_var == n_elts && n_elts <= 16)
>> >> >> > +  if ((n_var >= n_elts - 1) && n_elts <= 16)
>> >> >> >  {
>> >> >> >int matches[16][2] = {0};
>> >> >> >for (int i = 0; i < n_elts; i++)
>> >> >> > @@ -7,6 +7,18 @@ aarch64_expand_vector_init (rtx target, rtx 
>> >> >> > vals)
>> >> >> >vector register.  For big-endian we want that position to 
>> >> >> > hold
>> >> >> >the last element of VALS.  */
>> >> >> > maxelement = BYTES_BIG_ENDIAN ? n_elts - 1 : 0;
>> >> >> > +
>> >> >> > +   /* If we have a single constant element, use that for 
>> >> >> > duplicating
>> >> >> > +  instead.  */
>> >> >> > +   if (n_var == n_elts - 1)
>> >> >> > + for (int i = 0; i < n_elts; i++)
>> >> >> > +   if (CONST_INT_P (XVECEXP (vals, 0, i))
>> >> >> > +   || CONST_DOUBLE_P (XVECEXP (vals, 0, i)))
>> >> >> > + {
>> >> >> > +   maxelement = i;
>> >> >> > +   break;
>> >> >> > + }
>> >> >> > +
>> >> >> > rtx x = force_reg (inner_mode, XVECEXP (vals, 0, 
>> >> >> > maxelement));
>> >> >> > aarch64_emit_move (target, lowpart_subreg (mode, x, 
>> >> >> > inner_mode));
>> >> >>
>> >> >> We don't want to force the constant into a register though.
>> >> > OK right, sorry.
>> >> > With the attached patch, for the following test-case:
>> >> > int64x2_t f_s64(int64_t x)
>> >> > {
>> >> >   return (int64x2_t) { x, 1 };
>> >> > }
>> >> >
>> >> > it loads constant from memory (same code-gen as without patch).
>> >> > f_s64:
>> >> > adrpx1, .LC0
>> >> > ldr q0, [x1, #:lo12:.LC0]
>> >> > ins v0.d[0], x0
>> >> > ret
>> >> >
>> >> > Does the patch look OK ?
>> >> >
>> >> > Thanks,
>> >> > Prathamesh
>> >> > [...]
>> >> > [aarch64] Improve code-gen for vector initialization with single 
>> >> > constant element.
>> >> >
>> >> > gcc/ChangeLog:
>> >> >   * config/aarch64/aarc64.cc (aarch64_expand_vector_init): Tweak 
>> >> > condition
>> >> >   if (n_var == n_elts && n_elts <= 16) to allow a single constant,
>> >> >   and if maxv == 1, use constant element for duplicating into 
>> >> > register.
>> >> >
>> >> > gcc/testsuite/ChangeLog:
>> >> >   * gcc.target/aarch64/vec-init-single-const.c: New test.
>> >> >
>> >> > diff --git a/gcc/config/aarch64/aarch64.cc 
>> >> > b/gcc/config/aarch64/aarch64.cc
>> >> > index 2b0de7ca038..97309ddec4f 100644
>> >> > --- a/gcc/config/aarch64/aarch64.cc
>> >> > +++ b/gcc/config/aarch64/aarch64.cc
>> >> > @@ -22167,7 +22167,7 @@ aarch64_expand_vector_init (rtx target, rtx 
>> >> > vals)
>> >> >   and matches[X][1] with the count of duplicate elements (if X is 
>> >> > the
>> >> >   earliest element which has duplicates).  */
>> >> >
>> >> > -  if (n_var == n_elts && n_elts <= 16)
>> >> > +  if ((n_var >= n_elts - 1) && n_elts <= 16)
>> >>
>> >> No need for the extra brackets.
>> > Adjusted, thanks. Sorry if this sounds like a silly question, but why
>> > do we need the n_elts <= 16 check ?
>> > Won't n_elts be always <= 16 since max number of elements in a vector
>> > would be 16 for V16QI ?
>>
>> Was wondering the same thing :)
>>
>> Let's leave it though.
>>
>> >> >  {
>> >> >int matches[16][2] = {0};
>> >> >for (int i = 0; i < n_elts; i++)
>> >> > @@ -7,8 +7,26 @@ aarch64_expand_vector_init (rtx target, rtx 
>> >> > vals)
>> >> >vector register.  For big-endian we want that position to 
>> >> > hold
>> >> >the last element of VALS.  */
>> >> > maxelement 

Re: [aarch64] Use dup and zip1 for interleaving elements in initializing vector

2023-05-11 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-18.c 
> b/gcc/testsuite/gcc.target/aarch64/vec-init-18.c
> new file mode 100644
> index 000..598a51f17c6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vec-init-18.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3" } */
> +
> +#include 
> +
> +int16x8_t foo(int16_t x, int y)
> +{
> +  int16x8_t v = (int16x8_t) {x, y, x, y, x, y, x, y}; 
> +  return v;
> +}
> +
> +int16x8_t foo2(int16_t x) 
> +{
> +  int16x8_t v = (int16x8_t) {x, 1, x, 1, x, 1, x, 1}; 
> +  return v;
> +}
> +
> +/* { dg-final { scan-assembler-times {\tdup\tv[0-9]+\.4h, w[0-9]+} 3 } } */
> +/* { dg-final { scan-assembler {\tmovi\tv[0-9]+\.4h, 0x1} } } */
> +/* { dg-final { scan-assembler {\tzip1\tv[0-9]+\.8h, v[0-9]+\.8h, 
> v[0-9]+\.8h} } } */

Would be good to make this a scan-assembler-times ... 2.

OK with that change.  Thanks for doing this.

Richard


[PATCH] i386: Handle V4HI and V2SImode in ix86_widen_mult_cost [PR109807]

2023-05-11 Thread Uros Bizjak via Gcc-patches
Do not crash when asking ix86_widen_mult_cost for the cost of
a widening mul operation to V4HI or V2SImode.

gcc/ChangeLog:

PR target/109807
* config/i386/i386.cc (ix86_widen_mult_cost):
Handle V4HImode and V2SImode.

gcc/testsuite/ChangeLog:

PR target/109807
* gcc.target/i386/pr109807.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index b1d08ecdb3d..62fe06fdbaa 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20417,12 +20417,14 @@ ix86_widen_mult_cost (const struct processor_costs 
*cost,
   int basic_cost = 0;
   switch (mode)
 {
+case V4HImode:
 case V8HImode:
 case V16HImode:
   if (!uns_p || mode == V16HImode)
extra_cost = cost->sse_op * 2;
   basic_cost = cost->mulss * 2 + cost->sse_op * 4;
   break;
+case V2SImode:
 case V4SImode:
 case V8SImode:
   /* pmulhw/pmullw can be used.  */
diff --git a/gcc/testsuite/gcc.target/i386/pr109807.c 
b/gcc/testsuite/gcc.target/i386/pr109807.c
new file mode 100644
index 000..6380eb35312
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr109807.c
@@ -0,0 +1,4 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -msse4" } */
+
+#include "sse2-mmx-pmaddwd.c"


Re: [PATCH] c++: 'mutable' subobject of constexpr variable [PR109745]

2023-05-11 Thread Patrick Palka via Gcc-patches
On Thu, 11 May 2023, Patrick Palka wrote:

> r13-2701-g7107ea6fb933f1 made us correctly accept 'mutable' member
> accesses during constexpr evaluation of objects constructed during that
> evaluation, while continuing to reject such accesses for constexpr
> objects constructed outside of that evaluation, by considering the
> CONSTRUCTOR_MUTABLE_POISON flag during cxx_eval_component_reference.
> 
> However, this flag is set only for the outermost CONSTRUCTOR of a
> constexpr variable initializer, so if we're accessing a 'mutable'
> subobject within a nested CONSTRUCTOR, the flag won't be set and
> we'll incorrectly accept the access.  This can lead to us rejecting
> valid code, as in the first testcase, or even wrong code due to

d'oh, this should say "this can lead to us accepting invalid code"

> speculative constexpr evaluation as in the second and third testcase.
> 
> This patch fixes this by setting CONSTRUCTOR_MUTABLE_POISON recursively
> rather than only on the outermost CONSTRUCTOR.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
> trunk/13?
> 
>   PR c++/109745
> 
> gcc/cp/ChangeLog:
> 
>   * typeck2.cc (poison_mutable_constructors): Define.
>   (store_init_value): Use it instead of setting
>   CONSTRUCTOR_MUTABLE_POISON directly.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/cpp0x/constexpr-mutable4.C: New test.
>   * g++.dg/cpp0x/constexpr-mutable5.C: New test.
>   * g++.dg/cpp1y/constexpr-mutable2.C: New test.
> ---
>  gcc/cp/typeck2.cc | 26 +++--
>  .../g++.dg/cpp0x/constexpr-mutable4.C | 16 
>  .../g++.dg/cpp0x/constexpr-mutable5.C | 39 +++
>  .../g++.dg/cpp1y/constexpr-mutable2.C | 20 ++
>  4 files changed, 97 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-mutable4.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-mutable5.C
>  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-mutable2.C
> 
> diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
> index f5cc7c8371c..8a187708482 100644
> --- a/gcc/cp/typeck2.cc
> +++ b/gcc/cp/typeck2.cc
> @@ -776,6 +776,27 @@ split_nonconstant_init (tree dest, tree init)
>return code;
>  }
>  
> +/* T is the initializer of a constexpr variable.  Set 
> CONSTRUCTOR_MUTABLE_POISON
> +   for any CONSTRUCTOR within T that contains (directly or indirectly) a 
> mutable
> +   member, thereby poisoning it so it can't be copied to another a constexpr
> +   variable, or read during constexpr evaluation.  */
> +
> +static void
> +poison_mutable_constructors (tree t)
> +{
> +  if (TREE_CODE (t) != CONSTRUCTOR)
> +return;
> +
> +  if (cp_has_mutable_p (TREE_TYPE (t)))
> +{
> +  CONSTRUCTOR_MUTABLE_POISON (t) = true;
> +
> +  if (vec *elts = CONSTRUCTOR_ELTS (t))
> + for (const constructor_elt &ce : *elts)
> +   poison_mutable_constructors (ce.value);
> +}
> +}
> +
>  /* Perform appropriate conversions on the initial value of a variable,
> store it in the declaration DECL,
> and print any error messages that are appropriate.
> @@ -886,10 +907,7 @@ store_init_value (tree decl, tree init, vec va_gc>** cleanups, int flags)
>else
>   value = fold_non_dependent_init (value, tf_warning_or_error,
>/*manifestly_const_eval=*/true, decl);
> -  if (TREE_CODE (value) == CONSTRUCTOR && cp_has_mutable_p (type))
> - /* Poison this CONSTRUCTOR so it can't be copied to another
> -constexpr variable.  */
> - CONSTRUCTOR_MUTABLE_POISON (value) = true;
> +  poison_mutable_constructors (value);
>const_init = (reduced_constant_expression_p (value)
>   || error_operand_p (value));
>DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = const_init;
> diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable4.C 
> b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable4.C
> new file mode 100644
> index 000..01f32dea1bd
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable4.C
> @@ -0,0 +1,16 @@
> +// PR c++/109745
> +// { dg-do compile { target c++11 } }
> +
> +struct A { mutable int m = 0; };
> +
> +struct B { A a; };
> +
> +struct C { B b; };
> +
> +int main() {
> +  constexpr B b;
> +  constexpr int bam = b.a.m;// { dg-error "mutable" }
> +
> +  constexpr C c;
> +  constexpr int cbam = c.b.a.m; // { dg-error "mutable" }
> +}
> diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable5.C 
> b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable5.C
> new file mode 100644
> index 000..6a530e2abe6
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable5.C
> @@ -0,0 +1,39 @@
> +// PR c++/109745
> +// { dg-do run { target c++11 } }
> +// { dg-additional-options "-O" }
> +
> +struct A {
> +  mutable int m = 0;
> +  void f() const { ++m; };
> +  constexpr int get_m() const { return m; }
> +};
> +
> +struct B { A a; };
> +
> +struc

Re: [committed] Convert xstormy16 to LRA

2023-05-11 Thread Hans-Peter Nilsson via Gcc-patches
> Date: Thu, 11 May 2023 12:15:20 -0600
> From: Jeff Law 

> On 5/11/23 10:55, Paul Koning wrote:
> > 
> > 
> >> On May 11, 2023, at 11:05 AM, Hans-Peter Nilsson via Gcc-patches 
> >>  wrote:
> >>
> >> ...
> >> Yes, very interesting.  Thank you for sharing this.  I've
> >> seen regressions with LRA for CRIS too, for
> >> "double-register-sized" types, which for CRIS, a 32-bit
> >> target, translates to 64-bit types (DFmode and DImode), and
> >> where LRA does a much worse job than reload; spills a lot
> >> more often to stack, even after trying every
> >> register-allocation-related hook I found (and also an LRA
> >> patch which helped only by a fraction, but regressed results
> >> on x86_64-linux, so let's quickly forget it again).
> > 
> > That observation makes me a bit worried.  While CRIS may not be a priority 
> > platform, that description makes it sound like a case that would be 
> > significant in any 32 bit platform, which would include priority ones like 
> > i386 and ARM.
> If I understood things correctly, it seems to impact more when the 
> target exposes double-word patterns but doesn't actually have 
> instructions for those operations.  That's an implementation pattern 
> we've largely been moving away from over the last decade or so.

That description doesn't really match CRIS though.  The "ax"
prefix used in DImode patterns links the next instruction to
include the carry.  Thus better than an "open-coded"
version.  In comparison, CRIS doesn't define separable
patterns (anddi3, iordi3 etc.)

But, there's a movdi expander and splitter - with a long
reload-related comment.  Perhaps I can do away with that
even though having some arithmetic and compares in DImode.
Thanks for the hint.

brgds, H-P


[PATCH] c++: 'mutable' subobject of constexpr variable [PR109745]

2023-05-11 Thread Patrick Palka via Gcc-patches
r13-2701-g7107ea6fb933f1 made us correctly accept 'mutable' member
accesses during constexpr evaluation of objects constructed during that
evaluation, while continuing to reject such accesses for constexpr
objects constructed outside of that evaluation, by considering the
CONSTRUCTOR_MUTABLE_POISON flag during cxx_eval_component_reference.

However, this flag is set only for the outermost CONSTRUCTOR of a
constexpr variable initializer, so if we're accessing a 'mutable'
subobject within a nested CONSTRUCTOR, the flag won't be set and
we'll incorrectly accept the access.  This can lead to us rejecting
valid code, as in the first testcase, or even wrong code due to
speculative constexpr evaluation as in the second and third testcase.

This patch fixes this by setting CONSTRUCTOR_MUTABLE_POISON recursively
rather than only on the outermost CONSTRUCTOR.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/13?

PR c++/109745

gcc/cp/ChangeLog:

* typeck2.cc (poison_mutable_constructors): Define.
(store_init_value): Use it instead of setting
CONSTRUCTOR_MUTABLE_POISON directly.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-mutable4.C: New test.
* g++.dg/cpp0x/constexpr-mutable5.C: New test.
* g++.dg/cpp1y/constexpr-mutable2.C: New test.
---
 gcc/cp/typeck2.cc | 26 +++--
 .../g++.dg/cpp0x/constexpr-mutable4.C | 16 
 .../g++.dg/cpp0x/constexpr-mutable5.C | 39 +++
 .../g++.dg/cpp1y/constexpr-mutable2.C | 20 ++
 4 files changed, 97 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-mutable4.C
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-mutable5.C
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-mutable2.C

diff --git a/gcc/cp/typeck2.cc b/gcc/cp/typeck2.cc
index f5cc7c8371c..8a187708482 100644
--- a/gcc/cp/typeck2.cc
+++ b/gcc/cp/typeck2.cc
@@ -776,6 +776,27 @@ split_nonconstant_init (tree dest, tree init)
   return code;
 }
 
+/* T is the initializer of a constexpr variable.  Set 
CONSTRUCTOR_MUTABLE_POISON
+   for any CONSTRUCTOR within T that contains (directly or indirectly) a 
mutable
+   member, thereby poisoning it so it can't be copied to another a constexpr
+   variable, or read during constexpr evaluation.  */
+
+static void
+poison_mutable_constructors (tree t)
+{
+  if (TREE_CODE (t) != CONSTRUCTOR)
+return;
+
+  if (cp_has_mutable_p (TREE_TYPE (t)))
+{
+  CONSTRUCTOR_MUTABLE_POISON (t) = true;
+
+  if (vec *elts = CONSTRUCTOR_ELTS (t))
+   for (const constructor_elt &ce : *elts)
+ poison_mutable_constructors (ce.value);
+}
+}
+
 /* Perform appropriate conversions on the initial value of a variable,
store it in the declaration DECL,
and print any error messages that are appropriate.
@@ -886,10 +907,7 @@ store_init_value (tree decl, tree init, vec** 
cleanups, int flags)
   else
value = fold_non_dependent_init (value, tf_warning_or_error,
 /*manifestly_const_eval=*/true, decl);
-  if (TREE_CODE (value) == CONSTRUCTOR && cp_has_mutable_p (type))
-   /* Poison this CONSTRUCTOR so it can't be copied to another
-  constexpr variable.  */
-   CONSTRUCTOR_MUTABLE_POISON (value) = true;
+  poison_mutable_constructors (value);
   const_init = (reduced_constant_expression_p (value)
|| error_operand_p (value));
   DECL_INITIALIZED_BY_CONSTANT_EXPRESSION_P (decl) = const_init;
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable4.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable4.C
new file mode 100644
index 000..01f32dea1bd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable4.C
@@ -0,0 +1,16 @@
+// PR c++/109745
+// { dg-do compile { target c++11 } }
+
+struct A { mutable int m = 0; };
+
+struct B { A a; };
+
+struct C { B b; };
+
+int main() {
+  constexpr B b;
+  constexpr int bam = b.a.m;// { dg-error "mutable" }
+
+  constexpr C c;
+  constexpr int cbam = c.b.a.m; // { dg-error "mutable" }
+}
diff --git a/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable5.C 
b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable5.C
new file mode 100644
index 000..6a530e2abe6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/constexpr-mutable5.C
@@ -0,0 +1,39 @@
+// PR c++/109745
+// { dg-do run { target c++11 } }
+// { dg-additional-options "-O" }
+
+struct A {
+  mutable int m = 0;
+  void f() const { ++m; };
+  constexpr int get_m() const { return m; }
+};
+
+struct B { A a; };
+
+struct C { B b; };
+
+int main() {
+  constexpr A a;
+  a.m++;
+  if (a.get_m() != 1 || a.m != 1)
+__builtin_abort();
+  a.m++;
+  if (a.get_m() != 2 || a.m != 2)
+__builtin_abort();
+
+  constexpr B b;
+  b.a.m++;
+  if (b.a.get_m() != 1 || b.a.m != 1)
+__builtin_abort();
+  b.a.m++;
+  if (b.a.get_m() != 2 || b.a.m != 2)
+__builtin_abo

[PATCH] RISC-V: Add v_uimm_operand

2023-05-11 Thread Palmer Dabbelt
The vector shift immediates happen to have the same constraints as some
of the CSR-related operands, but it's a different usage.  This adds a
name for them, so I don't get confused again next time.

gcc/ChangeLog:

* config/riscv/autovec.md (shifts): Use v_uimm_operand.
* config/riscv/predicates.md (v_uimm_operand): New predicate.
---
I haven't even build tested this one, I just saw it when reviewing some
patch and figured I'd send it along.
---
 gcc/config/riscv/autovec.md| 2 +-
 gcc/config/riscv/predicates.md | 5 +
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index ac0c939d277..daad51abbc2 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -132,7 +132,7 @@ (define_expand "3"
   [(set (match_operand:VI 0 "register_operand")
 (any_shift:VI
  (match_operand:VI 1 "register_operand")
- (match_operand: 2 "csr_operand")))]
+ (match_operand: 2 "v_uimm_operand")))]
   "TARGET_VECTOR"
 {
   if (!CONST_SCALAR_INT_P (operands[2]))
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index e5adf06fa25..62007d6c6e3 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -43,6 +43,11 @@ (define_predicate "csr_operand"
   (ior (match_operand 0 "const_csr_operand")
(match_operand 0 "register_operand")))
 
+;; V has 32-bit unsigned immediates.  This happens to be the same constraint as
+;  the csr_operand, but it's not CSR related.
+(define_predicate "v_uimm_operand"
+  (match_operand 0 "csr_operand"))
+
 (define_predicate "sle_operand"
   (and (match_code "const_int")
(match_test "SMALL_OPERAND (INTVAL (op) + 1)")))
-- 
2.40.0



Re: [committed] Convert xstormy16 to LRA

2023-05-11 Thread Jeff Law via Gcc-patches




On 5/11/23 10:55, Paul Koning wrote:




On May 11, 2023, at 11:05 AM, Hans-Peter Nilsson via Gcc-patches 
 wrote:

...
Yes, very interesting.  Thank you for sharing this.  I've
seen regressions with LRA for CRIS too, for
"double-register-sized" types, which for CRIS, a 32-bit
target, translates to 64-bit types (DFmode and DImode), and
where LRA does a much worse job than reload; spills a lot
more often to stack, even after trying every
register-allocation-related hook I found (and also an LRA
patch which helped only by a fraction, but regressed results
on x86_64-linux, so let's quickly forget it again).


That observation makes me a bit worried.  While CRIS may not be a priority 
platform, that description makes it sound like a case that would be significant 
in any 32 bit platform, which would include priority ones like i386 and ARM.
If I understood things correctly, it seems to impact more when the 
target exposes double-word patterns but doesn't actually have 
instructions for those operations.  That's an implementation pattern 
we've largely been moving away from over the last decade or so.


Jeff


Re: [libgcc PATCH] Add bit reversal functions __bitrev[qhsd]i2.

2023-05-11 Thread Richard Sandiford via Gcc-patches
"Roger Sayle"  writes:
> This patch proposes adding run-time library support for bit reversal,
> by adding a __bitrevsi2 function to libgcc.  Thoughts/opinions?
>
> I'm also tempted to add __popcount[qh]i2 and __parity[qh]i2 to libgcc,
> to allow the RTL optimizers to perform narrowing operations, but I'm
> curious to hear whether QImode and HImode support, though more efficient,
> is frowned by the libgcc maintainers/philosophy.

I don't think RTL optimisers should be in the business of generating new
libcalls.  Wouldn't it have to be done in gimple and/or during expand?

> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32} and
> on nvptx-none, with no new regressions.  Ok for mainline?
>
>
> 2023-05-06  Roger Sayle  
>
> gcc/ChangeLog
> * doc/libgcc.texi (__bitrevqi2): Document bit reversal run-time
> functions; __bitrevqi2, __bitrevhi2, __bitrevsi2 and __bitrevdi2.
>
> libgcc/ChangeLog
> * Makfile.in (lib2funcs): Add __bitrev[qhsd]i2.
> * libgcc-std.ver.in (GCC_14.0.0): Add __bitrev[qhsd]i2.
> * libgcc2.c (__bitrevqi2): New function.
> (__bitrevhi2): Likewise.
> (__bitrevsi2): Likewise.
> (__bitrevdi2): Likewise.
> * libgcc2.h (__bitrevqi2): Prototype here.
> (__bitrevhi2): Likewise.
> (__bitrevsi2): Likewise.
> (__bitrevdi2): Likewise.
>
> Thanks in advance,
> Roger
> --
>
> diff --git a/gcc/doc/libgcc.texi b/gcc/doc/libgcc.texi
> index 73aa803..7611347 100644
> --- a/gcc/doc/libgcc.texi
> +++ b/gcc/doc/libgcc.texi
> @@ -218,6 +218,13 @@ These functions return the number of bits set in @var{a}.
>  These functions return the @var{a} byteswapped.
>  @end deftypefn
>  
> +@deftypefn {Runtime Function} int8_t __bitrevqi2 (int8_t @var{a})
> +@deftypefnx {Runtime Function} int16_t __bitrevhi2 (int16_t @var{a})
> +@deftypefnx {Runtime Function} int32_t __bitrevsi2 (int32_t @var{a})
> +@deftypefnx {Runtime Function} int64_t __bitrevdi2 (int64_t @var{a})
> +These functions return the bit reversed @var{a}.
> +@end deftypefn
> +
>  @node Soft float library routines
>  @section Routines for floating point emulation
>  @cindex soft float library
> diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
> index 6c4dc79..67c54df 100644
> --- a/libgcc/Makefile.in
> +++ b/libgcc/Makefile.in
> @@ -446,7 +446,7 @@ lib2funcs = _muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3 
> _cmpdi2 _ucmpdi2  \
>   _paritysi2 _paritydi2 _powisf2 _powidf2 _powixf2 _powitf2  \
>   _mulhc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3 _divsc3\
>   _divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2 _clrsbsi2  \
> - _clrsbdi2
> + _clrsbdi2 _bitrevqi2 _bitrevhi2 _bitrevsi2 _bitrevdi2
>  
>  # The floating-point conversion routines that involve a single-word integer.
>  # XX stands for the integer mode.
> diff --git a/libgcc/libgcc-std.ver.in b/libgcc/libgcc-std.ver.in
> index c4f87a5..2198b0e 100644
> --- a/libgcc/libgcc-std.ver.in
> +++ b/libgcc/libgcc-std.ver.in
> @@ -1944,3 +1944,12 @@ GCC_7.0.0 {
>__PFX__divmoddi4
>__PFX__divmodti4
>  }
> +
> +%inherit GCC_14.0.0 GCC_7.0.0
> +GCC_14.0.0 {
> +  # bit reversal functions
> +  __PFX__bitrevqi2
> +  __PFX__bitrevhi2
> +  __PFX__bitrevsi2
> +  __PFX__bitrevdi2
> +}
> diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
> index e0017d1..2bef2a1 100644
> --- a/libgcc/libgcc2.c
> +++ b/libgcc/libgcc2.c
> @@ -488,6 +488,54 @@ __bswapdi2 (DItype u)
> | (((u) & 0x00ffull) << 56));
>  }
>  #endif
> +
> +#ifdef L_bitrevqi2
> +QItype
> +__bitrevqi2 (QItype x)
> +{
> +  UQItype u = x;
> +  u = (((u) >> 1) & 0x55) | (((u) & 0x55) << 1);
> +  u = (((u) >> 2) & 0x33) | (((u) & 0x33) << 2);
> +  return ((u) >> 4) | ((u) << 4);
> +}
> +#endif
> +#ifdef L_bitrevhi2
> +HItype
> +__bitrevhi2 (HItype x)
> +{
> +  UHItype u = x;
> +  u = (((u) >> 1) & 0x) | (((u) & 0x) << 1);
> +  u = (((u) >> 2) & 0x) | (((u) & 0x) << 2);
> +  u = (((u) >> 4) & 0x0f0f) | (((u) & 0x0f0f) << 4);
> +  return ((u) >> 8) | ((u) << 8);
> +}
> +#endif
> +#ifdef L_bitrevsi2
> +SItype
> +__bitrevsi2 (SItype x)
> +{
> +  USItype u = x;
> +  u = (((u) >> 1) & 0x) | (((u) & 0x) << 1);
> +  u = (((u) >> 2) & 0x) | (((u) & 0x) << 2);
> +  u = (((u) >> 4) & 0x0f0f0f0f) | (((u) & 0x0f0f0f0f) << 4);
> +  return __bswapsi2 (u);

Would it be better to use __builtin_bswap32 here, so that targets
with bswap but not bitreverse still optimise the bswap part?
Same for the DI version.

Not sure how portable all this is, but the underlying assumptions
seem to be the same as for bswap.

Looks OK to me otherwise, but it should wait until something needs it
(and can test it).

Thanks,
Richard

> +}
> +#endif
> +#ifdef L_bitrevdi2
> +DItype
> +__bitrevdi2 (DItype x)
> +{
> +  UDItype u = x;
> +  u = (((u) >> 1) & 0xll)
> +  | (((u) & 0x5

Re: [PATCH v2] RISC-V: Split off shift patterns for autovectorization.

2023-05-11 Thread Palmer Dabbelt

On Thu, 11 May 2023 07:21:30 PDT (-0700), jeffreya...@gmail.com wrote:

On 5/11/23 04:33, Robin Dapp wrote:

"csr_operand" does seem wrong, though, as that just accepts constants.
Maybe "arith_operand" is the way to go?  I haven't looked at the
V immediates though.


I was pondering changing the shift-count operand to QImode everywhere
but that indeed does not help code generation across the board.  It can
still work but might require extra patterns here and there.

Yea.  It's a GCC wart and there hasn't ever been a clear best direction
on the mode for the shift count.  If you use QImode, as you note you
often end up having to add various patterns to avoid useless conversions
and such.


Yes, and I think given that we have so much weirdness for the sub-XLEN 
types in the RISC-V port we'd need to have a lot of fairly large 
patterns and some truncation-based fallbacks.  We've got some of those 
for the integer shifts already, though, so maybe it's the way to go?  

FWIW, I was trying to suggest X or REG as the shift amount and thought 
we'd done it that way for the integer shifts too.   I think we can 
reason about that with just some tiny code snippits, even if it's not 
the right way to go long term (as per below).  Probably a minor win, 
though, and I don't think it needs to block the patches.


Also: looks like I was wrong and "csr_operand" does the correct thing 
here because there's only a 5-bit immediate for the shift amounts.  We 
should probably name it something else, though, as this has nothing to 
do with CSRs...



I suspect QImode isn't ideal on a target like RV where we don't really
have QImode operations.  So all we do is force the introduction of
subregs all over the place to force the operand in to QImode.  It's
something I'd like to explore, but would obviously require a fair amount
of benchmarking to be able to confidently say which is better.


Folks have tried a few times and it's never ended up better.  I do think 
we're at a local minimum here, though -- ie, explicitly handling the 
shorter types would result in better generated code if we got everything 
right.  Gut feeling is that'd require a meaningful amount of middle-end 
work, though, as we're sufficiently different than MIPS here (and 
arm64/x86 have many of the ops).


Nobody in Rivos land is looking at this right now, though it's a pretty 
common red flag for new people and frequently trips up code gen so that 
might change with little notice...



Jeff


Re: [PATCH] Add RTX codes for BITREVERSE and COPYSIGN.

2023-05-11 Thread Richard Sandiford via Gcc-patches
"Roger Sayle"  writes:
> An analysis of backend UNSPECs reveals that two of the most common UNSPECs
> across target backends are for copysign and bit reversal.  This patch
> adds RTX codes for these expressions to allow their representation to
> be standardized, and them to optimized by the middle-end RTL optimizers.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-32} with
> no new failures.  Ok for mainline?
>
>
> 2023-05-06  Roger Sayle  
>
> gcc/ChangeLog
> * doc/rtl.texi (bitreverse, copysign): Document new RTX codes.
> * rtl.def (BITREVERSE, COPYSIGN): Define new RTX codes.
> * simplify-rtx.cc (simplify_unary_operation_1): Optimize
> NOT (BITREVERSE x) as BITREVERSE (NOT x).
> Optimize POPCOUNT (BITREVERSE x) as POPCOUNT x.
> Optimize PARITY (BITREVERSE x) as PARITY x.
> Optimize BITREVERSE (BITREVERSE x) as x.
> (simplify_const_unary_operation) : Evaluate
> BITREVERSE of a constant integer at compile-time.
> (simplify_binary_operation_1) :  Optimize
> COPY_SIGN (x, x) as x.  Optimize COPYSIGN (x, C) as ABS x
> or NEG (ABS x) for constant C.  Optimize COPYSIGN (ABS x, y)
> and COPYSIGN (NEG x, y) as COPYSIGN (x, y).  Optimize
> COPYSIGN (x, ABS y) as ABS x.
> Optimize COPYSIGN (COPYSIGN (x, y), z) as COPYSIGN (x, z).
> Optimize COPYSIGN (x, COPYSIGN (y, z)) as COPYSIGN (x, z).
> (simplify_const_binary_operation): Evaluate COPYSIGN of constant
> arguments at compile-time.
> * wide-int.cc (wide_int_storage::bitreverse): Provide a
> wide_int implementation, based upon bswap implementation.
> * wide-int.g (wide_int_storage::bitreverse): Prototype here.
>
>
> Thanks in advance,
> Roger
> --
>
> diff --git a/gcc/doc/rtl.texi b/gcc/doc/rtl.texi
> index 1de2494..76aeafb 100644
> --- a/gcc/doc/rtl.texi
> +++ b/gcc/doc/rtl.texi
> @@ -2742,6 +2742,17 @@ integer of mode @var{m}.  The mode of @var{x} must be 
> @var{m} or
>  Represents the value @var{x} with the order of bytes reversed, carried out
>  in mode @var{m}, which must be a fixed-point machine mode.
>  The mode of @var{x} must be @var{m} or @code{VOIDmode}.
> +
> +@findex bitreverse
> +@item (bitreverse:@var{m} @var{x})
> +Represents the value @var{x} with the order of bits reversed, carried out
> +in mode @var{m}, which must be a fixed-point machine mode.
> +The mode of @var{x} must be @var{m} or @code{VOIDmode}.
> +
> +@findex copysign
> +@item (copysign:@var{m} @var{x} @var{y})
> +Represents the value @var{x} with the sign of @var{y}.
> +Both @var{x} and @var{y} must have floating point machine mode @var{m}.
>  @end table
>  
>  @node Comparisons
> diff --git a/gcc/rtl.def b/gcc/rtl.def
> index 6ddbce3..88e2b19 100644
> --- a/gcc/rtl.def
> +++ b/gcc/rtl.def
> @@ -664,6 +664,9 @@ DEF_RTL_EXPR(POPCOUNT, "popcount", "e", RTX_UNARY)
>  /* Population parity (number of 1 bits modulo 2).  */
>  DEF_RTL_EXPR(PARITY, "parity", "e", RTX_UNARY)
>  
> +/* Reverse bits.  */
> +DEF_RTL_EXPR(BITREVERSE, "bitreverse", "e", RTX_UNARY)
> +
>  /* Reference to a signed bit-field of specified size and position.
> Operand 0 is the memory unit (usually SImode or QImode) which
> contains the field's first bit.  Operand 1 is the width, in bits.
> @@ -753,6 +756,9 @@ DEF_RTL_EXPR(US_TRUNCATE, "us_truncate", "e", RTX_UNARY)
>  /* Floating point multiply/add combined instruction.  */
>  DEF_RTL_EXPR(FMA, "fma", "eee", RTX_TERNARY)
>  
> +/* Floating point copysign.  Operand 0 with the sign of operand 1.  */
> +DEF_RTL_EXPR(COPYSIGN, "copysign", "ee", RTX_BIN_ARITH)
> +
>  /* Information about the variable and its location.  */
>  DEF_RTL_EXPR(VAR_LOCATION, "var_location", "te", RTX_EXTRA)
>  
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index d4aeebc..26fa2b9 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -1040,10 +1040,10 @@ simplify_context::simplify_unary_operation_1 
> (rtx_code code, machine_mode mode,
>   }
>  
>/* (not (bswap x)) -> (bswap (not x)).  */
> -  if (GET_CODE (op) == BSWAP)
> +  if (GET_CODE (op) == BSWAP || GET_CODE (op) == BITREVERSE)
>   {
> rtx x = simplify_gen_unary (NOT, mode, XEXP (op, 0), mode);
> -   return simplify_gen_unary (BSWAP, mode, x, mode);
> +   return simplify_gen_unary (GET_CODE (op), mode, x, mode);
>   }
>break;
>  
> @@ -1419,6 +1419,7 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
> code, machine_mode mode,
>switch (GET_CODE (op))
>   {
>   case BSWAP:
> + case BITREVERSE:
> /* (popcount (bswap )) = (popcount ).  */
> return simplify_gen_unary (POPCOUNT, mode, XEXP (op, 0),
>GET_MODE (XEXP (op, 0)));
> @@ -1448,6 +1449,7 @@ simplify_context::simplify_unary_operation_1 (rtx_code 
> code, machine_mode mode,
>

Re: [PATCH] Improve simple_dce for phis that only used in itself

2023-05-11 Thread Richard Biener via Gcc-patches



> Am 11.05.2023 um 17:18 schrieb Andrew Pinski via Gcc-patches 
> :
> 
> While I was looking at differences before and after
> r14-569-g21e2ef2dc25de3, I noticed that one phi node was
> not being removed.
> For an example, while compiling combine.cc, in expand_field_assignment,
> we would remove `# pos_51 = PHI `
> but we don't any more since pos_51 has more than zero users
> but in this case it is only itself.
> This patch improves simple_dce_from_worklist to detect that
> case and now we able to remove this phi statement again.
> 
> OK? Bootstrapped and tested on x86_64-linux-gnu.
> 
> gcc/ChangeLog:
> 
>* tree-ssa-dce.cc (simple_dce_from_worklist): For ssa names
>defined by a phi node with more than one uses, allow for the
>only uses are in that same defining statement.
> ---
> gcc/tree-ssa-dce.cc | 31 +--
> 1 file changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
> index 6554b5db03e..045c64a9c02 100644
> --- a/gcc/tree-ssa-dce.cc
> +++ b/gcc/tree-ssa-dce.cc
> @@ -2107,9 +2107,36 @@ simple_dce_from_worklist (bitmap worklist, bitmap 
> need_eh_cleanup)
>   unsigned i = bitmap_clear_first_set_bit (worklist);
> 
>   tree def = ssa_name (i);
> -  /* Removed by somebody else or still in use.  */
> +  /* Removed by somebody else or still in use.
> + Note use in itself for a phi node is not counted as still in use.  */
>   if (! def || ! has_zero_uses (def))
> -continue;
> +{
> +
> +  if (!def)
> +continue;

Please split the guarding if and handle this separately.  Ok with that change.

Richard 

> +
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (def);
> +  if (gimple_code (def_stmt) != GIMPLE_PHI)
> +continue;
> +
> +  gimple *use_stmt;
> +  imm_use_iterator use_iter;
> +  bool canremove = true;
> +
> +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, def)
> +{
> +  /* Ignore debug statements. */
> +  if (is_gimple_debug (use_stmt))
> +continue;
> +  if (use_stmt != def_stmt)
> +{
> +  canremove = false;
> +  break;
> +}
> +}
> +  if (!canremove)
> +continue;
> +}
> 
>   gimple *t = SSA_NAME_DEF_STMT (def);
>   if (gimple_has_side_effects (t))
> -- 
> 2.31.1
> 


Re: [committed] Convert xstormy16 to LRA

2023-05-11 Thread Paul Koning via Gcc-patches



> On May 11, 2023, at 11:05 AM, Hans-Peter Nilsson via Gcc-patches 
>  wrote:
> 
> ...
> Yes, very interesting.  Thank you for sharing this.  I've
> seen regressions with LRA for CRIS too, for
> "double-register-sized" types, which for CRIS, a 32-bit
> target, translates to 64-bit types (DFmode and DImode), and
> where LRA does a much worse job than reload; spills a lot
> more often to stack, even after trying every
> register-allocation-related hook I found (and also an LRA
> patch which helped only by a fraction, but regressed results
> on x86_64-linux, so let's quickly forget it again).

That observation makes me a bit worried.  While CRIS may not be a priority 
platform, that description makes it sound like a case that would be significant 
in any 32 bit platform, which would include priority ones like i386 and ARM.

If that's true, I wonder about dropping Reload.  While I understand it's been 
years since LRA was first introduced, wouldn't we even so want to go by the 
rule that a newer replacement mechanism doesn't replace an older one  until the 
replacement demonstrates comparable or better output compared with the older 
one?

paul




libgo patch committed: Add syscall.prlimit

2023-05-11 Thread Ian Lance Taylor via Gcc-patches
As of https://go.dev/cl/476695 the package golang.org/x/sys/unix
expects a syscall.prlimit function to exist.  This libgo patch adds
that function.  This is for https://go.dev/issue/46279 and
https://go.dev/issue/59712.  Since this is a small patch and is needed
to compile the widely used x/sys/unix package, committed to tip and to
GCC 11, 12, and 13 branches.

Ian
ba8160449c646138a3a9e1723ac1db0716a8b103
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index e133650ad91..702257009d2 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-0411a2733fd468e69f1998edd91e8fe3ba40ff9e
+737de90a63002d4872b19772a7116404ee5815b4
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/syscall/libcall_linux.go 
b/libgo/go/syscall/libcall_linux.go
index 19ae4393cf1..03ca7261b59 100644
--- a/libgo/go/syscall/libcall_linux.go
+++ b/libgo/go/syscall/libcall_linux.go
@@ -189,6 +189,14 @@ func Gettid() (tid int) {
 //sys  PivotRoot(newroot string, putold string) (err error)
 //pivot_root(newroot *byte, putold *byte) _C_int
 
+// Used by golang.org/x/sys/unix.
+//sys  prlimit(pid int, resource int, newlimit *Rlimit, oldlimit *Rlimit) (err 
error)
+//prlimit(pid Pid_t, resource _C_int, newlimit *Rlimit, oldlimit *Rlimit) 
_C_int
+
+func Prlimit(pid int, resource int, newlimit *Rlimit, oldlimit *Rlimit) error {
+   return prlimit(pid, resource, newlimit, oldlimit)
+}
+
 //sys  Removexattr(path string, attr string) (err error)
 //removexattr(path *byte, name *byte) _C_int
 


Re: [libstdc++] use strtold for from_chars even without locale

2023-05-11 Thread Jonathan Wakely via Gcc-patches
On Thu, 11 May 2023 at 17:04, Patrick Palka  wrote:

> On Fri, 5 May 2023, Jonathan Wakely wrote:
>
> >
> >
> > On Fri, 5 May 2023 at 10:43, Florian Weimer wrote:
> >   * Jonathan Wakely via Libstdc:
> >
> >   > We could use strtod for a single-threaded target (i.e.
> >   > !defined(_GLIBCXX_HAS_GTHREADS) by changing the global locale
> using
> >   > setlocale, instead of changing the per-thread locale using
> uselocale.
> >
> >   This is not generally safe because the call to setlocale is still
> >   observable to applications in principle because a previous pointer
> >   returned from setlocale they have store could be invalidated.
> >
> >
> > Ah yes, good point, thanks. I think that's a non-starter then. I still
> think using RAII makes the from_chars_impl function easier to read, so
> here's a version of that patch without the single-threaded
> > conditions.
> >
> > commit 4dc5b8864ec527e699d35880fbc706157113f92b
> > Author: Jonathan Wakely 
> > Date:   Thu May 4 15:22:07 2023
> >
> > libstdc++: Use RAII types in strtod-based std::from_chars
> implementation
> >
> > This adds auto_locale and auto_ferounding types to use RAII for
> changing
> > and restoring the local and floating-point environment when using
> strtod
> > to implement std::from_chars.
> >
> > The destructors for the RAII objects run slightly later than the
> > previous statements that restored the locale/fenv, but the
> differences
> > are just some trivial assignments and an isinf call.
> >
> > libstdc++-v3/ChangeLog:
> >
> > * src/c++17/floating_from_chars.cc
> [USE_STRTOD_FOR_FROM_CHARS]
> > (auto_locale, auto_ferounding): New class types.
> > (from_chars_impl): Use auto_locale and auto_ferounding.
> >
> > diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc
> b/libstdc++-v3/src/c++17/floating_from_chars.cc
> > index 78b9d92cdc0..7b3bdf445e3 100644
> > --- a/libstdc++-v3/src/c++17/floating_from_chars.cc
> > +++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
> > @@ -597,6 +597,69 @@ namespace
> >  return buf.c_str();
> >}
> >
> > +  // RAII type to change and restore the locale.
> > +  struct auto_locale
> > +  {
> > +#if _GLIBCXX_HAVE_USELOCALE
> > +// When we have uselocale we can change the current thread's locale.
> > +locale_t loc;
> > +locale_t orig;
>
> It's not a big deal, but we could consider making these members const
> too, like in auto_ferounding.
>

Done for loc, but not for orig (which is currently init'd in the ctor body).


>
> LGTM.  I noticed sprintf_ld from floating_to_chars.cc could benefit from
> auto_ferounding as well.
>

Ah yes. Maybe we should share the class, so we don't have two different
types with internal linkage, and two RTTI definitions etc.

For now I'll just push this patch, and make a note to reuse auto_ferounding
in the other file later.

Thanks for the review.



>
> > +
> > +auto_locale()
> > +: loc(::newlocale(LC_ALL_MASK, "C", (locale_t)0))
> > +{
> > +  if (loc)
> > + orig = ::uselocale(loc);
> > +  else
> > + ec = errc{errno};
> > +}
> > +
> > +~auto_locale()
> > +{
> > +  if (loc)
> > + {
> > +   ::uselocale(orig);
> > +   ::freelocale(loc);
> > + }
> > +}
> > +#else
> > +// Otherwise, we can't change the locale and so strtod can't be
> used.
> > +auto_locale() = delete;
> > +#endif
> > +
> > +explicit operator bool() const noexcept { return ec == errc{}; }
> > +
> > +errc ec{};
> > +
> > +auto_locale(const auto_locale&) = delete;
> > +auto_locale& operator=(const auto_locale&) = delete;
> > +  };
> > +
> > +  // RAII type to change and restore the floating-point environment.
> > +  struct auto_ferounding
> > +  {
> > +#if _GLIBCXX_USE_C99_FENV_TR1 && defined(FE_TONEAREST)
> > +const int rounding = std::fegetround();
> > +
> > +auto_ferounding()
> > +{
> > +  if (rounding != FE_TONEAREST)
> > + std::fesetround(FE_TONEAREST);
> > +}
> > +
> > +~auto_ferounding()
> > +{
> > +  if (rounding != FE_TONEAREST)
> > + std::fesetround(rounding);
> > +}
> > +#else
> > +auto_ferounding() = default;
> > +#endif
> > +
> > +auto_ferounding(const auto_ferounding&) = delete;
> > +auto_ferounding& operator=(const auto_ferounding&) = delete;
> > +  };
> > +
> >// Convert the NTBS `str` to a floating-point value of type `T`.
> >// If `str` cannot be converted, `value` is unchanged and `0` is
> returned.
> >// Otherwise, let N be the number of characters consumed from `str`.
> > @@ -607,16 +670,11 @@ namespace
> >ptrdiff_t
> >from_chars_impl(const char* str, T& value, errc& ec) noexcept
> >{
> > -if (locale_t loc = ::newlocale(LC_ALL_MASK, "C", (locale_t)0))
> [[likely]]
> > +auto_locale loc;
> > +
> > +if (loc)
> >{
> > - locale_t orig = ::uselocale(loc);
> > -
> > -#if _GLIBCXX_USE_C99_FENV_TR1 && de

RE: [PATCH 01/24] arm: [MVE intrinsics] factorize vaddlvaq

2023-05-11 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Christophe Lyon 
> Sent: Thursday, May 11, 2023 1:19 PM
> To: gcc-patches@gcc.gnu.org; Kyrylo Tkachov ;
> Richard Earnshaw ; Richard Sandiford
> 
> Cc: Christophe Lyon 
> Subject: [PATCH 01/24] arm: [MVE intrinsics] factorize vaddlvaq
> 
> Factorize vaddlvaq builtins so that they use parameterized names.

This series is ok (the changes look quite regular throughout).
Thanks,
Kyrill

> 
> 2022-10-25  Christophe Lyon  
> 
>   gcc/
>   * config/arm/iterators.md (mve_insn): Add vaddlva.
>   * config/arm/mve.md (mve_vaddlvaq_v4si): Rename into ...
>   (@mve_q_v4si): ... this.
>   (mve_vaddlvaq_p_v4si): Rename into ...
>   (@mve_q_p_v4si): ... this.
> ---
>  gcc/config/arm/iterators.md | 2 ++
>  gcc/config/arm/mve.md   | 8 
>  2 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
> index 2f6de937ef7..ff146afd913 100644
> --- a/gcc/config/arm/iterators.md
> +++ b/gcc/config/arm/iterators.md
> @@ -759,6 +759,8 @@ (define_int_attr mve_insn [
>(VABDQ_S "vabd") (VABDQ_U "vabd") (VABDQ_F "vabd")
>(VABSQ_M_F "vabs")
>(VABSQ_M_S "vabs")
> +  (VADDLVAQ_P_S "vaddlva") (VADDLVAQ_P_U "vaddlva")
> +  (VADDLVAQ_S "vaddlva") (VADDLVAQ_U "vaddlva")
>(VADDLVQ_P_S "vaddlv") (VADDLVQ_P_U "vaddlv")
>(VADDLVQ_S "vaddlv") (VADDLVQ_U "vaddlv")
>(VADDQ_M_N_S "vadd") (VADDQ_M_N_U "vadd")
> (VADDQ_M_N_F "vadd")
> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
> index f5cb8ef48ef..b548eced4f5 100644
> --- a/gcc/config/arm/mve.md
> +++ b/gcc/config/arm/mve.md
> @@ -1222,7 +1222,7 @@ (define_insn "@mve_q_f"
>  ;;
>  ;; [vaddlvaq_s vaddlvaq_u])
>  ;;
> -(define_insn "mve_vaddlvaq_v4si"
> +(define_insn "@mve_q_v4si"
>[
> (set (match_operand:DI 0 "s_register_operand" "=r")
>   (unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
> @@ -1230,7 +1230,7 @@ (define_insn "mve_vaddlvaq_v4si"
>VADDLVAQ))
>]
>"TARGET_HAVE_MVE"
> -  "vaddlva.32\t%Q0, %R0, %q2"
> +  ".32\t%Q0, %R0, %q2"
>[(set_attr "type" "mve_move")
>  ])
> 
> @@ -2534,7 +2534,7 @@ (define_insn "@mve_q_m_f"
>  ;;
>  ;; [vaddlvaq_p_s vaddlvaq_p_u])
>  ;;
> -(define_insn "mve_vaddlvaq_p_v4si"
> +(define_insn "@mve_q_p_v4si"
>[
> (set (match_operand:DI 0 "s_register_operand" "=r")
>   (unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
> @@ -2543,7 +2543,7 @@ (define_insn "mve_vaddlvaq_p_v4si"
>VADDLVAQ_P))
>]
>"TARGET_HAVE_MVE"
> -  "vpst\;vaddlvat.32\t%Q0, %R0, %q2"
> +  "vpst\;t.32\t%Q0, %R0, %q2"
>[(set_attr "type" "mve_move")
> (set_attr "length""8")])
>  ;;
> --
> 2.34.1



[PATCH V5] VECT: Add decrement IV support in Loop Vectorizer

2023-05-11 Thread juzhe . zhong
From: Ju-Zhe Zhong 

1. Fix document description according Jeff && Richard.
2. Add LOOP_VINFO_USING_SELECT_VL_P for single rgroup.
3. Add LOOP_VINFO_USING_SLP_ADJUSTED_LEN_P for SLP multiple rgroup.

gcc/ChangeLog:

* doc/md.texi: Add seletc_vl pattern.
* internal-fn.def (SELECT_VL): New ifn.
* optabs.def (OPTAB_D): New optab.
* tree-vect-loop-manip.cc (vect_adjust_loop_lens): New function.
(vect_set_loop_controls_by_select_vl): Ditto.
(vect_set_loop_condition_partial_vectors): Add loop control for 
decrement IV.
* tree-vect-loop.cc (vect_get_loop_len): Adjust loop len for SLP.
* tree-vect-stmts.cc (get_select_vl_data_ref_ptr): New function.
(vectorizable_store): Support data reference IV added by outcome of 
SELECT_VL.
(vectorizable_load): Ditto.
* tree-vectorizer.h (LOOP_VINFO_USING_SELECT_VL_P): New macro.
(LOOP_VINFO_USING_SLP_ADJUSTED_LEN_P): Ditto.
(vect_get_loop_len): Adjust loop len for SLP.

---
 gcc/doc/md.texi |  36 
 gcc/internal-fn.def |   1 +
 gcc/optabs.def  |   1 +
 gcc/tree-vect-loop-manip.cc | 380 +++-
 gcc/tree-vect-loop.cc   |  29 ++-
 gcc/tree-vect-stmts.cc  |  79 +++-
 gcc/tree-vectorizer.h   |  12 +-
 7 files changed, 524 insertions(+), 14 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8ebce31ba78..a94ffc4456d 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,42 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
 @end smallexample
 
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of active elements in a vector to be updated 
+in a loop iteration based on the total number of elements to be updated, 
+the vectorization factor and vector properties of the target.
+operand 1 is the total elements in the vector to be updated.
+operand 2 is the vectorization factor.
+The value of operand 0 is target dependent and flexible in each iteration.
+The operation of this pattern can be:
+
+@smallexample
+Case 1:
+operand0 = MIN (operand1, operand2);
+operand2 can be const_poly_int or poly_int related to vector mode size.
+Some target like RISC-V has a standalone instruction to get MIN (n, MODE SIZE) 
so
+that we can reduce a use of general purpose register.
+
+In this case, only the last iteration of the loop is partial iteration.
+@end smallexample
+
+@smallexample
+Case 2:
+if (operand1 <= operand2)
+  operand0 = operand1;
+else if (operand1 < 2 * operand2)
+  operand0 = ceil (operand1 / 2);
+else
+  operand0 = operand2;
+
+This case will evenly distribute work over the last 2 iterations of a 
stripmine loop.
+@end smallexample
+
+The output of this pattern is not only used as IV of loop control counter, but 
also
+is used as the IV of address calculation with multiply/shift operation. This 
allows
+dynamic adjustment of the number of elements processed each loop iteration.
+
 @cindex @code{check_raw_ptrs@var{m}} instruction pattern
 @item @samp{check_raw_ptrs@var{m}}
 Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7fe742c2ae7..6f6fa7d37f9 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -153,6 +153,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
 DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
 
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
 DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 695f5911b30..b637471b76e 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -476,3 +476,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
 OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
 OPTAB_D (len_load_optab, "len_load_$a")
 OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index ff6159e08d5..81334f4f171 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -385,6 +385,353 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
rgroup_controls *dest_rgm,
   return false;
 }
 
+/* Try to use adjust loop lens for non-SLP multiple-rgroups.
+
+ _36 = MIN_EXPR ;
+
+ First length (MIN (X, VF/N)):
+   loop_len_15 = MIN_EXPR <_36, POLY_INT_CST [2, 2]>;
+
+ Second length (X - MIN (X, 1 * VF/N)):
+   loop_len_16 = _36 - loop_len_15;
+
+ Third length (X - MIN (X, 2 * VF/N)):
+   _38 = MIN_EXPR <_36, POLY_INT_CST [4, 4]>;
+   loop_len_17 = _36 - _38;
+
+ Forth length (X - MIN (X, 3 * VF/N)):
+   _3

Re: Question on patch -fprofile-partial-training

2023-05-11 Thread Qing Zhao via Gcc-patches


> On May 10, 2023, at 9:15 AM, Jan Hubicka  wrote:
> 
>> Honza,
>>> Main motivation for this was profiling programs that contain specific
>>> code paths for different CPUs (such as graphics library in Firefox or Linux
>>> kernel). In the situation training machine differs from the machine
>>> program is run later, we end up optimizing for size all code paths
>>> except ones taken by the specific CPU.  This patch essentially tells gcc
>>> to consider every non-trained function as built without profile
>>> feedback.
>> Make sense.
>>> 
>>> For Firefox it had important impact on graphics rendering tests back
>>> then since the building machined had AVX while the benchmarking did not.
>>> Some benchmarks improved several times which is not a surprise if you
>>> consider tight graphics rendering loop optimized for size versus
>>> vectorized one.  
>> 
>> That’s a lot of improvement. So, without -fprofile-partial-training, the PGO 
>> hurt the performance for those cases? 
> 
> Yes, to get code size improvements we assume that the non-trained part
> of code is cold and with -Os we are very aggressive to optimize for
> size.  We now have two-level optimize_for size, so I think we could
> make this more fine grained this stage1.

Okay. I see. 

Thanks a lot for the info.

Another question (which is confusing us very much right now is):

When we lower the following  parameter from 999 to 950: (in GCC8)

DEFPARAM(HOT_BB_COUNT_WS_PERMILLE,
 "hot-bb-count-ws-permille",
 "A basic block profile count is considered hot if it contributes to "
 "the given permillage of the entire profiled execution.”
 999, 0, 1000)

The size of the “text.hot" section is 4x times SMALLER than the default one. Is 
this expected behavior? 
(From my reading of the GCC8 source code, when this parameter is getting 
smaller, more basic blocks and functions will
Be considered as HOT by GCC, then the text.hot section should be larger, not 
smaller, do I miss anything here?)

Thanks a lot for your help.

Qing

> 
> Honza
>> 
>>> The patch has bad effect on code size which in turn
>>> impacts performance too, so I think it makes sense to use
>>> -fprofile-partial-training with bit of care (i.e. only one code where
>>> such scenarios are likely).
>> 
>> Right. 
>>> 
>>> As for backporting, I do not have checkout of GCC 8 right now. It
>>> depends on profile infrastructure that was added in 2017 (so stage1 of
>>> GCC 8), so the patch may backport quite easilly.  I am not 100% sure
>>> what shape the infrastrucure was in the first version, but I am quite
>>> convinced it had the necessary bits - it was able to make the difference
>>> between 0 profile count and missing profile feedback.
>> 
>> This is good to know, I will try to back port to GCC8 and let them test to 
>> see any good impact.
>> 
>> Qing
>>> 
>>> Honza



Re: [RFC] libstdc++: Do not use pthread_mutex_clocklock with ThreadSanitizer

2023-05-11 Thread Jonathan Wakely via Gcc-patches
On Thu, 11 May 2023 at 16:54, Thomas Rodgers  wrote:

>
>
> On Thu, May 11, 2023 at 5:21 AM Mike Crowe via Libstdc++ <
> libstd...@gcc.gnu.org> wrote:
>
>> On Wednesday 10 May 2023 at 12:31:12 +0100, Jonathan Wakely wrote:
>> > On Wed, 10 May 2023 at 12:20, Jonathan Wakely via Libstdc++ <
>> > libstd...@gcc.gnu.org> wrote:
>> >
>> > > This patch would avoid TSan false positives when using timed waiting
>> > > functions on mutexes and condvars, but as noted below, it changes the
>> > > semantics.
>> > >
>> > > I'm not sure whether we want this workaround in place until tsan gets
>> > > fixed.
>> > >
>> > > On one hand, there's no guarantee that those functions use the right
>> > > clock anyway (and they won't do unless a recent-ish glibc is used).
>> But
>> > > on the other hand, if they normally would use the right clock because
>> > > you have glibc support, it's not ideal for tsan to cause a different
>> > > clock to be used.
>> > >
>> >
>> > But of course, it's not ideal to get false positives from tsan either
>> > (especially when it looks like a libstdc++ bug, as initially reported to
>> > me).
>>
>> I think that this is probably the least-worst option in the short term. As
>> TSan is distributed with GCC this workaround can be removed as soon as its
>> TSan implementation gains the necessary interceptors. I shall look into
>> trying to do that.
>>
>>
> I don't have a strong opinion either way on this, but I think documenting
> the TSAN suppressions is the option most in keeping with the principle of
> Least Astonishment.
>

That assumes anybody reads the docs :-)
Getting TSan errors from the std::lib is somewhat astonishing. The errors
could be avoided, at the risk of subtle timing differences between
tsanitized and un-tsanitized builds ... but won't there be subtle diffs
anyway based on the TSan overhead? Admittedly those will just be fairly
constant overhead, and so immune to system clock adjustments.


Re: [libstdc++] use strtold for from_chars even without locale

2023-05-11 Thread Patrick Palka via Gcc-patches
On Fri, 5 May 2023, Jonathan Wakely wrote:

> 
> 
> On Fri, 5 May 2023 at 10:43, Florian Weimer wrote:
>   * Jonathan Wakely via Libstdc:
> 
>   > We could use strtod for a single-threaded target (i.e.
>   > !defined(_GLIBCXX_HAS_GTHREADS) by changing the global locale using
>   > setlocale, instead of changing the per-thread locale using uselocale.
> 
>   This is not generally safe because the call to setlocale is still
>   observable to applications in principle because a previous pointer
>   returned from setlocale they have store could be invalidated.
> 
> 
> Ah yes, good point, thanks. I think that's a non-starter then. I still think 
> using RAII makes the from_chars_impl function easier to read, so here's a 
> version of that patch without the single-threaded
> conditions.
> 
> commit 4dc5b8864ec527e699d35880fbc706157113f92b
> Author: Jonathan Wakely 
> Date:   Thu May 4 15:22:07 2023
> 
> libstdc++: Use RAII types in strtod-based std::from_chars implementation
> 
> This adds auto_locale and auto_ferounding types to use RAII for changing
> and restoring the local and floating-point environment when using strtod
> to implement std::from_chars.
> 
> The destructors for the RAII objects run slightly later than the
> previous statements that restored the locale/fenv, but the differences
> are just some trivial assignments and an isinf call.
> 
> libstdc++-v3/ChangeLog:
> 
> * src/c++17/floating_from_chars.cc [USE_STRTOD_FOR_FROM_CHARS]
> (auto_locale, auto_ferounding): New class types.
> (from_chars_impl): Use auto_locale and auto_ferounding.
> 
> diff --git a/libstdc++-v3/src/c++17/floating_from_chars.cc 
> b/libstdc++-v3/src/c++17/floating_from_chars.cc
> index 78b9d92cdc0..7b3bdf445e3 100644
> --- a/libstdc++-v3/src/c++17/floating_from_chars.cc
> +++ b/libstdc++-v3/src/c++17/floating_from_chars.cc
> @@ -597,6 +597,69 @@ namespace
>  return buf.c_str();
>}
>  
> +  // RAII type to change and restore the locale.
> +  struct auto_locale
> +  {
> +#if _GLIBCXX_HAVE_USELOCALE
> +// When we have uselocale we can change the current thread's locale.
> +locale_t loc;
> +locale_t orig;

It's not a big deal, but we could consider making these members const
too, like in auto_ferounding.

LGTM.  I noticed sprintf_ld from floating_to_chars.cc could benefit from
auto_ferounding as well.

> +
> +auto_locale()
> +: loc(::newlocale(LC_ALL_MASK, "C", (locale_t)0))
> +{
> +  if (loc)
> + orig = ::uselocale(loc);
> +  else
> + ec = errc{errno};
> +}
> +
> +~auto_locale()
> +{
> +  if (loc)
> + {
> +   ::uselocale(orig);
> +   ::freelocale(loc);
> + }
> +}
> +#else
> +// Otherwise, we can't change the locale and so strtod can't be used.
> +auto_locale() = delete;
> +#endif
> +
> +explicit operator bool() const noexcept { return ec == errc{}; }
> +
> +errc ec{};
> +
> +auto_locale(const auto_locale&) = delete;
> +auto_locale& operator=(const auto_locale&) = delete;
> +  };
> +
> +  // RAII type to change and restore the floating-point environment.
> +  struct auto_ferounding
> +  {
> +#if _GLIBCXX_USE_C99_FENV_TR1 && defined(FE_TONEAREST)
> +const int rounding = std::fegetround();
> +
> +auto_ferounding()
> +{
> +  if (rounding != FE_TONEAREST)
> + std::fesetround(FE_TONEAREST);
> +}
> +
> +~auto_ferounding()
> +{
> +  if (rounding != FE_TONEAREST)
> + std::fesetround(rounding);
> +}
> +#else
> +auto_ferounding() = default;
> +#endif
> +
> +auto_ferounding(const auto_ferounding&) = delete;
> +auto_ferounding& operator=(const auto_ferounding&) = delete;
> +  };
> +
>// Convert the NTBS `str` to a floating-point value of type `T`.
>// If `str` cannot be converted, `value` is unchanged and `0` is returned.
>// Otherwise, let N be the number of characters consumed from `str`.
> @@ -607,16 +670,11 @@ namespace
>ptrdiff_t
>from_chars_impl(const char* str, T& value, errc& ec) noexcept
>{
> -if (locale_t loc = ::newlocale(LC_ALL_MASK, "C", (locale_t)0)) [[likely]]
> +auto_locale loc;
> +
> +if (loc)
>{
> - locale_t orig = ::uselocale(loc);
> -
> -#if _GLIBCXX_USE_C99_FENV_TR1 && defined(FE_TONEAREST)
> - const int rounding = std::fegetround();
> - if (rounding != FE_TONEAREST)
> -   std::fesetround(FE_TONEAREST);
> -#endif
> -
> + auto_ferounding rounding;
>   const int save_errno = errno;
>   errno = 0;
>   char* endptr;
> @@ -647,14 +705,6 @@ namespace
>  #endif
>   const int conv_errno = std::__exchange(errno, save_errno);
>  
> -#if _GLIBCXX_USE_C99_FENV_TR1 && defined(FE_TONEAREST)
> - if (rounding != FE_TONEAREST)
> -   std::fesetround(rounding);
> -#endif
> -
> - ::uselocale(orig);
> - ::freelocale(loc);
> -
>   const ptrdiff_t n = endptr - str;
>   i

Re: [RFC] libstdc++: Do not use pthread_mutex_clocklock with ThreadSanitizer

2023-05-11 Thread Thomas Rodgers via Gcc-patches
On Thu, May 11, 2023 at 5:21 AM Mike Crowe via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> On Wednesday 10 May 2023 at 12:31:12 +0100, Jonathan Wakely wrote:
> > On Wed, 10 May 2023 at 12:20, Jonathan Wakely via Libstdc++ <
> > libstd...@gcc.gnu.org> wrote:
> >
> > > This patch would avoid TSan false positives when using timed waiting
> > > functions on mutexes and condvars, but as noted below, it changes the
> > > semantics.
> > >
> > > I'm not sure whether we want this workaround in place until tsan gets
> > > fixed.
> > >
> > > On one hand, there's no guarantee that those functions use the right
> > > clock anyway (and they won't do unless a recent-ish glibc is used). But
> > > on the other hand, if they normally would use the right clock because
> > > you have glibc support, it's not ideal for tsan to cause a different
> > > clock to be used.
> > >
> >
> > But of course, it's not ideal to get false positives from tsan either
> > (especially when it looks like a libstdc++ bug, as initially reported to
> > me).
>
> I think that this is probably the least-worst option in the short term. As
> TSan is distributed with GCC this workaround can be removed as soon as its
> TSan implementation gains the necessary interceptors. I shall look into
> trying to do that.
>
>
I don't have a strong opinion either way on this, but I think documenting
the TSAN suppressions is the option most in keeping with the principle of
Least Astonishment.


> However, ...
>
> > > diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> > > index 89e7f5f5f45..e2700b05ec3 100644
> > > --- a/libstdc++-v3/acinclude.m4
> > > +++ b/libstdc++-v3/acinclude.m4
> > > @@ -4284,7 +4284,7 @@ AC_DEFUN([GLIBCXX_CHECK_PTHREAD_COND_CLOCKWAIT],
> [
> > >[glibcxx_cv_PTHREAD_COND_CLOCKWAIT=no])
> > >])
> > >if test $glibcxx_cv_PTHREAD_COND_CLOCKWAIT = yes; then
> > > -AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT, 1, [Define if
> > > pthread_cond_clockwait is available in .])
> > > +AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT, (_GLIBCXX_TSAN==0),
> > > [Define if pthread_cond_clockwait is available in .])
> > >fi
>
> TSan does appear to have an interceptor for pthread_cond_clockwait, even if
> it lacks the others. Does this mean that this part is unnecessary?
>
> See: https://github.com/google/sanitizers/issues/1259
>
> Thanks.
>
> Mike.
>
>


[wwwdocs] Document libstdc++ freestanding changes in gcc-13

2023-05-11 Thread Jonathan Wakely via Gcc-patches
Pushed to wwwdocs (better late than never).

-- >8 --

---
 htdocs/gcc-13/changes.html | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/htdocs/gcc-13/changes.html b/htdocs/gcc-13/changes.html
index bd022ed2..39414e18 100644
--- a/htdocs/gcc-13/changes.html
+++ b/htdocs/gcc-13/changes.html
@@ -412,6 +412,20 @@ You may also want to check out our
   Support for the 
   header from v2 of the Concurrency Technical Specification.
   
+  Support for many previously unavailable features in freestanding mode,
+  thanks to Arsen Arsenović. For example, std::tuple is
+  now available for freestanding compilation. The freestanding subset
+  contains all the components made freestanding by
+  https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1642r11.html";>P1642,
+  but libstdc++ adds more components to the freestanding subset,
+  such as std::array and std::string_view.
+  Additionally, libstdc++ now respects the -ffreestanding
+  compiler option and so it is not necessary to build a separate
+  freestanding installation of libstdc++.  Compiling with
+  -ffreestanding will restrict the available features to
+  the freestanding subset, even if libstdc++ was built as a full, hosted
+  implementation.
+  
 
 
 
-- 
2.40.1



[PATCH1/2] PR gcc/98350:Add a param to control the length of the chain with FMA in reassoc pass

2023-05-11 Thread Cui, Lili via Gcc-patches
From: Lili Cui 

Add a param for the chain with FMA in reassoc pass to make it more friendly to
the fma pass later. First to detect if this chain has ability to
generate more than 2 FMAs,if yes and param_reassoc_max_chain_length_with_fma
is enabled, We will rearrange the ops so that they can be combined into more
FMAs. When the chain length exceeds param_reassoc_max_chain_length_with_fma,
build parallel chains according to given association width and try to keep FMA
opportunity as much as possible.

TEST1:

float
foo (float a, float b, float c, float d, float *e)
{
   return  *e  + a * b + c * d ;
}

For -Ofast -march=icelake-server  GCC generates:
vmulss  %xmm3, %xmm2, %xmm2
vfmadd132ss %xmm1, %xmm2, %xmm0
vaddss  (%rdi), %xmm0, %xmm0
ret

with "--param=reassoc-max-chain-length-with-fma=3" GCC generates:
vfmadd213ss   (%rdi), %xmm1, %xmm0
vfmadd231ss   %xmm2, %xmm3, %xmm0
ret

gcc/ChangeLog:

PR gcc/98350
* params.opt (reassoc-max-fma-chain-length): New param.
* tree-ssa-reassoc.cc
(rewrite_expr_tree_parallel_for_fma): New.
(rank_ops_for_fma): Ditto.
(reassociate_bb): Handle new function.

gcc/testsuite/ChangeLog:

PR gcc/98350
* gcc.dg/pr98350-1.c: New test.
* gcc.dg/pr98350-2.c: Ditto.
---
 gcc/params.opt   |   4 +
 gcc/testsuite/gcc.dg/pr98350-1.c |  31 +
 gcc/testsuite/gcc.dg/pr98350-2.c |  17 +++
 gcc/tree-ssa-reassoc.cc  | 226 ---
 4 files changed, 262 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr98350-1.c
 create mode 100644 gcc/testsuite/gcc.dg/pr98350-2.c

diff --git a/gcc/params.opt b/gcc/params.opt
index 823cdb2ff85..f7c719afe64 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -1182,4 +1182,8 @@ The maximum factor which the loop vectorizer applies to 
the cost of statements i
 Common Joined UInteger Var(param_vect_induction_float) Init(1) IntegerRange(0, 
1) Param Optimization
 Enable loop vectorization of floating point inductions.
 
+-param=reassoc-max-chain-length-with-fma=
+Common Joined UInteger Var(param_reassoc_max_chain_length_with_fma) Init(1) 
IntegerRange(1, 65536) Param Optimization
+The maximum chain length with fma considered in reassociation pass.
+
 ; This comment is to ensure we retain the blank line above.
diff --git a/gcc/testsuite/gcc.dg/pr98350-1.c b/gcc/testsuite/gcc.dg/pr98350-1.c
new file mode 100644
index 000..265e0e57a49
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr98350-1.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mfpmath=sse -mfma 
--param=reassoc-max-chain-length-with-fma=8 -Wno-attributes " } */
+
+/* Test that the compiler properly optimizes multiply and add 
+   to generate more FMA instructions.  */
+#define N 1024
+double a[N];
+double b[N];
+double c[N];
+double d[N];
+double e[N];
+double f[N];
+double g[N];
+double h[N];
+double j[N];
+double k[N];
+double l[N];
+double m[N];
+double o[N];
+double p[N];
+
+
+void
+foo (void)
+{
+  for (int i = 0; i < N; i++)
+  {
+a[i] += b[i] * c[i] + d[i] * e[i] + f[i] * g[i] + h[i] * j[i] + k[i] * 
l[i] + m[i]* o[i] + p[i];
+  }
+}
+/* { dg-final { scan-assembler-times "vfm" 6  } } */
diff --git a/gcc/testsuite/gcc.dg/pr98350-2.c b/gcc/testsuite/gcc.dg/pr98350-2.c
new file mode 100644
index 000..246025d43b8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr98350-2.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -mfpmath=sse -mfma 
--param=reassoc-max-chain-length-with-fma=6 -Wno-attributes " } */
+
+/* Test that the compiler properly build parallel chains according to given
+   association width and try to keep FMA opportunity as much as possible.  */
+#define N 33
+double a[N];
+
+void
+foo (void)
+{
+  a[32] = a[0] *a[1] + a[2] * a[3] + a[4] * a[5] + a[6] * a[7] + a[8] * a[9]
++ a[10] * a[11] + a[12] * a[13] + a[14] * a[15] + a[16] * a[17]
++ a[18] * a[19] + a[20] * a[21] + a[22] * a[23] + a[24] + a[25]
++ a[26] + a[27] + a[28] + a[29] + a[30] + a[31];
+}
+/* { dg-final { scan-assembler-times "vfm" 12  } } */
diff --git a/gcc/tree-ssa-reassoc.cc b/gcc/tree-ssa-reassoc.cc
index 067a3f07f7e..f8c70ccadab 100644
--- a/gcc/tree-ssa-reassoc.cc
+++ b/gcc/tree-ssa-reassoc.cc
@@ -54,6 +54,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-ssa-reassoc.h"
 #include "tree-ssa-math-opts.h"
 #include "gimple-range.h"
+#include "internal-fn.h"
 
 /*  This is a simple global reassociation pass.  It is, in part, based
 on the LLVM pass of the same name (They do some things more/less
@@ -5468,6 +5469,114 @@ get_reassociation_width (int ops_num, enum tree_code 
opc,
   return width;
 }
 
+/* Rewrite statements with dependency chain with regard to the chance to
+   generate FMA. When the dependency chain length exceeds
+   param_max_reassoc_chain_length_with_fma, build parallel chains according to
+   given association width and 

RE: [PATCH 1/2] PR gcc/98350:Add a param to control the length of the chain with FMA in reassoc pass

2023-05-11 Thread Cui, Lili via Gcc-patches
> -Original Message-
> From: Richard Biener 
> Sent: Thursday, May 11, 2023 6:53 PM
> To: Cui, Lili 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH 1/2] PR gcc/98350:Add a param to control the length of
> the chain with FMA in reassoc pass

Hi Richard,
Thanks for helping to review the patch.

> 
> As you are not changing the number of ops you should be able to use
> quick_push here and below.  You should be able to do
> 
>  ops->splice (ops_mult);
>  ops->splice (ops_others);
> 
> as well.
> 
Done.

> > + /* When enabling param_reassoc_max_chain_length_with_fma
> to
> > +keep the chain with fma, rank_ops_for_fma will detect 
> > if
> > +the chain has fmas and if so it will rearrange the 
> > ops.  */
> > + if (param_reassoc_max_chain_length_with_fma > 1
> > + && direct_internal_fn_supported_p (IFN_FMA,
> > +TREE_TYPE (lhs),
> > +opt_type)
> > + && (rhs_code == PLUS_EXPR || rhs_code == MINUS_EXPR))
> > +   {
> > + keep_fma_chain = rank_ops_for_fma(&ops);
> > +   }
> > +
> > + int len = ops.length ();
> >   /* Only rewrite the expression tree to parallel in the
> >  last reassoc pass to avoid useless work back-and-forth
> >  with initial linearization.  */
> 
> we are doing the parallel rewrite only in the last reassoc pass, i think it 
> makes
> sense to do the same for reassoc-for-fma.

I rearranged the order of ops in reassoc1 without break the chain, it generated 
more vectorize during vector pass( seen in benchmark 503). So I rewrite the ssa 
tree and keep the chain with function "rewrite_expr_tree" in reassoc1, break 
the chain with "rewrite_expr_tree_parallel_for_fma" in reassoc2.

> 
> Why do the existing expr rewrites not work after re-sorting the ops?

For case https://godbolt.org/z/3x9PWE9Kb:  we put  "j" at first.

j + l * m + a * b + c * d + e * f + g * h;

GCC trunk: width = 2, ops_num = 6, old function " rewrite_expr_tree_parallel " 
generates 3 FMAs.
---
  _1 = l_10(D) * m_11(D);
  _3 = a_13(D) * b_14(D);
  _4 = j_12(D) + _3;> Here is one FMA.
  _5 = c_15(D) * d_16(D);
  _8 = _1 + _5;> Here is one FMA and lost one.
  _7 = e_17(D) * f_18(D);
  _9 = g_19(D) * h_20(D);
  _2 = _7 + _9;   > Here is one FMA and lost one.
  _6 = _2 + _4;
  _21 = _6 + _8;
  # VUSE <.MEM_22(D)>
  return _21;
--
width = 2, ops_num = 6, new function " rewrite_expr_tree_parallel_for_fma " 
generates 4 FMAs.
--
_1 = a_10(D) * b_11(D);
  _3 = c_13(D) * d_14(D);
  _5 = e_15(D) * f_16(D);
  _7 = g_17(D) * h_18(D);
  _4 = _5 + _7;   > Here is one FMA and lost one.
  _8 = _4 + _1;   > Here is one FMA.
  _9 = l_19(D) * m_20(D);
  _2 = _9 + j_12(D);> Here is one FMA.
  _6 = _2 + _3;> Here is one FMA.
  _21 = _8 + _6; 
  return _21;



> 
> >   if (!reassoc_insert_powi_p
> > - && ops.length () > 3
> > + && len > 3
> > + && (!keep_fma_chain
> > + || (keep_fma_chain
> > + && len >
> > + param_reassoc_max_chain_length_with_fma))
> 
> in the case len < param_reassoc_max_chain_length_with_fma we have the
> chain re-sorted but fall through to non-parallel rewrite.  I wonder if we do
> not want to instead adjust the reassociation width?  I'd say it depends on the
> number of mult cases in the chain (sth the re-sorting could have computed).
> Why do we have two completely independent --params here?  Can you give
> an example --param value combination that makes "sense" and show how it
> is beneficial?

For this small case https://godbolt.org/z/Pxczrre8P
a * b + c * d + e * f  + j

GCC trunk: ops_num = 4, targetm.sched.reassociation_width is 4 (scalar fp cost 
is 4). Calculated: Width = 2. we can get 2 FMAs.
--
  _1 = a_6(D) * b_7(D);
  _2 = c_8(D) * d_9(D);
  _5 = _1 + _2;
  _4 = e_10(D) * f_11(D);
  _3 = _4 + j_12(D);
  _13 = _3 + _5;

  _2 = c_8(D) * d_9(D);
  _5 = .FMA (a_6(D), b_7(D), _2);
  _3 = .FMA (e_10(D), f_11(D), j_12(D));
  _13 = _3 + _5;

New patch: If just rearrange ops and fall through to parallel rewrite to break 
the chain with width = 2.

-

[PATCH] Improve simple_dce for phis that only used in itself

2023-05-11 Thread Andrew Pinski via Gcc-patches
While I was looking at differences before and after
r14-569-g21e2ef2dc25de3, I noticed that one phi node was
not being removed.
For an example, while compiling combine.cc, in expand_field_assignment,
we would remove `# pos_51 = PHI `
but we don't any more since pos_51 has more than zero users
but in this case it is only itself.
This patch improves simple_dce_from_worklist to detect that
case and now we able to remove this phi statement again.

OK? Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-dce.cc (simple_dce_from_worklist): For ssa names
defined by a phi node with more than one uses, allow for the
only uses are in that same defining statement.
---
 gcc/tree-ssa-dce.cc | 31 +--
 1 file changed, 29 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-ssa-dce.cc b/gcc/tree-ssa-dce.cc
index 6554b5db03e..045c64a9c02 100644
--- a/gcc/tree-ssa-dce.cc
+++ b/gcc/tree-ssa-dce.cc
@@ -2107,9 +2107,36 @@ simple_dce_from_worklist (bitmap worklist, bitmap 
need_eh_cleanup)
   unsigned i = bitmap_clear_first_set_bit (worklist);
 
   tree def = ssa_name (i);
-  /* Removed by somebody else or still in use.  */
+  /* Removed by somebody else or still in use.
+Note use in itself for a phi node is not counted as still in use.  */
   if (! def || ! has_zero_uses (def))
-   continue;
+   {
+
+ if (!def)
+   continue;
+
+ gimple *def_stmt = SSA_NAME_DEF_STMT (def);
+ if (gimple_code (def_stmt) != GIMPLE_PHI)
+   continue;
+
+ gimple *use_stmt;
+ imm_use_iterator use_iter;
+ bool canremove = true;
+
+ FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, def)
+   {
+ /* Ignore debug statements. */
+ if (is_gimple_debug (use_stmt))
+   continue;
+ if (use_stmt != def_stmt)
+   {
+ canremove = false;
+ break;
+   }
+   }
+ if (!canremove)
+   continue;
+   }
 
   gimple *t = SSA_NAME_DEF_STMT (def);
   if (gimple_has_side_effects (t))
-- 
2.31.1



Re: [committed] Convert xstormy16 to LRA

2023-05-11 Thread Hans-Peter Nilsson via Gcc-patches
> From: "Roger Sayle" 
> Date: Tue, 2 May 2023 00:37:14 +0100

> Jeff Law wrote:
> > This patch converts the xstormy16 patch to LRA.  It introduces a code 
> > quality regression in the shiftsi testcase, but it also fixes numerous 
> > aborts/errors.  IMHO it's a good tradeoff.
> 
> I've investigated the shiftsi regression on xstormy16 and the underlying
> cause
> appears to be an interaction between lower-subreg's "subreg3" pass and the
> new LRA.  Previously, reload was not phased by the "clobbers" that are 
> introduced by the decompose_multiword_subregs function, but they appear
> to interfere with LRA's register assignments.
> 
> combine's make_extra_copies introduces a new pseudo-to-pseudo move,
> but when subreg3 inserts a naked clobber between the original and the
> new move, LRA is recombine theses pseudos back to the same allocno.
> 
> The shiftsi.cc regression on xstormy16 is fixed by adding
> -fno-split-wide-types.
> In fact, if all the regression tests pass, I'd suggest that
> flag_split_wide-types = false
> should be the default on xstormy16 now that we've moved to LRA.  And if this
> works for xstormy16, it might be useful to other targets for the LRA
> transition;
> it's a difference in behaviour between reload and LRA that could potentially
> affect multiple targets.
> 
> For reference, xstormy16 has a post-reload define_insn_and_split for movsi
> (i.e. a multi-word move).  If this insn was split during split1 (i.e. before
> subreg3)
> there wouldn't be a problem (no clobber), but alas the target's
> xstormy16_split_move
> function has several asserts insisting this only get called when
> reload_completed.
> 
> I hope this is useful.
> Cheers,
> Roger

Yes, very interesting.  Thank you for sharing this.  I've
seen regressions with LRA for CRIS too, for
"double-register-sized" types, which for CRIS, a 32-bit
target, translates to 64-bit types (DFmode and DImode), and
where LRA does a much worse job than reload; spills a lot
more often to stack, even after trying every
register-allocation-related hook I found (and also an LRA
patch which helped only by a fraction, but regressed results
on x86_64-linux, so let's quickly forget it again).

No fix or nicely stated bug entry yet, but at least a
different observation:

Coremark for cris-elf built with -O2 -march=v10, when going
from reload to LRA is slightly faster but a bit bigger (for
example before/after Jeffs r14-383-gfaf8bea79b6256, 5090593
to 5090567 cycles and 48887 to 48901 bytes), a relative
observation which has not changed much since February when I
started working on an LRA transition for CRIS.

But, the case for code with heavy use of "double-register-
sized" types is much worse; up to several percent slower.
My favorite sharable example is
gcc/testsuite/gcc.c-torture/execute/arith-rand-ll.c
(with a few unimportant local tweaks not suitable for
upstreaming but which I'm happy to share with anyone asking)
which around that commit goes from 1295021 to 1317531 cycles
(101.74%) and one percent larger; 4008 to 4048 bytes.

Your suggestion to default to -fno-split-wide-types seemed
too good to be true, and though worth a try, unfortunately
it was.  I'm seeing *horrible* regressions for
double-register codes with the patch below on top of LRA.
Coremark numbers suffer too (different baseline here than
above; closer to today's sources) from 5078989 to 5081968
cycles and from 48537 to 50145 bytes.

But, arith-rand-ll suffers much more: from 1317530 to
2182080 cycles (yes, 165.62%) and from 4044 to 4174 bytes.
(With reload, it's bad too, but "only" regressing 143.67% by
speed.)

Next, I'll turn around completely, and try defaulting to
-fsplit-wide-types-early, which sounds more promising. :)
I don't like throwing defaults around randomly, but trying
out a promising idea this way is easy.

So because of the numbers above, this will never be
committed, just passed for reference.  I believe this is the
correct way to default to -fno-split-wide-types:

-- >8 --
[PATCH] CRIS: Default to -fno-split-wide-types

* common/config/cris/cris-common.cc (cris_option_optimization_table):
New.  Default to -fno-split-wide-types.
(TARGET_OPTION_OPTIMIZATION_TABLE): Define.
---
 gcc/common/config/cris/cris-common.cc | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/gcc/common/config/cris/cris-common.cc 
b/gcc/common/config/cris/cris-common.cc
index b08d6014102d..cf00c1414651 100644
--- a/gcc/common/config/cris/cris-common.cc
+++ b/gcc/common/config/cris/cris-common.cc
@@ -26,6 +26,14 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "flags.h"
 
+/* Implement TARGET_OPTION_OPTIMIZATION_TABLE.  */
+
+static const struct default_options cris_option_optimization_table[] =
+  {
+{ OPT_LEVELS_1_PLUS, OPT_fsplit_wide_types, NULL, 0 },
+{ OPT_LEVELS_NONE, 0, NULL, 0 }
+  };
+
 /* TARGET_HANDLE_OPTION worker.  We just store the values into local
variables here.  Checks for correct semantics

Re: [PATCH] mklog.py: Add --commit option.

2023-05-11 Thread Jeff Law via Gcc-patches




On 5/11/23 02:29, Robin Dapp via Gcc-patches wrote:

Hi,

this patch allows mklog.py to be called with a commit hash directly.
So, instead of

  git show  | git gcc-mklog

  git gcc-mklog --commit 

can be used.

When no  is given but --commit is specified, HEAD is used
instead.  The behavior without --commit is the same as before.

Is that useful/OK?  I find that option a bit easier to work with.

Regards
  Robin

contrib/ChangeLog:

* mklog.py:  Add optional --commit  argument.
Seems reasonable to me and probably works better with the flows some 
people are using :-)


Jeff


[pushed] c++: Add testcase for already fixed PR [PR103807]

2023-05-11 Thread Patrick Palka via Gcc-patches
We accept this testcase since r13-806-g221acd67ca50f8.

PR c++/103807

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/lambda-targ1.C: New test.
---
 gcc/testsuite/g++.dg/cpp2a/lambda-targ1.C | 11 +++
 1 file changed, 11 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/lambda-targ1.C

diff --git a/gcc/testsuite/g++.dg/cpp2a/lambda-targ1.C 
b/gcc/testsuite/g++.dg/cpp2a/lambda-targ1.C
new file mode 100644
index 000..07fa6f9bc19
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/lambda-targ1.C
@@ -0,0 +1,11 @@
+// PR c++/103807
+// { dg-do compile { target c++20 } }
+
+template
+struct A { };
+
+A x;
+
+int main() {
+  A y;
+}
-- 
2.40.1.552.g91428f078b



[x86_64 PATCH] PR middle-end/109766: Prevent cprop_hardreg bloating code with -Os.

2023-05-11 Thread Roger Sayle

PR 109766 is an interesting case of large code being generated on x86_64,
caused by an interaction/conflict between register allocation and hardreg
cprop, that's tricky to fix/resolve within the middle-end.

The task/challenge is to push a DImode value in an SSE register on to
the stack, when optimizing for size.  GCC's register allocator makes
the optimal choice to move the SSE register to a GPR, and then use push.
So after reload we have:

(insn 46 3 4 2 (set (reg:DF 1 dx [101])
(reg:DF 21 xmm1 [ D1 ])) "pr109766.c":15:74 151 {*movdf_internal}
 (nil))
(insn 28 27 29 2 (set (mem:DF (pre_dec:DI (reg/f:DI 7 sp)) [0  S8 A64])
(reg:DF 1 dx [101])) "pr109766.c":16:5 142 {*pushdf}
 (expr_list:REG_ARGS_SIZE (const_int 56 [0x38])
(nil)))

which corresponds to the short 6 byte sequence:
66 48 0f 7e ca  movq   %xmm1,%rdx  [5 bytes]
52  push   %rdx[1 byte]


The problem is that several passes later, after pro_and_epilogue has
determined that the function doesn't need a stack frame, that the
hard register cprop pass sees the above two instructions, including
the initial register to register move, and decides to "simplify" it
as:

(insn 68 67 69 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0  S8 A64])
(reg:DI 21 xmm1 [101])) "pr109766.c":16:5 62 {*pushdi2_rex64}
 (expr_list:REG_ARGS_SIZE (const_int 56 [0x38])
(nil)))

but as x86_64 doesn't directly support push from SSE registers, the
above is split during split3 into:

(insn 92 91 93 2 (set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int -8 [0xfff8]))) "pr109766.c":16:5 247
{*leadi}
 (expr_list:REG_ARGS_SIZE (const_int 56 [0x38])
(nil)))
(insn 93 92 94 2 (set (mem:DI (reg/f:DI 7 sp) [0  S8 A64])
(reg:DI 21 xmm1 [101])) "pr109766.c":16:5 88 {*movdi_internal}
 (nil))

which corresponds to the bigger 10 byte sequence:

48 8d 64 24 f8  lea-0x8(%rsp),%rsp  [5 bytes]
66 0f d6 0c 24  movq   %xmm1,(%rsp) [5 bytes]


Clearly the cprop_hardreg substitution is questionable with -Os, but how
to prevent it is a challenge.  One (labor intensive) approach might be
to have regcprop.cc query the target's rtx_costs before performing
this type of substitution, which only works if the backend is
sufficiently parameterized.  Unfortunately, i386 like many targets
defines the rtx_cost of (set (dst) (src)) to be rtx_cost(dst) +
rtx_cost(src), which misses the subtlety of pushing an SSE register
to the stack.

An alternate solution, which can be implemented entirely in the
backend, is to prevent *pushdi2_rex64 being recognized (by
cprop_hardreg) with an SSE hard register operand after reload
when optimizing for size.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?


2023-05-11  Roger Sayle  

gcc/ChangeLog
PR middle-end/109766
* config/i386/i386.md (*pushdi_rex64): Disallow SSE registers
after reload when optimizing for size.
(*pushsi2_rex64): Likewise.
(*pushsi2): Likewise.

gcc/testsuite/ChangeLog
PR middle-end/109766
* gcc.target/i386/pr109766.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5a064f3..bfa5378 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2036,7 +2036,10 @@
 (define_insn "*pushdi2_rex64"
   [(set (match_operand:DI 0 "push_operand" "=<,<,!<")
(match_operand:DI 1 "general_no_elim_operand" "re*m,*v,n"))]
-  "TARGET_64BIT"
+  "TARGET_64BIT
+   && (!reload_completed
+   || !SSE_REG_P (operands[1])
+   || !optimize_insn_for_size_p ())"
   "@
push{q}\t%1
#
@@ -2079,7 +2082,10 @@
 (define_insn "*pushsi2_rex64"
   [(set (match_operand:SI 0 "push_operand" "=X,X")
(match_operand:SI 1 "nonmemory_no_elim_operand" "re,*v"))]
-  "TARGET_64BIT"
+  "TARGET_64BIT
+   && (!reload_completed
+   || !SSE_REG_P (operands[1])
+   || !optimize_insn_for_size_p ())"
   "@
push{q}\t%q1
#"
@@ -2089,7 +2095,10 @@
 (define_insn "*pushsi2"
   [(set (match_operand:SI 0 "push_operand" "=<,<")
(match_operand:SI 1 "general_no_elim_operand" "ri*m,*v"))]
-  "!TARGET_64BIT"
+  "!TARGET_64BIT
+   && (!reload_completed
+   || !SSE_REG_P (operands[1])
+   || !optimize_insn_for_size_p ())"
   "@
push{l}\t%1
#"
diff --git a/gcc/testsuite/gcc.target/i386/pr109766.c 
b/gcc/testsuite/gcc.target/i386/pr109766.c
new file mode 100644
index 000..e29f615
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr109766.c
@@ -0,0 +1,21 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-Os" } */
+#define $expr(...) (__extension__({__VA_ARGS__;}))
+#define $regF0 $expr(register double x __asm("xmm0"); x)
+#define $regF1 $expr(register double x __asm("xmm1"); x)
+#define $regF2 $expr(register doubl

Re: [PATCH v2] RISC-V: Split off shift patterns for autovectorization.

2023-05-11 Thread Jeff Law via Gcc-patches




On 5/11/23 04:33, Robin Dapp wrote:

"csr_operand" does seem wrong, though, as that just accepts constants.
Maybe "arith_operand" is the way to go?  I haven't looked at the
V immediates though.


I was pondering changing the shift-count operand to QImode everywhere
but that indeed does not help code generation across the board.  It can
still work but might require extra patterns here and there.
Yea.  It's a GCC wart and there hasn't ever been a clear best direction 
on the mode for the shift count.  If you use QImode, as you note you 
often end up having to add various patterns to avoid useless conversions 
and such.


I suspect QImode isn't ideal on a target like RV where we don't really 
have QImode operations.  So all we do is force the introduction of 
subregs all over the place to force the operand in to QImode.  It's 
something I'd like to explore, but would obviously require a fair amount 
of benchmarking to be able to confidently say which is better.


Jeff


Re: [PATCH v2] RISC-V: Allow vector constants in riscv_const_insns.

2023-05-11 Thread Kito Cheng via Gcc-patches
LGTM, thanks :)

On Thu, May 11, 2023 at 8:47 PM Robin Dapp  wrote:
>
> > OK, you can go ahead commit patch. I am gonna send another patch to
> > fix this.
> I agree that we should handle more constants but I'd still rather go
> ahead now and fix things later.  The patch is more about the test
> rather than the actual change anyway.
>
> Jeff already ack'ed v1, maybe waiting for Kito's OK to push still.
>
> (Minor) changes from v1:
>  - Rebase vs Juzhe's patch
>  - Change test format to match binops.
>
>
> This patch adds various vector constants to riscv_const_insns in order
> for them to be properly recognized as immediate operands.  This then
> allows to emit vmv.v.i instructions via autovectorization.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.cc (riscv_const_insns): Add permissible
> vector constants.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c: New test.
> * gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c: New test.
> * gcc.target/riscv/rvv/autovec/vmv-imm-template.h: New test.
> * gcc.target/riscv/rvv/autovec/vmv-imm-run.c: New test.
> ---
>  gcc/config/riscv/riscv.cc |  7 +++
>  .../riscv/rvv/autovec/vmv-imm-run.c   | 57 +++
>  .../riscv/rvv/autovec/vmv-imm-rv32.c  |  6 ++
>  .../riscv/rvv/autovec/vmv-imm-rv64.c  |  6 ++
>  .../riscv/rvv/autovec/vmv-imm-template.h  | 54 ++
>  5 files changed, 130 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-template.h
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 8f032250b0f..de578b5b899 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -1291,6 +1291,13 @@ riscv_const_insns (rtx x)
> return 1;
>   }
>   }
> +   /* Constants from -16 to 15 can be loaded with vmv.v.i.
> +  The Wc0, Wc1 constraints are already covered by the
> +  vi constraint so we do not need to check them here
> +  separately.  */
> +   else if (TARGET_VECTOR && satisfies_constraint_vi (x))
> + return 1;
> +
> /* TODO: We may support more const vector in the future.  */
> return x == CONST0_RTX (GET_MODE (x)) ? 1 : 0;
>}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c
> new file mode 100644
> index 000..309a296b686
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c
> @@ -0,0 +1,57 @@
> +/* { dg-do run { target { riscv_vector } } } */
> +/* { dg-additional-options "-std=c99 -fno-vect-cost-model 
> --param=riscv-autovec-preference=scalable -fno-builtin" } */
> +
> +#include "vmv-imm-template.h"
> +
> +#include 
> +#include 
> +
> +#define SZ 512
> +
> +#define TEST_POS(TYPE,VAL) \
> +  TYPE a##TYPE##VAL[SZ];   \
> +  vmv_##VAL (a##TYPE##VAL, SZ);\
> +  for (int i = 0; i < SZ; i++) \
> +assert (a##TYPE##VAL[i] == VAL);
> +
> +#define TEST_NEG(TYPE,VAL) \
> +  TYPE am##TYPE##VAL[SZ];  \
> +  vmv_m##VAL (am##TYPE##VAL, SZ);  \
> +  for (int i = 0; i < SZ; i++) \
> +assert (am##TYPE##VAL[i] == -VAL);
> +
> +int main ()
> +{
> +  TEST_NEG(int8_t, 16)
> +  TEST_NEG(int8_t, 15)
> +  TEST_NEG(int8_t, 14)
> +  TEST_NEG(int8_t, 13)
> +  TEST_NEG(int16_t, 12)
> +  TEST_NEG(int16_t, 11)
> +  TEST_NEG(int16_t, 10)
> +  TEST_NEG(int16_t, 9)
> +  TEST_NEG(int32_t, 8)
> +  TEST_NEG(int32_t, 7)
> +  TEST_NEG(int32_t, 6)
> +  TEST_NEG(int32_t, 5)
> +  TEST_NEG(int64_t, 4)
> +  TEST_NEG(int64_t, 3)
> +  TEST_NEG(int64_t, 2)
> +  TEST_NEG(int64_t, 1)
> +  TEST_POS(uint8_t, 0)
> +  TEST_POS(uint8_t, 1)
> +  TEST_POS(uint8_t, 2)
> +  TEST_POS(uint8_t, 3)
> +  TEST_POS(uint16_t, 4)
> +  TEST_POS(uint16_t, 5)
> +  TEST_POS(uint16_t, 6)
> +  TEST_POS(uint16_t, 7)
> +  TEST_POS(uint32_t, 8)
> +  TEST_POS(uint32_t, 9)
> +  TEST_POS(uint32_t, 10)
> +  TEST_POS(uint32_t, 11)
> +  TEST_POS(uint64_t, 12)
> +  TEST_POS(uint64_t, 13)
> +  TEST_POS(uint64_t, 14)
> +  TEST_POS(uint64_t, 15)
> +}
> diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c
> new file mode 100644
> index 000..c419256cd45
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c
> @@ -0,0 +1,6 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-std=c99 -march=rv32gcv -mabi=ilp32d 
> -fno-vect-cost-model --param=riscv-autovec-preference=scalable -fno-builtin" 
> } */
> +
> +#include "vmv-imm-template.h"
> +
> +/* { dg-final { scan-ass

RE: [PATCH v5] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-11 Thread Li, Pan2 via Gcc-patches
Sorry for disturbing, fixed my silly mistake in PATCH v6 and passed x86 
regression test. If no more concern, will commit after pass the x86 regression 
test.

Pan

-Original Message-
From: Li, Pan2 
Sent: Thursday, May 11, 2023 6:56 PM
To: Richard Sandiford 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
Yanzhang ; jeffreya...@gmail.com; ja...@redhat.com; 
rguent...@suse.de
Subject: RE: [PATCH v5] Var-Tracking: Typedef pointer_mux 
as decl_or_value

Thanks Richard Sandiford.

There is one interesting thing that the change from v4 to v5 (Aka, remove the 
case and put dv as first arg) makes some ICE, will have a try for fixing.

Pan

-Original Message-
From: Richard Sandiford 
Sent: Thursday, May 11, 2023 3:17 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
Yanzhang ; jeffreya...@gmail.com; ja...@redhat.com; 
rguent...@suse.de
Subject: Re: [PATCH v5] Var-Tracking: Typedef pointer_mux 
as decl_or_value

pan2...@intel.com writes:
> From: Pan Li 
>
> The decl_or_value is defined as void * before this PATCH. It will take 
> care of both the tree_node and rtx_def. Unfortunately, given a void 
> pointer cannot tell the input is tree_node or rtx_def.
>
> Then we have some implicit structure layout requirement similar as 
> below. Or we will touch unreasonable bits when cast void * to 
> tree_node or rtx_def.
>
> ++---+--+
> | offset | tree_node | rtx_def  |
> ++---+--+
> |  0 | code: 16  | code: 16 | <- require the same location and 
> | bitssize
> ++---+--+
> | 16 | ...   | mode: 8  |
> ++---+--+
> | ...   |
> ++---+--+
> | 24 | ...   | ...  |
> ++---+--+
>
> This behavior blocks the PATCH that extend the rtx_def mode from 8 to
> 16 bits for running out of machine mode. This PATCH introduced the 
> pointer_mux to tell the input is tree_node or rtx_def, and decouple 
> the above implicit dependency.
>
> Signed-off-by: Pan Li 
> Co-Authored-By: Richard Sandiford 
> Co-Authored-By: Richard Biener 
> Co-Authored-By: Jakub Jelinek 
>
> gcc/ChangeLog:
>
>   * mux-utils.h: Add overload operator == and != for pointer_mux.
>   * var-tracking.cc: Included mux-utils.h for pointer_tmux.
>   (decl_or_value): Changed from void * to pointer_mux.
>   (dv_is_decl_p): Reconciled to the new type, aka pointer_mux.
>   (dv_as_decl): Ditto.
>   (dv_as_opaque): Removed due to unnecessary.
>   (struct variable_hasher): Take decl_or_value as compare_type.
>   (variable_hasher::equal): Diito.
>   (dv_from_decl): Reconciled to the new type, aka pointer_mux.
>   (dv_from_value): Ditto.
>   (attrs_list_member):  Ditto.
>   (vars_copy): Ditto.
>   (var_reg_decl_set): Ditto.
>   (var_reg_delete_and_set): Ditto.
>   (find_loc_in_1pdv): Ditto.
>   (canonicalize_values_star): Ditto.
>   (variable_post_merge_new_vals): Ditto.
>   (dump_onepart_variable_differences): Ditto.
>   (variable_different_p): Ditto.
>   (set_slot_part): Ditto.
>   (clobber_slot_part): Ditto.
>   (clobber_variable_part): Ditto.

OK, thanks!

Richard

> ---
>  gcc/mux-utils.h |  4 +++
>  gcc/var-tracking.cc | 85
> ++---
>  2 files changed, 37 insertions(+), 52 deletions(-)
>
> diff --git a/gcc/mux-utils.h b/gcc/mux-utils.h index
> a2b6a316899..486d80915b1 100644
> --- a/gcc/mux-utils.h
> +++ b/gcc/mux-utils.h
> @@ -117,6 +117,10 @@ public:
>//  ...use ptr.known_second ()...
>T2 *second_or_null () const;
>  
> +  bool operator == (const pointer_mux &pm) const { return m_ptr == 
> + pm.m_ptr; }
> +
> +  bool operator != (const pointer_mux &pm) const { return m_ptr != 
> + pm.m_ptr; }
> +
>// Return true if the pointer is a T.
>//
>// This is only valid if T1 and T2 are distinct and if T can be 
> diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc index 
> fae0c73e02f..384084c8b3e 100644
> --- a/gcc/var-tracking.cc
> +++ b/gcc/var-tracking.cc
> @@ -116,6 +116,7 @@
>  #include "fibonacci_heap.h"
>  #include "print-rtl.h"
>  #include "function-abi.h"
> +#include "mux-utils.h"
>  
>  typedef fibonacci_heap  bb_heap_t;
>  
> @@ -197,14 +198,14 @@ struct micro_operation
>  
>  
>  /* A declaration of a variable, or an RTL value being handled like a
> -   declaration.  */
> -typedef void *decl_or_value;
> +   declaration by pointer_mux.  */
> +typedef pointer_mux decl_or_value;
>  
>  /* Return true if a decl_or_value DV is a DECL or NULL.  */  static 
> inline bool  dv_is_decl_p (decl_or_value dv)  {
> -  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
> +  return dv.is_first ();
>  }
>  
>  /* Return true if a decl_or_value is a VALUE rtl.  */ @@ -219,7
> +220,7 @@ static inline tree  dv_as_decl (decl_or_value dv)  {
>gcc_ch

[PATCH v6] Var-Tracking: Typedef pointer_mux as decl_or_value

2023-05-11 Thread Pan Li via Gcc-patches
From: Pan Li 

The decl_or_value is defined as void * before this PATCH. It will take
care of both the tree_node and rtx_def. Unfortunately, given a void
pointer cannot tell the input is tree_node or rtx_def.

Then we have some implicit structure layout requirement similar as
below. Or we will touch unreasonable bits when cast void * to tree_node
or rtx_def.

++---+--+
| offset | tree_node | rtx_def  |
++---+--+
|  0 | code: 16  | code: 16 | <- require the same location and bitssize
++---+--+
| 16 | ...   | mode: 8  |
++---+--+
| ...   |
++---+--+
| 24 | ...   | ...  |
++---+--+

This behavior blocks the PATCH that extend the rtx_def mode from 8 to
16 bits for running out of machine mode. This PATCH introduced the
pointer_mux to tell the input is tree_node or rtx_def, and decouple
the above implicit dependency.

Signed-off-by: Pan Li 
Co-Authored-By: Richard Sandiford 
Co-Authored-By: Richard Biener 
Co-Authored-By: Jakub Jelinek 

gcc/ChangeLog:

* mux-utils.h: Add overload operator == and != for pointer_mux.
* var-tracking.cc: Included mux-utils.h for pointer_tmux.
(decl_or_value): Changed from void * to pointer_mux.
(dv_is_decl_p): Reconciled to the new type, aka pointer_mux.
(dv_as_decl): Ditto.
(dv_as_opaque): Removed due to unnecessary.
(struct variable_hasher): Take decl_or_value as compare_type.
(variable_hasher::equal): Diito.
(dv_from_decl): Reconciled to the new type, aka pointer_mux.
(dv_from_value): Ditto.
(attrs_list_member):  Ditto.
(vars_copy): Ditto.
(var_reg_decl_set): Ditto.
(var_reg_delete_and_set): Ditto.
(find_loc_in_1pdv): Ditto.
(canonicalize_values_star): Ditto.
(variable_post_merge_new_vals): Ditto.
(dump_onepart_variable_differences): Ditto.
(variable_different_p): Ditto.
(set_slot_part): Ditto.
(clobber_slot_part): Ditto.
(clobber_variable_part): Ditto.
---
 gcc/mux-utils.h |  4 +++
 gcc/var-tracking.cc | 85 ++---
 2 files changed, 37 insertions(+), 52 deletions(-)

diff --git a/gcc/mux-utils.h b/gcc/mux-utils.h
index a2b6a316899..486d80915b1 100644
--- a/gcc/mux-utils.h
+++ b/gcc/mux-utils.h
@@ -117,6 +117,10 @@ public:
   //  ...use ptr.known_second ()...
   T2 *second_or_null () const;
 
+  bool operator == (const pointer_mux &pm) const { return m_ptr == pm.m_ptr; }
+
+  bool operator != (const pointer_mux &pm) const { return m_ptr != pm.m_ptr; }
+
   // Return true if the pointer is a T.
   //
   // This is only valid if T1 and T2 are distinct and if T can be
diff --git a/gcc/var-tracking.cc b/gcc/var-tracking.cc
index fae0c73e02f..384084c8b3e 100644
--- a/gcc/var-tracking.cc
+++ b/gcc/var-tracking.cc
@@ -116,6 +116,7 @@
 #include "fibonacci_heap.h"
 #include "print-rtl.h"
 #include "function-abi.h"
+#include "mux-utils.h"
 
 typedef fibonacci_heap  bb_heap_t;
 
@@ -197,14 +198,14 @@ struct micro_operation
 
 
 /* A declaration of a variable, or an RTL value being handled like a
-   declaration.  */
-typedef void *decl_or_value;
+   declaration by pointer_mux.  */
+typedef pointer_mux decl_or_value;
 
 /* Return true if a decl_or_value DV is a DECL or NULL.  */
 static inline bool
 dv_is_decl_p (decl_or_value dv)
 {
-  return !dv || (int) TREE_CODE ((tree) dv) != (int) VALUE;
+  return dv.is_first ();
 }
 
 /* Return true if a decl_or_value is a VALUE rtl.  */
@@ -219,7 +220,7 @@ static inline tree
 dv_as_decl (decl_or_value dv)
 {
   gcc_checking_assert (dv_is_decl_p (dv));
-  return (tree) dv;
+  return dv.known_first ();
 }
 
 /* Return the value in the decl_or_value.  */
@@ -227,14 +228,7 @@ static inline rtx
 dv_as_value (decl_or_value dv)
 {
   gcc_checking_assert (dv_is_value_p (dv));
-  return (rtx)dv;
-}
-
-/* Return the opaque pointer in the decl_or_value.  */
-static inline void *
-dv_as_opaque (decl_or_value dv)
-{
-  return dv;
+  return dv.known_second ();
 }
 
 
@@ -483,9 +477,9 @@ static void variable_htab_free (void *);
 
 struct variable_hasher : pointer_hash 
 {
-  typedef void *compare_type;
+  typedef decl_or_value compare_type;
   static inline hashval_t hash (const variable *);
-  static inline bool equal (const variable *, const void *);
+  static inline bool equal (const variable *, const decl_or_value);
   static inline void remove (variable *);
 };
 
@@ -501,11 +495,9 @@ variable_hasher::hash (const variable *v)
 /* Compare the declaration of variable X with declaration Y.  */
 
 inline bool
-variable_hasher::equal (const variable *v, const void *y)
+variable_hasher::equal (const variable *v, const decl_or_value y)
 {
-  decl_or_value dv = CONST_CAST2 (decl_or_value, const void *, y);
-
-  return (dv_as_opaque (v->dv) == dv_as_opaque (dv));

[PATCH v2] RISC-V: Allow vector constants in riscv_const_insns.

2023-05-11 Thread Robin Dapp via Gcc-patches
> OK, you can go ahead commit patch. I am gonna send another patch to
> fix this.
I agree that we should handle more constants but I'd still rather go
ahead now and fix things later.  The patch is more about the test
rather than the actual change anyway.

Jeff already ack'ed v1, maybe waiting for Kito's OK to push still.

(Minor) changes from v1:
 - Rebase vs Juzhe's patch
 - Change test format to match binops.


This patch adds various vector constants to riscv_const_insns in order
for them to be properly recognized as immediate operands.  This then
allows to emit vmv.v.i instructions via autovectorization.

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_const_insns): Add permissible
vector constants.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c: New test.
* gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c: New test.
* gcc.target/riscv/rvv/autovec/vmv-imm-template.h: New test.
* gcc.target/riscv/rvv/autovec/vmv-imm-run.c: New test.
---
 gcc/config/riscv/riscv.cc |  7 +++
 .../riscv/rvv/autovec/vmv-imm-run.c   | 57 +++
 .../riscv/rvv/autovec/vmv-imm-rv32.c  |  6 ++
 .../riscv/rvv/autovec/vmv-imm-rv64.c  |  6 ++
 .../riscv/rvv/autovec/vmv-imm-template.h  | 54 ++
 5 files changed, 130 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-template.h

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 8f032250b0f..de578b5b899 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -1291,6 +1291,13 @@ riscv_const_insns (rtx x)
return 1;
  }
  }
+   /* Constants from -16 to 15 can be loaded with vmv.v.i.
+  The Wc0, Wc1 constraints are already covered by the
+  vi constraint so we do not need to check them here
+  separately.  */
+   else if (TARGET_VECTOR && satisfies_constraint_vi (x))
+ return 1;
+
/* TODO: We may support more const vector in the future.  */
return x == CONST0_RTX (GET_MODE (x)) ? 1 : 0;
   }
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c
new file mode 100644
index 000..309a296b686
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-run.c
@@ -0,0 +1,57 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "-std=c99 -fno-vect-cost-model 
--param=riscv-autovec-preference=scalable -fno-builtin" } */
+
+#include "vmv-imm-template.h"
+
+#include 
+#include 
+
+#define SZ 512
+
+#define TEST_POS(TYPE,VAL) \
+  TYPE a##TYPE##VAL[SZ];   \
+  vmv_##VAL (a##TYPE##VAL, SZ);\
+  for (int i = 0; i < SZ; i++) \
+assert (a##TYPE##VAL[i] == VAL);
+
+#define TEST_NEG(TYPE,VAL) \
+  TYPE am##TYPE##VAL[SZ];  \
+  vmv_m##VAL (am##TYPE##VAL, SZ);  \
+  for (int i = 0; i < SZ; i++) \
+assert (am##TYPE##VAL[i] == -VAL);
+
+int main ()
+{
+  TEST_NEG(int8_t, 16)
+  TEST_NEG(int8_t, 15)
+  TEST_NEG(int8_t, 14)
+  TEST_NEG(int8_t, 13)
+  TEST_NEG(int16_t, 12)
+  TEST_NEG(int16_t, 11)
+  TEST_NEG(int16_t, 10)
+  TEST_NEG(int16_t, 9)
+  TEST_NEG(int32_t, 8)
+  TEST_NEG(int32_t, 7)
+  TEST_NEG(int32_t, 6)
+  TEST_NEG(int32_t, 5)
+  TEST_NEG(int64_t, 4)
+  TEST_NEG(int64_t, 3)
+  TEST_NEG(int64_t, 2)
+  TEST_NEG(int64_t, 1)
+  TEST_POS(uint8_t, 0)
+  TEST_POS(uint8_t, 1)
+  TEST_POS(uint8_t, 2)
+  TEST_POS(uint8_t, 3)
+  TEST_POS(uint16_t, 4)
+  TEST_POS(uint16_t, 5)
+  TEST_POS(uint16_t, 6)
+  TEST_POS(uint16_t, 7)
+  TEST_POS(uint32_t, 8)
+  TEST_POS(uint32_t, 9)
+  TEST_POS(uint32_t, 10)
+  TEST_POS(uint32_t, 11)
+  TEST_POS(uint64_t, 12)
+  TEST_POS(uint64_t, 13)
+  TEST_POS(uint64_t, 14)
+  TEST_POS(uint64_t, 15)
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c
new file mode 100644
index 000..c419256cd45
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-std=c99 -march=rv32gcv -mabi=ilp32d 
-fno-vect-cost-model --param=riscv-autovec-preference=scalable -fno-builtin" } 
*/
+
+#include "vmv-imm-template.h"
+
+/* { dg-final { scan-assembler-times "vmv.v.i" 32 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
new file mode 100644
index 000..520321e1c73
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/*

Re: [PATCH V4] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-11 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai"  writes:
> Thanks. I have read rgroup descriptions again.
> Still I am not fully understand it clearly, bear with me :)
>
> I don't known how to differentiate Case 2 and Case 3.
>
> Case 2 is multiple rgroup for SLP.
> Case 3 is multiple rgroup for non-SLP (VEC_PACK_TRUNC)
>
> Is it correct:
> case 2: rgc->max_nscalarper_iter != 1

Yes.

> Case 3 : rgc->max_nscalarper_iter == 1 but rgc->factor != 1?

For case 3 it's:

rgc->max_nscalars_per_iter == 1 && rgc != &LOOP_VINFO_LENS (loop_vinfo)[0]

rgc->factor is controlled by the target and just says what units
IFN_LOAD_LEN works in.  E.g. if we're loading 16-byte elements,
but the underlying instruction measures bytes, the factor would be 2.

Thanks,
Richard


Re: [RFC] libstdc++: Do not use pthread_mutex_clocklock with ThreadSanitizer

2023-05-11 Thread Jonathan Wakely via Gcc-patches
On Thu, 11 May 2023 at 13:19, Mike Crowe  wrote:

> On Wednesday 10 May 2023 at 12:31:12 +0100, Jonathan Wakely wrote:
> > On Wed, 10 May 2023 at 12:20, Jonathan Wakely via Libstdc++ <
> > libstd...@gcc.gnu.org> wrote:
> >
> > > This patch would avoid TSan false positives when using timed waiting
> > > functions on mutexes and condvars, but as noted below, it changes the
> > > semantics.
> > >
> > > I'm not sure whether we want this workaround in place until tsan gets
> > > fixed.
> > >
> > > On one hand, there's no guarantee that those functions use the right
> > > clock anyway (and they won't do unless a recent-ish glibc is used). But
> > > on the other hand, if they normally would use the right clock because
> > > you have glibc support, it's not ideal for tsan to cause a different
> > > clock to be used.
> > >
> >
> > But of course, it's not ideal to get false positives from tsan either
> > (especially when it looks like a libstdc++ bug, as initially reported to
> > me).
>
> I think that this is probably the least-worst option in the short term. As
> TSan is distributed with GCC this workaround can be removed as soon as its
> TSan implementation gains the necessary interceptors. I shall look into
> trying to do that.
>

Right, and before it gets into GCC it will already be upstream in LLVM, so
a recent Clang would support it too by the time we changed anything in
libstdc++.

Another option would be just document how to use
https://github.com/google/sanitizers/wiki/ThreadSanitizerSuppressions for
runtime suppressions, but that would be far from ideal.




> However, ...
>
> > > diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> > > index 89e7f5f5f45..e2700b05ec3 100644
> > > --- a/libstdc++-v3/acinclude.m4
> > > +++ b/libstdc++-v3/acinclude.m4
> > > @@ -4284,7 +4284,7 @@ AC_DEFUN([GLIBCXX_CHECK_PTHREAD_COND_CLOCKWAIT],
> [
> > >[glibcxx_cv_PTHREAD_COND_CLOCKWAIT=no])
> > >])
> > >if test $glibcxx_cv_PTHREAD_COND_CLOCKWAIT = yes; then
> > > -AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT, 1, [Define if
> > > pthread_cond_clockwait is available in .])
> > > +AC_DEFINE(_GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT, (_GLIBCXX_TSAN==0),
> > > [Define if pthread_cond_clockwait is available in .])
> > >fi
>
> TSan does appear to have an interceptor for pthread_cond_clockwait, even if
> it lacks the others. Does this mean that this part is unnecessary?
>

Ah good point, thanks. I grepped for clocklock but not clockwait.


>
> See: https://github.com/google/sanitizers/issues/1259
>
>
Thanks, I've added a link to my new tsan issue there.


[Commited] MAINTAINERS: Fix alphabetic sorting.

2023-05-11 Thread Robin Dapp via Gcc-patches
ChangeLog:

* MAINTAINERS: Sort.
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 1c380bef5c5..e4dee76e2df 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -521,7 +521,6 @@ James Lemke 
 Ilya Leoshkevich   
 Kriang Lerdsuwanakij   
 Pan Li 
-Juzhe Zhong
 Renlin Li  
 Xinliang David Li  
 Chen Liqin 
@@ -716,6 +715,7 @@ Dennis Zhang

 Yufeng Zhang   
 Qing Zhao  
 Shujing Zhao   
+Juzhe Zhong
 Jon Ziegler
 Roman Zippel   
 Josef Zlomek   
-- 
2.40.0



[committed] RISC-V: Update RVV integer compare simplification comments

2023-05-11 Thread Pan Li via Gcc-patches
From: Pan Li 

The VMSET simplification RVV integer comparision has merged already.
This patch would like to update the comments for the cases that the
define_split will act on.

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/vector.md: Add comments for simplifying to vmset.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/vector.md | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 75479f27bcd..328fce8d632 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -8161,13 +8161,20 @@ (define_insn 
"@pred_indexed_store"
 ;; 
-
 ;;  Integer Compare Instructions Simplification
 ;; 
-
-;; Simplify to VMCLR.m Includes:
+;; Simplify OP(V, V) Instructions to VMCLR.m Includes:
 ;; - 1.  VMSNE
 ;; - 2.  VMSLT
 ;; - 3.  VMSLTU
 ;; - 4.  VMSGT
 ;; - 5.  VMSGTU
 ;; 
-
+;; Simplify OP(V, V) Instructions to VMSET.m Includes:
+;; - 1.  VMSEQ
+;; - 2.  VMSLE
+;; - 3.  VMSLEU
+;; - 4.  VMSGE
+;; - 5.  VMSGEU
+;; 
-
 (define_split
   [(set (match_operand:VB  0 "register_operand")
(if_then_else:VB
-- 
2.34.1



[PATCH 24/24] arm: [MVE intrinsics] rework vmlaq vmlasq vqdmlahq vqdmlashq vqrdmlahq vqrdmlashq

2023-05-11 Thread Christophe Lyon via Gcc-patches
Implement vmlaq, vmlasq, vqdmlahq, vqdmlashq, vqrdmlahq, vqrdmlashq
using the new MVE builtins framework.

2022-12-12  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vmlaq, vmlasq, vqdmlahq)
(vqdmlashq, vqrdmlahq, vqrdmlashq): New.
* config/arm/arm-mve-builtins-base.def (vmlaq, vmlasq, vqdmlahq)
(vqdmlashq, vqrdmlahq, vqrdmlashq): New.
* config/arm/arm-mve-builtins-base.h (vmlaq, vmlasq, vqdmlahq)
(vqdmlashq, vqrdmlahq, vqrdmlashq): New.
* config/arm/arm-mve-builtins.cc
(function_instance::has_inactive_argument): Handle vmlaq, vmlasq,
vqdmlahq, vqdmlashq, vqrdmlahq, vqrdmlashq.
* config/arm/arm_mve.h (vqrdmlashq): Remove.
(vqrdmlahq): Remove.
(vqdmlashq): Remove.
(vqdmlahq): Remove.
(vmlasq): Remove.
(vmlaq): Remove.
(vmlaq_m): Remove.
(vmlasq_m): Remove.
(vqdmlashq_m): Remove.
(vqdmlahq_m): Remove.
(vqrdmlahq_m): Remove.
(vqrdmlashq_m): Remove.
(vmlasq_n_u8): Remove.
(vmlaq_n_u8): Remove.
(vqrdmlashq_n_s8): Remove.
(vqrdmlahq_n_s8): Remove.
(vqdmlahq_n_s8): Remove.
(vqdmlashq_n_s8): Remove.
(vmlasq_n_s8): Remove.
(vmlaq_n_s8): Remove.
(vmlasq_n_u16): Remove.
(vmlaq_n_u16): Remove.
(vqrdmlashq_n_s16): Remove.
(vqrdmlahq_n_s16): Remove.
(vqdmlashq_n_s16): Remove.
(vqdmlahq_n_s16): Remove.
(vmlasq_n_s16): Remove.
(vmlaq_n_s16): Remove.
(vmlasq_n_u32): Remove.
(vmlaq_n_u32): Remove.
(vqrdmlashq_n_s32): Remove.
(vqrdmlahq_n_s32): Remove.
(vqdmlashq_n_s32): Remove.
(vqdmlahq_n_s32): Remove.
(vmlasq_n_s32): Remove.
(vmlaq_n_s32): Remove.
(vmlaq_m_n_s8): Remove.
(vmlaq_m_n_s32): Remove.
(vmlaq_m_n_s16): Remove.
(vmlaq_m_n_u8): Remove.
(vmlaq_m_n_u32): Remove.
(vmlaq_m_n_u16): Remove.
(vmlasq_m_n_s8): Remove.
(vmlasq_m_n_s32): Remove.
(vmlasq_m_n_s16): Remove.
(vmlasq_m_n_u8): Remove.
(vmlasq_m_n_u32): Remove.
(vmlasq_m_n_u16): Remove.
(vqdmlashq_m_n_s8): Remove.
(vqdmlashq_m_n_s32): Remove.
(vqdmlashq_m_n_s16): Remove.
(vqdmlahq_m_n_s8): Remove.
(vqdmlahq_m_n_s32): Remove.
(vqdmlahq_m_n_s16): Remove.
(vqrdmlahq_m_n_s8): Remove.
(vqrdmlahq_m_n_s32): Remove.
(vqrdmlahq_m_n_s16): Remove.
(vqrdmlashq_m_n_s8): Remove.
(vqrdmlashq_m_n_s32): Remove.
(vqrdmlashq_m_n_s16): Remove.
(__arm_vmlasq_n_u8): Remove.
(__arm_vmlaq_n_u8): Remove.
(__arm_vqrdmlashq_n_s8): Remove.
(__arm_vqdmlashq_n_s8): Remove.
(__arm_vqrdmlahq_n_s8): Remove.
(__arm_vqdmlahq_n_s8): Remove.
(__arm_vmlasq_n_s8): Remove.
(__arm_vmlaq_n_s8): Remove.
(__arm_vmlasq_n_u16): Remove.
(__arm_vmlaq_n_u16): Remove.
(__arm_vqrdmlashq_n_s16): Remove.
(__arm_vqdmlashq_n_s16): Remove.
(__arm_vqrdmlahq_n_s16): Remove.
(__arm_vqdmlahq_n_s16): Remove.
(__arm_vmlasq_n_s16): Remove.
(__arm_vmlaq_n_s16): Remove.
(__arm_vmlasq_n_u32): Remove.
(__arm_vmlaq_n_u32): Remove.
(__arm_vqrdmlashq_n_s32): Remove.
(__arm_vqdmlashq_n_s32): Remove.
(__arm_vqrdmlahq_n_s32): Remove.
(__arm_vqdmlahq_n_s32): Remove.
(__arm_vmlasq_n_s32): Remove.
(__arm_vmlaq_n_s32): Remove.
(__arm_vmlaq_m_n_s8): Remove.
(__arm_vmlaq_m_n_s32): Remove.
(__arm_vmlaq_m_n_s16): Remove.
(__arm_vmlaq_m_n_u8): Remove.
(__arm_vmlaq_m_n_u32): Remove.
(__arm_vmlaq_m_n_u16): Remove.
(__arm_vmlasq_m_n_s8): Remove.
(__arm_vmlasq_m_n_s32): Remove.
(__arm_vmlasq_m_n_s16): Remove.
(__arm_vmlasq_m_n_u8): Remove.
(__arm_vmlasq_m_n_u32): Remove.
(__arm_vmlasq_m_n_u16): Remove.
(__arm_vqdmlahq_m_n_s8): Remove.
(__arm_vqdmlahq_m_n_s32): Remove.
(__arm_vqdmlahq_m_n_s16): Remove.
(__arm_vqrdmlahq_m_n_s8): Remove.
(__arm_vqrdmlahq_m_n_s32): Remove.
(__arm_vqrdmlahq_m_n_s16): Remove.
(__arm_vqrdmlashq_m_n_s8): Remove.
(__arm_vqrdmlashq_m_n_s32): Remove.
(__arm_vqrdmlashq_m_n_s16): Remove.
(__arm_vqdmlashq_m_n_s8): Remove.
(__arm_vqdmlashq_m_n_s16): Remove.
(__arm_vqdmlashq_m_n_s32): Remove.
(__arm_vmlasq): Remove.
(__arm_vmlaq): Remove.
(__arm_vqrdmlashq): Remove.
(__arm_vqdmlashq): Remove.
(__arm_vqrdmlahq): Remove.
(__arm_vqdmlahq): Remove.
(__arm_vmlaq_m): Remove.
(__arm_vmlasq_m): Remove.
(__arm_vqdmlahq_m): Remove.
(__arm_vqrdmlahq_m): Remove.
(__arm_vqrdmlashq_m): Remove.
(__arm_vqdmlashq

[PATCH 08/24] arm: [MVE intrinsics] rework vmladavaq vmladavaxq vmlsdavaq vmlsdavaxq

2023-05-11 Thread Christophe Lyon via Gcc-patches
Implement vmladavaq, vmladavaxq, vmlsdavaq, vmlsdavaxq using the new
MVE builtins framework.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vmladavaxq, vmladavaq)
(vmlsdavaq, vmlsdavaxq): New.
* config/arm/arm-mve-builtins-base.def (vmladavaxq, vmladavaq)
(vmlsdavaq, vmlsdavaxq): New.
* config/arm/arm-mve-builtins-base.h (vmladavaxq, vmladavaq)
(vmlsdavaq, vmlsdavaxq): New.
* config/arm/arm_mve.h (vmladavaq): Remove.
(vmlsdavaxq): Remove.
(vmlsdavaq): Remove.
(vmladavaxq): Remove.
(vmladavaq_p): Remove.
(vmladavaxq_p): Remove.
(vmlsdavaq_p): Remove.
(vmlsdavaxq_p): Remove.
(vmladavaq_u8): Remove.
(vmlsdavaxq_s8): Remove.
(vmlsdavaq_s8): Remove.
(vmladavaxq_s8): Remove.
(vmladavaq_s8): Remove.
(vmladavaq_u16): Remove.
(vmlsdavaxq_s16): Remove.
(vmlsdavaq_s16): Remove.
(vmladavaxq_s16): Remove.
(vmladavaq_s16): Remove.
(vmladavaq_u32): Remove.
(vmlsdavaxq_s32): Remove.
(vmlsdavaq_s32): Remove.
(vmladavaxq_s32): Remove.
(vmladavaq_s32): Remove.
(vmladavaq_p_s8): Remove.
(vmladavaq_p_s32): Remove.
(vmladavaq_p_s16): Remove.
(vmladavaq_p_u8): Remove.
(vmladavaq_p_u32): Remove.
(vmladavaq_p_u16): Remove.
(vmladavaxq_p_s8): Remove.
(vmladavaxq_p_s32): Remove.
(vmladavaxq_p_s16): Remove.
(vmlsdavaq_p_s8): Remove.
(vmlsdavaq_p_s32): Remove.
(vmlsdavaq_p_s16): Remove.
(vmlsdavaxq_p_s8): Remove.
(vmlsdavaxq_p_s32): Remove.
(vmlsdavaxq_p_s16): Remove.
(__arm_vmladavaq_u8): Remove.
(__arm_vmlsdavaxq_s8): Remove.
(__arm_vmlsdavaq_s8): Remove.
(__arm_vmladavaxq_s8): Remove.
(__arm_vmladavaq_s8): Remove.
(__arm_vmladavaq_u16): Remove.
(__arm_vmlsdavaxq_s16): Remove.
(__arm_vmlsdavaq_s16): Remove.
(__arm_vmladavaxq_s16): Remove.
(__arm_vmladavaq_s16): Remove.
(__arm_vmladavaq_u32): Remove.
(__arm_vmlsdavaxq_s32): Remove.
(__arm_vmlsdavaq_s32): Remove.
(__arm_vmladavaxq_s32): Remove.
(__arm_vmladavaq_s32): Remove.
(__arm_vmladavaq_p_s8): Remove.
(__arm_vmladavaq_p_s32): Remove.
(__arm_vmladavaq_p_s16): Remove.
(__arm_vmladavaq_p_u8): Remove.
(__arm_vmladavaq_p_u32): Remove.
(__arm_vmladavaq_p_u16): Remove.
(__arm_vmladavaxq_p_s8): Remove.
(__arm_vmladavaxq_p_s32): Remove.
(__arm_vmladavaxq_p_s16): Remove.
(__arm_vmlsdavaq_p_s8): Remove.
(__arm_vmlsdavaq_p_s32): Remove.
(__arm_vmlsdavaq_p_s16): Remove.
(__arm_vmlsdavaxq_p_s8): Remove.
(__arm_vmlsdavaxq_p_s32): Remove.
(__arm_vmlsdavaxq_p_s16): Remove.
(__arm_vmladavaq): Remove.
(__arm_vmlsdavaxq): Remove.
(__arm_vmlsdavaq): Remove.
(__arm_vmladavaxq): Remove.
(__arm_vmladavaq_p): Remove.
(__arm_vmladavaxq_p): Remove.
(__arm_vmlsdavaq_p): Remove.
(__arm_vmlsdavaxq_p): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   4 +
 gcc/config/arm/arm-mve-builtins-base.def |   4 +
 gcc/config/arm/arm-mve-builtins-base.h   |   4 +
 gcc/config/arm/arm_mve.h | 538 ---
 4 files changed, 12 insertions(+), 538 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 69af6f9139e..8a5ab990337 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -280,8 +280,12 @@ FUNCTION (vminnmq, unspec_based_mve_function_exact_insn, 
(UNKNOWN, UNKNOWN, SMIN
 FUNCTION_PRED_P_F (vminnmvq, VMINNMVQ)
 FUNCTION_WITH_RTX_M_NO_F (vminq, SMIN, UMIN, VMINQ)
 FUNCTION_PRED_P_S_U (vminvq, VMINVQ)
+FUNCTION_PRED_P_S (vmladavaxq, VMLADAVAXQ)
+FUNCTION_PRED_P_S_U (vmladavaq, VMLADAVAQ)
 FUNCTION_PRED_P_S_U (vmladavq, VMLADAVQ)
 FUNCTION_PRED_P_S (vmladavxq, VMLADAVXQ)
+FUNCTION_PRED_P_S (vmlsdavaq, VMLSDAVAQ)
+FUNCTION_PRED_P_S (vmlsdavaxq, VMLSDAVAXQ)
 FUNCTION_PRED_P_S (vmlsdavq, VMLSDAVQ)
 FUNCTION_PRED_P_S (vmlsdavxq, VMLSDAVXQ)
 FUNCTION_WITHOUT_N_NO_F (vmovlbq, VMOVLBQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index 40d462fc7d2..cf0ed4b58df 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -49,8 +49,12 @@ DEF_MVE_FUNCTION (vminaq, binary_maxamina, all_signed, 
m_or_none)
 DEF_MVE_FUNCTION (vminavq, binary_maxavminav, all_signed, p_or_none)
 DEF_MVE_FUNCTION (vminq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vminvq, binary_maxvminv, all_integer, p_or_none)
+DEF_MVE_FUNCTION (vmladavaq, binary_acca_int32, all_integer, p_or_none)
+DEF_MVE_FUNCTION (vmladavaxq, binary_acca_i

[PATCH 15/24] arm: [MVE intrinsics] rework vrmlaldavhq vrmlaldavhxq vrmlsldavhq vrmlsldavhxq

2023-05-11 Thread Christophe Lyon via Gcc-patches
Implement vrmlaldavhq, vrmlaldavhxq, vrmlsldavhq, vrmlsldavhxq using
the new MVE builtins framework.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vrmlaldavhq, vrmlaldavhxq)
(vrmlsldavhq, vrmlsldavhxq): New.
* config/arm/arm-mve-builtins-base.def (vrmlaldavhq, vrmlaldavhxq)
(vrmlsldavhq, vrmlsldavhxq): New.
* config/arm/arm-mve-builtins-base.h (vrmlaldavhq, vrmlaldavhxq)
(vrmlsldavhq, vrmlsldavhxq): New.
* config/arm/arm-mve-builtins-functions.h
(unspec_mve_function_exact_insn_pred_p): Handle vrmlaldavhq,
vrmlaldavhxq, vrmlsldavhq, vrmlsldavhxq.
* config/arm/arm_mve.h (vrmlaldavhq): Remove.
(vrmlsldavhxq): Remove.
(vrmlsldavhq): Remove.
(vrmlaldavhxq): Remove.
(vrmlaldavhq_p): Remove.
(vrmlaldavhxq_p): Remove.
(vrmlsldavhq_p): Remove.
(vrmlsldavhxq_p): Remove.
(vrmlaldavhq_u32): Remove.
(vrmlsldavhxq_s32): Remove.
(vrmlsldavhq_s32): Remove.
(vrmlaldavhxq_s32): Remove.
(vrmlaldavhq_s32): Remove.
(vrmlaldavhq_p_s32): Remove.
(vrmlaldavhxq_p_s32): Remove.
(vrmlsldavhq_p_s32): Remove.
(vrmlsldavhxq_p_s32): Remove.
(vrmlaldavhq_p_u32): Remove.
(__arm_vrmlaldavhq_u32): Remove.
(__arm_vrmlsldavhxq_s32): Remove.
(__arm_vrmlsldavhq_s32): Remove.
(__arm_vrmlaldavhxq_s32): Remove.
(__arm_vrmlaldavhq_s32): Remove.
(__arm_vrmlaldavhq_p_s32): Remove.
(__arm_vrmlaldavhxq_p_s32): Remove.
(__arm_vrmlsldavhq_p_s32): Remove.
(__arm_vrmlsldavhxq_p_s32): Remove.
(__arm_vrmlaldavhq_p_u32): Remove.
(__arm_vrmlaldavhq): Remove.
(__arm_vrmlsldavhxq): Remove.
(__arm_vrmlsldavhq): Remove.
(__arm_vrmlaldavhxq): Remove.
(__arm_vrmlaldavhq_p): Remove.
(__arm_vrmlaldavhxq_p): Remove.
(__arm_vrmlsldavhq_p): Remove.
(__arm_vrmlsldavhxq_p): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc |   4 +
 gcc/config/arm/arm-mve-builtins-base.def|   4 +
 gcc/config/arm/arm-mve-builtins-base.h  |   4 +
 gcc/config/arm/arm-mve-builtins-functions.h |   8 +-
 gcc/config/arm/arm_mve.h| 182 
 5 files changed, 18 insertions(+), 184 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index af1a2c9942a..142ba9357a1 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -326,6 +326,10 @@ FUNCTION_WITHOUT_N_NO_F (vrev16q, VREV16Q)
 FUNCTION_WITHOUT_N (vrev32q, VREV32Q)
 FUNCTION_WITHOUT_N (vrev64q, VREV64Q)
 FUNCTION_WITHOUT_N_NO_F (vrhaddq, VRHADDQ)
+FUNCTION_PRED_P_S_U (vrmlaldavhq, VRMLALDAVHQ)
+FUNCTION_PRED_P_S (vrmlaldavhxq, VRMLALDAVHXQ)
+FUNCTION_PRED_P_S (vrmlsldavhq, VRMLSLDAVHQ)
+FUNCTION_PRED_P_S (vrmlsldavhxq, VRMLSLDAVHXQ)
 FUNCTION_WITHOUT_N_NO_F (vrmulhq, VRMULHQ)
 FUNCTION_ONLY_F (vrndq, VRNDQ)
 FUNCTION_ONLY_F (vrndaq, VRNDAQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index f7f353b34a7..1dd3ad3489b 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -96,6 +96,10 @@ DEF_MVE_FUNCTION (vrev16q, unary, integer_8, mx_or_none)
 DEF_MVE_FUNCTION (vrev32q, unary, integer_8_16, mx_or_none)
 DEF_MVE_FUNCTION (vrev64q, unary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vrhaddq, binary, all_integer, mx_or_none)
+DEF_MVE_FUNCTION (vrmlaldavhq, binary_acc_int64, integer_32, p_or_none)
+DEF_MVE_FUNCTION (vrmlaldavhxq, binary_acc_int64, signed_32, p_or_none)
+DEF_MVE_FUNCTION (vrmlsldavhq, binary_acc_int64, signed_32, p_or_none)
+DEF_MVE_FUNCTION (vrmlsldavhxq, binary_acc_int64, signed_32, p_or_none)
 DEF_MVE_FUNCTION (vrmulhq, binary, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vrshlq, binary_round_lshift, all_integer, mx_or_none)
 DEF_MVE_FUNCTION (vrshrnbq, binary_rshift_narrow, integer_16_32, m_or_none)
diff --git a/gcc/config/arm/arm-mve-builtins-base.h 
b/gcc/config/arm/arm-mve-builtins-base.h
index 08d07a7c6d5..9604991b168 100644
--- a/gcc/config/arm/arm-mve-builtins-base.h
+++ b/gcc/config/arm/arm-mve-builtins-base.h
@@ -108,6 +108,10 @@ extern const function_base *const vrev16q;
 extern const function_base *const vrev32q;
 extern const function_base *const vrev64q;
 extern const function_base *const vrhaddq;
+extern const function_base *const vrmlaldavhq;
+extern const function_base *const vrmlaldavhxq;
+extern const function_base *const vrmlsldavhq;
+extern const function_base *const vrmlsldavhxq;
 extern const function_base *const vrmulhq;
 extern const function_base *const vrndaq;
 extern const function_base *const vrndmq;
diff --git a/gcc/config/arm/arm-mve-builtins-functions.h 
b/gcc/config/arm/arm-mve-builtins-functions.h
index ea926e42b81..77a6269f0da 100644
--- a/gcc/config/arm/arm-mve-

[PATCH 17/24] arm: [MVE intrinsics] factorize vmlaldavaq vmlaldavaxq vmlsldavaq vmlsldavaxq

2023-05-11 Thread Christophe Lyon via Gcc-patches
Factorize vmlaldavaq, vmlaldavaxq, vmlsldavaq, vmlsldavaxq builtins so
that they use the same parameterized names.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/iterators.md (MVE_VMLxLDAVAxQ, MVE_VMLxLDAVAxQ_P):
New.
(mve_insn): Add vmlaldava, vmlaldavax, vmlsldava, vmlsldavax.
(supf): Add VMLALDAVAXQ_P_S, VMLALDAVAXQ_S, VMLSLDAVAQ_P_S,
VMLSLDAVAQ_S, VMLSLDAVAXQ_P_S, VMLSLDAVAXQ_S.
* config/arm/mve.md (mve_vmlaldavaq_)
(mve_vmlsldavaq_s, mve_vmlsldavaxq_s)
(mve_vmlaldavaxq_s): Merge into ...
(@mve_q_): ... this.
(mve_vmlaldavaq_p_, mve_vmlaldavaxq_p_)
(mve_vmlsldavaq_p_s, mve_vmlsldavaxq_p_s): Merge into
...
(@mve_q_p_): ... this.
---
 gcc/config/arm/iterators.md |  28 +
 gcc/config/arm/mve.md   | 121 +---
 2 files changed, 42 insertions(+), 107 deletions(-)

diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 729127d8586..7a88bc91182 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -741,6 +741,20 @@ (define_int_iterator MVE_VMLxLDAVxQ_P [
 VMLSLDAVXQ_P_S
 ])
 
+(define_int_iterator MVE_VMLxLDAVAxQ [
+VMLALDAVAQ_S VMLALDAVAQ_U
+VMLALDAVAXQ_S
+VMLSLDAVAQ_S
+VMLSLDAVAXQ_S
+])
+
+(define_int_iterator MVE_VMLxLDAVAxQ_P [
+VMLALDAVAQ_P_S VMLALDAVAQ_P_U
+VMLALDAVAXQ_P_S
+VMLSLDAVAQ_P_S
+VMLSLDAVAXQ_P_S
+])
+
 (define_int_iterator MVE_VRMLxLDAVxQ [
 VRMLALDAVHQ_S VRMLALDAVHQ_U
 VRMLALDAVHXQ_S
@@ -883,6 +897,10 @@ (define_int_attr mve_insn [
 (VMLADAVQ_S "vmladav") (VMLADAVQ_U "vmladav")
 (VMLADAVXQ_P_S "vmladavx")
 (VMLADAVXQ_S "vmladavx")
+(VMLALDAVAQ_P_S "vmlaldava") (VMLALDAVAQ_P_U "vmlaldava")
+(VMLALDAVAQ_S "vmlaldava") (VMLALDAVAQ_U "vmlaldava")
+(VMLALDAVAXQ_P_S "vmlaldavax")
+(VMLALDAVAXQ_S "vmlaldavax")
 (VMLALDAVQ_P_S "vmlaldav") (VMLALDAVQ_P_U "vmlaldav")
 (VMLALDAVQ_S "vmlaldav") (VMLALDAVQ_U "vmlaldav")
 (VMLALDAVXQ_P_S "vmlaldavx")
@@ -897,6 +915,10 @@ (define_int_attr mve_insn [
 (VMLSDAVQ_S "vmlsdav")
 (VMLSDAVXQ_P_S "vmlsdavx")
 (VMLSDAVXQ_S "vmlsdavx")
+(VMLSLDAVAQ_P_S "vmlsldava")
+(VMLSLDAVAQ_S "vmlsldava")
+(VMLSLDAVAXQ_P_S "vmlsldavax")
+(VMLSLDAVAXQ_S "vmlsldavax")
 (VMLSLDAVQ_P_S "vmlsldav")
 (VMLSLDAVQ_S "vmlsldav")
 (VMLSLDAVXQ_P_S "vmlsldavx")
@@ -2351,6 +2373,12 @@ (define_int_attr supf [(VCVTQ_TO_F_S "s") (VCVTQ_TO_F_U 
"u") (VREV16Q_S "s")
   (VRMLSLDAVHQ_S "s")
   (VRMLSLDAVHXQ_P_S "s")
   (VRMLSLDAVHXQ_S "s")
+  (VMLALDAVAXQ_P_S "s")
+  (VMLALDAVAXQ_S "s")
+  (VMLSLDAVAQ_P_S "s")
+  (VMLSLDAVAQ_S "s")
+  (VMLSLDAVAXQ_P_S "s")
+  (VMLSLDAVAXQ_S "s")
   ])
 
 ;; Both kinds of return insn.
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index e2259aa48e9..c6fd634b5c0 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -2550,34 +2550,21 @@ (define_insn "@mve_q_p_f"
(set_attr "length""8")])
 
 ;;
-;; [vmlaldavaq_s, vmlaldavaq_u])
+;; [vmlaldavaq_s, vmlaldavaq_u]
+;; [vmlaldavaxq_s]
+;; [vmlsldavaq_s]
+;; [vmlsldavaxq_s]
 ;;
-(define_insn "mve_vmlaldavaq_"
-  [
-   (set (match_operand:DI 0 "s_register_operand" "=r")
-   (unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
-  (match_operand:MVE_5 2 "s_register_operand" "w")
-  (match_operand:MVE_5 3 "s_register_operand" "w")]
-VMLALDAVAQ))
-  ]
-  "TARGET_HAVE_MVE"
-  "vmlaldava.%#\t%Q0, %R0, %q2, %q3"
-  [(set_attr "type" "mve_move")
-])
-
-;;
-;; [vmlaldavaxq_s])
-;;
-(define_insn "mve_vmlaldavaxq_s"
+(define_insn "@mve_q_"
   [
(set (match_operand:DI 0 "s_register_operand" "=r")
(unspec:DI [(match_operand:DI 1 "s_register_operand" "0")
   (match_operand:MVE_5 2 "s_register_operand" "w")
   (match_operand:MVE_5 3 "s_register_operand" "w")]
-VMLALDAVAXQ_S))
+MVE_VMLxLDAVAxQ))
   ]
   "TARGET_HAVE_MVE"
-  "vmlaldavax.s%#\t%Q0, %R0, %q2, %q3"
+  ".%#\t%Q0, %R0, %q2, %q3"
   [(set_attr "type" "mve_move")
 ])
 
@@ -2600,38 +2587,6 @@ (define_insn "@mve_q_p_"
   [(set_attr "type" "mve_move")
(set_attr "length""8")])
 
-;;
-;; [vmlsldavaq_s])
-;;
-(define_insn "mve_vmlsldavaq_s"
-  [
-   (set (

[PATCH 18/24] arm: [MVE intrinsics] rework vmlaldavaq vmlaldavaxq vmlsldavaq vmlsldavaxq

2023-05-11 Thread Christophe Lyon via Gcc-patches
Implement vmlaldavaq, vmlaldavaxq, vmlsldavaq, vmlsldavaxq using the
new MVE builtins framework.

2022-10-25  Christophe Lyon  

gcc/
* config/arm/arm-mve-builtins-base.cc (vmlaldavaq, vmlaldavaxq)
(vmlsldavaq, vmlsldavaxq): New.
* config/arm/arm-mve-builtins-base.def (vmlaldavaq, vmlaldavaxq)
(vmlsldavaq, vmlsldavaxq): New.
* config/arm/arm-mve-builtins-base.h (vmlaldavaq, vmlaldavaxq)
(vmlsldavaq, vmlsldavaxq): New.
* config/arm/arm_mve.h (vmlaldavaq): Remove.
(vmlaldavaxq): Remove.
(vmlsldavaq): Remove.
(vmlsldavaxq): Remove.
(vmlaldavaq_p): Remove.
(vmlaldavaxq_p): Remove.
(vmlsldavaq_p): Remove.
(vmlsldavaxq_p): Remove.
(vmlaldavaq_s16): Remove.
(vmlaldavaxq_s16): Remove.
(vmlsldavaq_s16): Remove.
(vmlsldavaxq_s16): Remove.
(vmlaldavaq_u16): Remove.
(vmlaldavaq_s32): Remove.
(vmlaldavaxq_s32): Remove.
(vmlsldavaq_s32): Remove.
(vmlsldavaxq_s32): Remove.
(vmlaldavaq_u32): Remove.
(vmlaldavaq_p_s32): Remove.
(vmlaldavaq_p_s16): Remove.
(vmlaldavaq_p_u32): Remove.
(vmlaldavaq_p_u16): Remove.
(vmlaldavaxq_p_s32): Remove.
(vmlaldavaxq_p_s16): Remove.
(vmlsldavaq_p_s32): Remove.
(vmlsldavaq_p_s16): Remove.
(vmlsldavaxq_p_s32): Remove.
(vmlsldavaxq_p_s16): Remove.
(__arm_vmlaldavaq_s16): Remove.
(__arm_vmlaldavaxq_s16): Remove.
(__arm_vmlsldavaq_s16): Remove.
(__arm_vmlsldavaxq_s16): Remove.
(__arm_vmlaldavaq_u16): Remove.
(__arm_vmlaldavaq_s32): Remove.
(__arm_vmlaldavaxq_s32): Remove.
(__arm_vmlsldavaq_s32): Remove.
(__arm_vmlsldavaxq_s32): Remove.
(__arm_vmlaldavaq_u32): Remove.
(__arm_vmlaldavaq_p_s32): Remove.
(__arm_vmlaldavaq_p_s16): Remove.
(__arm_vmlaldavaq_p_u32): Remove.
(__arm_vmlaldavaq_p_u16): Remove.
(__arm_vmlaldavaxq_p_s32): Remove.
(__arm_vmlaldavaxq_p_s16): Remove.
(__arm_vmlsldavaq_p_s32): Remove.
(__arm_vmlsldavaq_p_s16): Remove.
(__arm_vmlsldavaxq_p_s32): Remove.
(__arm_vmlsldavaxq_p_s16): Remove.
(__arm_vmlaldavaq): Remove.
(__arm_vmlaldavaxq): Remove.
(__arm_vmlsldavaq): Remove.
(__arm_vmlsldavaxq): Remove.
(__arm_vmlaldavaq_p): Remove.
(__arm_vmlaldavaxq_p): Remove.
(__arm_vmlsldavaq_p): Remove.
(__arm_vmlsldavaxq_p): Remove.
---
 gcc/config/arm/arm-mve-builtins-base.cc  |   4 +
 gcc/config/arm/arm-mve-builtins-base.def |   4 +
 gcc/config/arm/arm-mve-builtins-base.h   |   4 +
 gcc/config/arm/arm_mve.h | 368 ---
 4 files changed, 12 insertions(+), 368 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc 
b/gcc/config/arm/arm-mve-builtins-base.cc
index 142ba9357a1..2b0c800013c 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -285,12 +285,16 @@ FUNCTION_PRED_P_S (vmladavaxq, VMLADAVAXQ)
 FUNCTION_PRED_P_S_U (vmladavaq, VMLADAVAQ)
 FUNCTION_PRED_P_S_U (vmladavq, VMLADAVQ)
 FUNCTION_PRED_P_S (vmladavxq, VMLADAVXQ)
+FUNCTION_PRED_P_S_U (vmlaldavaq, VMLALDAVAQ)
+FUNCTION_PRED_P_S (vmlaldavaxq, VMLALDAVAXQ)
 FUNCTION_PRED_P_S_U (vmlaldavq, VMLALDAVQ)
 FUNCTION_PRED_P_S (vmlaldavxq, VMLALDAVXQ)
 FUNCTION_PRED_P_S (vmlsdavaq, VMLSDAVAQ)
 FUNCTION_PRED_P_S (vmlsdavaxq, VMLSDAVAXQ)
 FUNCTION_PRED_P_S (vmlsdavq, VMLSDAVQ)
 FUNCTION_PRED_P_S (vmlsdavxq, VMLSDAVXQ)
+FUNCTION_PRED_P_S (vmlsldavaq, VMLSLDAVAQ)
+FUNCTION_PRED_P_S (vmlsldavaxq, VMLSLDAVAXQ)
 FUNCTION_PRED_P_S (vmlsldavq, VMLSLDAVQ)
 FUNCTION_PRED_P_S (vmlsldavxq, VMLSLDAVXQ)
 FUNCTION_WITHOUT_N_NO_F (vmovlbq, VMOVLBQ)
diff --git a/gcc/config/arm/arm-mve-builtins-base.def 
b/gcc/config/arm/arm-mve-builtins-base.def
index 1dd3ad3489b..d61badb99d9 100644
--- a/gcc/config/arm/arm-mve-builtins-base.def
+++ b/gcc/config/arm/arm-mve-builtins-base.def
@@ -54,12 +54,16 @@ DEF_MVE_FUNCTION (vmladavaq, binary_acca_int32, 
all_integer, p_or_none)
 DEF_MVE_FUNCTION (vmladavaxq, binary_acca_int32, all_signed, p_or_none)
 DEF_MVE_FUNCTION (vmladavq, binary_acc_int32, all_integer, p_or_none)
 DEF_MVE_FUNCTION (vmladavxq, binary_acc_int32, all_signed, p_or_none)
+DEF_MVE_FUNCTION (vmlaldavaq, binary_acca_int64, integer_16_32, p_or_none)
+DEF_MVE_FUNCTION (vmlaldavaxq, binary_acca_int64, signed_16_32, p_or_none)
 DEF_MVE_FUNCTION (vmlaldavq, binary_acc_int64, integer_16_32, p_or_none)
 DEF_MVE_FUNCTION (vmlaldavxq, binary_acc_int64, signed_16_32, p_or_none)
 DEF_MVE_FUNCTION (vmlsdavaq, binary_acca_int32, all_signed, p_or_none)
 DEF_MVE_FUNCTION (vmlsdavaxq, binary_acca_int32, all_signed, p_or_none)
 DEF_MVE_FUNCTION (vmlsdavq, binary_acc_int32, all_integer, p_or_none)
 DEF_MVE_FUNCTION (vmlsdavxq, binary_acc_int32, all_signed, p_or_none)
+DEF_MVE_FUNCTION 

  1   2   >