Re: [PATCH] Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory.

2023-12-03 Thread Uros Bizjak
On Mon, Dec 4, 2023 at 8:11 AM Hongtao Liu  wrote:
>
> On Fri, Dec 1, 2023 at 10:26 PM Richard Biener
>  wrote:
> >
> > On Fri, Dec 1, 2023 at 3:39 AM liuhongt  wrote:
> > >
> > > > Hmm, I would suggest you put reg_needed into the class and accumulate
> > > > over all vec_construct, with your patch you pessimize a single v32qi
> > > > over two separate v16qi for example.  Also currently the whole block is
> > > > gated with INTEGRAL_TYPE_P but register pressure would be also
> > > > a concern for floating point vectors.  finish_cost would then apply an
> > > > adjustment.
> > >
> > > Changed.
> > >
> > > > 'target_avail_regs' is for GENERAL_REGS, does that include APX regs?
> > > > I don't see anything similar for FP regs, but I guess the target should 
> > > > know
> > > > or maybe there's a #regs in regclass query already.
> > > Haven't see any, use below setting.
> > >
> > > unsigned target_avail_sse = TARGET_64BIT ? (TARGET_AVX512F ? 32 : 16) : 8;
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > No big impact on SPEC2017.
> > > Observe 1 big improvement from other benchmark by avoiding vectorization 
> > > with
> > > vec_construct v32qi which caused lots of spills.
> > >
> > > Ok for trunk?
> >
> > LGTM, let's see what x86 maintainers think.
> +Honza and Uros.
> Any comments?

I have no comment on vector stuff, I think you are the most
experienced developer in this area.

Uros.

> >
> > Richard.
> >
> > > For vec_contruct, the components must be live at the same time if
> > > they're not loaded from memory, when the number of those components
> > > exceeds available registers, spill happens. Try to account that with a
> > > rough estimation.
> > > ??? Ideally, we should have an overall estimation of register pressure
> > > if we know the live range of all variables.
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
> > > Count sse_reg/gpr_regs for components not loaded from memory.
> > > (ix86_vector_costs:ix86_vector_costs): New constructor.
> > > (ix86_vector_costs::m_num_gpr_needed[3]): New private memeber.
> > > (ix86_vector_costs::m_num_sse_needed[3]): Ditto.
> > > (ix86_vector_costs::finish_cost): Estimate overall register
> > > pressure cost.
> > > (ix86_vector_costs::ix86_vect_estimate_reg_pressure): New
> > > function.
> > > ---
> > >  gcc/config/i386/i386.cc | 54 ++---
> > >  1 file changed, 50 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 9390f525b99..dcaea6c2096 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -24562,15 +24562,34 @@ ix86_noce_conversion_profitable_p (rtx_insn 
> > > *seq, struct noce_if_info *if_info)
> > >  /* x86-specific vector costs.  */
> > >  class ix86_vector_costs : public vector_costs
> > >  {
> > > -  using vector_costs::vector_costs;
> > > +public:
> > > +  ix86_vector_costs (vec_info *, bool);
> > >
> > >unsigned int add_stmt_cost (int count, vect_cost_for_stmt kind,
> > >   stmt_vec_info stmt_info, slp_tree node,
> > >   tree vectype, int misalign,
> > >   vect_cost_model_location where) override;
> > >void finish_cost (const vector_costs *) override;
> > > +
> > > +private:
> > > +
> > > +  /* Estimate register pressure of the vectorized code.  */
> > > +  void ix86_vect_estimate_reg_pressure ();
> > > +  /* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used 
> > > for
> > > + estimation of register pressure.
> > > + ??? Currently it's only used by vec_construct/scalar_to_vec
> > > + where we know it's not loaded from memory.  */
> > > +  unsigned m_num_gpr_needed[3];
> > > +  unsigned m_num_sse_needed[3];
> > >  };
> > >
> > > +ix86_vector_costs::ix86_vector_costs (vec_info* vinfo, bool 
> > > costing_for_scalar)
> > > +  : vector_costs (vinfo, costing_for_scalar),
> > > +m_num_gpr_needed (),
> > > +m_num_sse_needed ()
> > > +{
> > > +}
> > > +
> > >  /* Implement targetm.vectorize.create_costs.  */
> > >
> > >  static vector_costs *
> > > @@ -24748,8 +24767,7 @@ ix86_vector_costs::add_stmt_cost (int count, 
> > > vect_cost_for_stmt kind,
> > >  }
> > >else if ((kind == vec_construct || kind == scalar_to_vec)
> > >&& node
> > > -  && SLP_TREE_DEF_TYPE (node) == vect_external_def
> > > -  && INTEGRAL_TYPE_P (TREE_TYPE (vectype)))
> > > +  && SLP_TREE_DEF_TYPE (node) == vect_external_def)
> > >  {
> > >stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, 
> > > misalign);
> > >unsigned i;
> > > @@ -24785,7 +24803,15 @@ ix86_vector_costs::add_stmt_cost (int count, 
> > > vect_cost_for_stmt kind,
> > >   && (gimple_assign_rhs_code (def) != BIT_FIELD_REF
> > >

Re: [PATCH] i386: Fix rtl checking ICE in ix86_elim_entry_set_got [PR112837]

2023-12-03 Thread Uros Bizjak
On Mon, Dec 4, 2023 at 8:41 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase ICEs with RTL checking, because it sets if
> XINT (SET_SRC (set), 1) is UNSPEC_SET_GOT without checking if SET_SRC (set)
> is actually an UNSPEC, so any time we see any other insn with PARALLEL
> and a SET in it which is not an UNSPEC we ICE during RTL checking or
> access there some other union member as if it was an rt_int.
> The rest is just small cleanup.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2023-12-04  Jakub Jelinek  
>
> PR target/112837
> * config/i386/i386.cc (ix86_elim_entry_set_got): Before checking
> for UNSPEC_SET_GOT check that SET_SRC is UNSPEC.  Use SET_SRC and
> SET_DEST macros instead of XEXP, rename vec variable to set.
>
> * gcc.dg/pr112837.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386.cc.jj  2023-12-03 17:44:51.837530235 +0100
> +++ gcc/config/i386/i386.cc 2023-12-03 23:20:31.117005983 +0100
> @@ -8607,10 +8607,11 @@ ix86_elim_entry_set_got (rtx reg)
>rtx pat = PATTERN (c_insn);
>if (GET_CODE (pat) == PARALLEL)
> {
> - rtx vec = XVECEXP (pat, 0, 0);
> - if (GET_CODE (vec) == SET
> - && XINT (XEXP (vec, 1), 1) == UNSPEC_SET_GOT
> - && REGNO (XEXP (vec, 0)) == REGNO (reg))
> + rtx set = XVECEXP (pat, 0, 0);
> + if (GET_CODE (set) == SET
> + && GET_CODE (SET_SRC (set)) == UNSPEC
> + && XINT (SET_SRC (set), 1) == UNSPEC_SET_GOT
> + && REGNO (SET_DEST (set)) == REGNO (reg))
> delete_insn (c_insn);
> }
>  }
> --- gcc/testsuite/gcc.dg/pr112837.c.jj  2023-12-03 23:25:04.803208457 +0100
> +++ gcc/testsuite/gcc.dg/pr112837.c 2023-12-03 23:25:41.194703497 +0100
> @@ -0,0 +1,11 @@
> +/* PR target/112837 */
> +/* { dg-do compile } */
> +/* { dg-options "-fcompare-elim -fprofile" } */
> +/* { dg-additional-options "-fpie" { target pie } } */
> +/* { dg-require-profiling "-fprofile" } */
> +
> +void
> +foo (int i)
> +{
> +  foo (i);
> +}
>
> Jakub
>


Re: [PATCH] i386: Fix up signbit2 expander [PR112816]

2023-12-03 Thread Uros Bizjak
On Mon, Dec 4, 2023 at 8:35 AM Jakub Jelinek  wrote:
>
> Hi!
>
> The following testcase ICEs, because the signbit2 expander uses an
> explicit SUBREG in the pattern around match_operand with register_operand
> predicate.  If we are unlucky enough that expansion tries to expand it
> with some SUBREG as operands[1], we have two nested SUBREGs in the IL,
> which is not valid and causes ICE later.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
> for trunk?
>
> 2023-12-04  Jakub Jelinek  
>
> PR target/112816
> * config/i386/sse.md (signbit2): Force operands[1] into a REG.
>
> * gcc.target/i386/sse2-pr112816.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/sse.md.jj   2023-12-01 08:10:42.311330174 +0100
> +++ gcc/config/i386/sse.md  2023-12-02 09:53:45.489970487 +0100
> @@ -5116,7 +5116,10 @@ (define_expand "signbit2"
> (match_operand:VF1_AVX2 1 "register_operand") 0)
>   (match_dup 2)))]
>"TARGET_SSE2"
> -  "operands[2] = GEN_INT (GET_MODE_UNIT_BITSIZE (mode)-1);")
> +{
> +  operands[1] = force_reg (mode, operands[1]);
> +  operands[2] = GEN_INT (GET_MODE_UNIT_BITSIZE (mode)-1);
> +})
>
>  ;; Also define scalar versions.  These are used for abs, neg, and
>  ;; conditional move.  Using subregs into vector modes causes register
> --- gcc/testsuite/gcc.target/i386/sse2-pr112816.c.jj2023-12-02 
> 10:00:23.623394880 +0100
> +++ gcc/testsuite/gcc.target/i386/sse2-pr112816.c   2023-12-02 
> 09:59:47.024909235 +0100
> @@ -0,0 +1,16 @@
> +/* PR target/112816 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2" } */
> +
> +#define N 4
> +struct S { float x[N]; };
> +struct T { int x[N]; };
> +
> +struct T
> +foo (struct S x)
> +{
> +  struct T res;
> +  for (int i = 0; i < N; ++i)
> +res.x[i] = __builtin_signbit (x.x[i]) ? -1 : 0;
> +  return res;
> +}
>
> Jakub
>


[PATCH] RISC-V: Check if zcd conflicts with zcmt and zcmp

2023-12-03 Thread Kito Cheng
gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::check_conflict_ext): Check zcd conflicts
with zcmt and zcmp.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-29.c: New test.
* gcc.target/riscv/arch-30.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc  | 8 
 gcc/testsuite/gcc.target/riscv/arch-29.c | 7 +++
 gcc/testsuite/gcc.target/riscv/arch-30.c | 7 +++
 3 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-29.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/arch-30.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index aecb342b164..bfb41827f7a 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1230,6 +1230,14 @@ riscv_subset_list::check_conflict_ext ()
   /* 'H' hypervisor extension requires base ISA with 32 registers.  */
   if (lookup ("e") && lookup ("h"))
 error_at (m_loc, "%<-march=%s%>: h extension requires i extension", 
m_arch);
+
+  if (lookup ("zcd"))
+{
+  if (lookup ("zcmt"))
+   error_at (m_loc, "%<-march=%s%>: zcd conflicts with zcmt", m_arch);
+  if (lookup ("zcmp"))
+   error_at (m_loc, "%<-march=%s%>: zcd conflicts with zcmp", m_arch);
+}
 }
 
 /* Parsing function for multi-letter extensions.
diff --git a/gcc/testsuite/gcc.target/riscv/arch-29.c 
b/gcc/testsuite/gcc.target/riscv/arch-29.c
new file mode 100644
index 000..f8281275878
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-29.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64id_zcd_zcmt -mabi=lp64d" } */
+int foo()
+{
+}
+
+/* { dg-error "zcd conflicts with zcmt" "" { target *-*-* } 0 } */
diff --git a/gcc/testsuite/gcc.target/riscv/arch-30.c 
b/gcc/testsuite/gcc.target/riscv/arch-30.c
new file mode 100644
index 000..3e67ea0bb06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/arch-30.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64id_zcd_zcmp -mabi=lp64d" } */
+int foo()
+{
+}
+
+/* { dg-error "zcd conflicts with zcmp" "" { target *-*-* } 0 } */
-- 
2.40.1



Re: [PATCH 2/6] c: Turn int-conversion warnings into permerrors

2023-12-03 Thread Kito Cheng
RISC-V newlib patch send, one for libgloss and another one for libm,
the libm issue is because we don't have right long double support,
however newlib has supported that few months ago, and porting effort
is minor, so I just port that to fix the issue :)

https://sourceware.org/pipermail/newlib/2023/020725.html
https://sourceware.org/pipermail/newlib/2023/020726.html


[PATCH] i386: Fix rtl checking ICE in ix86_elim_entry_set_got [PR112837]

2023-12-03 Thread Jakub Jelinek
Hi!

The following testcase ICEs with RTL checking, because it sets if
XINT (SET_SRC (set), 1) is UNSPEC_SET_GOT without checking if SET_SRC (set)
is actually an UNSPEC, so any time we see any other insn with PARALLEL
and a SET in it which is not an UNSPEC we ICE during RTL checking or
access there some other union member as if it was an rt_int.
The rest is just small cleanup.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-04  Jakub Jelinek  

PR target/112837
* config/i386/i386.cc (ix86_elim_entry_set_got): Before checking
for UNSPEC_SET_GOT check that SET_SRC is UNSPEC.  Use SET_SRC and
SET_DEST macros instead of XEXP, rename vec variable to set.

* gcc.dg/pr112837.c: New test.

--- gcc/config/i386/i386.cc.jj  2023-12-03 17:44:51.837530235 +0100
+++ gcc/config/i386/i386.cc 2023-12-03 23:20:31.117005983 +0100
@@ -8607,10 +8607,11 @@ ix86_elim_entry_set_got (rtx reg)
   rtx pat = PATTERN (c_insn);
   if (GET_CODE (pat) == PARALLEL)
{
- rtx vec = XVECEXP (pat, 0, 0);
- if (GET_CODE (vec) == SET
- && XINT (XEXP (vec, 1), 1) == UNSPEC_SET_GOT
- && REGNO (XEXP (vec, 0)) == REGNO (reg))
+ rtx set = XVECEXP (pat, 0, 0);
+ if (GET_CODE (set) == SET
+ && GET_CODE (SET_SRC (set)) == UNSPEC
+ && XINT (SET_SRC (set), 1) == UNSPEC_SET_GOT
+ && REGNO (SET_DEST (set)) == REGNO (reg))
delete_insn (c_insn);
}
 }
--- gcc/testsuite/gcc.dg/pr112837.c.jj  2023-12-03 23:25:04.803208457 +0100
+++ gcc/testsuite/gcc.dg/pr112837.c 2023-12-03 23:25:41.194703497 +0100
@@ -0,0 +1,11 @@
+/* PR target/112837 */
+/* { dg-do compile } */
+/* { dg-options "-fcompare-elim -fprofile" } */
+/* { dg-additional-options "-fpie" { target pie } } */
+/* { dg-require-profiling "-fprofile" } */
+
+void
+foo (int i)
+{
+  foo (i);
+}

Jakub



[PATCH] extend.texi: Mark builtin arguments with @var{...}

2023-12-03 Thread Jakub Jelinek
On Fri, Dec 01, 2023 at 10:43:57AM -0700, Sandra Loosemore wrote:
> On 12/1/23 10:33, Jakub Jelinek wrote:
> > Shall we tweak that somehow?  If the argument names are unimportant, perhaps
> > it is fine to leave that out, but shouldn't we always use @var{...} around
> > the parameter names when specified?
> 
> Yup.  The Texinfo manual says:  "When using @deftypefn command and
> variations, you should mark parameter names with @var to distinguish these
> from data type names, keywords, and other parts of the literal syntax of the
> programming language."

Here is a patch which does that (but not adding types to where they were
missing, that will be harder to search for).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-12-04  Jakub Jelinek  

* doc/extend.texi (__sync_fetch_and_add, __sync_fetch_and_sub,
__sync_fetch_and_or, __sync_fetch_and_and, __sync_fetch_and_xor,
__sync_fetch_and_nand, __sync_add_and_fetch, __sync_sub_and_fetch,
__sync_or_and_fetch, __sync_and_and_fetch, __sync_xor_and_fetch,
__sync_nand_and_fetch, __sync_bool_compare_and_swap,
__sync_val_compare_and_swap, __sync_lock_test_and_set,
__sync_lock_release, __atomic_load_n, __atomic_load, __atomic_store_n,
__atomic_store, __atomic_exchange_n, __atomic_exchange,
__atomic_compare_exchange_n, __atomic_compare_exchange,
__atomic_add_fetch, __atomic_sub_fetch, __atomic_and_fetch,
__atomic_xor_fetch, __atomic_or_fetch, __atomic_nand_fetch,
__atomic_fetch_add, __atomic_fetch_sub, __atomic_fetch_and,
__atomic_fetch_xor, __atomic_fetch_or, __atomic_fetch_nand,
__atomic_test_and_set, __atomic_clear, __atomic_thread_fence,
__atomic_signal_fence, __atomic_always_lock_free,
__atomic_is_lock_free, __builtin_add_overflow,
__builtin_sadd_overflow, __builtin_saddl_overflow,
__builtin_saddll_overflow, __builtin_uadd_overflow,
__builtin_uaddl_overflow, __builtin_uaddll_overflow,
__builtin_sub_overflow, __builtin_ssub_overflow,
__builtin_ssubl_overflow, __builtin_ssubll_overflow,
__builtin_usub_overflow, __builtin_usubl_overflow,
__builtin_usubll_overflow, __builtin_mul_overflow,
__builtin_smul_overflow, __builtin_smull_overflow,
__builtin_smulll_overflow, __builtin_umul_overflow,
__builtin_umull_overflow, __builtin_umulll_overflow,
__builtin_add_overflow_p, __builtin_sub_overflow_p,
__builtin_mul_overflow_p, __builtin_addc, __builtin_addcl,
__builtin_addcll, __builtin_subc, __builtin_subcl, __builtin_subcll,
__builtin_alloca, __builtin_alloca_with_align,
__builtin_alloca_with_align_and_max, __builtin_speculation_safe_value,
__builtin_nan, __builtin_nand32, __builtin_nand64, __builtin_nand128,
__builtin_nanf, __builtin_nanl, __builtin_nanf@var{n},
__builtin_nanf@var{n}x, __builtin_nans, __builtin_nansd32,
__builtin_nansd64, __builtin_nansd128, __builtin_nansf,
__builtin_nansl, __builtin_nansf@var{n}, __builtin_nansf@var{n}x,
__builtin_ffs, __builtin_clz, __builtin_ctz, __builtin_clrsb,
__builtin_popcount, __builtin_parity, __builtin_bswap16,
__builtin_bswap32, __builtin_bswap64, __builtin_bswap128,
__builtin_extend_pointer, __builtin_goacc_parlevel_id,
__builtin_goacc_parlevel_size, vec_clrl, vec_clrr, vec_mulh, vec_mul,
vec_div, vec_dive, vec_mod, __builtin_rx_mvtc): Use @var{...} around
parameter names.
(vec_rl, vec_sl, vec_sr, vec_sra): Likewise.  Use @var{...} also
around A, B and R in description.

--- gcc/doc/extend.texi.jj  2023-12-01 16:57:27.577890670 +0100
+++ gcc/doc/extend.texi 2023-12-02 10:35:16.509472645 +0100
@@ -12733,12 +12733,12 @@ variables to be protected.  The list is
 empty.  GCC interprets an empty list as meaning that all globally
 accessible variables should be protected.
 
-@defbuiltin{@var{type} __sync_fetch_and_add (@var{type} *ptr, @var{type} 
value, ...)}
-@defbuiltinx{@var{type} __sync_fetch_and_sub (@var{type} *ptr, @var{type} 
value, ...)}
-@defbuiltinx{@var{type} __sync_fetch_and_or (@var{type} *ptr, @var{type} 
value, ...)}
-@defbuiltinx{@var{type} __sync_fetch_and_and (@var{type} *ptr, @var{type} 
value, ...)}
-@defbuiltinx{@var{type} __sync_fetch_and_xor (@var{type} *ptr, @var{type} 
value, ...)}
-@defbuiltinx{@var{type} __sync_fetch_and_nand (@var{type} *ptr, @var{type} 
value, ...)}
+@defbuiltin{@var{type} __sync_fetch_and_add (@var{type} *@var{ptr}, @var{type} 
@var{value}, ...)}
+@defbuiltinx{@var{type} __sync_fetch_and_sub (@var{type} *@var{ptr}, 
@var{type} @var{value}, ...)}
+@defbuiltinx{@var{type} __sync_fetch_and_or (@var{type} *@var{ptr}, @var{type} 
@var{value}, ...)}
+@defbuiltinx{@var{type} __sync_fetch_and_and (@var{type} *@var{ptr}, 
@var{type} @var{value}, ...)}
+@defbuiltinx{@var{type} __sync_fetch_and_xor (@var{type} 

[PATCH] i386: Fix up signbit2 expander [PR112816]

2023-12-03 Thread Jakub Jelinek
Hi!

The following testcase ICEs, because the signbit2 expander uses an
explicit SUBREG in the pattern around match_operand with register_operand
predicate.  If we are unlucky enough that expansion tries to expand it
with some SUBREG as operands[1], we have two nested SUBREGs in the IL,
which is not valid and causes ICE later.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok
for trunk?

2023-12-04  Jakub Jelinek  

PR target/112816
* config/i386/sse.md (signbit2): Force operands[1] into a REG.

* gcc.target/i386/sse2-pr112816.c: New test.

--- gcc/config/i386/sse.md.jj   2023-12-01 08:10:42.311330174 +0100
+++ gcc/config/i386/sse.md  2023-12-02 09:53:45.489970487 +0100
@@ -5116,7 +5116,10 @@ (define_expand "signbit2"
(match_operand:VF1_AVX2 1 "register_operand") 0)
  (match_dup 2)))]
   "TARGET_SSE2"
-  "operands[2] = GEN_INT (GET_MODE_UNIT_BITSIZE (mode)-1);")
+{
+  operands[1] = force_reg (mode, operands[1]);
+  operands[2] = GEN_INT (GET_MODE_UNIT_BITSIZE (mode)-1);
+})
 
 ;; Also define scalar versions.  These are used for abs, neg, and
 ;; conditional move.  Using subregs into vector modes causes register
--- gcc/testsuite/gcc.target/i386/sse2-pr112816.c.jj2023-12-02 
10:00:23.623394880 +0100
+++ gcc/testsuite/gcc.target/i386/sse2-pr112816.c   2023-12-02 
09:59:47.024909235 +0100
@@ -0,0 +1,16 @@
+/* PR target/112816 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+#define N 4
+struct S { float x[N]; };
+struct T { int x[N]; };
+
+struct T
+foo (struct S x)
+{
+  struct T res;
+  for (int i = 0; i < N; ++i)
+res.x[i] = __builtin_signbit (x.x[i]) ? -1 : 0;
+  return res;
+}

Jakub



Re: Re: [PATCH v2] RISC-V: Update crypto vector ISA info with latest spec

2023-12-03 Thread Fei Gao
Committed! Thanks Kito.

BR, 
Fei

On 2023-12-04 15:01  Kito Cheng  wrote:
>
>LGTM again :)
>
>On Mon, Dec 4, 2023 at 2:44 PM Feng Wang  wrote:
>>
>> Rebase and resend this patch due to it was not added into patchwork
>> before. Kito had already reviewed it. Please refer to
>> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327499.html
>>
>> This patch add the Zvkb subset of crypto vector extension. The
>> corresponding test cases have aslo been modified.
>>
>> gcc/ChangeLog:
>>
>> * common/config/riscv/riscv-common.cc: Add zvkb ISA info.
>> * config/riscv/riscv.opt: Add Mask(ZVKB)
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/zvkn-1.c: Replace zvbb with zvkb.
>> * gcc.target/riscv/zvkn.c:   Ditto.
>> * gcc.target/riscv/zvknc-1.c:Ditto.
>> * gcc.target/riscv/zvknc-2.c:Ditto.
>> * gcc.target/riscv/zvknc.c:  Ditto.
>> * gcc.target/riscv/zvkng-1.c:Ditto.
>> * gcc.target/riscv/zvkng-2.c:Ditto.
>> * gcc.target/riscv/zvkng.c:  Ditto.
>> * gcc.target/riscv/zvks-1.c: Ditto.
>> * gcc.target/riscv/zvks.c:   Ditto.
>> * gcc.target/riscv/zvksc-1.c:Ditto.
>> * gcc.target/riscv/zvksc-2.c:Ditto.
>> * gcc.target/riscv/zvksc.c:  Ditto.
>> * gcc.target/riscv/zvksg-1.c:Ditto.
>> * gcc.target/riscv/zvksg-2.c:Ditto.
>> * gcc.target/riscv/zvksg.c:  Ditto.
>> ---
>>  gcc/common/config/riscv/riscv-common.cc  | 6 --
>>  gcc/config/riscv/riscv.opt   | 2 ++
>>  gcc/testsuite/gcc.target/riscv/zvkn-1.c  | 8 
>>  gcc/testsuite/gcc.target/riscv/zvkn.c    | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvknc-1.c | 8 
>>  gcc/testsuite/gcc.target/riscv/zvknc-2.c | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvknc.c   | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvkng-1.c | 8 
>>  gcc/testsuite/gcc.target/riscv/zvkng-2.c | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvkng.c   | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvks-1.c  | 8 
>>  gcc/testsuite/gcc.target/riscv/zvks.c    | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvksc-1.c | 8 
>>  gcc/testsuite/gcc.target/riscv/zvksc-2.c | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvksc.c   | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvksg-1.c | 8 
>>  gcc/testsuite/gcc.target/riscv/zvksg-2.c | 4 ++--
>>  gcc/testsuite/gcc.target/riscv/zvksg.c   | 4 ++--
>>  18 files changed, 50 insertions(+), 46 deletions(-)
>>
>> diff --git a/gcc/common/config/riscv/riscv-common.cc 
>> b/gcc/common/config/riscv/riscv-common.cc
>> index ded85b4c578..6c210412515 100644
>> --- a/gcc/common/config/riscv/riscv-common.cc
>> +++ b/gcc/common/config/riscv/riscv-common.cc
>> @@ -106,7 +106,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
>>
>>    {"zvkn", "zvkned"},
>>    {"zvkn", "zvknhb"},
>> -  {"zvkn", "zvbb"},
>> +  {"zvkn", "zvkb"},
>>    {"zvkn", "zvkt"},
>>    {"zvknc", "zvkn"},
>>    {"zvknc", "zvbc"},
>> @@ -114,7 +114,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
>>    {"zvkng", "zvkg"},
>>    {"zvks", "zvksed"},
>>    {"zvks", "zvksh"},
>> -  {"zvks", "zvbb"},
>> +  {"zvks", "zvkb"},
>>    {"zvks", "zvkt"},
>>    {"zvksc", "zvks"},
>>    {"zvksc", "zvbc"},
>> @@ -253,6 +253,7 @@ static const struct riscv_ext_version 
>> riscv_ext_version_table[] =
>>
>>    {"zvbb", ISA_SPEC_CLASS_NONE, 1, 0},
>>    {"zvbc", ISA_SPEC_CLASS_NONE, 1, 0},
>> +  {"zvkb", ISA_SPEC_CLASS_NONE, 1, 0},
>>    {"zvkg", ISA_SPEC_CLASS_NONE, 1, 0},
>>    {"zvkned", ISA_SPEC_CLASS_NONE, 1, 0},
>>    {"zvknha", ISA_SPEC_CLASS_NONE, 1, 0},
>> @@ -1624,6 +1625,7 @@ static const riscv_ext_flag_table_t 
>> riscv_ext_flag_table[] =
>>
>>    {"zvbb", _options::x_riscv_zvb_subext, MASK_ZVBB},
>>    {"zvbc", _options::x_riscv_zvb_subext, MASK_ZVBC},
>> +  {"zvkb", _options::x_riscv_zvb_subext, MASK_ZVKB},
>>    {"zvkg", _options::x_riscv_zvk_subext, MASK_ZVKG},
>>    {"zvkned",   _options::x_riscv_zvk_subext, MASK_ZVKNED},
>>    {"zvknha",   _options::x_riscv_zvk_subext, MASK_ZVKNHA},
>> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
>> index 0c6517bdc8b..78186fff6c5 100644
>> --- a/gcc/config/riscv/riscv.opt
>> +++ b/gcc/config/riscv/riscv.opt
>> @@ -319,6 +319,8 @@ Mask(ZVBB) Var(riscv_zvb_subext)
>>
>>  Mask(ZVBC) Var(riscv_zvb_subext)
>>
>> +Mask(ZVKB) Var(riscv_zvb_subext)
>> +
>>  TargetVariable
>>  int riscv_zvk_subext
>>
>> diff --git a/gcc/testsuite/gcc.target/riscv/zvkn-1.c 
>> b/gcc/testsuite/gcc.target/riscv/zvkn-1.c
>> index 23b255b4779..069a8f66c92 100644
>> --- a/gcc/testsuite/gcc.target/riscv/zvkn-1.c
>> +++ b/gcc/testsuite/gcc.target/riscv/zvkn-1.c
>> @@ -1,6 +1,6 @@
>>  /* { dg-do compile } */
>> -/* { dg-options "-march=rv64gc_zvkned_zvknhb_zvbb_zvkt" { target { rv64 } } 
>> } */
>> -/* { dg-options "-march=rv32gc_zvkned_zvknhb_zvbb_zvkt" { target { rv32 } } 
>> } */
>> +/* { dg-options "-march=rv64gc_zvkned_zvknhb_zvkb_zvkt" { target { rv64 } } 
>> 

Re: [PATCH] Take register pressure into account for vec_construct/scalar_to_vec when the components are not loaded from memory.

2023-12-03 Thread Hongtao Liu
On Fri, Dec 1, 2023 at 10:26 PM Richard Biener
 wrote:
>
> On Fri, Dec 1, 2023 at 3:39 AM liuhongt  wrote:
> >
> > > Hmm, I would suggest you put reg_needed into the class and accumulate
> > > over all vec_construct, with your patch you pessimize a single v32qi
> > > over two separate v16qi for example.  Also currently the whole block is
> > > gated with INTEGRAL_TYPE_P but register pressure would be also
> > > a concern for floating point vectors.  finish_cost would then apply an
> > > adjustment.
> >
> > Changed.
> >
> > > 'target_avail_regs' is for GENERAL_REGS, does that include APX regs?
> > > I don't see anything similar for FP regs, but I guess the target should 
> > > know
> > > or maybe there's a #regs in regclass query already.
> > Haven't see any, use below setting.
> >
> > unsigned target_avail_sse = TARGET_64BIT ? (TARGET_AVX512F ? 32 : 16) : 8;
> >
> > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > No big impact on SPEC2017.
> > Observe 1 big improvement from other benchmark by avoiding vectorization 
> > with
> > vec_construct v32qi which caused lots of spills.
> >
> > Ok for trunk?
>
> LGTM, let's see what x86 maintainers think.
+Honza and Uros.
Any comments?
>
> Richard.
>
> > For vec_contruct, the components must be live at the same time if
> > they're not loaded from memory, when the number of those components
> > exceeds available registers, spill happens. Try to account that with a
> > rough estimation.
> > ??? Ideally, we should have an overall estimation of register pressure
> > if we know the live range of all variables.
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
> > Count sse_reg/gpr_regs for components not loaded from memory.
> > (ix86_vector_costs:ix86_vector_costs): New constructor.
> > (ix86_vector_costs::m_num_gpr_needed[3]): New private memeber.
> > (ix86_vector_costs::m_num_sse_needed[3]): Ditto.
> > (ix86_vector_costs::finish_cost): Estimate overall register
> > pressure cost.
> > (ix86_vector_costs::ix86_vect_estimate_reg_pressure): New
> > function.
> > ---
> >  gcc/config/i386/i386.cc | 54 ++---
> >  1 file changed, 50 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 9390f525b99..dcaea6c2096 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -24562,15 +24562,34 @@ ix86_noce_conversion_profitable_p (rtx_insn *seq, 
> > struct noce_if_info *if_info)
> >  /* x86-specific vector costs.  */
> >  class ix86_vector_costs : public vector_costs
> >  {
> > -  using vector_costs::vector_costs;
> > +public:
> > +  ix86_vector_costs (vec_info *, bool);
> >
> >unsigned int add_stmt_cost (int count, vect_cost_for_stmt kind,
> >   stmt_vec_info stmt_info, slp_tree node,
> >   tree vectype, int misalign,
> >   vect_cost_model_location where) override;
> >void finish_cost (const vector_costs *) override;
> > +
> > +private:
> > +
> > +  /* Estimate register pressure of the vectorized code.  */
> > +  void ix86_vect_estimate_reg_pressure ();
> > +  /* Number of GENERAL_REGS/SSE_REGS used in the vectorizer, it's used for
> > + estimation of register pressure.
> > + ??? Currently it's only used by vec_construct/scalar_to_vec
> > + where we know it's not loaded from memory.  */
> > +  unsigned m_num_gpr_needed[3];
> > +  unsigned m_num_sse_needed[3];
> >  };
> >
> > +ix86_vector_costs::ix86_vector_costs (vec_info* vinfo, bool 
> > costing_for_scalar)
> > +  : vector_costs (vinfo, costing_for_scalar),
> > +m_num_gpr_needed (),
> > +m_num_sse_needed ()
> > +{
> > +}
> > +
> >  /* Implement targetm.vectorize.create_costs.  */
> >
> >  static vector_costs *
> > @@ -24748,8 +24767,7 @@ ix86_vector_costs::add_stmt_cost (int count, 
> > vect_cost_for_stmt kind,
> >  }
> >else if ((kind == vec_construct || kind == scalar_to_vec)
> >&& node
> > -  && SLP_TREE_DEF_TYPE (node) == vect_external_def
> > -  && INTEGRAL_TYPE_P (TREE_TYPE (vectype)))
> > +  && SLP_TREE_DEF_TYPE (node) == vect_external_def)
> >  {
> >stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, 
> > misalign);
> >unsigned i;
> > @@ -24785,7 +24803,15 @@ ix86_vector_costs::add_stmt_cost (int count, 
> > vect_cost_for_stmt kind,
> >   && (gimple_assign_rhs_code (def) != BIT_FIELD_REF
> >   || !VECTOR_TYPE_P (TREE_TYPE
> > (TREE_OPERAND (gimple_assign_rhs1 (def), 
> > 0))
> > -   stmt_cost += ix86_cost->sse_to_integer;
> > +   {
> > + if (fp)
> > +   m_num_sse_needed[where]++;
> > + else
> > +   {
> > + m_num_gpr_needed[where]++;
> > + 

[PATCH] Support udot_prodv*qi with emulation sdot_prodv*hi

2023-12-03 Thread liuhongt
Like r14-5990-gb4a7c1c8c59d19, but the patch optimized for udot_prod.

Since (zero_extend) (unsigned char)-> int is equal
to (zero_extend)(unsigned char) -> short
+ (sign_extend) (short) -> int

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

It should be safe to emulate udot_prodv*qi with

 vec_unpacku_lo_v32qi
 vec_unpacku_lo_v32qi
 vec_unpacku_hi_v32qi
 vec_unpacku_hi_v32qi
 sdot_prodv16hi
 sdot_prodv16hi
 add3v8si

gcc/ChangeLog:

* config/i386/sse.md (udot_prodv64qi): New expander.
(udot_prod): Emulates with VEC_UNPACKU_EXPR +
DOT_PROD (short, int).

gcc/testsuite/ChangeLog:

* gcc.target/i386/udotprodint8_emulate.c: New test.
---
 gcc/config/i386/sse.md| 82 ---
 .../gcc.target/i386/udotprodint8_emulate.c| 15 
 2 files changed, 85 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/udotprodint8_emulate.c

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a1d4fec42a2..3244cef483a 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -30835,20 +30835,78 @@ (define_expand "sdot_prodv64qi"
 
 (define_expand "udot_prod"
   [(match_operand: 0 "register_operand")
-   (match_operand:VI1 1 "register_operand")
-   (match_operand:VI1 2 "register_operand")
+   (match_operand:VI1_AVX2 1 "register_operand")
+   (match_operand:VI1_AVX2 2 "register_operand")
(match_operand: 3 "register_operand")]
-  "TARGET_AVXVNNIINT8"
+  "TARGET_SSE2"
 {
-  operands[1] = lowpart_subreg (mode,
-force_reg (mode, operands[1]),
-mode);
-  operands[2] = lowpart_subreg (mode,
-force_reg (mode, operands[2]),
-mode);
-  emit_insn (gen_rtx_SET (operands[0], operands[3]));
-  emit_insn (gen_vpdpbuud_ (operands[0], operands[3],
-  operands[1], operands[2]));
+  if (TARGET_AVXVNNIINT8)
+{
+  operands[1] = lowpart_subreg (mode,
+   force_reg (mode, operands[1]),
+   mode);
+  operands[2] = lowpart_subreg (mode,
+   force_reg (mode, operands[2]),
+   mode);
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+  emit_insn (gen_vpdpbuud_ (operands[0], operands[3],
+ operands[1], operands[2]));
+   }
+ else
+   {
+ /* Emulate with vpdpwssd.  */
+ rtx op1_lo = gen_reg_rtx (mode);
+ rtx op1_hi = gen_reg_rtx (mode);
+ rtx op2_lo = gen_reg_rtx (mode);
+ rtx op2_hi = gen_reg_rtx (mode);
+
+ emit_insn (gen_vec_unpacku_lo_ (op1_lo, operands[1]));
+ emit_insn (gen_vec_unpacku_lo_ (op2_lo, operands[2]));
+ emit_insn (gen_vec_unpacku_hi_ (op1_hi, operands[1]));
+ emit_insn (gen_vec_unpacku_hi_ (op2_hi, operands[2]));
+
+ rtx res1 = gen_reg_rtx (mode);
+ rtx res2 = gen_reg_rtx (mode);
+ rtx sum = gen_reg_rtx (mode);
+
+ emit_move_insn (sum, CONST0_RTX (mode));
+ emit_insn (gen_sdot_prod (res1, op1_lo,
+   op2_lo, sum));
+ emit_insn (gen_sdot_prod (res2, op1_hi,
+   op2_hi, operands[3]));
+ emit_insn (gen_add3 (operands[0], res1, res2));
+   }
+
+  DONE;
+})
+
+(define_expand "udot_prodv64qi"
+  [(match_operand:V16SI 0 "register_operand")
+   (match_operand:V64QI 1 "register_operand")
+   (match_operand:V64QI 2 "register_operand")
+   (match_operand:V16SI 3 "register_operand")]
+  "(TARGET_AVX512VNNI || TARGET_AVX512BW) && TARGET_EVEX512"
+{
+  /* Emulate with vpdpwssd.  */
+  rtx op1_lo = gen_reg_rtx (V32HImode);
+  rtx op1_hi = gen_reg_rtx (V32HImode);
+  rtx op2_lo = gen_reg_rtx (V32HImode);
+  rtx op2_hi = gen_reg_rtx (V32HImode);
+
+  emit_insn (gen_vec_unpacku_lo_v64qi (op1_lo, operands[1]));
+  emit_insn (gen_vec_unpacku_lo_v64qi (op2_lo, operands[2]));
+  emit_insn (gen_vec_unpacku_hi_v64qi (op1_hi, operands[1]));
+  emit_insn (gen_vec_unpacku_hi_v64qi (op2_hi, operands[2]));
+
+  rtx res1 = gen_reg_rtx (V16SImode);
+  rtx res2 = gen_reg_rtx (V16SImode);
+  rtx sum = gen_reg_rtx (V16SImode);
+
+  emit_move_insn (sum, CONST0_RTX (V16SImode));
+  emit_insn (gen_sdot_prodv32hi (res1, op1_lo, op2_lo, sum));
+  emit_insn (gen_sdot_prodv32hi (res2, op1_hi, op2_hi, operands[3]));
+
+  emit_insn (gen_addv16si3 (operands[0], res1, res2));
   DONE;
 })
 
diff --git a/gcc/testsuite/gcc.target/i386/udotprodint8_emulate.c 
b/gcc/testsuite/gcc.target/i386/udotprodint8_emulate.c
new file mode 100644
index 000..1e8f2cfe521
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/udotprodint8_emulate.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-mavxvnni -O2 -fdump-tree-optimized" } */
+/* { dg-final { scan-tree-dump-times "DOT_PROD_EXPR" 1 

Re: [PATCH v2] RISC-V: Update crypto vector ISA info with latest spec

2023-12-03 Thread Kito Cheng
LGTM again :)

On Mon, Dec 4, 2023 at 2:44 PM Feng Wang  wrote:
>
> Rebase and resend this patch due to it was not added into patchwork
> before. Kito had already reviewed it. Please refer to
> https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327499.html
>
> This patch add the Zvkb subset of crypto vector extension. The
> corresponding test cases have aslo been modified.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: Add zvkb ISA info.
> * config/riscv/riscv.opt: Add Mask(ZVKB)
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zvkn-1.c: Replace zvbb with zvkb.
> * gcc.target/riscv/zvkn.c:   Ditto.
> * gcc.target/riscv/zvknc-1.c:Ditto.
> * gcc.target/riscv/zvknc-2.c:Ditto.
> * gcc.target/riscv/zvknc.c:  Ditto.
> * gcc.target/riscv/zvkng-1.c:Ditto.
> * gcc.target/riscv/zvkng-2.c:Ditto.
> * gcc.target/riscv/zvkng.c:  Ditto.
> * gcc.target/riscv/zvks-1.c: Ditto.
> * gcc.target/riscv/zvks.c:   Ditto.
> * gcc.target/riscv/zvksc-1.c:Ditto.
> * gcc.target/riscv/zvksc-2.c:Ditto.
> * gcc.target/riscv/zvksc.c:  Ditto.
> * gcc.target/riscv/zvksg-1.c:Ditto.
> * gcc.target/riscv/zvksg-2.c:Ditto.
> * gcc.target/riscv/zvksg.c:  Ditto.
> ---
>  gcc/common/config/riscv/riscv-common.cc  | 6 --
>  gcc/config/riscv/riscv.opt   | 2 ++
>  gcc/testsuite/gcc.target/riscv/zvkn-1.c  | 8 
>  gcc/testsuite/gcc.target/riscv/zvkn.c| 4 ++--
>  gcc/testsuite/gcc.target/riscv/zvknc-1.c | 8 
>  gcc/testsuite/gcc.target/riscv/zvknc-2.c | 4 ++--
>  gcc/testsuite/gcc.target/riscv/zvknc.c   | 4 ++--
>  gcc/testsuite/gcc.target/riscv/zvkng-1.c | 8 
>  gcc/testsuite/gcc.target/riscv/zvkng-2.c | 4 ++--
>  gcc/testsuite/gcc.target/riscv/zvkng.c   | 4 ++--
>  gcc/testsuite/gcc.target/riscv/zvks-1.c  | 8 
>  gcc/testsuite/gcc.target/riscv/zvks.c| 4 ++--
>  gcc/testsuite/gcc.target/riscv/zvksc-1.c | 8 
>  gcc/testsuite/gcc.target/riscv/zvksc-2.c | 4 ++--
>  gcc/testsuite/gcc.target/riscv/zvksc.c   | 4 ++--
>  gcc/testsuite/gcc.target/riscv/zvksg-1.c | 8 
>  gcc/testsuite/gcc.target/riscv/zvksg-2.c | 4 ++--
>  gcc/testsuite/gcc.target/riscv/zvksg.c   | 4 ++--
>  18 files changed, 50 insertions(+), 46 deletions(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index ded85b4c578..6c210412515 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -106,7 +106,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
>
>{"zvkn", "zvkned"},
>{"zvkn", "zvknhb"},
> -  {"zvkn", "zvbb"},
> +  {"zvkn", "zvkb"},
>{"zvkn", "zvkt"},
>{"zvknc", "zvkn"},
>{"zvknc", "zvbc"},
> @@ -114,7 +114,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
>{"zvkng", "zvkg"},
>{"zvks", "zvksed"},
>{"zvks", "zvksh"},
> -  {"zvks", "zvbb"},
> +  {"zvks", "zvkb"},
>{"zvks", "zvkt"},
>{"zvksc", "zvks"},
>{"zvksc", "zvbc"},
> @@ -253,6 +253,7 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>
>{"zvbb", ISA_SPEC_CLASS_NONE, 1, 0},
>{"zvbc", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zvkb", ISA_SPEC_CLASS_NONE, 1, 0},
>{"zvkg", ISA_SPEC_CLASS_NONE, 1, 0},
>{"zvkned", ISA_SPEC_CLASS_NONE, 1, 0},
>{"zvknha", ISA_SPEC_CLASS_NONE, 1, 0},
> @@ -1624,6 +1625,7 @@ static const riscv_ext_flag_table_t 
> riscv_ext_flag_table[] =
>
>{"zvbb", _options::x_riscv_zvb_subext, MASK_ZVBB},
>{"zvbc", _options::x_riscv_zvb_subext, MASK_ZVBC},
> +  {"zvkb", _options::x_riscv_zvb_subext, MASK_ZVKB},
>{"zvkg", _options::x_riscv_zvk_subext, MASK_ZVKG},
>{"zvkned",   _options::x_riscv_zvk_subext, MASK_ZVKNED},
>{"zvknha",   _options::x_riscv_zvk_subext, MASK_ZVKNHA},
> diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
> index 0c6517bdc8b..78186fff6c5 100644
> --- a/gcc/config/riscv/riscv.opt
> +++ b/gcc/config/riscv/riscv.opt
> @@ -319,6 +319,8 @@ Mask(ZVBB) Var(riscv_zvb_subext)
>
>  Mask(ZVBC) Var(riscv_zvb_subext)
>
> +Mask(ZVKB) Var(riscv_zvb_subext)
> +
>  TargetVariable
>  int riscv_zvk_subext
>
> diff --git a/gcc/testsuite/gcc.target/riscv/zvkn-1.c 
> b/gcc/testsuite/gcc.target/riscv/zvkn-1.c
> index 23b255b4779..069a8f66c92 100644
> --- a/gcc/testsuite/gcc.target/riscv/zvkn-1.c
> +++ b/gcc/testsuite/gcc.target/riscv/zvkn-1.c
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
> -/* { dg-options "-march=rv64gc_zvkned_zvknhb_zvbb_zvkt" { target { rv64 } } 
> } */
> -/* { dg-options "-march=rv32gc_zvkned_zvknhb_zvbb_zvkt" { target { rv32 } } 
> } */
> +/* { dg-options "-march=rv64gc_zvkned_zvknhb_zvkb_zvkt" { target { rv64 } } 
> } */
> +/* { dg-options "-march=rv32gc_zvkned_zvknhb_zvkb_zvkt" { target { rv32 } } 
> } */
>
>  #ifndef __riscv_zvkn
>  #error "Feature macro for `Zvkn' not defined"
> @@ -14,8 +14,8 @@
>  #error 

Re: Re: [PATCH 1/4] [RISC-V] prefer Zicond primitive semantics to SFB

2023-12-03 Thread Fei Gao
Committed.  Thanks Kito and Jeff.

BR
Fei

On 2023-11-28 13:03  Jeff Law  wrote:
>
>
>
>On 11/27/23 20:09, Kito Cheng wrote:
>> Personally I don't like to play with the pattern order to tweak the
>> code gen since it kinda introduces implicit relation/rule here, but I
>> guess the only way to prevent that is to duplicate the pattern for SFB
>> again, which is not an ideal solution...
>I won't object to this patch, but I don't really like it either.
>
>This patch highlights that the SFB code is not well integrated with the
>rest of the conditional move support.
>
>Jeff

[PATCH v2] RISC-V: Update crypto vector ISA info with latest spec

2023-12-03 Thread Feng Wang
Rebase and resend this patch due to it was not added into patchwork
before. Kito had already reviewed it. Please refer to
https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg327499.html

This patch add the Zvkb subset of crypto vector extension. The
corresponding test cases have aslo been modified.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add zvkb ISA info.
* config/riscv/riscv.opt: Add Mask(ZVKB)

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvkn-1.c: Replace zvbb with zvkb.
* gcc.target/riscv/zvkn.c:   Ditto.
* gcc.target/riscv/zvknc-1.c:Ditto.
* gcc.target/riscv/zvknc-2.c:Ditto.
* gcc.target/riscv/zvknc.c:  Ditto.
* gcc.target/riscv/zvkng-1.c:Ditto.
* gcc.target/riscv/zvkng-2.c:Ditto.
* gcc.target/riscv/zvkng.c:  Ditto.
* gcc.target/riscv/zvks-1.c: Ditto.
* gcc.target/riscv/zvks.c:   Ditto.
* gcc.target/riscv/zvksc-1.c:Ditto.
* gcc.target/riscv/zvksc-2.c:Ditto.
* gcc.target/riscv/zvksc.c:  Ditto.
* gcc.target/riscv/zvksg-1.c:Ditto.
* gcc.target/riscv/zvksg-2.c:Ditto.
* gcc.target/riscv/zvksg.c:  Ditto.
---
 gcc/common/config/riscv/riscv-common.cc  | 6 --
 gcc/config/riscv/riscv.opt   | 2 ++
 gcc/testsuite/gcc.target/riscv/zvkn-1.c  | 8 
 gcc/testsuite/gcc.target/riscv/zvkn.c| 4 ++--
 gcc/testsuite/gcc.target/riscv/zvknc-1.c | 8 
 gcc/testsuite/gcc.target/riscv/zvknc-2.c | 4 ++--
 gcc/testsuite/gcc.target/riscv/zvknc.c   | 4 ++--
 gcc/testsuite/gcc.target/riscv/zvkng-1.c | 8 
 gcc/testsuite/gcc.target/riscv/zvkng-2.c | 4 ++--
 gcc/testsuite/gcc.target/riscv/zvkng.c   | 4 ++--
 gcc/testsuite/gcc.target/riscv/zvks-1.c  | 8 
 gcc/testsuite/gcc.target/riscv/zvks.c| 4 ++--
 gcc/testsuite/gcc.target/riscv/zvksc-1.c | 8 
 gcc/testsuite/gcc.target/riscv/zvksc-2.c | 4 ++--
 gcc/testsuite/gcc.target/riscv/zvksc.c   | 4 ++--
 gcc/testsuite/gcc.target/riscv/zvksg-1.c | 8 
 gcc/testsuite/gcc.target/riscv/zvksg-2.c | 4 ++--
 gcc/testsuite/gcc.target/riscv/zvksg.c   | 4 ++--
 18 files changed, 50 insertions(+), 46 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index ded85b4c578..6c210412515 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -106,7 +106,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
 
   {"zvkn", "zvkned"},
   {"zvkn", "zvknhb"},
-  {"zvkn", "zvbb"},
+  {"zvkn", "zvkb"},
   {"zvkn", "zvkt"},
   {"zvknc", "zvkn"},
   {"zvknc", "zvbc"},
@@ -114,7 +114,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zvkng", "zvkg"},
   {"zvks", "zvksed"},
   {"zvks", "zvksh"},
-  {"zvks", "zvbb"},
+  {"zvks", "zvkb"},
   {"zvks", "zvkt"},
   {"zvksc", "zvks"},
   {"zvksc", "zvbc"},
@@ -253,6 +253,7 @@ static const struct riscv_ext_version 
riscv_ext_version_table[] =
 
   {"zvbb", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvbc", ISA_SPEC_CLASS_NONE, 1, 0},
+  {"zvkb", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvkg", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvkned", ISA_SPEC_CLASS_NONE, 1, 0},
   {"zvknha", ISA_SPEC_CLASS_NONE, 1, 0},
@@ -1624,6 +1625,7 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
 
   {"zvbb", _options::x_riscv_zvb_subext, MASK_ZVBB},
   {"zvbc", _options::x_riscv_zvb_subext, MASK_ZVBC},
+  {"zvkb", _options::x_riscv_zvb_subext, MASK_ZVKB},
   {"zvkg", _options::x_riscv_zvk_subext, MASK_ZVKG},
   {"zvkned",   _options::x_riscv_zvk_subext, MASK_ZVKNED},
   {"zvknha",   _options::x_riscv_zvk_subext, MASK_ZVKNHA},
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index 0c6517bdc8b..78186fff6c5 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -319,6 +319,8 @@ Mask(ZVBB) Var(riscv_zvb_subext)
 
 Mask(ZVBC) Var(riscv_zvb_subext)
 
+Mask(ZVKB) Var(riscv_zvb_subext)
+
 TargetVariable
 int riscv_zvk_subext
 
diff --git a/gcc/testsuite/gcc.target/riscv/zvkn-1.c 
b/gcc/testsuite/gcc.target/riscv/zvkn-1.c
index 23b255b4779..069a8f66c92 100644
--- a/gcc/testsuite/gcc.target/riscv/zvkn-1.c
+++ b/gcc/testsuite/gcc.target/riscv/zvkn-1.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
-/* { dg-options "-march=rv64gc_zvkned_zvknhb_zvbb_zvkt" { target { rv64 } } } 
*/
-/* { dg-options "-march=rv32gc_zvkned_zvknhb_zvbb_zvkt" { target { rv32 } } } 
*/
+/* { dg-options "-march=rv64gc_zvkned_zvknhb_zvkb_zvkt" { target { rv64 } } } 
*/
+/* { dg-options "-march=rv32gc_zvkned_zvknhb_zvkb_zvkt" { target { rv32 } } } 
*/
 
 #ifndef __riscv_zvkn
 #error "Feature macro for `Zvkn' not defined"
@@ -14,8 +14,8 @@
 #error "Feature macro for `Zvknhb' not defined"
 #endif
 
-#ifndef __riscv_zvbb
-#error "Feature macro for `Zvbb' not defined"
+#ifndef __riscv_zvkb
+#error "Feature macro for `Zvkb' not defined"
 #endif
 
 #ifndef __riscv_zvkt
diff --git a/gcc/testsuite/gcc.target/riscv/zvkn.c 

[committed] RISC-V: Add sifive-x280 to -mcpu

2023-12-03 Thread Kito Cheng
x280 is one of SiFive core, and it release for a while, also
upstream LLVM already support that.

[1] https://www.sifive.com/cores/intelligence-x280

gcc/ChangeLog:

* config/riscv/riscv-cores.def: Add sifive-x280.
* doc/invoke.texi (RISC-V Options): Add sifive-x280

gcc/testsuite/ChangeLog:

* gcc.target/riscv/mcpu-sifive-x280.c: New test.
---
 gcc/config/riscv/riscv-cores.def  |  1 +
 gcc/doc/invoke.texi   |  2 +-
 .../gcc.target/riscv/mcpu-sifive-x280.c   | 20 +++
 3 files changed, 22 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/mcpu-sifive-x280.c

diff --git a/gcc/config/riscv/riscv-cores.def b/gcc/config/riscv/riscv-cores.def
index 91deabb6079..34df59e8d61 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -73,6 +73,7 @@ RISCV_CORE("sifive-s76",  "rv64imafdc", "sifive-7-series")
 
 RISCV_CORE("sifive-u54",  "rv64imafdc", "sifive-5-series")
 RISCV_CORE("sifive-u74",  "rv64imafdc", "sifive-7-series")
+RISCV_CORE("sifive-x280", "rv64imafdcv_zfh_zba_zbb_zvfh_zvl512b", 
"sifive-7-series")
 
 RISCV_CORE("thead-c906",  
"rv64imafdc_xtheadba_xtheadbb_xtheadbs_xtheadcmo_"
  "xtheadcondmov_xtheadfmemidx_xtheadmac_"
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 2fab4c5d71f..6fe63b5f999 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -29776,7 +29776,7 @@ by particular CPU name.
 Permissible values for this option are: @samp{sifive-e20}, @samp{sifive-e21},
 @samp{sifive-e24}, @samp{sifive-e31}, @samp{sifive-e34}, @samp{sifive-e76},
 @samp{sifive-s21}, @samp{sifive-s51}, @samp{sifive-s54}, @samp{sifive-s76},
-@samp{sifive-u54}, and @samp{sifive-u74}.
+@samp{sifive-u54}, @samp{sifive-u74}, and @samp{sifive-x280}.
 
 @opindex mtune
 @item -mtune=@var{processor-string}
diff --git a/gcc/testsuite/gcc.target/riscv/mcpu-sifive-x280.c 
b/gcc/testsuite/gcc.target/riscv/mcpu-sifive-x280.c
new file mode 100644
index 000..be6e13f810b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/mcpu-sifive-x280.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-skip-if "-march given" { *-*-* } { "-march=*" } } */
+/* { dg-options "-mcpu=sifive-x280 -mabi=lp64" } */
+/* SiFive x280 => rv64imafdcv_zfh_zba_zbb_zvfh_zvl512b */
+
+#if !((__riscv_xlen == 64) \
+  && !defined(__riscv_32e) \
+  && (__riscv_flen == 64)  \
+  && defined(__riscv_c)\
+  && defined(__riscv_zfh)  \
+  && defined(__riscv_zvfh) \
+  && defined(__riscv_zvl512b)  \
+  && defined(__riscv_v))
+#error "unexpected arch"
+#endif
+
+int main()
+{
+  return 0;
+}
-- 
2.40.1



[committed] RISC-V: Refactor riscv_implied_info_t to make it able to handle conditional implication [NFC]

2023-12-03 Thread Kito Cheng
RISC-V ISA implication rules become little bit complicated than before,
it may come with condition, so this commit extend the capability of
riscv_implied_info_t, also make it more...C++ize.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc (riscv_implied_predicator_t): New.
(riscv_implied_info_t::riscv_implied_info_t): New.
(riscv_implied_info_t::match): New.
(riscv_implied_info): New entry for zcf.
(riscv_subset_list::handle_implied_ext): Use
riscv_implied_info_t::match.
(riscv_subset_list::check_implied_ext): Ditto.
(riscv_subset_list::handle_combine_ext): Ditto.
(riscv_subset_list::parse): Move zcf implication handling to
riscv_implied_infos.
---
 gcc/common/config/riscv/riscv-common.cc | 44 ++---
 1 file changed, 33 insertions(+), 11 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index de793f96fa5..aecb342b164 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -38,11 +38,36 @@ along with GCC; see the file COPYING3.  If not see
 #define TARGET_DEFAULT_TARGET_FLAGS (MASK_BIG_ENDIAN)
 #endif
 
+typedef bool (*riscv_implied_predicator_t) (const riscv_subset_list *);
+
 /* Type for implied ISA info.  */
 struct riscv_implied_info_t
 {
+  constexpr riscv_implied_info_t (const char *ext, const char *implied_ext,
+ riscv_implied_predicator_t predicator
+ = nullptr)
+: ext (ext), implied_ext (implied_ext), predicator (predicator){};
+
+  bool match (const riscv_subset_list *subset_list, const char *ext_name) const
+  {
+if (strcmp (ext_name, ext) != 0)
+  return false;
+
+if (predicator && !predicator (subset_list))
+  return false;
+
+return true;
+  }
+
+  bool match (const riscv_subset_list *subset_list,
+ const riscv_subset_t *subset) const
+  {
+return match (subset_list, subset->name.c_str());
+  }
+
   const char *ext;
   const char *implied_ext;
+  riscv_implied_predicator_t predicator;
 };
 
 /* Implied ISA info, must end with NULL sentinel.  */
@@ -143,6 +168,11 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zcmp", "zca"},
   {"zcmt", "zca"},
   {"zcmt", "zicsr"},
+  {"zcf", "f",
+   [] (const riscv_subset_list *subset_list) -> bool
+   {
+ return subset_list->xlen () == 32 && subset_list->lookup ("f");
+   }},
 
   {"smaia", "ssaia"},
   {"smstateen", "ssstateen"},
@@ -1093,7 +1123,7 @@ riscv_subset_list::handle_implied_ext (const char *ext)
implied_info->ext;
++implied_info)
 {
-  if (strcmp (ext, implied_info->ext) != 0)
+  if (!implied_info->match (this, ext))
continue;
 
   /* Skip if implied extension already present.  */
@@ -1131,7 +1161,7 @@ riscv_subset_list::check_implied_ext ()
   for (implied_info = _implied_info[0]; implied_info->ext;
   ++implied_info)
{
- if (strcmp (itr->name.c_str(), implied_info->ext) != 0)
+ if (!implied_info->match (this, itr))
continue;
 
  if (!lookup (implied_info->implied_ext))
@@ -1160,8 +1190,7 @@ riscv_subset_list::handle_combine_ext ()
   for (implied_info = _implied_info[0]; implied_info->ext;
   ++implied_info)
{
- /* Skip if implied extension don't match combine extension */
- if (strcmp (combine_info->name, implied_info->ext) != 0)
+ if (!implied_info->match (this, combine_info->name))
continue;
 
  if (lookup (implied_info->implied_ext))
@@ -1502,13 +1531,6 @@ riscv_subset_list::parse (const char *arch, location_t 
loc)
   subset_list->handle_implied_ext (itr->name.c_str ());
 }
 
-  /* Zce only implies zcf when RV32 and 'f' extension exist.  */
-  if (subset_list->lookup ("zce") != NULL
-   && subset_list->m_xlen == 32
-   && subset_list->lookup ("f") != NULL
-   && subset_list->lookup ("zcf") == NULL)
-subset_list->add ("zcf", false);
-
   /* Make sure all implied extensions are included. */
   gcc_assert (subset_list->check_implied_ext ());
 
-- 
2.40.1



[committed] RISC-V: Refine riscv_subset_list::parse [NFC]

2023-12-03 Thread Kito Cheng
Extract the logic of checking conflict extensions to a standard alone
function, prepare to add more checking logic.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc
(riscv_subset_list::check_conflict_ext): New.
(riscv_subset_list::parse): Move checking conflict ext. to
check_conflict_ext.
* config/riscv/riscv-subset.h:
Add riscv_subset_list::check_conflict_ext.
---
 gcc/common/config/riscv/riscv-common.cc | 31 +++--
 gcc/config/riscv/riscv-subset.h |  1 +
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index ded85b4c578..de793f96fa5 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1185,6 +1185,24 @@ riscv_subset_list::handle_combine_ext ()
 }
 }
 
+void
+riscv_subset_list::check_conflict_ext ()
+{
+  if (lookup ("zcf") && m_xlen == 64)
+error_at (m_loc, "%<-march=%s%>: zcf extension supports in rv32 only",
+ m_arch);
+
+  if (lookup ("zfinx") && lookup ("f"))
+error_at (m_loc,
+ "%<-march=%s%>: z*inx conflicts with floating-point "
+ "extensions",
+ m_arch);
+
+  /* 'H' hypervisor extension requires base ISA with 32 registers.  */
+  if (lookup ("e") && lookup ("h"))
+error_at (m_loc, "%<-march=%s%>: h extension requires i extension", 
m_arch);
+}
+
 /* Parsing function for multi-letter extensions.
 
Return Value:
@@ -1495,18 +1513,7 @@ riscv_subset_list::parse (const char *arch, location_t 
loc)
   gcc_assert (subset_list->check_implied_ext ());
 
   subset_list->handle_combine_ext ();
-
-  if (subset_list->lookup ("zcf") && subset_list->m_xlen == 64)
-error_at (loc, "%<-march=%s%>: zcf extension supports in rv32 only"
- , arch);
-
-  if (subset_list->lookup ("zfinx") && subset_list->lookup ("f"))
-error_at (loc, "%<-march=%s%>: z*inx conflicts with floating-point "
-  "extensions", arch);
-
-  /* 'H' hypervisor extension requires base ISA with 32 registers.  */
-  if (subset_list->lookup ("e") && subset_list->lookup ("h"))
-error_at (loc, "%<-march=%s%>: h extension requires i extension", arch);
+  subset_list->check_conflict_ext ();
 
   return subset_list;
 
diff --git a/gcc/config/riscv/riscv-subset.h b/gcc/config/riscv/riscv-subset.h
index d2a4bd20530..ad1cab2aa24 100644
--- a/gcc/config/riscv/riscv-subset.h
+++ b/gcc/config/riscv/riscv-subset.h
@@ -79,6 +79,7 @@ private:
   void handle_implied_ext (const char *);
   bool check_implied_ext ();
   void handle_combine_ext ();
+  void check_conflict_ext ();
 
 public:
   ~riscv_subset_list ();
-- 
2.40.1



回复: [PATCH 2/7] RISC-V: Add intrinsic functions for crypto vector Zvbc extension

2023-12-03 Thread Feng Wang
2023-12-04 11:37 juzhe.zhong  wrote:


Will split again as you mentioned. Thanks.
Feng Wang

>Hi, eswin.



>



>Thanks for contributing vector crypto support.



>



>It seems patches mess up. Could you rebase your patch to the trunk GCC cleanly 
>and send it again.



>



>The patches look odd to me, for example:



>



> // ZVBB



>-DEF_VECTOR_CRYPTO_FUNCTION (vandn, zvbb, full_preds, u_vvv_ops, zvkb_or_zvbb)



>-DEF_VECTOR_CRYPTO_FUNCTION (vandn, zvbb, full_preds, u_vvx_ops, zvkb_or_zvbb)



>-DEF_VECTOR_CRYPTO_FUNCTION (vbrev, zvbb, full_preds, u_vv_ops, zvbb)



>-DEF_VECTOR_CRYPTO_FUNCTION (vbrev8, zvbb, full_preds, u_vv_ops, zvkb_or_zvbb)



>-DEF_VECTOR_CRYPTO_FUNCTION (vrev8, zvbb, full_preds, u_vv_ops, zvkb_or_zvbb)



>-DEF_VECTOR_CRYPTO_FUNCTION (vclz, zvbb, none_m_preds, u_vv_ops, zvbb)



>-DEF_VECTOR_CRYPTO_FUNCTION (vctz, zvbb, none_m_preds, u_vv_ops, zvbb)



>-DEF_VECTOR_CRYPTO_FUNCTION (vrol, zvbb, full_preds, u_vvv_ops, zvkb_or_zvbb)



>-DEF_VECTOR_CRYPTO_FUNCTION (vrol, zvbb, full_preds, u_shift_vvx_ops, 
>zvkb_or_zvbb)



>-DEF_VECTOR_CRYPTO_FUNCTION (vror, zvbb, full_preds, u_vvv_ops, zvkb_or_zvbb)



>-DEF_VECTOR_CRYPTO_FUNCTION (vror, zvbb, full_preds, u_shift_vvx_ops, 
>zvkb_or_zvbb)



>-DEF_VECTOR_CRYPTO_FUNCTION (vwsll, zvbb, full_preds, u_wvv_ops, zvbb)



>-DEF_VECTOR_CRYPTO_FUNCTION (vwsll, zvbb, full_preds, u_shift_wvx_ops, zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vandn,  zvbb_zvbc, full_preds, u_vvv_ops, 
>zvkb_or_zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vandn,  zvbb_zvbc, full_preds, u_vvx_ops, 
>zvkb_or_zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vbrev,  zvbb_zvbc, full_preds, u_vv_ops,  zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vbrev8, zvbb_zvbc, full_preds, u_vv_ops,  
>zvkb_or_zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vrev8,  zvbb_zvbc, full_preds, u_vv_ops,  
>zvkb_or_zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vclz,   zvbb_zvbc, none_m_preds, u_vv_ops, zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vctz,   zvbb_zvbc, none_m_preds, u_vv_ops, zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vrol,   zvbb_zvbc, full_preds, u_vvv_ops, 
>zvkb_or_zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vrol,   zvbb_zvbc, full_preds, u_shift_vvx_ops, 
>zvkb_or_zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vror,   zvbb_zvbc, full_preds, u_vvv_ops, 
>zvkb_or_zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vror,   zvbb_zvbc, full_preds, u_shift_vvx_ops, 
>zvkb_or_zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vwsll,  zvbb_zvbc, full_preds, u_wvv_ops, zvbb)



>+DEF_VECTOR_CRYPTO_FUNCTION (vwsll,  zvbb_zvbc, full_preds, u_shift_wvx_ops, 
>zvbb)



>



>Seems you mess up your local development which is not easy to review.



>



>I would expecting patches as follows:



>



>1. Add crypto march support (riscv-common.cc)



>2. Add crypto machine descriptions (vector-cryptio.md)



>3. Add crypto builtin.



>4. Add testcases.



>



>Thanks.



>



>



>



>juzhe.zh...@rivai.ai




[PATCH] Don't vectorize when vector stmts are only vec_contruct and stores

2023-12-03 Thread liuhongt
.i.e. for below cases.
   a[0] = b1;
   a[1] = b2;
   ..
   a[n] = bn;

There're extra dependences when contructing the vector, but not for
scalar store. According to experiments, it's generally worse.

The patch adds an cut-off heuristic when vec_stmt is just
vec_construct and vector store. It improves SPEC2017 a little bit.

BenchMarks  Ratio
500.perlbench_r 2.60%
502.gcc_r   0.30%
505.mcf_r   0.40%
520.omnetpp_r   -1.00%
523.xalancbmk_r 0.90%
525.x264_r  0.00%
531.deepsjeng_r 0.30%
541.leela_r 0.90%
548.exchange2_r 3.20%
557.xz_r1.40%
503.bwaves_r0.00%
507.cactuBSSN_r 0.00%
508.namd_r  0.30%
510.parest_r0.00%
511.povray_r0.20%
519.lbm_r   SAME BIN
521.wrf_r   -0.30%
526.blender_r   -1.20%
527.cam4_r  -0.20%
538.imagick_r   4.00%
544.nab_r   0.40%
549.fotonik3d_r 0.00%
554.roms_r  0.00%
Geomean-int 0.90%
Geomean-fp  0.30%
Geomean-all 0.50%

And
Regressed testcases:

gcc.target/i386/part-vect-absneghf.c
gcc.target/i386/part-vect-copysignhf.c
gcc.target/i386/part-vect-xorsignhf.c

Regressed under -m32 since it generates 2 vector
.ABS/NEG/XORSIGN/COPYSIGN vs original 1 64-bit vec_construct. The
original testcases are used to test vectorization capability for
.ABS/NEG/XORG/COPYSIGN, so just restrict testcase to TARGET_64BIT.

gcc.target/i386/pr111023-2.c
gcc.target/i386/pr111023.c
Regressed under -m32

testcase as below

void
v8hi_v8qi (v8hi *dst, v16qi src)
{
  short tem[8];
  tem[0] = src[0];
  tem[1] = src[1];
  tem[2] = src[2];
  tem[3] = src[3];
  tem[4] = src[4];
  tem[5] = src[5];
  tem[6] = src[6];
  tem[7] = src[7];
  dst[0] = *(v8hi *) tem;
}

under 64-bit target, vectorizer realize it's just permutation of
original src vector, but under -m32, vectorizer relies on
vec_construct for vectorization. I think optimziation for this case
under 32-bit target maynot impact much, so just add
-fno-vect-cost-model.

gcc.target/i386/pr91446.c: This testcase is guard for cost model of
vector store, not vectorization capability, so just adjust testcase.

gcc.target/i386/pr108938-3.c: This testcase relies on vec_construct to
optimize for bswap, like other optimziation vectorizer can't realize
optimization after it. So the current solution is add
-fno-vect-cost-model to the testcase.

costmodel-pr104582-1.c
costmodel-pr104582-2.c
costmodel-pr104582-4.c

Failed since it's now not vectorized, looked at the PR, it's exactly
what's wanted, so adjust testcase to scan-tree-dump-not.


Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/99881
PR target/104582
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
Check if kind is vec_construct or vector store.
(ix86_vector_costs::finish_cost): Don't do vectorization when
vector stmts are only vec_construct and stores.
(ix86_vector_costs::ix86_vect_construct_store_only_p): New
function.
(ix86_vector_costs::ix86_vect_cut_off): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-absneghf.c: Restrict testcase to
TARGET_64BIT.
* gcc.target/i386/part-vect-copysignhf.c: Ditto.
* gcc.target/i386/part-vect-xorsignhf.c: Ditto.
* gcc.target/i386/pr91446.c: Adjust testcase.
* gcc.target/i386/pr108938-3.c: Add -fno-vect-cost-model.
* gcc.target/i386/pr111023-2.c: Ditto.
* gcc.target/i386/pr111023.c: Ditto.
* gcc.target/i386/pr99881.c: Remove xfail.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-1.c: Changed
to Scan-tree-dump-not.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-3.c: Ditto.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr104582-4.c: Ditto.
---
 gcc/config/i386/i386.cc   | 81 ++-
 .../costmodel/x86_64/costmodel-pr104582-1.c   |  2 +-
 .../costmodel/x86_64/costmodel-pr104582-3.c   |  2 +-
 .../costmodel/x86_64/costmodel-pr104582-4.c   |  2 +-
 .../gcc.target/i386/part-vect-absneghf.c  |  4 +-
 .../gcc.target/i386/part-vect-copysignhf.c|  4 +-
 .../gcc.target/i386/part-vect-xorsignhf.c |  4 +-
 gcc/testsuite/gcc.target/i386/pr108938-3.c|  2 +-
 gcc/testsuite/gcc.target/i386/pr111023-2.c|  2 +-
 gcc/testsuite/gcc.target/i386/pr111023.c  |  2 +-
 gcc/testsuite/gcc.target/i386/pr91446.c   | 14 ++--
 gcc/testsuite/gcc.target/i386/pr99881.c   |  2 +-
 12 files changed, 99 insertions(+), 22 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index dcaea6c2096..a4b23e29eba 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -24573,6 +24573,10 @@ public:
 
 private:
 
+  /* Don't do vectorization for certain patterns.  */
+  void ix86_vect_cut_off ();
+

[PATCH] RISC-V: Fix overlap group incorrect overlap on v0

2023-12-03 Thread Juzhe-Zhong
In serious high register pressure case (appended in this patch):

We see vluxei8.v   v0,(s1),v1,v0.t which is not allowed.
Since according to RVV ISA:

+;; The destination vector register group for a masked vector instruction 
cannot overlap the source mask register (v0),
+;; unless the destination vector register is being written with a mask value 
(e.g., compares) or the scalar result of a reduction.

Such case doesn't have spillings, however, we expect such case should be 
spilled and reload data.

The rootcause is I made a mistake in previous patch on matching dest operand 
and mask operand constraints:

dest: "=vr"
mask: "vmWc1"

After this patch:

dest: "vd,vr"
mask: "vm,Wc1"

make EEW widening pattern are same as other instruction patterns.

PR target/112431

gcc/ChangeLog:

* config/riscv/vector-iterators.md: New attributes.
* config/riscv/vector.md: Fix incorrect overlap.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr112431-34.c: New test.

---
 gcc/config/riscv/vector-iterators.md  | 1077 +
 gcc/config/riscv/vector.md|  268 ++--
 .../gcc.target/riscv/rvv/base/pr112431-34.c   |  101 ++
 3 files changed, 1312 insertions(+), 134 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/pr112431-34.c

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 56080ed1f5f..f97f33f98ee 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -3916,3 +3916,1080 @@
   (V1024BI "riscv_vector::vls_mode_valid_p (V1024BImode) && TARGET_MIN_VLEN >= 
1024")
   (V2048BI "riscv_vector::vls_mode_valid_p (V2048BImode) && TARGET_MIN_VLEN >= 
2048")
   (V4096BI "riscv_vector::vls_mode_valid_p (V4096BImode) && TARGET_MIN_VLEN >= 
4096")])
+
+;; The following attributes are used by EEW widening instructions.
+;; Since according to RVV ISA:
+;; The destination vector register group for a masked vector instruction 
cannot overlap the source mask register (v0),
+;; unless the destination vector register is being written with a mask value 
(e.g., compares) or the scalar result of a reduction. 
+;; We don't allow v v0,...v0.t happens for widening instructions.
+ 
+(define_mode_attr widen_eew_dest_constraint [
+  (RVVM8QI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM4QI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM2QI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM1QI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVMF2QI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVMF4QI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVMF8QI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM8HI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM4HI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM2HI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM1HI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVMF2HI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVMF4HI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM8HF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM4HF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM2HF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM1HF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVMF2HF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVMF4HF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM8SI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM4SI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM2SI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM1SI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVMF2SI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM8SF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM4SF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM2SF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM1SF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVMF2SF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM8DI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM4DI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM2DI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM1DI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM8DF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM4DF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM2DF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (RVVM1DF "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (V1QI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, ?, ?")
+  (V2QI "=vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, vd, vr, 

Re: [PATCH] c++: #pragma GCC unroll C++ fixes [PR112795]

2023-12-03 Thread Jason Merrill

On 12/2/23 05:51, Jakub Jelinek wrote:

Hi!

foo in the unroll-5.C testcase ICEs because cp_parser_pragma_unroll
during parsing calls maybe_constant_value unconditionally, which is
fine if !processing_template_decl, but can ICE otherwise.

While just calling fold_non_dependent_expr there instead could be enough
to fix the ICE (and I guess the right thing to do for backports if any),
I don't see a reason why we couldn't handle a dependent #pragma GCC unroll
argument as well, the unrolling isn't done in the FE and all the middle-end
cares about is that ANNOTATE_EXPR has a 1..65534 last operand when it is
annot_expr_unroll_kind.

So, the following patch changes all the unsigned short unroll arguments
to tree unroll (and thus avoids the tree -> unsigned short -> tree
conversions), does the type and value checking during parsing only if
the argument isn't dependent and repeats it during instantiation.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2023-12-02  Jakub Jelinek  

PR c++/112795
gcc/cp/
* cp-tree.h (cp_convert_range_for): Change UNROLL type from
unsigned short to tree.
(finish_while_stmt_cond, finish_do_stmt, finish_for_cond): Likewise.
* parser.cc (cp_parser_statement): Pass NULL_TREE rather than 0 to
cp_parser_iteration_statement UNROLL argument.
(cp_parser_for, cp_parser_c_for): Change UNROLL type from
unsigned short to tree.
(cp_parser_range_for): Likewise.  Set RANGE_FOR_UNROLL to just UNROLL
rather than build_int_cst from it.
(cp_convert_range_for, cp_parser_iteration_statement): Change UNROLL
type from unsigned short to tree.
(cp_parser_omp_loop_nest): Pass NULL_TREE rather than 0 to
cp_parser_range_for UNROLL argument.
(cp_parser_pragma_unroll): Return tree rather than unsigned short.
If parsed expression is type dependent, just return it, don't diagnose
issues with value if it is value dependent.
(cp_parser_pragma): Change UNROLL type from unsigned short to tree.
* semantics.cc (finish_while_stmt_cond): Change UNROLL type from
unsigned short to tree.  Build ANNOTATE_EXPR with UNROLL as its last
operand rather than build_int_cst from it.
(finish_do_stmt, finish_for_cond): Likewise.
* pt.cc (tsubst_stmt) : Change UNROLL type from
unsigned short to tree and set it to RECUR on RANGE_FOR_UNROLL (t).
(tsubst_expr) : For annot_expr_unroll_kind repeat
checks on UNROLL value from cp_parser_pragma_unroll.
gcc/testsuite/
* g++.dg/ext/unroll-5.C: New test.
* g++.dg/ext/unroll-6.C: New test.

--- gcc/cp/cp-tree.h.jj 2023-12-01 08:10:42.707324577 +0100
+++ gcc/cp/cp-tree.h2023-12-01 16:08:20.152165244 +0100
@@ -7371,7 +7371,7 @@ extern bool maybe_clone_body  (tree);
  
  /* In parser.cc */

  extern tree cp_convert_range_for (tree, tree, tree, cp_decomp *, bool,
- unsigned short, bool);
+ tree, bool);
  extern void cp_convert_omp_range_for (tree &, tree &, tree &,
  tree &, tree &, tree &, tree &, tree &);
  extern void cp_finish_omp_range_for (tree, tree);
@@ -7692,19 +7692,16 @@ extern void begin_else_clause   (tree);
  extern void finish_else_clause(tree);
  extern void finish_if_stmt(tree);
  extern tree begin_while_stmt  (void);
-extern void finish_while_stmt_cond (tree, tree, bool, unsigned short,
-bool);
+extern void finish_while_stmt_cond (tree, tree, bool, tree, bool);
  extern void finish_while_stmt (tree);
  extern tree begin_do_stmt (void);
  extern void finish_do_body(tree);
-extern void finish_do_stmt (tree, tree, bool, unsigned short,
-bool);
+extern void finish_do_stmt (tree, tree, bool, tree, bool);
  extern tree finish_return_stmt(tree);
  extern tree begin_for_scope   (tree *);
  extern tree begin_for_stmt(tree, tree);
  extern void finish_init_stmt  (tree);
-extern void finish_for_cond(tree, tree, bool, unsigned short,
-bool);
+extern void finish_for_cond(tree, tree, bool, tree, bool);
  extern void finish_for_expr   (tree, tree);
  extern void finish_for_stmt   (tree);
  extern tree begin_range_for_stmt  (tree, tree);
--- gcc/cp/parser.cc.jj 2023-12-01 08:10:42.800323262 +0100
+++ gcc/cp/parser.cc2023-12-02 08:52:45.254387503 +0100
@@ -2391,15 +2391,15 @@ static tree cp_parser_selection_statemen
  static tree cp_parser_condition
(cp_parser *);
  static tree 

Re: [PATCH] c++: decltype of (non-captured variable) [PR83167]

2023-12-03 Thread Jason Merrill

On 12/1/23 17:42, Patrick Palka wrote:

On Fri, 1 Dec 2023, Jason Merrill wrote:


On 12/1/23 12:32, Patrick Palka wrote:

On Tue, 14 Nov 2023, Jason Merrill wrote:


On 11/14/23 11:10, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

-- >8 --

For decltype((x)) within a lambda where x is not captured, we dubiously
require that the lambda has a capture default, unlike for decltype(x).
This patch fixes this inconsistency; I couldn't find a justification for
it in the standard.


The relevant passage seems to be

https://eel.is/c++draft/expr.prim#id.unqual-3

"If naming the entity from outside of an unevaluated operand within S
would
refer to an entity captured by copy in some intervening lambda-expression,
then let E be the innermost such lambda-expression.

If there is such a lambda-expression and if P is in E's function parameter
scope but not its parameter-declaration-clause, then the type of the
expression is the type of a class member access expression ([expr.ref])
naming
the non-static data member that would be declared for such a capture in
the
object parameter ([dcl.fct]) of the function call operator of E."

In this case I guess there is no such lambda-expression because naming x
won't
refer to a capture by copy if the lambda doesn't capture anything, so we
ignore the lambda.

Maybe refer to that in a comment?  OK with that change.

I'm surprised that it refers specifically to capture by copy, but I guess
a
capture by reference should have the same decltype as the captured
variable?


Ah, seems like it.  So maybe we should get rid of the redundant
by-reference capture-default handling, to more closely mirror the
standard?

Also now that r14-6026-g73e2bdbf9bed48 made capture_decltype return
NULL_TREE to mean the capture is dependent, it seems we should just
inline capture_decltype into finish_decltype_type rather than
introducing another special return value to mean "fall back to ordinary
handling".

How does the following look?  Bootstrapped and regtested on
x86_64-pc-linux-gnu.

-- >8 --

PR c++/83167

gcc/cp/ChangeLog:

* semantics.cc (capture_decltype): Inline into its only caller ...
(finish_decltype_type): ... here.  Update nearby comment to refer
to recent standard.  Restrict uncaptured variable handling to just
lambdas with a by-copy capture-default.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-decltype4.C: New test.
---
   gcc/cp/semantics.cc   | 107 +++---
   .../g++.dg/cpp0x/lambda/lambda-decltype4.C|  15 +++
   2 files changed, 55 insertions(+), 67 deletions(-)
   create mode 100644 gcc/testsuite/g++.dg/cpp0x/lambda/lambda-decltype4.C

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index fbbc18336a0..fb4c3992e34 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -53,7 +53,6 @@ along with GCC; see the file COPYING3.  If not see
 static tree maybe_convert_cond (tree);
   static tree finalize_nrv_r (tree *, int *, void *);
-static tree capture_decltype (tree);
 /* Used for OpenMP non-static data member privatization.  */
   @@ -11856,21 +11855,48 @@ finish_decltype_type (tree expr, bool
id_expression_or_member_access_p,
   }
 else
   {
-  /* Within a lambda-expression:
-
-Every occurrence of decltype((x)) where x is a possibly
-parenthesized id-expression that names an entity of
-automatic storage duration is treated as if x were
-transformed into an access to a corresponding data member
-of the closure type that would have been declared if x
-were a use of the denoted entity.  */
 if (outer_automatic_var_p (STRIP_REFERENCE_REF (expr))
  && current_function_decl
  && LAMBDA_FUNCTION_P (current_function_decl))
{
- type = capture_decltype (STRIP_REFERENCE_REF (expr));
- if (!type)
-   goto dependent;
+ /* [expr.prim.id.unqual]/3: If naming the entity from outside of an
+unevaluated operand within S would refer to an entity captured by
+copy in some intervening lambda-expression, then let E be the
+innermost such lambda-expression.
+
+If there is such a lambda-expression and if P is in E's function
+parameter scope but not its parameter-declaration-clause, then
the
+type of the expression is the type of a class member access
+expression naming the non-static data member that would be
declared
+for such a capture in the object parameter of the function call
+operator of E."  */


Hmm, looks like this code is only checking the innermost lambda, it needs to
check all containing lambdas for one that would capture it by copy.


Unfortunately this seems to be a can of worms, since IIUC we also have
to check that there's no non-default-capture lambda in the stack as
well, e.g.

   int 

Re: [PATCH] RISC-V: Document optimization parameter riscv-strcmp-inline-limit

2023-12-03 Thread Kito Cheng
Wait, I got this on my machine?

../../../../riscv-gnu-toolchain-trunk/gcc/gcc/doc/invoke.texi:29774: misplaced }
../../../../riscv-gnu-toolchain-trunk/gcc/gcc/doc/invoke.texi:29786: misplaced }


On Mon, Dec 4, 2023 at 10:43 AM Kito Cheng  wrote:
>
> LGTM
>
> On Sun, Dec 3, 2023 at 5:16 AM Christoph Müllner 
>  wrote:
>>
>> This patch documents the optimization parameter
>> riscv-strcmp-inline-limit, which can be used to tweak the behaviour
>> of -minline-strcmp and -minline-strncmp.
>>
>> gcc/ChangeLog:
>>
>> PR target/112650
>> * doc/invoke.texi: Document riscv-strcmp-inline-limit.
>>
>> Signed-off-by: Christoph Müllner 
>> ---
>>  gcc/doc/invoke.texi | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 2fab4c5d71f..ba2d843b484 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -29846,6 +29846,10 @@ Inlining will only be done if the strings are 
>> properly aligned
>>  and instructions for accelerated processing are available.
>>  The default is to not inline strcmp calls.
>>
>> +The @option{--param riscv-strcmp-inline-limit=@{n}} parameter controls
>> +the maximum number of bytes compared by the inlined code.
>> +The default value is 64.
>> +
>>  @opindex minline-strncmp
>>  @item -minline-strncmp
>>  @itemx -mno-inline-strncmp
>> @@ -29854,6 +29858,10 @@ Inlining will only be done if the strings are 
>> properly aligned
>>  and instructions for accelerated processing are available.
>>  The default is to not inline strncmp calls.
>>
>> +The @option{--param riscv-strcmp-inline-limit=@{n}} parameter controls
>> +the maximum number of bytes compared by the inlined code.
>> +The default value is 64.
>> +
>>  @opindex mshorten-memrefs
>>  @item -mshorten-memrefs
>>  @itemx -mno-shorten-memrefs
>> --
>> 2.41.0
>>


[PATCH 2/7] RISC-V: Add intrinsic functions for crypto vector Zvbc extension

2023-12-03 Thread juzhe.zh...@rivai.ai
Hi, eswin.

Thanks for contributing vector crypto support.

It seems patches mess up. Could you rebase your patch to the trunk GCC cleanly 
and send it again.

The patches look odd to me, for example:

 // ZVBB
-DEF_VECTOR_CRYPTO_FUNCTION (vandn, zvbb, full_preds, u_vvv_ops, zvkb_or_zvbb)
-DEF_VECTOR_CRYPTO_FUNCTION (vandn, zvbb, full_preds, u_vvx_ops, zvkb_or_zvbb)
-DEF_VECTOR_CRYPTO_FUNCTION (vbrev, zvbb, full_preds, u_vv_ops, zvbb)
-DEF_VECTOR_CRYPTO_FUNCTION (vbrev8, zvbb, full_preds, u_vv_ops, zvkb_or_zvbb)
-DEF_VECTOR_CRYPTO_FUNCTION (vrev8, zvbb, full_preds, u_vv_ops, zvkb_or_zvbb)
-DEF_VECTOR_CRYPTO_FUNCTION (vclz, zvbb, none_m_preds, u_vv_ops, zvbb)
-DEF_VECTOR_CRYPTO_FUNCTION (vctz, zvbb, none_m_preds, u_vv_ops, zvbb)
-DEF_VECTOR_CRYPTO_FUNCTION (vrol, zvbb, full_preds, u_vvv_ops, zvkb_or_zvbb)
-DEF_VECTOR_CRYPTO_FUNCTION (vrol, zvbb, full_preds, u_shift_vvx_ops, 
zvkb_or_zvbb)
-DEF_VECTOR_CRYPTO_FUNCTION (vror, zvbb, full_preds, u_vvv_ops, zvkb_or_zvbb)
-DEF_VECTOR_CRYPTO_FUNCTION (vror, zvbb, full_preds, u_shift_vvx_ops, 
zvkb_or_zvbb)
-DEF_VECTOR_CRYPTO_FUNCTION (vwsll, zvbb, full_preds, u_wvv_ops, zvbb)
-DEF_VECTOR_CRYPTO_FUNCTION (vwsll, zvbb, full_preds, u_shift_wvx_ops, zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vandn,  zvbb_zvbc, full_preds, u_vvv_ops, 
zvkb_or_zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vandn,  zvbb_zvbc, full_preds, u_vvx_ops, 
zvkb_or_zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vbrev,  zvbb_zvbc, full_preds, u_vv_ops,  zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vbrev8, zvbb_zvbc, full_preds, u_vv_ops,  
zvkb_or_zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vrev8,  zvbb_zvbc, full_preds, u_vv_ops,  
zvkb_or_zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vclz,   zvbb_zvbc, none_m_preds, u_vv_ops, zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vctz,   zvbb_zvbc, none_m_preds, u_vv_ops, zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vrol,   zvbb_zvbc, full_preds, u_vvv_ops, 
zvkb_or_zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vrol,   zvbb_zvbc, full_preds, u_shift_vvx_ops, 
zvkb_or_zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vror,   zvbb_zvbc, full_preds, u_vvv_ops, 
zvkb_or_zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vror,   zvbb_zvbc, full_preds, u_shift_vvx_ops, 
zvkb_or_zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vwsll,  zvbb_zvbc, full_preds, u_wvv_ops, zvbb)
+DEF_VECTOR_CRYPTO_FUNCTION (vwsll,  zvbb_zvbc, full_preds, u_shift_wvx_ops, 
zvbb)

Seems you mess up your local development which is not easy to review.

I would expecting patches as follows:

1. Add crypto march support (riscv-common.cc)
2. Add crypto machine descriptions (vector-cryptio.md)
3. Add crypto builtin.
4. Add testcases.

Thanks.



juzhe.zh...@rivai.ai


[PATCH 7/7] RISC-V: Add intrinsic functions for crypto vector Zvksh extension

2023-12-03 Thread Feng Wang
This patch add the intrinsic functions(according to https://github.com/
riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/
vector-crypto/intrinsic_funcs.md) for crypto vector Zvksh extension. And all
the test cases are added for api-testing.

Co-Authored by: Songhe Zhu 

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add Zvksh in riscv_implied_info.
* config/riscv/riscv-vector-builtins-bases.cc (class vaeskf2): Add new 
function_base for Zvksh.
(class vaeskf2_vsm3c): Ditto.
(class vsm3me): Ditto.
(BASE): Add Zvksh BASE declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct zvbb_zvbc_def): 
Add function_builder for Zvksh.
(struct crypto_vv_def): Ditto.
* config/riscv/riscv-vector-crypto-builtins-avail.h (AVAIL): Add enable 
condition.
* config/riscv/riscv-vector-crypto-builtins-functions.def (vsm4r): Add 
intrinsc def.
(vsm3me): Ditto.
(vsm3c): Ditto.
* config/riscv/riscv.md: Add Zvksh ins name.
* config/riscv/vector-crypto.md (sm3c): Add Zvksh md patterns.
(@pred_vaeskf2_scalar): Ditto.
(@pred_vi_nomaskedoff_scalar): Ditto.
(@pred_vsm3me): Ditto.
* config/riscv/vector.md: Add the corresponding attribute for Zvksh.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvk/zvk.exp:
* gcc.target/riscv/zvk/zvksh/vsm3c.c: New test.
* gcc.target/riscv/zvk/zvksh/vsm3c_overloaded.c: New test.
* gcc.target/riscv/zvk/zvksh/vsm3me.c: New test.
* gcc.target/riscv/zvk/zvksh/vsm3me_overloaded.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   |  1 +
 .../riscv/riscv-vector-builtins-bases.cc  | 26 --
 .../riscv/riscv-vector-builtins-bases.h   |  2 +
 .../riscv/riscv-vector-builtins-shapes.cc | 10 ++--
 .../riscv-vector-crypto-builtins-avail.h  |  1 +
 ...riscv-vector-crypto-builtins-functions.def |  5 +-
 gcc/config/riscv/riscv.md |  4 +-
 gcc/config/riscv/vector-crypto.md | 43 +---
 gcc/config/riscv/vector.md| 12 ++---
 gcc/testsuite/gcc.target/riscv/zvk/zvk.exp|  2 +
 .../gcc.target/riscv/zvk/zvksh/vsm3c.c| 51 +++
 .../riscv/zvk/zvksh/vsm3c_overloaded.c| 51 +++
 .../gcc.target/riscv/zvk/zvksh/vsm3me.c   | 51 +++
 .../riscv/zvk/zvksh/vsm3me_overloaded.c   | 51 +++
 14 files changed, 286 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvksh/vsm3c.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvksh/vsm3c_overloaded.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvksh/vsm3me.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvksh/vsm3me_overloaded.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 87595b135ef..dbb42ca2f1e 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -128,6 +128,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zvknha",   "v"},
   {"zvknhb",   "v"},
   {"zvksed",   "v"},
+  {"zvksh","v"},
 
   {"zfh", "zfhmin"},
   {"zfhmin", "f"},
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 83309f07661..07a9dc49104 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2302,8 +2302,9 @@ public:
   }
 };
 
-/* Implements vaeskf2. */
-class vaeskf2 : public function_base
+/* Implements vaeskf2/vsm3c. */
+template
+class vaeskf2_vsm3c : public function_base
 {
 public:
   bool apply_mask_policy_p () const override { return false; }
@@ -2312,7 +2313,20 @@ public:
 
   rtx expand (function_expander ) const override
   {
-return e.use_exact_insn (code_for_pred_vaeskf2_scalar (e.vector_mode ()));
+return e.use_exact_insn (code_for_pred_vi_nomaskedoff_scalar (UNSPEC, 
e.vector_mode ()));
+  }
+};
+
+/* Implements vsm3me. */
+class vsm3me : public function_base
+{
+public:
+  bool apply_mask_policy_p () const override { return false; }
+  bool use_mask_predication_p () const override { return false; }
+
+  rtx expand (function_expander ) const override
+  {
+return e.use_exact_insn (code_for_pred_vsm3me (e.vector_mode ()));
   }
 };
 
@@ -2593,12 +2607,14 @@ static CONSTEXPR const crypto_vv  
vaesdf_obj;
 static CONSTEXPR const crypto_vv  vaesdm_obj;
 static CONSTEXPR const crypto_vv   vaesz_obj;
 static CONSTEXPR const crypto_vi vaeskf1_obj;
-static CONSTEXPR const vaeskf2 vaeskf2_obj;
+static CONSTEXPR const vaeskf2_vsm3c vaeskf2_obj;
 static CONSTEXPR const vg_nhab   vsha2ms_obj;
 static CONSTEXPR const vg_nhab   vsha2ch_obj;
 static CONSTEXPR const vg_nhab   vsha2cl_obj;
 static CONSTEXPR const crypto_vi   vsm4k_obj;
 static 

[PATCH 5/7] RISC-V: Add intrinsic functions for crypto vector Zvknh[ab] extension

2023-12-03 Thread Feng Wang
This patch add the intrinsic functions(according to https://github.com/
riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/
vector-crypto/intrinsic_funcs.md) for crypto vector Zvknh[ab] extension. And all
the test cases are added for api-testing.

Co-Authored by: Songhe Zhu 

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add Zvknh[ab] in 
riscv_implied_info.
* config/riscv/riscv-vector-builtins-bases.cc (class vghsh): Add new 
function_base for Zvknh[ab].
(class vg_nhab): Ditto.
(BASE): Add Zvknh[ab] BASE declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def): 
Add function_builder for Zvknh[ab].
* config/riscv/riscv-vector-builtins.cc: Define new data struct for 
Zvknh[ab].
* config/riscv/riscv-vector-crypto-builtins-avail.h (AVAIL): Add enable 
condition.
* config/riscv/riscv-vector-crypto-builtins-functions.def (vaeskf2): 
Add intrinsc def.
(vsha2ms): Ditto.
(vsha2ch): Ditto.
(vsha2cl): Ditto.
* config/riscv/riscv.md: Add Zvknh[ab] ins name.
* config/riscv/vector-crypto.md (sha2ms): Add Zvknh[ab] md patterns.
(@pred_vghsh): Ditto.
(@pred_v): Dito.
(@pred_vgmul): Ditto
* config/riscv/vector.md: Add the corresponding attribute for Zvknh[ab].

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvk/zvk.exp:
* gcc.target/riscv/zvk/zvknha/vsha2ch.c: New test.
* gcc.target/riscv/zvk/zvknha/vsha2ch_overloaded.c: New test.
* gcc.target/riscv/zvk/zvknha/vsha2cl.c: New test.
* gcc.target/riscv/zvk/zvknha/vsha2cl_overloaded.c: New test.
* gcc.target/riscv/zvk/zvknha/vsha2ms.c: New test.
* gcc.target/riscv/zvk/zvknha/vsha2ms_overloaded.c: New test.
* gcc.target/riscv/zvk/zvknhb/vsha2ch.c: New test.
* gcc.target/riscv/zvk/zvknhb/vsha2ch_overloaded.c: New test.
* gcc.target/riscv/zvk/zvknhb/vsha2cl.c: New test.
* gcc.target/riscv/zvk/zvknhb/vsha2cl_overloaded.c: New test.
* gcc.target/riscv/zvk/zvknhb/vsha2ms.c: New test.
* gcc.target/riscv/zvk/zvknhb/vsha2ms_overloaded.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   |  2 +
 .../riscv/riscv-vector-builtins-bases.cc  | 15 +++-
 .../riscv/riscv-vector-builtins-bases.h   |  3 +
 .../riscv/riscv-vector-builtins-shapes.cc |  7 +-
 gcc/config/riscv/riscv-vector-builtins.cc |  6 ++
 .../riscv-vector-crypto-builtins-avail.h  |  2 +
 ...riscv-vector-crypto-builtins-functions.def | 10 ++-
 gcc/config/riscv/riscv.md | 27 +++---
 gcc/config/riscv/vector-crypto.md | 55 +---
 gcc/config/riscv/vector.md| 12 +--
 gcc/testsuite/gcc.target/riscv/zvk/zvk.exp|  4 +
 .../gcc.target/riscv/zvk/zvknha/vsha2ch.c | 51 
 .../riscv/zvk/zvknha/vsha2ch_overloaded.c | 51 
 .../gcc.target/riscv/zvk/zvknha/vsha2cl.c | 51 
 .../riscv/zvk/zvknha/vsha2cl_overloaded.c | 51 
 .../gcc.target/riscv/zvk/zvknha/vsha2ms.c | 51 
 .../riscv/zvk/zvknha/vsha2ms_overloaded.c | 51 
 .../gcc.target/riscv/zvk/zvknhb/vsha2ch.c | 83 +++
 .../riscv/zvk/zvknhb/vsha2ch_overloaded.c | 83 +++
 .../gcc.target/riscv/zvk/zvknhb/vsha2cl.c | 83 +++
 .../riscv/zvk/zvknhb/vsha2cl_overloaded.c | 83 +++
 .../gcc.target/riscv/zvk/zvknhb/vsha2ms.c | 83 +++
 .../riscv/zvk/zvknhb/vsha2ms_overloaded.c | 83 +++
 23 files changed, 892 insertions(+), 55 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvknha/vsha2ch.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zvk/zvknha/vsha2ch_overloaded.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvknha/vsha2cl.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zvk/zvknha/vsha2cl_overloaded.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvknha/vsha2ms.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zvk/zvknha/vsha2ms_overloaded.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvknhb/vsha2ch.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zvk/zvknhb/vsha2ch_overloaded.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvknhb/vsha2cl.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zvk/zvknhb/vsha2cl_overloaded.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvknhb/vsha2ms.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/zvk/zvknhb/vsha2ms_overloaded.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 60a174d4801..7201ac3866c 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -125,6 +125,8 @@ static const riscv_implied_info_t riscv_implied_info[] =
   

[PATCH 6/7] RISC-V: Add intrinsic functions for crypto vector Zvksed extension.

2023-12-03 Thread Feng Wang
This patch add the intrinsic functions(according to https://github.com/
riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/
vector-crypto/intrinsic_funcs.md) for crypto vector Zvksed extension. And all
the test cases are added for api-testing.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add Zvksed in riscv_implied_info.
* config/riscv/riscv-vector-builtins-bases.cc (class vaeskf1): Add new 
function_base for Zvksed.
(class crypto_vi): Ditto.
(BASE): Add Zvksed BASE declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def): 
Add function_builder for Zvksed.
* config/riscv/riscv-vector-crypto-builtins-avail.h (AVAIL): Add enable 
condition.
* config/riscv/riscv-vector-crypto-builtins-functions.def (vsha2cl): 
Add intrinsc def.
(vsm4k): Ditto.
(vsm4r): Ditto.
* config/riscv/riscv.md: Add Zvksed ins name.
* config/riscv/vector-crypto.md (sm4k): Add Zvksed md patterns.
(@pred_vaeskf1_scalar):Ditto.
(@pred_crypto_vi_scalar): Ditto.
* config/riscv/vector.md: Add the corresponding attribute for Zvksed.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvk/zvk.exp:
* gcc.target/riscv/zvk/zvksed/vsm4k.c: New test.
* gcc.target/riscv/zvk/zvksed/vsm4k_overloaded.c: New test.
* gcc.target/riscv/zvk/zvksed/vsm4r.c: New test.
* gcc.target/riscv/zvk/zvksed/vsm4r_overloaded.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   |   1 +
 .../riscv/riscv-vector-builtins-bases.cc  |  13 +-
 .../riscv/riscv-vector-builtins-bases.h   |   2 +
 .../riscv/riscv-vector-builtins-shapes.cc |   2 +-
 .../riscv-vector-crypto-builtins-avail.h  |   1 +
 ...riscv-vector-crypto-builtins-functions.def |  10 +-
 gcc/config/riscv/riscv.md |   5 +-
 gcc/config/riscv/vector-crypto.md |  40 +++--
 gcc/config/riscv/vector.md|  20 ++-
 gcc/testsuite/gcc.target/riscv/zvk/zvk.exp|   3 +-
 .../gcc.target/riscv/zvk/zvksed/vsm4k.c   |  50 ++
 .../riscv/zvk/zvksed/vsm4k_overloaded.c   |  50 ++
 .../gcc.target/riscv/zvk/zvksed/vsm4r.c   | 170 ++
 .../riscv/zvk/zvksed/vsm4r_overloaded.c   | 170 ++
 14 files changed, 505 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvksed/vsm4k.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvksed/vsm4k_overloaded.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvksed/vsm4r.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvksed/vsm4r_overloaded.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 7201ac3866c..87595b135ef 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -127,6 +127,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zvkned",   "v"},
   {"zvknha",   "v"},
   {"zvknhb",   "v"},
+  {"zvksed",   "v"},
 
   {"zfh", "zfhmin"},
   {"zfhmin", "f"},
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index a3670ec5b38..83309f07661 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2288,8 +2288,9 @@ public:
   }
 };
 
-/* Implements vaeskf1. */
-class vaeskf1 : public function_base
+/* Implements vaeskf1/vsm4k. */
+template
+class crypto_vi : public function_base
 {
 public:
   bool apply_mask_policy_p () const override { return false; }
@@ -2297,7 +2298,7 @@ public:
 
   rtx expand (function_expander ) const override
   {
-return e.use_exact_insn (code_for_pred_vaeskf1_scalar (e.vector_mode ()));
+return e.use_exact_insn (code_for_pred_crypto_vi_scalar (UNSPEC, 
e.vector_mode ()));
   }
 };
 
@@ -2591,11 +2592,13 @@ static CONSTEXPR const crypto_vv  
vaesem_obj;
 static CONSTEXPR const crypto_vv  vaesdf_obj;
 static CONSTEXPR const crypto_vv  vaesdm_obj;
 static CONSTEXPR const crypto_vv   vaesz_obj;
-static CONSTEXPR const vaeskf1 vaeskf1_obj;
+static CONSTEXPR const crypto_vi vaeskf1_obj;
 static CONSTEXPR const vaeskf2 vaeskf2_obj;
 static CONSTEXPR const vg_nhab   vsha2ms_obj;
 static CONSTEXPR const vg_nhab   vsha2ch_obj;
 static CONSTEXPR const vg_nhab   vsha2cl_obj;
+static CONSTEXPR const crypto_vi   vsm4k_obj;
+static CONSTEXPR const crypto_vv   vsm4r_obj;
 
 /* Declare the function base NAME, pointing it to an instance
of class _obj.  */
@@ -2882,4 +2885,6 @@ BASE (vaeskf2)
 BASE (vsha2ms)
 BASE (vsha2ch)
 BASE (vsha2cl)
+BASE (vsm4k)
+BASE (vsm4r)
 } // end namespace riscv_vector
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.h 
b/gcc/config/riscv/riscv-vector-builtins-bases.h
index 0560b0008f0..e9e6d7bfe7f 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.h
+++ 

[PATCH 4/7] RISC-V: Add intrinsic functions for crypto vector Zvkned extension

2023-12-03 Thread Feng Wang
This patch add the intrinsic functions(according to https://github.com/
riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/
vector-crypto/intrinsic_funcs.md) for crypto vector Zvkned extension. And all
the test cases are added for api-testing.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add Zvkned in riscv_implied_info.
* config/riscv/riscv-vector-builtins-bases.cc (class crypto_vv): Add 
new function_base for Zvkned.
(class vaeskf1): Ditto.
(class vgmul): Ditto.
(class vaeskf2): Ditto.
(BASE): Add Zvkned BASE declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct zvbb_zvbc_def): 
Add new function_builder for Zvkned.
(struct crypto_vi_def): Ditto. 
(SHAPE): Add Zvkned SHAPE declaration.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc 
(registered_function::overloaded_hash): Process the overloaded of size_t.
* config/riscv/riscv-vector-builtins.def (vi): Add new operator type.
* config/riscv/riscv-vector-crypto-builtins-avail.h (AVAIL): Add enable 
condition.
* config/riscv/riscv-vector-crypto-builtins-functions.def (vgmul): 
Optimize vgmul.
(vaesef): Add intrinsc def.
(vaesem): Ditto.
(vaesdf): Ditto.
(vaesdm): Ditto.
(vaesz):  Ditto.
(vaeskf1) Ditto.
(vaeskf2) Ditto.
* config/riscv/riscv.md: Add Zvkned ins name.
* config/riscv/vector-crypto.md (aesef): Add Zvkned md patterns.
(vv): Ditto.
(@pred_crypto_vv): Ditto.
(@pred_crypto_vvx1_scalar): Ditto.
(@pred_crypto_vvx2_scalar): Ditto.
(@pred_crypto_vvx4_scalar): Ditto.
(@pred_crypto_vvx8_scalar): Ditto.
(@pred_crypto_vvx16_scalar): Ditto.
(@pred_vaeskf1_scalar): Ditto.
(@pred_vaeskf2_scalar): Ditto.
* config/riscv/vector-iterators.md: Add new iterators for Zvkned.
* config/riscv/vector.md: Add the corresponding attribute for Zvkned.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvk/zvk.exp:
* gcc.target/riscv/zvk/zvkned/vaesdf.c: New test.
* gcc.target/riscv/zvk/zvkned/vaesdf_overloaded.c: New test.
* gcc.target/riscv/zvk/zvkned/vaesdm.c: New test.
* gcc.target/riscv/zvk/zvkned/vaesdm_overloaded.c: New test.
* gcc.target/riscv/zvk/zvkned/vaesef.c: New test.
* gcc.target/riscv/zvk/zvkned/vaesef_overloaded.c: New test.
* gcc.target/riscv/zvk/zvkned/vaesem.c: New test.
* gcc.target/riscv/zvk/zvkned/vaesem_overloaded.c: New test.
* gcc.target/riscv/zvk/zvkned/vaeskf1.c: New test.
* gcc.target/riscv/zvk/zvkned/vaeskf1_overloaded.c: New test.
* gcc.target/riscv/zvk/zvkned/vaeskf2.c: New test.
* gcc.target/riscv/zvk/zvkned/vaeskf2_overloaded.c: New test.
* gcc.target/riscv/zvk/zvkned/vaesz.c: New test.
* gcc.target/riscv/zvk/zvkned/vaesz_overloaded.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   |   1 +
 .../riscv/riscv-vector-builtins-bases.cc  |  80 +++-
 .../riscv/riscv-vector-builtins-bases.h   |   7 +
 .../riscv/riscv-vector-builtins-shapes.cc |  41 +++-
 .../riscv/riscv-vector-builtins-shapes.h  |   1 +
 gcc/config/riscv/riscv-vector-builtins.cc |  62 +-
 gcc/config/riscv/riscv-vector-builtins.def|   1 +
 .../riscv-vector-crypto-builtins-avail.h  |   1 +
 ...riscv-vector-crypto-builtins-functions.def |  34 +++-
 gcc/config/riscv/riscv.md |   8 +-
 gcc/config/riscv/vector-crypto.md | 184 ++
 gcc/config/riscv/vector-iterators.md  |  32 +++
 gcc/config/riscv/vector.md|  23 ++-
 gcc/testsuite/gcc.target/riscv/zvk/zvk.exp|   2 +
 .../gcc.target/riscv/zvk/zvkned/vaesdf.c  | 169 
 .../riscv/zvk/zvkned/vaesdf_overloaded.c  | 169 
 .../gcc.target/riscv/zvk/zvkned/vaesdm.c  | 170 
 .../riscv/zvk/zvkned/vaesdm_overloaded.c  | 170 
 .../gcc.target/riscv/zvk/zvkned/vaesef.c  | 170 
 .../riscv/zvk/zvkned/vaesef_overloaded.c  | 170 
 .../gcc.target/riscv/zvk/zvkned/vaesem.c  | 170 
 .../riscv/zvk/zvkned/vaesem_overloaded.c  | 170 
 .../gcc.target/riscv/zvk/zvkned/vaeskf1.c |  50 +
 .../riscv/zvk/zvkned/vaeskf1_overloaded.c |  50 +
 .../gcc.target/riscv/zvk/zvkned/vaeskf2.c |  50 +
 .../riscv/zvk/zvkned/vaeskf2_overloaded.c |  50 +
 .../gcc.target/riscv/zvk/zvkned/vaesz.c   | 130 +
 .../riscv/zvk/zvkned/vaesz_overloaded.c   | 130 +
 28 files changed, 2278 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvkned/vaesdf.c
 create mode 100644 

[PATCH 3/7] RISC-V: Add intrinsic functions for crypto vector Zvkg extension

2023-12-03 Thread Feng Wang
This patch add the intrinsic functions(according to https://github.com/
riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/
vector-crypto/intrinsic_funcs.md) for crypto vector Zvkg extension. And all
the test cases are added for api-testing.

Co-Authored by: Songhe Zhu 
Co-Authored by: Ciyan Pan 

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add Zvkg in riscv_implied_info.
* config/riscv/riscv-vector-builtins-bases.cc (class vghsh):Add new 
function_base for Zvkg.
(class vgmul): Ditto.
(BASE): Add Zvkg BASE declaration.
* config/riscv/riscv-vector-builtins-bases.h:Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct crypto_vv_def): 
Add function_builder for Zvkg.
(SHAPE): Add Zvkg SHAPE declaration.
* config/riscv/riscv-vector-builtins-shapes.h:Dito.
* config/riscv/riscv-vector-builtins.cc: Define new data struct for 
Zvkg.
* config/riscv/riscv-vector-crypto-builtins-avail.h (AVAIL): Add enable 
condition.
* config/riscv/riscv-vector-crypto-builtins-functions.def (vghsh): Add 
intrinsc def.
(vgmul): Ditto.
* config/riscv/riscv.md: Add Zvkg ins name.
* config/riscv/vector-crypto.md (@pred_vghsh): Add Zvkg md 
patterns.
(@pred_vgmul): Ditto.
* config/riscv/vector-iterators.md: Add new iterators for Zvkg.
* config/riscv/vector.md: Add the corresponding attribute for Zvkg.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvk/zvk.exp:
* gcc.target/riscv/zvk/zvkg/vghsh.c: New test.
* gcc.target/riscv/zvk/zvkg/vghsh_overloaded.c: New test.
* gcc.target/riscv/zvk/zvkg/vgmul.c: New test.
* gcc.target/riscv/zvk/zvkg/vgmul_overloaded.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   |  1 +
 .../riscv/riscv-vector-builtins-bases.cc  | 29 +++
 .../riscv/riscv-vector-builtins-bases.h   |  2 +
 .../riscv/riscv-vector-builtins-shapes.cc | 23 +
 .../riscv/riscv-vector-builtins-shapes.h  |  1 +
 gcc/config/riscv/riscv-vector-builtins.cc | 15 ++
 .../riscv-vector-crypto-builtins-avail.h  |  1 +
 ...riscv-vector-crypto-builtins-functions.def |  3 ++
 gcc/config/riscv/riscv.md |  4 +-
 gcc/config/riscv/vector-crypto.md | 43 +++-
 gcc/config/riscv/vector-iterators.md  |  4 ++
 gcc/config/riscv/vector.md| 19 +++
 gcc/testsuite/gcc.target/riscv/zvk/zvk.exp|  2 +
 .../gcc.target/riscv/zvk/zvkg/vghsh.c | 51 +++
 .../riscv/zvk/zvkg/vghsh_overloaded.c | 51 +++
 .../gcc.target/riscv/zvk/zvkg/vgmul.c | 51 +++
 .../riscv/zvk/zvkg/vgmul_overloaded.c | 51 +++
 17 files changed, 340 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvkg/vghsh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvkg/vghsh_overloaded.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvkg/vgmul.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvkg/vgmul_overloaded.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index 296500e15df..3eefd0263f9 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -123,6 +123,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zvbb",  "zvkb"},
   {"zvbc", "v"},
   {"zvkb", "v"},
+  {"zvkg", "v"},
 
   {"zfh", "zfhmin"},
   {"zfhmin", "f"},
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 45b1e563ff4..0cb9b2925af 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2227,6 +2227,31 @@ public:
   }
 };
 
+class vghsh : public function_base
+{
+public:
+  bool apply_mask_policy_p () const override { return false; }
+  bool use_mask_predication_p () const override { return false; }
+  bool has_merge_operand_p () const override { return false; }
+  rtx expand (function_expander ) const override
+  {
+  return e.use_exact_insn (code_for_pred_vghsh (e.vector_mode ()));
+  }
+};
+
+
+class vgmul : public function_base
+{
+public:
+  bool apply_mask_policy_p () const override { return false; }
+  bool use_mask_predication_p () const override { return false; }
+  bool has_merge_operand_p () const override { return false; }
+  rtx expand (function_expander ) const override
+  {
+  return e.use_exact_insn (code_for_pred_vgmul (e.vector_mode ()));
+  }
+};
+
 static CONSTEXPR const vsetvl vsetvl_obj;
 static CONSTEXPR const vsetvl vsetvlmax_obj;
 static CONSTEXPR const loadstore vle_obj;
@@ -2496,6 +2521,8 @@ static CONSTEXPR const vcltzvctz_obj;
 static CONSTEXPR const vwsll vwsll_obj;
 static CONSTEXPR const clmul  vclmul_obj;
 static CONSTEXPR const clmul 

[PATCH 2/7] RISC-V: Add intrinsic functions for crypto vector Zvbc extension

2023-12-03 Thread Feng Wang
This patch add the intrinsic functions(according to https://github.com/
riscv-non-isa/rvv-intrinsic-doc/blob/eopc/vector-crypto/auto-generated/
vector-crypto/intrinsic_funcs.md) for crypto vector Zvbc extension. And all
the test cases are added for api-testing.

Co-Authored by: Songhe Zhu 

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Add Zvbc in riscv_implied_info.
* config/riscv/riscv-vector-builtins-bases.cc (class clmul):Add new 
function_base for Zvbc.
(BASE): Add Zvbc BASE declaration.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-shapes.cc (struct zvbb_def): Add 
new function_builder for Zvbc.
(struct zvbb_zvbc_def): Combine function_base of Zvbb and Zvbc.
(SHAPE): Add Zvbc SHAPE declaration.
* config/riscv/riscv-vector-builtins-shapes.h: Ditto.
* config/riscv/riscv-vector-builtins.cc 
(DEF_RVV_CRYPTO_SEW32_OPS):Define new data struct for Zvbc.
(DEF_RVV_CRYPTO_SEW64_OPS): Ditto.
* config/riscv/riscv-vector-crypto-builtins-avail.h (AVAIL): Add enable 
condition.
* config/riscv/riscv-vector-crypto-builtins-functions.def (vandn): Add 
intrinsc def.
(vbrev):  Ditto.
(vbrev8): Ditto.
(vrev8):  Ditto.
(vclz):   Ditto.
(vctz):   Ditto.
(vrol):   Ditto.
(vror):   Ditto.
(vwsll):  Ditto.
(vclmul): Ditto.
(vclmulh):Ditto.
* config/riscv/riscv.md: Add Zvbc ins name.
* config/riscv/vector-crypto.md (h): Add Zvbc md patterns.
(@pred_vclmul): Ditto.
(@pred_vclmul_scalar): Ditto.
* config/riscv/vector-iterators.md: Add new iterators for Zvbc.
* config/riscv/vector.md: Add the corresponding attribute for Zvbc.
* config/riscv/riscv-vector-crypto-builtins-types.def: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/zvk/zvk.exp:
* gcc.target/riscv/zvk/zvbc/vclmul.c: New test.
* gcc.target/riscv/zvk/zvbc/vclmul_overloaded.c: New test.
* gcc.target/riscv/zvk/zvbc/vclmulh.c: New test.
* gcc.target/riscv/zvk/zvbc/vclmulh_overloaded.c: New test.
---
 gcc/common/config/riscv/riscv-common.cc   |   1 +
 .../riscv/riscv-vector-builtins-bases.cc  |  22 ++
 .../riscv/riscv-vector-builtins-bases.h   |   2 +
 .../riscv/riscv-vector-builtins-shapes.cc |   6 +-
 .../riscv/riscv-vector-builtins-shapes.h  |   2 +-
 gcc/config/riscv/riscv-vector-builtins.cc |  29 +++
 .../riscv-vector-crypto-builtins-avail.h  |   1 +
 ...riscv-vector-crypto-builtins-functions.def |  31 +--
 .../riscv-vector-crypto-builtins-types.def|  21 ++
 gcc/config/riscv/riscv.md |   5 +-
 gcc/config/riscv/vector-crypto.md |  50 +
 gcc/config/riscv/vector-iterators.md  |   5 +
 gcc/config/riscv/vector.md|  14 +-
 .../gcc.target/riscv/zvk/zvbc/vclmul.c| 208 ++
 .../riscv/zvk/zvbc/vclmul_overloaded.c| 208 ++
 .../gcc.target/riscv/zvk/zvbc/vclmulh.c   | 208 ++
 .../riscv/zvk/zvbc/vclmulh_overloaded.c   | 208 ++
 gcc/testsuite/gcc.target/riscv/zvk/zvk.exp|   2 +
 18 files changed, 998 insertions(+), 25 deletions(-)
 create mode 100755 gcc/config/riscv/riscv-vector-crypto-builtins-types.def
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvbc/vclmul.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvbc/vclmul_overloaded.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvbc/vclmulh.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/zvk/zvbc/vclmulh_overloaded.c

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index a5fb492c690..296500e15df 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -121,6 +121,7 @@ static const riscv_implied_info_t riscv_implied_info[] =
   {"zvksg", "zvks"},
   {"zvksg", "zvkg"},
   {"zvbb",  "zvkb"},
+  {"zvbc", "v"},
   {"zvkb", "v"},
 
   {"zfh", "zfhmin"},
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index e41343b4a1a..45b1e563ff4 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -2209,6 +2209,24 @@ public:
   }
 };
 
+template
+class clmul : public function_base
+{
+public:
+  rtx expand (function_expander ) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vv:
+return e.use_exact_insn (code_for_pred_vclmul (UNSPEC, e.vector_mode 
()));
+  case OP_TYPE_vx:
+return e.use_exact_insn (code_for_pred_vclmul_scalar (UNSPEC, 
e.vector_mode ()));
+  default:
+gcc_unreachable ();
+  }
+  }
+};
+
 static CONSTEXPR const vsetvl vsetvl_obj;
 static CONSTEXPR const vsetvl vsetvlmax_obj;
 static CONSTEXPR const 

Re: [PATCH] pro_and_epilogue: Call df_note_add_problem () if SHRINK_WRAPPING_ENABLED [PR112760]

2023-12-03 Thread Andrew Pinski
On Sat, Dec 2, 2023 at 3:04 AM Richard Sandiford
 wrote:
>
> Jakub Jelinek  writes:
> > Hi!
> >
> > The following testcase ICEs on x86_64-linux since df_note_add_problem ()
> > call has been added to mode switching.
> > The problem is that the pro_and_epilogue pass in
> > prepare_shrink_wrap -> copyprop_hardreg_forward_bb_without_debug_insn
> > uses regcprop.cc infrastructure which relies on accurate REG_DEAD/REG_UNUSED
> > notes.  E.g. regcprop.cc
> >   /* We need accurate notes.  Earlier passes such as if-conversion may
> >  leave notes in an inconsistent state.  */
> >   df_note_add_problem ();
> > documents that.  On the testcase below it is in particular the
> >   /* Detect obviously dead sets (via REG_UNUSED notes) and remove them. 
> >  */
> >   if (set
> >   && !RTX_FRAME_RELATED_P (insn)
> >   && NONJUMP_INSN_P (insn)
> >   && !may_trap_p (set)
> >   && find_reg_note (insn, REG_UNUSED, SET_DEST (set))
> >   && !side_effects_p (SET_SRC (set))
> >   && !side_effects_p (SET_DEST (set)))
> > {
> >   bool last = insn == BB_END (bb);
> >   delete_insn (insn);
> >   if (last)
> > break;
> >   continue;
> > }
> > case where a stale REG_UNUSED note breaks stuff up (added in vzeroupper
> > pass, redundant insn after it deleted later).
> >
> > The following patch makes sure the notes are not stale if shrink wrapping
> > is enabled.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> I still maintain that so much stuff relies on the lack of false-positive
> REG_UNUSED notes that (whatever the intention might have been) we need
> to prevent the false positive.  Like Andrew says, any use of single_set
> is suspect if there's a REG_UNUSED note for something that is in fact used.
>
> So sorry to be awkward, but I don't think this is the way to go.  I think
> we'll just end up playing whack-a-mole and adding df_note_add_problem to
> lots of passes.
>
> (FTR, I'm not saying passes have to avoid false negatives, just false
> positives.  If a pass updates an instruction with a REG_UNUSED note,
> and the pass is no longer sure whether the register is unused or not,
> the pass can just delete the note.)

Just FYI. This issue with single_use did come up back in 2009 but it
seems like it was glossed over until now.
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=40209#c5 and the
other comments in that bug report where the patch which fixed the ICE
was just about doing the add of df_note_add_problem .
We definitely need to figure out what should be done here for
single_use; split off the use of REG_UNUSED from single_use and use
df_single_use for that like what was mentioned there. Or just add
df_note_add_problem in other places or fix postreload not to the wrong
thing (but there are definitely other passes like mentioned in that
bug report that does not update REG_UNUSED, e.g. gcse).

Thanks,
Andrew


>
> Richard
>
> > 2023-12-02  Jakub Jelinek  
> >
> >   PR rtl-optimization/112760
> >   * function.cc (thread_prologue_and_epilogue_insns): If
> >   SHRINK_WRAPPING_ENABLED, call df_note_add_problem before calling
> >   df_analyze.
> >
> >   * gcc.dg/pr112760.c: New test.
> >
> > --- gcc/function.cc.jj2023-11-07 08:32:01.699254744 +0100
> > +++ gcc/function.cc   2023-12-01 13:42:51.885189341 +0100
> > @@ -6036,6 +6036,11 @@ make_epilogue_seq (void)
> >  void
> >  thread_prologue_and_epilogue_insns (void)
> >  {
> > +  /* prepare_shrink_wrap uses 
> > copyprop_hardreg_forward_bb_without_debug_insn
> > + which uses regcprop.cc functions which rely on accurate REG_UNUSED
> > + and REG_DEAD notes.  */
> > +  if (SHRINK_WRAPPING_ENABLED)
> > +df_note_add_problem ();
> >df_analyze ();
> >
> >/* Can't deal with multiple successors of the entry block at the
> > --- gcc/testsuite/gcc.dg/pr112760.c.jj2023-12-01 13:46:57.444746529 
> > +0100
> > +++ gcc/testsuite/gcc.dg/pr112760.c   2023-12-01 13:46:36.729036971 +0100
> > @@ -0,0 +1,22 @@
> > +/* PR rtl-optimization/112760 */
> > +/* { dg-do run } */
> > +/* { dg-options "-O2 -fno-dce -fno-guess-branch-probability 
> > --param=max-cse-insns=0" } */
> > +/* { dg-additional-options "-m8bit-idiv -mavx" { target i?86-*-* 
> > x86_64-*-* } } */
> > +
> > +unsigned g;
> > +
> > +__attribute__((__noipa__)) unsigned short
> > +foo (unsigned short a, unsigned short b)
> > +{
> > +  unsigned short x = __builtin_add_overflow_p (a, g, (unsigned short) 0);
> > +  g -= g / b;
> > +  return x;
> > +}
> > +
> > +int
> > +main ()
> > +{
> > +  unsigned short x = foo (40, 6);
> > +  if (x != 0)
> > +__builtin_abort ();
> > +}
> >
> >   Jakub


Re: [PATCH] RISC-V: Document optimization parameter riscv-strcmp-inline-limit

2023-12-03 Thread Kito Cheng
LGTM

On Sun, Dec 3, 2023 at 5:16 AM Christoph Müllner <
christoph.muell...@vrull.eu> wrote:

> This patch documents the optimization parameter
> riscv-strcmp-inline-limit, which can be used to tweak the behaviour
> of -minline-strcmp and -minline-strncmp.
>
> gcc/ChangeLog:
>
> PR target/112650
> * doc/invoke.texi: Document riscv-strcmp-inline-limit.
>
> Signed-off-by: Christoph Müllner 
> ---
>  gcc/doc/invoke.texi | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 2fab4c5d71f..ba2d843b484 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -29846,6 +29846,10 @@ Inlining will only be done if the strings are
> properly aligned
>  and instructions for accelerated processing are available.
>  The default is to not inline strcmp calls.
>
> +The @option{--param riscv-strcmp-inline-limit=@{n}} parameter controls
> +the maximum number of bytes compared by the inlined code.
> +The default value is 64.
> +
>  @opindex minline-strncmp
>  @item -minline-strncmp
>  @itemx -mno-inline-strncmp
> @@ -29854,6 +29858,10 @@ Inlining will only be done if the strings are
> properly aligned
>  and instructions for accelerated processing are available.
>  The default is to not inline strncmp calls.
>
> +The @option{--param riscv-strcmp-inline-limit=@{n}} parameter controls
> +the maximum number of bytes compared by the inlined code.
> +The default value is 64.
> +
>  @opindex mshorten-memrefs
>  @item -mshorten-memrefs
>  @itemx -mno-shorten-memrefs
> --
> 2.41.0
>
>


Re: [PATCH v2] rs6000: Add new pass for replacement of contiguous addresses vector load lxv with lxvp

2023-12-03 Thread Kewen.Lin
Hi Ajit,

on 2023/12/1 17:10, Ajit Agarwal wrote:
> Hello Kewen:
> 
> On 24/11/23 3:01 pm, Kewen.Lin wrote:
>> Hi Ajit,
>>
>> Don't forget to CC David (CC-ed) :), some comments are inlined below.
>>
>> on 2023/10/8 03:04, Ajit Agarwal wrote:
>>> Hello All:
>>>
>>> This patch add new pass to replace contiguous addresses vector load lxv 
>>> with mma instruction
>>> lxvp.
>>
>> IMHO the current binding lxvp (and lxvpx, stxvp{x,}) to MMA looks wrong, 
>> it's only
>> Power10 and VSX required, these instructions should perform well without MMA 
>> support.
>> So one patch to separate their support from MMA seems to go first.
>>
> 
> I will make the changes for Power10 and VSX.
> 
>>> This patch addresses one regressions failure in ARM architecture.
>>
>> Could you explain this?  I don't see any test case for this.
> 
> I have submitted v1 of the patch and there were regressions failure for 
> Linaro.
> I have fixed in version V2.

OK, thanks for clarifying.  So some unexpected changes on generic code in v1
caused the failure exposed on arm.

> 
>  
>> Besides, it seems a bad idea to put this pass after reload? as register 
>> allocation
>> finishes, this pairing has to be restricted by the reg No. (I didn't see any
>> checking on the reg No. relationship for paring btw.)
>>
> 
> Adding before reload pass deletes one of the lxv and replaced with lxvp. This
> fails in reload pass while freeing reg_eqivs as ira populates them and then

I can't find reg_eqivs, I guessed you meant reg_equivs and moved this pass 
right before
pass_reload (between pass_ira and pass_reload)?  IMHO it's unexpected as those 
two passes
are closely correlated.  I was expecting to put it somewhere before ira.

> vecload pass deletes some of insns and while freeing in reload pass as insn
> is already deleted in vecload pass reload pass segfaults.
> 
> Moving vecload pass before ira will not make register pairs with lxvp and
> in ira and that will be a problem.

Could you elaborate the obstacle for moving such pass before pass_ira?

Basing on the status quo, the lxvp is bundled with OOmode, then I'd expect
we can generate OOmode move (load) and use the components with unspec (or
subreg with Peter's patch) to replace all the previous use places, it looks
doable to me.

> 
> Making after reload pass is the only solution I see as ira and reload pass
> makes register pairs and vecload pass will be easier with generation of
> lxvp.
> 
> Please suggest.
>  
>> Looking forward to the comments from Segher/David/Peter/Mike etc.

Still looking forward. :)

BR,
Kewen


[PATCH] Workaround array_slice constructor portability issues (with older g++).

2023-12-03 Thread Roger Sayle

The recent change to represent language and target attribute tables using
vec.h's array_slice template class triggers an issue/bug in older g++
compilers, specifically the g++ 4.8.5 system compiler of older RedHat
distributions.  This exhibits as the following compilation errors during
bootstrap:

../../gcc/gcc/c/c-lang.cc:55:2661: error: could not convert '(const
scoped_attribute_specs* const*)(& c_objc_attribute_table)' from 'const
scoped_attribute_specs* const*' to 'array_slice'
 struct lang_hooks lang_hooks = LANG_HOOKS_INITIALIZER;

../../gcc/gcc/c/c-decl.cc:4657:1: error: could not convert '(const
attribute_spec*)(& std_attributes)' from 'const attribute_spec*' to
'array_slice'

Here the issue is with constructors of the from:

static const int table[] = { 1, 2, 3 };
array_slice t = table;

Perhaps there's a fix possible in vec.h (an additional constructor?), but
the patch below fixes this issue by using one of array_slice's constructors
(that takes a size) explicitly, rather than rely on template resolution.
In the example above this looks like:

array_slice t (table, 3);

or equivalently

array_slice t = array_slice(table, 3);

or equivalently

array_slice t = array_slice(table, ARRAY_SIZE (table));


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap,
where these changes allow the bootstrap to complete.  Ok for mainline?
This fix might not by ideal, but it both draws attention to the problem
and restores bootstrap whilst better approaches are investigated.  For
example, an ARRAY_SLICE(table) macro might be appropriate if there isn't
an easy/portable template resolution solution.  Thoughts?


2023-12-03  Roger Sayle  

gcc/c-family/ChangeLog
* c-attribs.cc (c_common_gnu_attribute_table): Use an explicit
array_slice constructor with an explicit size argument.
(c_common_format_attribute_table): Likewise.

gcc/c/ChangeLog
* c-decl.cc (std_attribute_table): Use an explicit
array_slice constructor with an explicit size argument.
* c-objc-common.h (LANG_HOOKS_ATTRIBUTE_TABLE): Likewise.

gcc/ChangeLog
* config/i386/i386-options.cc (ix86_gnu_attribute_table): Use an
explicit array_slice constructor with an explicit size argument.
* config/i386/i386.cc (TARGET_ATTRIBUTE_TABLE): Likewise.

gcc/cp/ChangeLog
* cp-objcp-common.h (LANG_HOOKS_ATTRIBUTE_TABLE): Use an
explicit array_slice constructor with an explicit size argument.
* tree.cc (cxx_gnu_attribute_table): Likewise.
(std_attribute_table): Likewise.

gcc/lto/ChangeLog
* lto-lang.cc (lto_gnu_attribute_table): Use an explicit
array_slice constructor with an explicit size argument.
(lto_format_attribute_table): Likewise.
(LANG_HOOKS_ATTRIBUTE_TABLE): Likewise.


Thanks in advance,
Roger
--

diff --git a/gcc/c-family/c-attribs.cc b/gcc/c-family/c-attribs.cc
index 45af074..af83588 100644
--- a/gcc/c-family/c-attribs.cc
+++ b/gcc/c-family/c-attribs.cc
@@ -584,7 +584,9 @@ const struct attribute_spec c_common_gnu_attributes[] =
 
 const struct scoped_attribute_specs c_common_gnu_attribute_table =
 {
-  "gnu", c_common_gnu_attributes
+  "gnu",
+  array_slice(c_common_gnu_attributes,
+   ARRAY_SIZE (c_common_gnu_attributes))
 };
 
 /* Give the specifications for the format attributes, used by C and all
@@ -603,7 +605,9 @@ const struct attribute_spec c_common_format_attributes[] =
 
 const struct scoped_attribute_specs c_common_format_attribute_table =
 {
-  "gnu", c_common_format_attributes
+  "gnu",
+  array_slice(c_common_format_attributes,
+   ARRAY_SIZE (c_common_format_attributes))
 };
 
 /* Returns TRUE iff the attribute indicated by ATTR_ID takes a plain
diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
index 248d1bb..a6984b0 100644
--- a/gcc/c/c-decl.cc
+++ b/gcc/c/c-decl.cc
@@ -4653,7 +4653,8 @@ static const attribute_spec std_attributes[] =
 
 const scoped_attribute_specs std_attribute_table =
 {
-  nullptr, std_attributes
+  nullptr, array_slice(std_attributes,
+ARRAY_SIZE (std_attributes))
 };
 
 /* Create the predefined scalar types of C,
diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
index 426d938..021c651 100644
--- a/gcc/c/c-objc-common.h
+++ b/gcc/c/c-objc-common.h
@@ -83,7 +83,8 @@ static const scoped_attribute_specs *const 
c_objc_attribute_table[] =
 };
 
 #undef LANG_HOOKS_ATTRIBUTE_TABLE
-#define LANG_HOOKS_ATTRIBUTE_TABLE c_objc_attribute_table
+#define LANG_HOOKS_ATTRIBUTE_TABLE \
+array_slice (c_objc_attribute_table, 
ARRAY_SIZE (c_objc_attribute_table))
 
 #undef LANG_HOOKS_TREE_DUMP_DUMP_TREE_FN
 #define LANG_HOOKS_TREE_DUMP_DUMP_TREE_FN c_dump_tree
diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc
index 8776592..50b3425 100644
--- a/gcc/config/i386/i386-options.cc
+++ b/gcc/config/i386/i386-options.cc
@@ -4171,7 +4171,9 @@ static const 

[Committed] RISC-V: Robostify the W43, W86, W87 constraint enabled attribute

2023-12-03 Thread Juzhe-Zhong
Committed as it is obvious fix.

gcc/ChangeLog:

* config/riscv/riscv.md: Rostify the constraints.

---
 gcc/config/riscv/riscv.md | 19 +--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 4c6f63677df..ff521454876 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -515,13 +515,28 @@
  (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) 
!= 2"))
 (const_string "no")
 
- (and (eq_attr "group_overlap" "W42,W43")
+ (and (eq_attr "group_overlap" "W42")
  (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) 
!= 4"))
 (const_string "no")
 
- (and (eq_attr "group_overlap" "W84,W86,W87")
+ (and (eq_attr "group_overlap" "W84")
  (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) 
!= 8"))
 (const_string "no")
+
+ ;; According to RVV ISA:
+ ;; The destination EEW is greater than the source EEW, the source 
EMUL is at least 1,
+;; and the overlap is in the highest-numbered part of the destination 
register group
+;; (e.g., when LMUL=8, vzext.vf4 v0, v6 is legal, but a source of v0, 
v2, or v4 is not).
+;; So the source operand should have LMUL >= 1.
+ (and (eq_attr "group_overlap" "W43")
+ (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) 
!= 4
+  && riscv_get_v_regno_alignment (GET_MODE 
(operands[3])) >= 1"))
+(const_string "no")
+
+ (and (eq_attr "group_overlap" "W86,W87")
+ (match_test "riscv_get_v_regno_alignment (GET_MODE (operands[0])) 
!= 8
+  && riscv_get_v_regno_alignment (GET_MODE 
(operands[3])) >= 1"))
+(const_string "no")
 ]
(const_string "yes")))
 
-- 
2.36.3



Re: [committed] Fix gnu23-builtins-no-dfp

2023-12-03 Thread Jeff Law




On 12/3/23 05:23, Thomas Schwinge wrote:

Hi!

On 2023-12-03T08:41:59+0100, Florian Weimer  wrote:

* Jeff Law:


Anyway, this test was the one I was most concerned about.  Basically
we're testing that on a !dfp target that the builtins are not available.
   It expects a warning, but gets an error by default now.  I just
changed the test to use -fpermissive, so that the test behaves as it did
previously.


In these ambiguous cases, I cloned tests into -fpermissive and error
variants.  This might be appropriate here as well (or I should remove
the clones again if those are the wrong thing to do).


For that test case, it did seem appropriate to me to simply
's%dg-warning%dg-error', which I already had posted in

"c: Turn -Wimplicit-function-declaration into a permerror: Fix 
'gcc.dg/gnu23-builtins-no-dfp-1.c'",
awaiting review.  Rationale: For this test case it's secondary *how*
"implicit declaration of function" is diagnosed, so I'd test the standard
way, which instead of "warning" now is "error".  (But no strong feelings
either way.)  ;-)
Sorry, I missed your fix.  I like it better then mine.  Approved, along 
with reverting my bits.


jeff


Re: [PATCH] testsuite: Fix up gcc.target/aarch64/pr112406.c for modern C [PR112406]

2023-12-03 Thread Richard Biener



> Am 03.12.2023 um 19:32 schrieb Jakub Jelinek :
> 
> On Fri, Nov 17, 2023 at 02:04:01PM +0100, Robin Dapp wrote:
>>> Yes, your version is also OK.
>> 
>> The attached was bootstrapped and regtested on aarch64, x86 and
>> regtested on riscv.  Going to commit it later unless somebody objects.
> 
> Unfortunately the aarch64/pr112406.c was reduced too much and is rejected
> since the switch to modern C patchset.
> 
> The following patch fixes that, I've verified the testcase
> before/after the changes still ICEs in r14-5563 and doesn't with
> r14-5564 and after the changes compiles fine with even latest trunk.
> Everything admittedly with a cross-compiler, but that shouldn't change
> anything.
> 
> Ok for trunk?

Ok

> Note, one of the modern C changes is that at least when people use
> cvise/creduce/delta scripts which ensure no further errors are introduced
> during the reduction then expected originally such reductions will not
> appear anymore.
> 
> 2023-12-03  Jakub Jelinek  
> 
>PR middle-end/112406
>* gcc.target/aarch64/pr112406.c (MagickPixelPacket): Add missing
>semicolon.
>(GetImageChannelMoments_image): Avoid using implicit int.
>(SetMagickPixelPacket): Use void return type instead of implicit int.
>(GetImageChannelMoments): Likewise.  Use __builtin_atan instead of
>atan.
> 
> --- gcc/testsuite/gcc.target/aarch64/pr112406.c.jj2023-11-18 
> 09:35:20.944084686 +0100
> +++ gcc/testsuite/gcc.target/aarch64/pr112406.c2023-12-03 
> 19:05:16.109365791 +0100
> @@ -2,10 +2,10 @@
> /* { dg-options "-march=armv8-a+sve -w -Ofast" } */
> 
> typedef struct {
> -  int red
> +  int red;
> } MagickPixelPacket;
> 
> -GetImageChannelMoments_image, GetImageChannelMoments_image_0,
> +int GetImageChannelMoments_image, GetImageChannelMoments_image_0,
> GetImageChannelMoments___trans_tmp_1, GetImageChannelMoments_M11_0,
> GetImageChannelMoments_pixel_3, GetImageChannelMoments_y,
> GetImageChannelMoments_p;
> @@ -15,10 +15,12 @@ double GetImageChannelMoments_M00_0, Get
> 
> MagickPixelPacket GetImageChannelMoments_pixel;
> 
> +void
> SetMagickPixelPacket(int color, MagickPixelPacket *pixel) {
>   pixel->red = color;
> }
> 
> +void
> GetImageChannelMoments() {
>   for (; GetImageChannelMoments_y; GetImageChannelMoments_y++) {
> SetMagickPixelPacket(GetImageChannelMoments_p,
> @@ -33,5 +35,5 @@ GetImageChannelMoments() {
> GetImageChannelMoments_M01_1 +=
> GetImageChannelMoments_y * GetImageChannelMoments_p++;
>   }
> -  GetImageChannelMoments___trans_tmp_1 = atan(GetImageChannelMoments_M11_0);
> +  GetImageChannelMoments___trans_tmp_1 = 
> __builtin_atan(GetImageChannelMoments_M11_0);
> }
> 
> 
>Jakub
> 


[PATCH] testsuite: Fix up gcc.target/s390/pr96127.c test for modern C [PR96127]

2023-12-03 Thread Jakub Jelinek
Hi!

I've noticed this test regressed on s390x-linux with the addition of the
switch to modern C patchset.  Haven't tried to reproduce the ICE, but as it
was a backend ICE and FE after warning used to add such casts before (now
errors), I think this ought to keep the testcase testing what was intended
before.

Ok for trunk?

2023-12-03  Jakub Jelinek  

PR target/96127
* gcc.target/s390/pr96127.c (c1): Add casts to long int *.

--- gcc/testsuite/gcc.target/s390/pr96127.c.jj  2020-07-28 15:39:10.058755540 
+0200
+++ gcc/testsuite/gcc.target/s390/pr96127.c 2023-12-03 19:19:52.140110428 
+0100
@@ -7,7 +7,7 @@ void
 c1 (int oz, int dk, int ub)
 {
   int *hd = 0;
-  long int *th = 
+  long int *th = (long int *) 
 
   while (ub < 1)
 {
@@ -17,7 +17,7 @@ c1 (int oz, int dk, int ub)
 
   while (oz < 2)
 {
-  long int *lq = 
+  long int *lq = (long int *) 
 
   (*hd < (*lq = *th)) < oz;
 

Jakub



[PATCH] testsuite: Fix up gcc.target/aarch64/pr112406.c for modern C [PR112406]

2023-12-03 Thread Jakub Jelinek
On Fri, Nov 17, 2023 at 02:04:01PM +0100, Robin Dapp wrote:
> > Yes, your version is also OK.
> 
> The attached was bootstrapped and regtested on aarch64, x86 and
> regtested on riscv.  Going to commit it later unless somebody objects.

Unfortunately the aarch64/pr112406.c was reduced too much and is rejected
since the switch to modern C patchset.

The following patch fixes that, I've verified the testcase
before/after the changes still ICEs in r14-5563 and doesn't with
r14-5564 and after the changes compiles fine with even latest trunk.
Everything admittedly with a cross-compiler, but that shouldn't change
anything.

Ok for trunk?

Note, one of the modern C changes is that at least when people use
cvise/creduce/delta scripts which ensure no further errors are introduced
during the reduction then expected originally such reductions will not
appear anymore.

2023-12-03  Jakub Jelinek  

PR middle-end/112406
* gcc.target/aarch64/pr112406.c (MagickPixelPacket): Add missing
semicolon.
(GetImageChannelMoments_image): Avoid using implicit int.
(SetMagickPixelPacket): Use void return type instead of implicit int.
(GetImageChannelMoments): Likewise.  Use __builtin_atan instead of
atan.

--- gcc/testsuite/gcc.target/aarch64/pr112406.c.jj  2023-11-18 
09:35:20.944084686 +0100
+++ gcc/testsuite/gcc.target/aarch64/pr112406.c 2023-12-03 19:05:16.109365791 
+0100
@@ -2,10 +2,10 @@
 /* { dg-options "-march=armv8-a+sve -w -Ofast" } */
 
 typedef struct {
-  int red
+  int red;
 } MagickPixelPacket;
 
-GetImageChannelMoments_image, GetImageChannelMoments_image_0,
+int GetImageChannelMoments_image, GetImageChannelMoments_image_0,
 GetImageChannelMoments___trans_tmp_1, GetImageChannelMoments_M11_0,
 GetImageChannelMoments_pixel_3, GetImageChannelMoments_y,
 GetImageChannelMoments_p;
@@ -15,10 +15,12 @@ double GetImageChannelMoments_M00_0, Get
 
 MagickPixelPacket GetImageChannelMoments_pixel;
 
+void
 SetMagickPixelPacket(int color, MagickPixelPacket *pixel) {
   pixel->red = color;
 }
 
+void
 GetImageChannelMoments() {
   for (; GetImageChannelMoments_y; GetImageChannelMoments_y++) {
 SetMagickPixelPacket(GetImageChannelMoments_p,
@@ -33,5 +35,5 @@ GetImageChannelMoments() {
 GetImageChannelMoments_M01_1 +=
 GetImageChannelMoments_y * GetImageChannelMoments_p++;
   }
-  GetImageChannelMoments___trans_tmp_1 = atan(GetImageChannelMoments_M11_0);
+  GetImageChannelMoments___trans_tmp_1 = 
__builtin_atan(GetImageChannelMoments_M11_0);
 }


Jakub



Re: [PATCH] gcc/doc: spelling mistakes and example

2023-12-03 Thread David Malcolm
On Sun, 2023-12-03 at 11:59 +, Jonny Grant wrote:
> 
> 
> On 03/12/2023 04:03, Xi Ruoyao wrote:
> > On Sun, 2023-12-03 at 00:17 +, Jonny Grant wrote:
> > > @@ -733,7 +733,7 @@ To configure GCC:
> > >  @smallexample
> > >  % mkdir @var{objdir}
> > >  % cd @var{objdir}
> > > -% @var{srcdir}/configure [@var{options}] [@var{target}]
> > > +% ../@var{srcdir}/configure [@var{options}] [@var{target}]
> > >  @end smallexample
> > 
> > No, this is definitely incorrect.  srcdir is the path (it may be
> > relative or absolute) to the GCC source tree.  It's not necessary
> > to be
> > placed in the parent directory of objdir.
> > 
> 
> Fair enough.
> 
> Can the spelling corrections still be merged? Or should I re-submit
> the patch without that line?

The spelling corrections look OK to me.

Do you have an account that can push commits, or would you need this
done for you?

Please can you add Signed-off-by lines to your patches/commits
(via -s); see https://gcc.gnu.org/dco.html

Thanks
Dave

> 
> Kind regards, Jonny
> 



Re: [PATCH v2] testsuite, arm: Fix up pr112337.c test

2023-12-03 Thread Richard Sandiford
Saurabh Jha  writes:
> On 12/1/2023 2:10 PM, Richard Earnshaw (lists) wrote:
>> On 01/12/2023 13:45, Christophe Lyon wrote:
>>> On Fri, 1 Dec 2023 at 13:44, Richard Earnshaw (lists)
>>>  wrote:
 On 01/12/2023 11:28, Saurabh Jha wrote:
> Hey,
>
> I introduced this test "gcc/testsuite/gcc.target/arm/mve/pr112337.c" in 
> this commit 2365aae84de030bbb006edac18c9314812fc657b before. This had an 
> error which I unfortunately missed. This patch fixes that test.
>
> Did regression testing on arm-none-eabi and found no regressions. Output 
> of running gcc/contrib/compare_tests is this:
>
> """
> Tests that now work, but didn't before (2 tests):
>
> arm-eabi-aem/-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp: 
> gcc.target/arm/mve/pr112337.c (test for excess errors)
> arm-eabi-aem/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard:
>  gcc.target/arm/mve/pr112337.c (test for excess errors)
> """
>
> Ok for trunk? I don't have commit access so could someone please commit 
> on my behalf?
>
> Regards,
> Saurabh
>
> gcc/testsuite/ChangeLog:
>
>  * gcc.target/arm/mve/pr112337.c: Fix the testcase

 Hmm, could this be related to the changes Christophe made recently to 
 change the way MVE vector types were set up internally?  If so, this might 
 indicate an issue that's going to affect real users with existing code.

>>> My change was only about vector types, here the problem is with a
>>> pointer to a scalar.
>>> Anyway, I ran the test with my commit reverted and it still fails in
>>> the same way, so I think this patch is needed.
>>>
>>> Thanks,
>>>
>>> Christophe
>>>
 Christophe?

 R.
>> Ok, thanks for checking.  In that case, Saurabh, your patch is OK, but 
>> please change 'Fix testcase' to 'Use int32_t instead of int.'
>>
>> Note that ChangeLog entries end with a full stop.
>>
>> R.
>
> Thank you for the feedback. Please find the updated ChangeLog below.
>
> gcc/testsuite/ChangeLog:
>
>  * gcc.target/arm/mve/pr112337.c: Use int32_t instead of int.

Thanks, pushed to trunk.

Richard


[PATCH] c++/modules: Prevent treating suppressed debug info as extern template [PR112820]

2023-12-03 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

The TYPE_DECL_SUPPRESS_DEBUG and DECL_EXTERNAL flags use the same
underlying bit. This is causing confusion when attempting to determine
the interface for a streamed-in class type, since the modules code
currently assumes that all DECL_EXTERNAL types are extern templates.
However, when -g is specified then TYPE_DECL_SUPPRESS_DEBUG (and hence
DECL_EXTERNAL) is marked on various other kinds of declarations, such as
vtables, which causes them to never be emitted.

This patch constrains the checks for DECL_EXTERNAL for this to only
consider template instantiations, thus avoiding the issue.

PR c++/102607
PR c++/112820

gcc/cp/ChangeLog:

* module.cc (trees_in::read_class_def): Only set interface for
template instantiations.

gcc/testsuite/ChangeLog:

* g++.dg/modules/debug-2_a.C: New test.
* g++.dg/modules/debug-2_b.C: New test.
* g++.dg/modules/debug-2_c.C: New test.
* g++.dg/modules/debug-3_a.C: New test.
* g++.dg/modules/debug-3_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/module.cc | 4 +++-
 gcc/testsuite/g++.dg/modules/debug-2_a.C | 9 +
 gcc/testsuite/g++.dg/modules/debug-2_b.C | 8 
 gcc/testsuite/g++.dg/modules/debug-2_c.C | 9 +
 gcc/testsuite/g++.dg/modules/debug-3_a.C | 8 
 gcc/testsuite/g++.dg/modules/debug-3_b.C | 9 +
 6 files changed, 46 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/debug-2_c.C
 create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/debug-3_b.C

diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 33fcf396875..257f39421d0 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -12041,7 +12041,9 @@ trees_in::read_class_def (tree defn, tree 
maybe_template)
   bool installing = maybe_dup && !TYPE_SIZE (type);
   if (installing)
 {
-  if (DECL_EXTERNAL (defn) && TYPE_LANG_SPECIFIC (type))
+  if (DECL_EXTERNAL (defn)
+ && TYPE_LANG_SPECIFIC (type)
+ && CLASSTYPE_TEMPLATE_INSTANTIATION (type))
{
  /* We don't deal with not-really-extern, because, for a
 module you want the import to be the interface, and for a
diff --git a/gcc/testsuite/g++.dg/modules/debug-2_a.C 
b/gcc/testsuite/g++.dg/modules/debug-2_a.C
new file mode 100644
index 000..eed0905542b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/debug-2_a.C
@@ -0,0 +1,9 @@
+// PR c++/112820
+// { dg-additional-options "-fmodules-ts -g" }
+// { dg-module-cmi io }
+
+export module io;
+
+export struct error {
+  virtual const char* what() const noexcept;
+};
diff --git a/gcc/testsuite/g++.dg/modules/debug-2_b.C 
b/gcc/testsuite/g++.dg/modules/debug-2_b.C
new file mode 100644
index 000..fc9afbc02e0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/debug-2_b.C
@@ -0,0 +1,8 @@
+// PR c++/112820
+// { dg-additional-options "-fmodules-ts -g" }
+
+module io;
+
+const char* error::what() const noexcept {
+  return "bla";
+}
diff --git a/gcc/testsuite/g++.dg/modules/debug-2_c.C 
b/gcc/testsuite/g++.dg/modules/debug-2_c.C
new file mode 100644
index 000..37117f69dcd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/debug-2_c.C
@@ -0,0 +1,9 @@
+// PR c++/112820
+// { dg-module-do link }
+// { dg-additional-options "-fmodules-ts -g" }
+
+import io;
+
+int main() {
+  error{};
+}
diff --git a/gcc/testsuite/g++.dg/modules/debug-3_a.C 
b/gcc/testsuite/g++.dg/modules/debug-3_a.C
new file mode 100644
index 000..9e33d8260fd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/debug-3_a.C
@@ -0,0 +1,8 @@
+// PR c++/102607
+// { dg-additional-options "-fmodules-ts -g" }
+// { dg-module-cmi mod }
+
+export module mod;
+export struct B {
+  virtual ~B() = default;
+};
diff --git a/gcc/testsuite/g++.dg/modules/debug-3_b.C 
b/gcc/testsuite/g++.dg/modules/debug-3_b.C
new file mode 100644
index 000..03c78b71b5d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/debug-3_b.C
@@ -0,0 +1,9 @@
+// PR c++/102607
+// { dg-module-do link }
+// { dg-additional-options "-fmodules-ts -g" }
+
+import mod;
+int main() {
+  struct D : B {};
+  (void)D{};
+}
-- 
2.42.0



Re: [committed] Fix gnu23-builtins-no-dfp

2023-12-03 Thread Thomas Schwinge
Hi!

On 2023-12-03T08:41:59+0100, Florian Weimer  wrote:
> * Jeff Law:
>
>> Anyway, this test was the one I was most concerned about.  Basically
>> we're testing that on a !dfp target that the builtins are not available.
>>   It expects a warning, but gets an error by default now.  I just
>> changed the test to use -fpermissive, so that the test behaves as it did
>> previously.
>
> In these ambiguous cases, I cloned tests into -fpermissive and error
> variants.  This might be appropriate here as well (or I should remove
> the clones again if those are the wrong thing to do).

For that test case, it did seem appropriate to me to simply
's%dg-warning%dg-error', which I already had posted in

"c: Turn -Wimplicit-function-declaration into a permerror: Fix 
'gcc.dg/gnu23-builtins-no-dfp-1.c'",
awaiting review.  Rationale: For this test case it's secondary *how*
"implicit declaration of function" is diagnosed, so I'd test the standard
way, which instead of "warning" now is "error".  (But no strong feelings
either way.)  ;-)


Grüße
 Thomas
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] gcc/doc: spelling mistakes and example

2023-12-03 Thread Jonny Grant



On 03/12/2023 04:03, Xi Ruoyao wrote:
> On Sun, 2023-12-03 at 00:17 +, Jonny Grant wrote:
>> @@ -733,7 +733,7 @@ To configure GCC:
>>  @smallexample
>>  % mkdir @var{objdir}
>>  % cd @var{objdir}
>> -% @var{srcdir}/configure [@var{options}] [@var{target}]
>> +% ../@var{srcdir}/configure [@var{options}] [@var{target}]
>>  @end smallexample
> 
> No, this is definitely incorrect.  srcdir is the path (it may be
> relative or absolute) to the GCC source tree.  It's not necessary to be
> placed in the parent directory of objdir.
> 

Fair enough.

Can the spelling corrections still be merged? Or should I re-submit the patch 
without that line?

Kind regards, Jonny


[PATCH] lra: Updates of biggest mode for hard regs [PR112278]

2023-12-03 Thread Richard Sandiford
[Gah.  In my head I'd sent this a few weeks ago, but it turns out
 that I hadn't even got to the stage of writing the changlog...]

LRA keeps track of the biggest mode for both hard registers and
pseudos.  The updates assume that the modes are ordered, i.e. that
we can tell whether one is no bigger than the other at compile time.

That is (or at least seemed to be) a reasonable restriction for pseudos.
But it isn't necessarily so for hard registers, since the uses of hard
registers can be logically distinct.  The testcase is an example of this.

The biggest mode of hard registers is also special for other reasons.
As the existing comment says:

  /* A reg can have a biggest_mode of VOIDmode if it was only ever seen as
 part of a multi-word register.  In that case, just use the reg_rtx
 mode.  Do the same also if the biggest mode was larger than a register
 or we can not compare the modes.  Otherwise, limit the size to that of
 the biggest access in the function or to the natural mode at least.  */

This patch applies the same approach to the updates.

Tested on aarch64-linus-gnu (with and without SVE) and on x86_64-linux-gnu.
OK to install?

Richard


gcc/
PR rtl-optimization/112278
* lra-int.h (lra_update_biggest_mode): New function.
* lra-coalesce.cc (merge_pseudos): Use it.
* lra-lives.cc (process_bb_lives): Likewise.
* lra.cc (new_insn_reg): Likewise.

gcc/testsuite/
PR rtl-optimization/112278
* gcc.target/aarch64/sve/pr112278.c: New test.
---
 gcc/lra-coalesce.cc |  4 +---
 gcc/lra-int.h   | 15 +++
 gcc/lra-lives.cc|  4 +---
 gcc/lra.cc  |  5 ++---
 gcc/testsuite/gcc.target/aarch64/sve/pr112278.c | 15 +++
 5 files changed, 34 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/pr112278.c

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index 04a5bbd714b..d8ca096c35f 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -112,9 +112,7 @@ merge_pseudos (int regno1, int regno2)
 = (lra_merge_live_ranges
(lra_reg_info[first].live_ranges,
lra_copy_live_range_list (lra_reg_info[first2].live_ranges)));
-  if (partial_subreg_p (lra_reg_info[first].biggest_mode,
-   lra_reg_info[first2].biggest_mode))
-lra_reg_info[first].biggest_mode = lra_reg_info[first2].biggest_mode;
+  lra_update_biggest_mode (first, lra_reg_info[first2].biggest_mode);
 }
 
 /* Change pseudos in *LOC on their coalescing group
diff --git a/gcc/lra-int.h b/gcc/lra-int.h
index d7ec7c7dc7f..5cdf92be7fc 100644
--- a/gcc/lra-int.h
+++ b/gcc/lra-int.h
@@ -535,4 +535,19 @@ lra_assign_reg_val (int from, int to)
   lra_reg_info[to].offset = lra_reg_info[from].offset;
 }
 
+/* Update REGNO's biggest recorded mode so that it includes a reference
+   in mode MODE.  */
+inline void
+lra_update_biggest_mode (int regno, machine_mode mode)
+{
+  if (!ordered_p (GET_MODE_SIZE (lra_reg_info[regno].biggest_mode),
+ GET_MODE_SIZE (mode)))
+{
+  gcc_checking_assert (HARD_REGISTER_NUM_P (regno));
+  lra_reg_info[regno].biggest_mode = reg_raw_mode[regno];
+}
+  else if (partial_subreg_p (lra_reg_info[regno].biggest_mode, mode))
+lra_reg_info[regno].biggest_mode = mode;
+}
+
 #endif /* GCC_LRA_INT_H */
diff --git a/gcc/lra-lives.cc b/gcc/lra-lives.cc
index f60e564da82..0b204232849 100644
--- a/gcc/lra-lives.cc
+++ b/gcc/lra-lives.cc
@@ -770,9 +770,7 @@ process_bb_lives (basic_block bb, int _point, bool 
dead_insn_p)
{
  int regno = reg->regno;
 
- if (partial_subreg_p (lra_reg_info[regno].biggest_mode,
-   reg->biggest_mode))
-   lra_reg_info[regno].biggest_mode = reg->biggest_mode;
+ lra_update_biggest_mode (regno, reg->biggest_mode);
  if (HARD_REGISTER_NUM_P (regno))
lra_hard_reg_usage[regno] += freq;
}
diff --git a/gcc/lra.cc b/gcc/lra.cc
index c917a1adee2..29e2a3506e1 100644
--- a/gcc/lra.cc
+++ b/gcc/lra.cc
@@ -581,9 +581,8 @@ new_insn_reg (rtx_insn *insn, int regno, enum op_type type,
   lra_insn_reg *ir = lra_insn_reg_pool.allocate ();
   ir->type = type;
   ir->biggest_mode = mode;
-  if (NONDEBUG_INSN_P (insn)
-  && partial_subreg_p (lra_reg_info[regno].biggest_mode, mode))
-lra_reg_info[regno].biggest_mode = mode;
+  if (NONDEBUG_INSN_P (insn))
+lra_update_biggest_mode (regno, mode);
   ir->subreg_p = subreg_p;
   ir->early_clobber_alts = early_clobber_alts;
   ir->regno = regno;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr112278.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr112278.c
new file mode 100644
index 000..4f56add2b0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr112278.c
@@ -0,0 +1,15 @@
+#include 
+#include 
+
+void
+f (void)
+{
+  {
+register svint8_t v0