RE: [PATCH v1] RISC-V: Allow rounding mode control for RVV floating-point add

2023-06-27 Thread Li, Pan2 via Gcc-patches
Ack, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Tuesday, June 27, 2023 3:00 PM
To: Li, Pan2 ; gcc-patches 
Cc: Kito.cheng ; Li, Pan2 ; Wang, 
Yanzhang ; jeffreyalaw 
Subject: Re: [PATCH v1] RISC-V: Allow rounding mode control for RVV 
floating-point add

LGTM.
You can go ahead to implement rounding mode of floating-point by mode-switching:

Suggest you implement rounding mode for floating-poing as follows:

1st step: Implement mode-switching for floating-point rounding mode except 
DYNAMIC which should be totally same as fixed-point.
2nd step: Support DYNAMIC rounding mode on mode-switching which may need to 
modify the mode-switching PASS.

Thanks.

juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-06-27 14:06
To: gcc-patches
CC: juzhe.zhong; 
kito.cheng; pan2.li; 
yanzhang.wang; 
jeffreyalaw
Subject: [PATCH v1] RISC-V: Allow rounding mode control for RVV floating-point 
add
From: Pan Li mailto:pan2...@intel.com>>

According to the doc as below, we need to support the rounding mode of
the RVV floating-point, both the static and dynamice frm.

https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/226

For tracking and development friendly, We will take some steps to support
all rounding modes for the RVV floating-point rounding modes.

1. Allow rounding mode control by one intrinsic (aka this patch), vfadd.
2. Support static rounding mode control by mode switch, like fixed-point.
3. Support dynamice round mode control by mode switch.
4. Support the rest floating-point instructions for frm.

Please *NOTE* this patch only allow the rounding mode control for the
vfadd intrinsic API, and the related frm will be coverred by step 2.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
Co-Authored by: Juzhe-Zhong mailto:juzhe.zh...@rivai.ai>>

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum floating_point_rounding_mode):
Add macro for static frm min and max.
* config/riscv/riscv-vector-builtins-bases.cc
(class binop_frm): New class for floating-point with frm.
(BASE): Add vfadd for frm.
* config/riscv/riscv-vector-builtins-bases.h: Likewise.
* config/riscv/riscv-vector-builtins-functions.def
(vfadd_frm): Likewise.
* config/riscv/riscv-vector-builtins-shapes.cc
(struct alu_frm_def): New struct for alu with frm.
(SHAPE): Add alu with frm.
* config/riscv/riscv-vector-builtins-shapes.h: Likewise.
* config/riscv/riscv-vector-builtins.cc
(function_checker::report_out_of_range_and_not): New function
for report out of range and not val.
(function_checker::require_immediate_range_or): New function
for checking in range or one val.
* config/riscv/riscv-vector-builtins.h: Add function decl.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-error.c: New test.
* gcc.target/riscv/rvv/base/float-point-frm.c: New test.
---
gcc/config/riscv/riscv-protos.h   |  2 +
.../riscv/riscv-vector-builtins-bases.cc  | 25 +++
.../riscv/riscv-vector-builtins-bases.h   |  1 +
.../riscv/riscv-vector-builtins-functions.def |  2 +
.../riscv/riscv-vector-builtins-shapes.cc | 68 +++
.../riscv/riscv-vector-builtins-shapes.h  |  1 +
gcc/config/riscv/riscv-vector-builtins.cc | 41 +++
gcc/config/riscv/riscv-vector-builtins.h  |  4 ++
.../riscv/rvv/base/float-point-frm-error.c| 15 
.../riscv/rvv/base/float-point-frm.c  | 30 
10 files changed, 189 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-error.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm.c

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index f686edab3d1..bee64eee504 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -278,6 +278,8 @@ enum floating_point_rounding_mode
   FRM_RUP = 3, /* Aka 0b011.  */
   FRM_RMM = 4, /* Aka 0b100.  */
   FRM_DYN = 7, /* Aka 0b111.  */
+  FRM_STATIC_MIN = FRM_RNE,
+  FRM_STATIC_MAX = FRM_RMM,
};
opt_machine_mode vectorize_related_mode (machine_mode, scalar_mode,
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index 5c8deda900d..1b4c2c6ad66 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -281,6 +281,29 @@ public:
   }
};
+/* Implements below instructions for now.
+   - vfadd
+*/
+template
+class binop_frm : public function_base
+{
+public:
+  bool has_rounding_mode_operand_p () const override { return true; }
+
+  rtx expand (function_expander &e) const override
+  {
+switch (e.op_info->op)
+  {
+  case OP_TYPE_vf:
+ return e.use_exact_insn (code_for_pred_scalar (CODE, e.vector_mode ()));
+  case OP_TYP

Re: [PATCH] match.pd: Use element_mode instead of TYPE_MODE.

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, 27 Jun 2023, Robin Dapp wrote:

> > Can you push the element_mode change separately please?
> 
> OK.
> 
> > I'd like to hear more reasoning of why target_supports_op_p is wanted
> > here.  Doesn't target_supports_op_p return false if this is for example
> > a soft-fp target?  So if at all, shouldn't the test only be carried
> > out if the original operation was supported by the target?
> 
> Tamar and I discussed whether a target check is appropriate yesterday
> already here and were torn.  How or where would we expect a non-supported
> operation to be discarded, though?  In my case the expression is generated
> during a ranger fold and survives until expand where we ICE.

Why does the expander not have a fallback here?  If we put up
restrictions like this like we do for vector operations (after
vector lowering!), we need to document this.  Your check covers
more than just FP16 types as well which I think is undesirable.

So it seems for FP16 we need this for correctness (to not ICE)
while for other modes it might be appropriate for performance
(though I cannot imagine a target supporting say long double
not supporting float).

Richard.


Re: [PATCH] Mark asm goto with outputs as volatile

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, Jun 27, 2023 at 5:26 AM Andrew Pinski via Gcc-patches
 wrote:
>
> The manual references asm goto as being implicitly volatile already
> and that was done when asm goto could not have outputs. When outputs
> were added to `asm goto`, only asm goto without outputs were still being
> marked as volatile. Now some parts of GCC decide, removing the `asm goto`
> is ok if the output is not used, though not updating the CFG (this happens
> on both the RTL level and the gimple level). Since the biggest user of `asm 
> goto`
> is the Linux kernel and they expect them to be volatile (they use them to
> copy to/from userspace), we should just mark the inline-asm as volatile.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu.

OK.

> PR middle-end/110420
> PR middle-end/103979
> PR middle-end/98619
>
> gcc/ChangeLog:
>
> * gimplify.cc (gimplify_asm_expr): Mark asm with labels as volatile.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/compile/asmgoto-6.c: New test.
> ---
>  gcc/gimplify.cc   |  7 -
>  .../gcc.c-torture/compile/asmgoto-6.c | 26 +++
>  2 files changed, 32 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
>
> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index 0e24b915b8f..dc6a00e8bd9 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -6935,7 +6935,12 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p)
>stmt = gimple_build_asm_vec (TREE_STRING_POINTER (ASM_STRING (expr)),
>inputs, outputs, clobbers, labels);
>
> -  gimple_asm_set_volatile (stmt, ASM_VOLATILE_P (expr) || noutputs == 0);
> +  /* asm is volatile if it was marked by the user as volatile or
> +there is no outputs or this is an asm goto.  */
> +  gimple_asm_set_volatile (stmt,
> +  ASM_VOLATILE_P (expr)
> +  || noutputs == 0
> +  || labels);
>gimple_asm_set_input (stmt, ASM_INPUT_P (expr));
>gimple_asm_set_inline (stmt, ASM_INLINE_P (expr));
>
> diff --git a/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c 
> b/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
> new file mode 100644
> index 000..0652bd4e4e1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
> @@ -0,0 +1,26 @@
> +
> +/* { dg-do compile } */
> +/* PR middle-end/110420 */
> +/* PR middle-end/103979 */
> +/* PR middle-end/98619 */
> +/* Test that the middle-end does not remove the asm goto
> +   with an output. */
> +
> +static int t;
> +void g(void);
> +
> +void f(void)
> +{
> +  int  __gu_val;
> +  asm goto("#my asm "
> + : "=&r"(__gu_val)
> + :
> + :
> + : Efault);
> +  t = __gu_val;
> +  g();
> +Efault:
> +}
> +
> +/* Make sure "my asm " is still in the assembly. */
> +/* { dg-final { scan-assembler "my asm " } } */
> --
> 2.31.1
>


Re: [PATCH] [x86] Refine maskstore patterns with UNSPEC_MASKMOV.

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, Jun 27, 2023 at 7:38 AM liuhongt  wrote:
>
> At the rtl level, we cannot guarantee that the maskstore is not optimized
> to other full-memory accesses, as the current implementations are equivalent
> in terms of pattern, to solve this potential problem, this patch refines
> the pattern of the maskstore and the intrinsics with unspec.
>
> One thing I'm not sure is VCOND_EXPR, should VCOND_EXPR also expect
> fault suppression for masked-out elements?

You mean the vcond and vcond_eq optabs?  No, those do not expect
fault suppression.

>
> Currently we're still using vec_merge for both AVX2 and AVX512 target.
>
> 
> Similar like r14-2070-gc79476da46728e
>
> If mem_addr points to a memory region with less than whole vector size
> bytes of accessible memory and k is a mask that would prevent reading
> the inaccessible bytes from mem_addr, add UNSPEC_MASKMOV to prevent
> it to be transformed to any other whole memory access instructions.
>
> Bootstrapped and regtested on x86_64-pc-linu-gnu{-m32,}.
> Ready to push to trunk.
>
> gcc/ChangeLog:
>
> PR rtl-optimization/110237
> * config/i386/sse.md (_store_mask): Refine with
> UNSPEC_MASKMOV.
> (maskstore (*_store_mask): New define_insn, it's renamed
> from original _store_mask.
> ---
>  gcc/config/i386/sse.md | 69 ++
>  1 file changed, 57 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 3b50c7117f8..812cfca4b92 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -1608,7 +1608,7 @@ (define_insn "_blendm"
> (set_attr "prefix" "evex")
> (set_attr "mode" "")])
>
> -(define_insn "_store_mask"
> +(define_insn "*_store_mask"
>[(set (match_operand:V48_AVX512VL 0 "memory_operand" "=m")
> (vec_merge:V48_AVX512VL
>   (match_operand:V48_AVX512VL 1 "register_operand" "v")
> @@ -1636,7 +1636,7 @@ (define_insn "_store_mask"
> (set_attr "memory" "store")
> (set_attr "mode" "")])
>
> -(define_insn "_store_mask"
> +(define_insn "*_store_mask"
>[(set (match_operand:VI12HFBF_AVX512VL 0 "memory_operand" "=m")
> (vec_merge:VI12HFBF_AVX512VL
>   (match_operand:VI12HFBF_AVX512VL 1 "register_operand" "v")
> @@ -27008,21 +27008,66 @@ (define_expand "maskstore"
>"TARGET_AVX")
>
>  (define_expand "maskstore"
> -  [(set (match_operand:V48H_AVX512VL 0 "memory_operand")
> -   (vec_merge:V48H_AVX512VL
> - (match_operand:V48H_AVX512VL 1 "register_operand")
> - (match_dup 0)
> - (match_operand: 2 "register_operand")))]
> +  [(set (match_operand:V48_AVX512VL 0 "memory_operand")
> +   (unspec:V48_AVX512VL
> + [(match_operand:V48_AVX512VL 1 "register_operand")
> +  (match_dup 0)
> +  (match_operand: 2 "register_operand")]
> + UNSPEC_MASKMOV))]
>"TARGET_AVX512F")
>
>  (define_expand "maskstore"
> -  [(set (match_operand:VI12_AVX512VL 0 "memory_operand")
> -   (vec_merge:VI12_AVX512VL
> - (match_operand:VI12_AVX512VL 1 "register_operand")
> - (match_dup 0)
> - (match_operand: 2 "register_operand")))]
> +  [(set (match_operand:VI12HFBF_AVX512VL 0 "memory_operand")
> +   (unspec:VI12HFBF_AVX512VL
> + [(match_operand:VI12HFBF_AVX512VL 1 "register_operand")
> +  (match_dup 0)
> +  (match_operand: 2 "register_operand")]
> + UNSPEC_MASKMOV))]
>"TARGET_AVX512BW")
>
> +(define_insn "_store_mask"
> +  [(set (match_operand:V48_AVX512VL 0 "memory_operand" "=m")
> +   (unspec:V48_AVX512VL
> + [(match_operand:V48_AVX512VL 1 "register_operand" "v")
> +  (match_dup 0)
> +  (match_operand: 2 "register_operand" "Yk")]
> + UNSPEC_MASKMOV))]
> +  "TARGET_AVX512F"
> +{
> +  if (FLOAT_MODE_P (GET_MODE_INNER (mode)))
> +{
> +  if (misaligned_operand (operands[0], mode))
> +   return "vmovu\t{%1, %0%{%2%}|%0%{%2%}, %1}";
> +  else
> +   return "vmova\t{%1, %0%{%2%}|%0%{%2%}, %1}";
> +}
> +  else
> +{
> +  if (misaligned_operand (operands[0], mode))
> +   return "vmovdqu\t{%1, %0%{%2%}|%0%{%2%}, %1}";
> +  else
> +   return "vmovdqa\t{%1, %0%{%2%}|%0%{%2%}, %1}";
> +}
> +}
> +  [(set_attr "type" "ssemov")
> +   (set_attr "prefix" "evex")
> +   (set_attr "memory" "store")
> +   (set_attr "mode" "")])
> +
> +(define_insn "_store_mask"
> +  [(set (match_operand:VI12HFBF_AVX512VL 0 "memory_operand" "=m")
> +   (unspec:VI12HFBF_AVX512VL
> + [(match_operand:VI12HFBF_AVX512VL 1 "register_operand" "v")
> +  (match_dup 0)
> +  (match_operand: 2 "register_operand" "Yk")]
> +  UNSPEC_MASKMOV))]
> +  "TARGET_AVX512BW"
> +  "vmovdqu\t{%1, %0%{%2%}|%0%{%2%}, %1}"
> +  [(set_attr "type" "ssemov")
> +   (set_attr "prefix" "evex")
> +   (set_attr "memory" "store")
> +   (set_attr "mode" "")])
> +
>  (define_expand "cbranch4"
>[(set (reg

Re: [PATCH] gengtype: Handle braced initialisers in structs

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, Jun 27, 2023 at 7:38 AM Richard Sandiford via Gcc-patches
 wrote:
>
> I have a patch that adds braced initializers to a GTY structure.
> gengtype didn't accept that, because it parsed the "{ ... }" in
> " = { ... };" as the end of a statement (as "{ ... }" would be in
> a function definition) and so it didn't expect the following ";".
>
> This patch explicitly handles initialiser-like sequences.
>
> Arguably, the parser should also skip redundant ";", but that
> feels more like a workaround rather than the real fix.
>
> Tested on aarch64-linux-gnu & x86_&4-linux-gnu.  OK to install?

OK.

> Richard
>
>
> gcc/
> * gengtype-parse.cc (consume_until_comma_or_eos): Parse "= { ... }"
> as a probable initializer rather than a probable complete statement.
> ---
>  gcc/gengtype-parse.cc | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/gengtype-parse.cc b/gcc/gengtype-parse.cc
> index 2b2156c5f45..19184d77899 100644
> --- a/gcc/gengtype-parse.cc
> +++ b/gcc/gengtype-parse.cc
> @@ -450,6 +450,12 @@ consume_until_comma_or_eos ()
> parse_error ("unexpected end of file while scanning for ',' or ';'");
> return false;
>
> +  case '=':
> +   advance ();
> +   if (token () == '{')
> + consume_balanced ('{', '}');
> +   break;
> +
>default:
> advance ();
> break;
> --
> 2.25.1
>


Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, Jun 27, 2023 at 8:30 AM Tejas Belagod  wrote:
>
>
>
>
>
> From: Richard Biener 
> Date: Monday, June 26, 2023 at 2:23 PM
> To: Tejas Belagod 
> Cc: gcc-patches@gcc.gnu.org 
> Subject: Re: [RFC] GNU Vector Extension -- Packed Boolean Vectors
>
> On Mon, Jun 26, 2023 at 8:24 AM Tejas Belagod via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > Packed Boolean Vectors
> > --
> >
> > I'd like to propose a feature addition to GNU Vector extensions to add 
> > packed
> > boolean vectors (PBV).  This has been discussed in the past here[1] and a 
> > variant has
> > been implemented in Clang recently[2].
> >
> > With predication features being added to vector architectures (SVE, MVE, 
> > AVX),
> > it is a useful feature to have to model predication on targets.  This could
> > find its use in intrinsics or just used as is as a GNU vector extension 
> > being
> > mapped to underlying target features.  For example, the packed boolean 
> > vector
> > could directly map to a predicate register on SVE.
> >
> > Also, this new packed boolean type GNU extension can be used with SVE ACLE
> > intrinsics to replace a fixed-length svbool_t.
> >
> > Here are a few options to represent the packed boolean vector type.
>
> The GIMPLE frontend uses a new 'vector_mask' attribute:
>
> typedef int v8si __attribute__((vector_size(8*sizeof(int;
> typedef v8si v8sib __attribute__((vector_mask));
>
> it get's you a vector type that's the appropriate (dependent on the
> target) vector
> mask type for the vector data type (v8si in this case).
>
>
>
> Thanks Richard.
>
> Having had a quick look at the implementation, it does seem to tick the boxes.
>
> I must admit I haven't dug deep, but if the target hook allows the mask to be
>
> defined in way that is target-friendly (and I don't know how much effort it 
> will
>
> be to migrate the attribute to more front-ends), it should do the job nicely.
>
> Let me go back and dig a bit deeper and get back with questions if any.

Let me add that the advantage of this is the compiler doesn't need
to support weird explicitely laid out packed boolean vectors that do
not match what the target supports and the user doesn't need to know
what the target supports (and thus have an #ifdef maze around explicitely
specified layouts).

It does remove some flexibility though, for example with -mavx512f -mavx512vl
you'll get AVX512 style masks for V4SImode data vectors but of course the
target sill supports SSE2/AVX2 style masks as well, but those would not be
available as "packed boolean vectors", though they are of course in fact
equal to V4SImode data vectors with -1 or 0 values, so in this particular
case it might not matter.

That said, the vector_mask attribute will get you V4SImode vectors with
signed boolean elements of 32 bits for V4SImode data vectors with
SSE2/AVX2.

Richard.

>
>
> Thanks,
>
> Tejas.
>
>
>
>
>
>
>
> > 1. __attribute__((vector_size (n))) where n represents bytes
> >
> >   typedef bool vbool __attribute__ ((vector_size (1)));
> >
> > In this approach, the shape of the boolean vector is unclear. IoW, it is not
> > clear if each bit in 'n' controls a byte or an element. On targets
> > like SVE, it would be natural to have each bit control a byte of the target
> > vector (therefore resulting in an 'unpacked' layout of the PBV) and on AVX, 
> > each
> > bit would control one element/lane on the target vector(therefore resulting 
> > in a
> > 'packed' layout with all significant bits at the LSB).
> >
> > 2. __attribute__((vector_size (n))) where n represents num of lanes
> >
> >   typedef int v4si __attribute__ ((vector_size (4 * sizeof (int)));
> >   typedef bool v4bi __attribute__ ((vector_size (sizeof v4si / sizeof 
> > (v4si){0}[0])));
> >
> > Here the 'n' in the vector_size attribute represents the number of bits that
> > is needed to represent a vector quantity.  In this case, this packed boolean
> > vector can represent upto 'n' vector lanes. The size of the type is
> > rounded up the nearest byte.  For example, the sizeof v4bi in the above
> > example is 1.
> >
> > In this approach, because of the nature of the representation, the n bits 
> > required
> > to represent the n lanes of the vector are packed at the LSB. This does not 
> > naturally
> > align with the SVE approach of each bit representing a byte of the target 
> > vector
> > and PBV therefore having an 'unpacked' layout.
> >
> > More importantly, another drawback here is that the change in units for 
> > vector_size
> > might be confusing to programmers.  The units will have to be interpreted 
> > based on the
> > base type of the typedef. It does not offer any flexibility in terms of the 
> > layout of
> > the bool vector - it is fixed.
> >
> > 3. Combination of 1 and 2.
> >
> > Combining the best of 1 and 2, we can introduce extra parameters to 
> > vector_size that will
> > unambiguously represent the layout of the PBV. Consider
> >
> >   typedef bool vbool __attribute__((vector_size (s, n[, w])

Re: [PATCH] [x86] Refine maskstore patterns with UNSPEC_MASKMOV.

2023-06-27 Thread Hongtao Liu via Gcc-patches
On Tue, Jun 27, 2023 at 3:20 PM Richard Biener via Gcc-patches
 wrote:
>
> On Tue, Jun 27, 2023 at 7:38 AM liuhongt  wrote:
> >
> > At the rtl level, we cannot guarantee that the maskstore is not optimized
> > to other full-memory accesses, as the current implementations are equivalent
> > in terms of pattern, to solve this potential problem, this patch refines
> > the pattern of the maskstore and the intrinsics with unspec.
> >
> > One thing I'm not sure is VCOND_EXPR, should VCOND_EXPR also expect
> > fault suppression for masked-out elements?
>
> You mean the vcond and vcond_eq optabs?  No, those do not expect
> fault suppression.
Yes, vcond/vcond_eq, thanks for clarifying.
>
> >
> > Currently we're still using vec_merge for both AVX2 and AVX512 target.
> >
> > 
> > Similar like r14-2070-gc79476da46728e
> >
> > If mem_addr points to a memory region with less than whole vector size
> > bytes of accessible memory and k is a mask that would prevent reading
> > the inaccessible bytes from mem_addr, add UNSPEC_MASKMOV to prevent
> > it to be transformed to any other whole memory access instructions.
> >
> > Bootstrapped and regtested on x86_64-pc-linu-gnu{-m32,}.
> > Ready to push to trunk.
> >
> > gcc/ChangeLog:
> >
> > PR rtl-optimization/110237
> > * config/i386/sse.md (_store_mask): Refine with
> > UNSPEC_MASKMOV.
> > (maskstore > (*_store_mask): New define_insn, it's renamed
> > from original _store_mask.
> > ---
> >  gcc/config/i386/sse.md | 69 ++
> >  1 file changed, 57 insertions(+), 12 deletions(-)
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 3b50c7117f8..812cfca4b92 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -1608,7 +1608,7 @@ (define_insn "_blendm"
> > (set_attr "prefix" "evex")
> > (set_attr "mode" "")])
> >
> > -(define_insn "_store_mask"
> > +(define_insn "*_store_mask"
> >[(set (match_operand:V48_AVX512VL 0 "memory_operand" "=m")
> > (vec_merge:V48_AVX512VL
> >   (match_operand:V48_AVX512VL 1 "register_operand" "v")
> > @@ -1636,7 +1636,7 @@ (define_insn "_store_mask"
> > (set_attr "memory" "store")
> > (set_attr "mode" "")])
> >
> > -(define_insn "_store_mask"
> > +(define_insn "*_store_mask"
> >[(set (match_operand:VI12HFBF_AVX512VL 0 "memory_operand" "=m")
> > (vec_merge:VI12HFBF_AVX512VL
> >   (match_operand:VI12HFBF_AVX512VL 1 "register_operand" "v")
> > @@ -27008,21 +27008,66 @@ (define_expand 
> > "maskstore"
> >"TARGET_AVX")
> >
> >  (define_expand "maskstore"
> > -  [(set (match_operand:V48H_AVX512VL 0 "memory_operand")
> > -   (vec_merge:V48H_AVX512VL
> > - (match_operand:V48H_AVX512VL 1 "register_operand")
> > - (match_dup 0)
> > - (match_operand: 2 "register_operand")))]
> > +  [(set (match_operand:V48_AVX512VL 0 "memory_operand")
> > +   (unspec:V48_AVX512VL
> > + [(match_operand:V48_AVX512VL 1 "register_operand")
> > +  (match_dup 0)
> > +  (match_operand: 2 "register_operand")]
> > + UNSPEC_MASKMOV))]
> >"TARGET_AVX512F")
> >
> >  (define_expand "maskstore"
> > -  [(set (match_operand:VI12_AVX512VL 0 "memory_operand")
> > -   (vec_merge:VI12_AVX512VL
> > - (match_operand:VI12_AVX512VL 1 "register_operand")
> > - (match_dup 0)
> > - (match_operand: 2 "register_operand")))]
> > +  [(set (match_operand:VI12HFBF_AVX512VL 0 "memory_operand")
> > +   (unspec:VI12HFBF_AVX512VL
> > + [(match_operand:VI12HFBF_AVX512VL 1 "register_operand")
> > +  (match_dup 0)
> > +  (match_operand: 2 "register_operand")]
> > + UNSPEC_MASKMOV))]
> >"TARGET_AVX512BW")
> >
> > +(define_insn "_store_mask"
> > +  [(set (match_operand:V48_AVX512VL 0 "memory_operand" "=m")
> > +   (unspec:V48_AVX512VL
> > + [(match_operand:V48_AVX512VL 1 "register_operand" "v")
> > +  (match_dup 0)
> > +  (match_operand: 2 "register_operand" "Yk")]
> > + UNSPEC_MASKMOV))]
> > +  "TARGET_AVX512F"
> > +{
> > +  if (FLOAT_MODE_P (GET_MODE_INNER (mode)))
> > +{
> > +  if (misaligned_operand (operands[0], mode))
> > +   return "vmovu\t{%1, %0%{%2%}|%0%{%2%}, %1}";
> > +  else
> > +   return "vmova\t{%1, %0%{%2%}|%0%{%2%}, %1}";
> > +}
> > +  else
> > +{
> > +  if (misaligned_operand (operands[0], mode))
> > +   return "vmovdqu\t{%1, %0%{%2%}|%0%{%2%}, %1}";
> > +  else
> > +   return "vmovdqa\t{%1, %0%{%2%}|%0%{%2%}, %1}";
> > +}
> > +}
> > +  [(set_attr "type" "ssemov")
> > +   (set_attr "prefix" "evex")
> > +   (set_attr "memory" "store")
> > +   (set_attr "mode" "")])
> > +
> > +(define_insn "_store_mask"
> > +  [(set (match_operand:VI12HFBF_AVX512VL 0 "memory_operand" "=m")
> > +   (unspec:VI12HFBF_AVX512VL
> > + [(match_operand:VI12HFBF_AVX512VL 1 "register_ope

Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:

> From: Ju-Zhe Zhong 
> 
> Hi, Richi.
> 
> I tried to understand your last email and to refactor the do-while loop using 
> VECTOR_CST_NELTS.
> 
> This patch works fine for LEN_MASK_STORE and compiler can CSE redundant store.
> I have appended testcase in this patch to test VN for LEN_MASK_STORE.
> 
> I am not sure whether I am on the same page with you.
> 
> Feel free to correct me, Thanks.
> 
> gcc/ChangeLog:
> 
> * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and 
> fix LEN_STORE
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New test.
> 
> ---
>  .../rvv/autovec/partial/len_maskstore_vn-1.c  | 30 +++
>  gcc/tree-ssa-sccvn.cc | 24 +++
>  2 files changed, 49 insertions(+), 5 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> 
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> new file mode 100644
> index 000..0b2d03693dc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param 
> riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */
> +
> +void __attribute__((noinline,noclone))
> +foo (int *out, int *res)
> +{
> +  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
> +  int i;
> +  for (i = 0; i < 16; ++i)
> +{
> +  if (mask[i])
> +out[i] = i;
> +}
> +  int o0 = out[0];
> +  int o7 = out[7];
> +  int o14 = out[14];
> +  int o15 = out[15];
> +  res[0] = o0;
> +  res[2] = o7;
> +  res[4] = o14;
> +  res[6] = o15;
> +}
> +
> +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two
> +   vector iterations.  FRE5 after that should be able to CSE
> +   out[7] and out[15], but leave out[0] and out[14] alone.  */
> +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */
> +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */
> +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */
> +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 11061a374a2..242d82d6274 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
> if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
>   return (void *)-1;
> break;
> + case IFN_LEN_MASK_STORE:
> +   len = gimple_call_arg (call, 2);
> +   bias = gimple_call_arg (call, 5);
> +   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> + return (void *)-1;
> +   mask = gimple_call_arg (call, internal_fn_mask_index (fn));
> +   mask = vn_valueize (mask);
> +   if (TREE_CODE (mask) != VECTOR_CST)
> + return (void *)-1;
> +   break;
>   default:
> return (void *)-1;
>   }
> @@ -3344,11 +3354,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
> tree vectype = TREE_TYPE (def_rhs);
> unsigned HOST_WIDE_INT elsz
>   = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> +   /* Set initial len value is the UINT_MAX, so mask_idx < actual_len
> +  is always true for MASK_STORE.  */
> +   unsigned actual_len = UINT_MAX;
> +   if (len)
> + actual_len = tree_to_uhwi (len) + tree_to_shwi (bias);
> +   unsigned nunits
> + = MIN (actual_len, VECTOR_CST_NELTS (mask).coeffs[0]);

No, that's not correct and what I meant.  There's
vector_cst_encoded_nelts (mask), but for example for an
all-ones mask that would be 1.  You'd then also not access
VECTOR_CST_ELT but VECTOR_CST_ENCODED_ELT.  What I'm not sure
is how to recover the implicit walk over all elements for the
purpose of computing start/length and how generally this will
work for variable-length vectors where enumerating "all"
elements touched is required for correctness.

The most simple thing would be to make this all conditional
to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
asking for VL vector testcases.

Richard.

> if (mask)
>   {
> HOST_WIDE_INT start = 0, length = 0;
> -   unsigned mask_idx = 0;
> -   do
> +   for (unsigned mask_idx = 0; mask_idx < nunits; mask_idx++)
>   {
> if (integer_zerop (VECTOR_CST_ELT (mask, mask_idx)))
>   {
> @@ -3371,9 +3387,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>   }
> else
>   length += elsz;
> -   

Re: [PATCH] Mark asm goto with outputs as volatile

2023-06-27 Thread Andreas Schwab via Gcc-patches
On Jun 26 2023, Andrew Pinski via Gcc-patches wrote:

> diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> index 0e24b915b8f..dc6a00e8bd9 100644
> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -6935,7 +6935,12 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p)
>stmt = gimple_build_asm_vec (TREE_STRING_POINTER (ASM_STRING (expr)),
>  inputs, outputs, clobbers, labels);
>  
> -  gimple_asm_set_volatile (stmt, ASM_VOLATILE_P (expr) || noutputs == 0);
> +  /* asm is volatile if it was marked by the user as volatile or
> +  there is no outputs or this is an asm goto.  */
   are

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

2023-06-27 Thread juzhe.zh...@rivai.ai
Hi, Richi.

When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but 
actual actual nunits is 8 in that case,
then I failed to walk all elements analysis.

>> The most simple thing would be to make this all conditional
>> to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
>> asking for VL vector testcases.
Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in intrinsics. 
I am not sure how to reproduce VL vectors
in C code.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-27 15:33
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford; pan2.li
Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Hi, Richi.
> 
> I tried to understand your last email and to refactor the do-while loop using 
> VECTOR_CST_NELTS.
> 
> This patch works fine for LEN_MASK_STORE and compiler can CSE redundant store.
> I have appended testcase in this patch to test VN for LEN_MASK_STORE.
> 
> I am not sure whether I am on the same page with you.
> 
> Feel free to correct me, Thanks.
> 
> gcc/ChangeLog:
> 
> * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and 
> fix LEN_STORE
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New test.
> 
> ---
>  .../rvv/autovec/partial/len_maskstore_vn-1.c  | 30 +++
>  gcc/tree-ssa-sccvn.cc | 24 +++
>  2 files changed, 49 insertions(+), 5 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> 
> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> new file mode 100644
> index 000..0b2d03693dc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param 
> riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */
> +
> +void __attribute__((noinline,noclone))
> +foo (int *out, int *res)
> +{
> +  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
> +  int i;
> +  for (i = 0; i < 16; ++i)
> +{
> +  if (mask[i])
> +out[i] = i;
> +}
> +  int o0 = out[0];
> +  int o7 = out[7];
> +  int o14 = out[14];
> +  int o15 = out[15];
> +  res[0] = o0;
> +  res[2] = o7;
> +  res[4] = o14;
> +  res[6] = o15;
> +}
> +
> +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two
> +   vector iterations.  FRE5 after that should be able to CSE
> +   out[7] and out[15], but leave out[0] and out[14] alone.  */
> +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */
> +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */
> +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */
> +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */
> diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> index 11061a374a2..242d82d6274 100644
> --- a/gcc/tree-ssa-sccvn.cc
> +++ b/gcc/tree-ssa-sccvn.cc
> @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
>  return (void *)-1;
>break;
> + case IFN_LEN_MASK_STORE:
> +   len = gimple_call_arg (call, 2);
> +   bias = gimple_call_arg (call, 5);
> +   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> + return (void *)-1;
> +   mask = gimple_call_arg (call, internal_fn_mask_index (fn));
> +   mask = vn_valueize (mask);
> +   if (TREE_CODE (mask) != VECTOR_CST)
> + return (void *)-1;
> +   break;
>  default:
>return (void *)-1;
>  }
> @@ -3344,11 +3354,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> *data_,
>tree vectype = TREE_TYPE (def_rhs);
>unsigned HOST_WIDE_INT elsz
>  = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> +   /* Set initial len value is the UINT_MAX, so mask_idx < actual_len
> + is always true for MASK_STORE.  */
> +   unsigned actual_len = UINT_MAX;
> +   if (len)
> + actual_len = tree_to_uhwi (len) + tree_to_shwi (bias);
> +   unsigned nunits
> + = MIN (actual_len, VECTOR_CST_NELTS (mask).coeffs[0]);
 
No, that's not correct and what I meant.  There's
vector_cst_encoded_nelts (mask), but for example for an
all-ones mask that would be 1.  You'd then also not access
VECTOR_CST_ELT but VECTOR_CST_ENCODED_ELT.  What I'm not sure
is how to recover the implicit walk over all elements for the
purpose of computing start/length and how generally this will
work for variable-length vectors where enumerating "all"
elements touched is required for correctness.
 
The most simple thing would be to make this all conditional
to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
asking for VL vector testcases.
 
Richard.
 
>if (mask)
>  {
>   

Re: [PATCH] match.pd: Use element_mode instead of TYPE_MODE.

2023-06-27 Thread Robin Dapp via Gcc-patches
> Why does the expander not have a fallback here?  If we put up
> restrictions like this like we do for vector operations (after
> vector lowering!), we need to document this.  Your check covers
> more than just FP16 types as well which I think is undesirable.

I'm not sure I follow.  What would we fall back to if
(_Float16)a + (_Float16)b is not supported?  Should I provide
a (_Float16)((float)a + (float)b) fallback?  But that would just
undo the simplification we performed.  Or do you mean in optabs
already?

> So it seems for FP16 we need this for correctness (to not ICE)
> while for other modes it might be appropriate for performance
> (though I cannot imagine a target supporting say long double
> not supporting float).

What about something like:

-  && target_supports_op_p (newtype, op, optab_default)
+  && (!target_supports_op_p (itype, op, optab_default)
+  || element_mode (newtype) != HFmode
+  || target_supports_op_p (newtype, op, optab_default))
?

Regards
 Robin



Re: [PATCH] [x86] Refine maskstore patterns with UNSPEC_MASKMOV.

2023-06-27 Thread Hongtao Liu via Gcc-patches
On Tue, Jun 27, 2023 at 3:28 PM Hongtao Liu  wrote:
>
> On Tue, Jun 27, 2023 at 3:20 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On Tue, Jun 27, 2023 at 7:38 AM liuhongt  wrote:
> > >
> > > At the rtl level, we cannot guarantee that the maskstore is not optimized
> > > to other full-memory accesses, as the current implementations are 
> > > equivalent
> > > in terms of pattern, to solve this potential problem, this patch refines
> > > the pattern of the maskstore and the intrinsics with unspec.
> > >
> > > One thing I'm not sure is VCOND_EXPR, should VCOND_EXPR also expect
> > > fault suppression for masked-out elements?
> >
> > You mean the vcond and vcond_eq optabs?  No, those do not expect
> > fault suppression.
> Yes, vcond/vcond_eq, thanks for clarifying.
> >
> > >
> > > Currently we're still using vec_merge for both AVX2 and AVX512 target.
> > >
> > > 
> > > Similar like r14-2070-gc79476da46728e
> > >
> > > If mem_addr points to a memory region with less than whole vector size
> > > bytes of accessible memory and k is a mask that would prevent reading
> > > the inaccessible bytes from mem_addr, add UNSPEC_MASKMOV to prevent
> > > it to be transformed to any other whole memory access instructions.
> > >
> > > Bootstrapped and regtested on x86_64-pc-linu-gnu{-m32,}.
> > > Ready to push to trunk.
I'm going to backpart this patch and masload one[1] to GCC11/GCC12/GCC13

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622410.html
> > >
> > > gcc/ChangeLog:
> > >
> > > PR rtl-optimization/110237
> > > * config/i386/sse.md (_store_mask): Refine with
> > > UNSPEC_MASKMOV.
> > > (maskstore > > (*_store_mask): New define_insn, it's renamed
> > > from original _store_mask.
> > > ---
> > >  gcc/config/i386/sse.md | 69 ++
> > >  1 file changed, 57 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > > index 3b50c7117f8..812cfca4b92 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -1608,7 +1608,7 @@ (define_insn "_blendm"
> > > (set_attr "prefix" "evex")
> > > (set_attr "mode" "")])
> > >
> > > -(define_insn "_store_mask"
> > > +(define_insn "*_store_mask"
> > >[(set (match_operand:V48_AVX512VL 0 "memory_operand" "=m")
> > > (vec_merge:V48_AVX512VL
> > >   (match_operand:V48_AVX512VL 1 "register_operand" "v")
> > > @@ -1636,7 +1636,7 @@ (define_insn "_store_mask"
> > > (set_attr "memory" "store")
> > > (set_attr "mode" "")])
> > >
> > > -(define_insn "_store_mask"
> > > +(define_insn "*_store_mask"
> > >[(set (match_operand:VI12HFBF_AVX512VL 0 "memory_operand" "=m")
> > > (vec_merge:VI12HFBF_AVX512VL
> > >   (match_operand:VI12HFBF_AVX512VL 1 "register_operand" "v")
> > > @@ -27008,21 +27008,66 @@ (define_expand 
> > > "maskstore"
> > >"TARGET_AVX")
> > >
> > >  (define_expand "maskstore"
> > > -  [(set (match_operand:V48H_AVX512VL 0 "memory_operand")
> > > -   (vec_merge:V48H_AVX512VL
> > > - (match_operand:V48H_AVX512VL 1 "register_operand")
> > > - (match_dup 0)
> > > - (match_operand: 2 "register_operand")))]
> > > +  [(set (match_operand:V48_AVX512VL 0 "memory_operand")
> > > +   (unspec:V48_AVX512VL
> > > + [(match_operand:V48_AVX512VL 1 "register_operand")
> > > +  (match_dup 0)
> > > +  (match_operand: 2 "register_operand")]
> > > + UNSPEC_MASKMOV))]
> > >"TARGET_AVX512F")
> > >
> > >  (define_expand "maskstore"
> > > -  [(set (match_operand:VI12_AVX512VL 0 "memory_operand")
> > > -   (vec_merge:VI12_AVX512VL
> > > - (match_operand:VI12_AVX512VL 1 "register_operand")
> > > - (match_dup 0)
> > > - (match_operand: 2 "register_operand")))]
> > > +  [(set (match_operand:VI12HFBF_AVX512VL 0 "memory_operand")
> > > +   (unspec:VI12HFBF_AVX512VL
> > > + [(match_operand:VI12HFBF_AVX512VL 1 "register_operand")
> > > +  (match_dup 0)
> > > +  (match_operand: 2 "register_operand")]
> > > + UNSPEC_MASKMOV))]
> > >"TARGET_AVX512BW")
> > >
> > > +(define_insn "_store_mask"
> > > +  [(set (match_operand:V48_AVX512VL 0 "memory_operand" "=m")
> > > +   (unspec:V48_AVX512VL
> > > + [(match_operand:V48_AVX512VL 1 "register_operand" "v")
> > > +  (match_dup 0)
> > > +  (match_operand: 2 "register_operand" "Yk")]
> > > + UNSPEC_MASKMOV))]
> > > +  "TARGET_AVX512F"
> > > +{
> > > +  if (FLOAT_MODE_P (GET_MODE_INNER (mode)))
> > > +{
> > > +  if (misaligned_operand (operands[0], mode))
> > > +   return "vmovu\t{%1, %0%{%2%}|%0%{%2%}, %1}";
> > > +  else
> > > +   return "vmova\t{%1, %0%{%2%}|%0%{%2%}, %1}";
> > > +}
> > > +  else
> > > +{
> > > +  if (misaligned_operand (operands[0], mode))
> > > +   return "vmovdqu\t{%1, %0%{%2%}|%0%{%2%}, %1}";
> >

Re: [PATCH 1/2] Mid engine setup [SU]ABDL

2023-06-27 Thread Richard Sandiford via Gcc-patches
Richard Sandiford  writes:
>> - VTYPE x, y, out;
>> + VTYPE x, y;
>> + WTYPE out;
>>   type diff;
>> loop i in range:
>>   S1 diff = x[i] - y[i]
>>   S2 out[i] = ABS_EXPR ;
>>  
>> -   where 'type' is a integer and 'VTYPE' is a vector of integers
>> -   the same size as 'type'
>> +   where 'VTYPE' and 'WTYPE' are vectors of integers.
>> + 'WTYPE' may be wider than 'VTYPE'.
>> + 'type' is as wide as 'WTYPE'.
>
> I don't think the existing comment is right about the types.  What we're
> matching is scalar code, so VTYPE and (now) WTYPE are integers rather
> than vectors of integers.

Gah, sorry, I realise now that the point was that VTYPE and WTYPE
are sequences rather than scalars.  But patterns are used for SLP
as well as loops, and the inputs and outputs might not be memory
objects.  So:

> I think it would be clearer to write:
>
>S1 diff = (type) x[i] - (type) y[i]
>S2 out[i] = ABS_EXPR <(WTYPE) diff>;
>
> since the promotions happen on the operands.
>
> It'd be good to keep the part about 'type' being an integer.
>
> Rather than:
>
>'WTYPE' may be wider than 'VTYPE'.
>'type' is as wide as 'WTYPE'.
>
> maybe:
>
>'type' is no narrower than 'VTYPE' (but may be wider)
>'WTYPE' is no narrower than 'type' (but may be wider)

...how about:

  TYPE1 x;
  TYPE2 y;
  TYPE3 x_cast = (TYPE3) x;  // widening or no-op
  TYPE3 y_cast = (TYPE3) y;  // widening or no-op
  TYPE3 diff = x_cast - y_cast;
  TYPE4 diff_cast = (TYPE4) diff;// widening or no-op
  TYPE5 abs = ABS(U)_EXPR ;

(based on the comment above vect_recog_widen_op_pattern).

Thanks,
Richard


Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but 
> actual actual nunits is 8 in that case,
> then I failed to walk all elements analysis.
> 
> >> The most simple thing would be to make this all conditional
> >> to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
> >> asking for VL vector testcases.
> Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in 
> intrinsics. I am not sure how to reproduce VL vectors
> in C code.

The original motivation was from unrolled vector code for the
conditional masking case.  Does RVV allow MASK_STORE from
intrinsics?

Otherwise how about

 for (i = 0; i < 8; ++i)
   a[i] = 42;
 for (i = 2; i < 6; ++i)
   b[i] = a[i];

with disabling complete unrolling -fdisable-tree-cunrolli both
loops should get vectorized, hopefully not iterating(?)  Then
we end up with the overlapping CSE opportunity, no?  Maybe
it needs support for fixed size vectors and using VL vectors
just for the epilog.

Richard.


> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-27 15:33
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford; pan2.li
> Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong 
> > 
> > Hi, Richi.
> > 
> > I tried to understand your last email and to refactor the do-while loop 
> > using VECTOR_CST_NELTS.
> > 
> > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant 
> > store.
> > I have appended testcase in this patch to test VN for LEN_MASK_STORE.
> > 
> > I am not sure whether I am on the same page with you.
> > 
> > Feel free to correct me, Thanks.
> > 
> > gcc/ChangeLog:
> > 
> > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and 
> > fix LEN_STORE
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New 
> > test.
> > 
> > ---
> >  .../rvv/autovec/partial/len_maskstore_vn-1.c  | 30 +++
> >  gcc/tree-ssa-sccvn.cc | 24 +++
> >  2 files changed, 49 insertions(+), 5 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > 
> > diff --git 
> > a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > new file mode 100644
> > index 000..0b2d03693dc
> > --- /dev/null
> > +++ 
> > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param 
> > riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */
> > +
> > +void __attribute__((noinline,noclone))
> > +foo (int *out, int *res)
> > +{
> > +  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
> > +  int i;
> > +  for (i = 0; i < 16; ++i)
> > +{
> > +  if (mask[i])
> > +out[i] = i;
> > +}
> > +  int o0 = out[0];
> > +  int o7 = out[7];
> > +  int o14 = out[14];
> > +  int o15 = out[15];
> > +  res[0] = o0;
> > +  res[2] = o7;
> > +  res[4] = o14;
> > +  res[6] = o15;
> > +}
> > +
> > +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two
> > +   vector iterations.  FRE5 after that should be able to CSE
> > +   out[7] and out[15], but leave out[0] and out[14] alone.  */
> > +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */
> > +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */
> > +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */
> > +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */
> > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> > index 11061a374a2..242d82d6274 100644
> > --- a/gcc/tree-ssa-sccvn.cc
> > +++ b/gcc/tree-ssa-sccvn.cc
> > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> > *data_,
> >if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> >  return (void *)-1;
> >break;
> > + case IFN_LEN_MASK_STORE:
> > +   len = gimple_call_arg (call, 2);
> > +   bias = gimple_call_arg (call, 5);
> > +   if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_p (bias))
> > + return (void *)-1;
> > +   mask = gimple_call_arg (call, internal_fn_mask_index (fn));
> > +   mask = vn_valueize (mask);
> > +   if (TREE_CODE (mask) != VECTOR_CST)
> > + return (void *)-1;
> > +   break;
> >  default:
> >return (void *)-1;
> >  }
> > @@ -3344,11 +3354,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> > *data_,
> >tree vectype = TREE_TYPE (def_rhs);
> >unsigned HOST_WIDE_INT elsz
> >  = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (vectype)));
> > +   /* Set initial len value is the UINT_MAX, so mask_idx < actual_len
> > + is always true for MASK_STORE.  */
> > +   unsigned actual_len =

Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

2023-06-27 Thread juzhe.zh...@rivai.ai
Hi, Richi.

>> Does RVV allow MASK_STORE from
>> intrinsics?
No, RVV didn't use any internal_fn in intrinsics.

>>with disabling complete unrolling -fdisable-tree-cunrolli both
>>loops should get vectorized, hopefully not iterating(?)  Then
>>we end up with the overlapping CSE opportunity, no?  Maybe
>>it needs support for fixed size vectors and using VL vectors
>>just for the epilog.

I tried on ARM:
https://godbolt.org/z/cTET8f7W9 

It seems that it can't reproduce CSE opportunity.
I am not sure whether I am doing something wrong.
Besides, RVV currently can't have VLS modes (V4SI) since LTO issue.
I am trying to find a CSE opportunity on VL vectors.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-27 15:47
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; pan2.li
Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> 
> When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but 
> actual actual nunits is 8 in that case,
> then I failed to walk all elements analysis.
> 
> >> The most simple thing would be to make this all conditional
> >> to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
> >> asking for VL vector testcases.
> Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in 
> intrinsics. I am not sure how to reproduce VL vectors
> in C code.
 
The original motivation was from unrolled vector code for the
conditional masking case.  Does RVV allow MASK_STORE from
intrinsics?
 
Otherwise how about
 
for (i = 0; i < 8; ++i)
   a[i] = 42;
for (i = 2; i < 6; ++i)
   b[i] = a[i];
 
with disabling complete unrolling -fdisable-tree-cunrolli both
loops should get vectorized, hopefully not iterating(?)  Then
we end up with the overlapping CSE opportunity, no?  Maybe
it needs support for fixed size vectors and using VL vectors
just for the epilog.
 
Richard.
 
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-27 15:33
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford; pan2.li
> Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong 
> > 
> > Hi, Richi.
> > 
> > I tried to understand your last email and to refactor the do-while loop 
> > using VECTOR_CST_NELTS.
> > 
> > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant 
> > store.
> > I have appended testcase in this patch to test VN for LEN_MASK_STORE.
> > 
> > I am not sure whether I am on the same page with you.
> > 
> > Feel free to correct me, Thanks.
> > 
> > gcc/ChangeLog:
> > 
> > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and 
> > fix LEN_STORE
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New 
> > test.
> > 
> > ---
> >  .../rvv/autovec/partial/len_maskstore_vn-1.c  | 30 +++
> >  gcc/tree-ssa-sccvn.cc | 24 +++
> >  2 files changed, 49 insertions(+), 5 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > 
> > diff --git 
> > a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > new file mode 100644
> > index 000..0b2d03693dc
> > --- /dev/null
> > +++ 
> > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param 
> > riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */
> > +
> > +void __attribute__((noinline,noclone))
> > +foo (int *out, int *res)
> > +{
> > +  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
> > +  int i;
> > +  for (i = 0; i < 16; ++i)
> > +{
> > +  if (mask[i])
> > +out[i] = i;
> > +}
> > +  int o0 = out[0];
> > +  int o7 = out[7];
> > +  int o14 = out[14];
> > +  int o15 = out[15];
> > +  res[0] = o0;
> > +  res[2] = o7;
> > +  res[4] = o14;
> > +  res[6] = o15;
> > +}
> > +
> > +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two
> > +   vector iterations.  FRE5 after that should be able to CSE
> > +   out[7] and out[15], but leave out[0] and out[14] alone.  */
> > +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */
> > +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */
> > +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */
> > +/* { dg-final { scan-tree-dump " = 15;" "fre5" } } */
> > diff --git a/gcc/tree-ssa-sccvn.cc b/gcc/tree-ssa-sccvn.cc
> > index 11061a374a2..242d82d6274 100644
> > --- a/gcc/tree-ssa-sccvn.cc
> > +++ b/gcc/tree-ssa-sccvn.cc
> > @@ -3304,6 +3304,16 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void 
> > *data_,
> >if (!tree_fits_uhwi_p (len) || !tree_fits_shwi_

Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

2023-06-27 Thread juzhe.zh...@rivai.ai
Hi, Richi.
After several tries, I found a case that is "close to" have CSE opportunity in 
RVV for VL vectors:

void __attribute__((noinline,noclone))
foo (uint16_t *out, uint16_t *res)
{
  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1 };
  int i;
  for (i = 0; i < 8; ++i)
{
  if (mask[i])
out[i] = 33;
}
  uint16_t o0 = out[0];
  uint16_t o7 = out[3];
  uint16_t o14 = out[6];
  uint16_t o15 = out[7];
  res[0] = o0;
  res[2] = o7;
  res[4] = o14;
  res[6] = o15;
}

The Gimple IR:
  _64 = .SELECT_VL (ivtmp_31, POLY_INT_CST [4, 4]);
  vect__1.15_54 = .LEN_MASK_LOAD (vectp_mask.13_21, 32B, _64, { -1, ... }, 0);
  mask__7.16_56 = vect__1.15_54 != { 0, ... };
  .LEN_MASK_STORE (vectp_out.17_34, 16B, _64, mask__7.16_56, { 33, ... }, 0);

You can see the "len" is always variable produced by SELECT_VL so it failed to 
have CSE opportunity.
And I tried in ARM SVE: https://godbolt.org/z/63a6WcT9o
It also fail to have CSE opportunity.

It seems that it's difficult to have such CSE opportunity in VL vectors.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-27 15:47
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; pan2.li
Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> 
> When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but 
> actual actual nunits is 8 in that case,
> then I failed to walk all elements analysis.
> 
> >> The most simple thing would be to make this all conditional
> >> to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
> >> asking for VL vector testcases.
> Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in 
> intrinsics. I am not sure how to reproduce VL vectors
> in C code.
 
The original motivation was from unrolled vector code for the
conditional masking case.  Does RVV allow MASK_STORE from
intrinsics?
 
Otherwise how about
 
for (i = 0; i < 8; ++i)
   a[i] = 42;
for (i = 2; i < 6; ++i)
   b[i] = a[i];
 
with disabling complete unrolling -fdisable-tree-cunrolli both
loops should get vectorized, hopefully not iterating(?)  Then
we end up with the overlapping CSE opportunity, no?  Maybe
it needs support for fixed size vectors and using VL vectors
just for the epilog.
 
Richard.
 
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-27 15:33
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford; pan2.li
> Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
>  
> > From: Ju-Zhe Zhong 
> > 
> > Hi, Richi.
> > 
> > I tried to understand your last email and to refactor the do-while loop 
> > using VECTOR_CST_NELTS.
> > 
> > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant 
> > store.
> > I have appended testcase in this patch to test VN for LEN_MASK_STORE.
> > 
> > I am not sure whether I am on the same page with you.
> > 
> > Feel free to correct me, Thanks.
> > 
> > gcc/ChangeLog:
> > 
> > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE and 
> > fix LEN_STORE
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New 
> > test.
> > 
> > ---
> >  .../rvv/autovec/partial/len_maskstore_vn-1.c  | 30 +++
> >  gcc/tree-ssa-sccvn.cc | 24 +++
> >  2 files changed, 49 insertions(+), 5 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > 
> > diff --git 
> > a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c 
> > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > new file mode 100644
> > index 000..0b2d03693dc
> > --- /dev/null
> > +++ 
> > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > @@ -0,0 +1,30 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param 
> > riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre5" } */
> > +
> > +void __attribute__((noinline,noclone))
> > +foo (int *out, int *res)
> > +{
> > +  int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1 };
> > +  int i;
> > +  for (i = 0; i < 16; ++i)
> > +{
> > +  if (mask[i])
> > +out[i] = i;
> > +}
> > +  int o0 = out[0];
> > +  int o7 = out[7];
> > +  int o14 = out[14];
> > +  int o15 = out[15];
> > +  res[0] = o0;
> > +  res[2] = o7;
> > +  res[4] = o14;
> > +  res[6] = o15;
> > +}
> > +
> > +/* Vectorization produces .LEN_MASK_STORE, unrolling will unroll the two
> > +   vector iterations.  FRE5 after that should be able to CSE
> > +   out[7] and out[15], but leave out[0] and out[14] alone.  */
> > +/* { dg-final { scan-tree-dump " = o0_\[0-9\]+;" "fre5" } } */
> > +/* { dg-final { scan-tree-dump " = 7;" "fre5" } } */
> > +/* { dg-final { scan-tree-dump " = o14_\[0-9\]+;" "fre5" } } */
> > +/* { dg-final {

[PATCH v7] tree-ssa-sink: Improve code sinking pass

2023-06-27 Thread Ajit Agarwal via Gcc-patches
Hello All:

This patch improves code sinking pass to sink statements before call to reduce
register pressure.
Review comments are incorporated.

For example :

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}

Code Sinking does the following:

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  
  if (a != 5)
{
  l = a + b + c + d +e + f; 
  bar();
  j = l;
}
}

Bootstrapped regtested on powerpc64-linux-gnu.

Thanks & Regards
Ajit


tree-ssa-sink: Improve code sinking pass

Currently, code sinking will sink code after function calls.  This increases
register pressure for callee-saved registers.  The following patch improves
code sinking by placing the sunk code before calls in the use block or in
the immediate dominator of the use blocks.

2023-06-01  Ajit Kumar Agarwal  

gcc/ChangeLog:

PR tree-optimization/81953
* tree-ssa-sink.cc (statement_sink_location): Move statements before
calls.
(def_use_same_block): New function.
(select_best_block): Add heuristics to select the best blocks in the
immediate post dominator.

gcc/testsuite/ChangeLog:

PR tree-optimization/81953
* gcc.dg/tree-ssa/ssa-sink-20.c: New testcase.
* gcc.dg/tree-ssa/ssa-sink-21.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c | 15 
 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c | 19 +
 gcc/tree-ssa-sink.cc| 79 ++---
 3 files changed, 87 insertions(+), 26 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
new file mode 100644
index 000..d3b79ca5803
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-20.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_12\s+=\s+_4\s+\+\s+f_11\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
new file mode 100644
index 000..84e7938c54f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-sink-21.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-sink-stats" } */
+void bar();
+int j, x;
+void foo(int a, int b, int c, int d, int e, int f)
+{
+  int l;
+  l = a + b + c + d +e + f;
+  if (a != 5)
+{
+  bar();
+  if (b != 3)
+x = 3;
+  else
+x = 5;
+  j = l;
+}
+}
+/* { dg-final { scan-tree-dump 
{l_13\s+=\s+_4\s+\+\s+f_12\(D\);\n\s+bar\s+\(\)} sink1 } } */
diff --git a/gcc/tree-ssa-sink.cc b/gcc/tree-ssa-sink.cc
index b1ba7a2ad6c..113c89d0967 100644
--- a/gcc/tree-ssa-sink.cc
+++ b/gcc/tree-ssa-sink.cc
@@ -171,9 +171,28 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
   return commondom;
 }
 
+/* Return TRUE if immediate defs of STMT and STMT are in same
+ * block, FALSE otherwise.  */
+
+static bool
+def_use_same_block (gimple *stmt)
+{
+  def_operand_p def;
+  ssa_op_iter iter;
+
+  FOR_EACH_SSA_DEF_OPERAND (def, stmt, iter, SSA_OP_DEF)
+{
+  gimple *def_stmt = SSA_NAME_DEF_STMT (DEF_FROM_PTR (def));
+  if ((gimple_bb (def_stmt) == gimple_bb (stmt)))
+   return true;
+ }
+  return false;
+}
+
 /* Given EARLY_BB and LATE_BB, two blocks in a path through the dominator
tree, return the best basic block between them (inclusive) to place
-   statements.
+   statements. The best basic block should be an immediate dominator of
+   best basic block if the use stmt is after the call.
 
We want the most control dependent block in the shallowest loop nest.
 
@@ -190,11 +209,22 @@ nearest_common_dominator_of_uses (def_operand_p def_p, 
bool *debug_stmts)
 static basic_block
 select_best_block (basic_block early_bb,
   basic_block late_bb,
-  gimple *stmt)
+  gimple *stmt,
+  gimple *use)
 {
   basic_block best_bb = late_bb;
   basic_block temp_bb = late_bb;
   int threshold;
+  /* Get the sinking threshold.  If the statement to be moved has memory
+ operands, then increase the threshold by 7% as those are even more
+ profitable to avoid, clamping at 100%.  */
+  threshold = param_sink_frequency_threshold;
+  if (gimple_vuse (stmt) || gimple_vdef (stmt))
+{
+  threshold += 7;
+  if (threshold > 100)
+   threshold = 100;
+}
 
   while (temp_bb != early_bb)
 {
@@ -203,34 +233,33 @@ select_best_block (basic_block early_bb,
   if (bb_loop_depth (temp_bb)

Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
> >> Does RVV allow MASK_STORE from
> >> intrinsics?
> No, RVV didn't use any internal_fn in intrinsics.
> 
> >>with disabling complete unrolling -fdisable-tree-cunrolli both
> >>loops should get vectorized, hopefully not iterating(?)  Then
> >>we end up with the overlapping CSE opportunity, no?  Maybe
> >>it needs support for fixed size vectors and using VL vectors
> >>just for the epilog.
> 
> I tried on ARM:
> https://godbolt.org/z/cTET8f7W9 
> 
> It seems that it can't reproduce CSE opportunity.
> I am not sure whether I am doing something wrong.
> Besides, RVV currently can't have VLS modes (V4SI) since LTO issue.
> I am trying to find a CSE opportunity on VL vectors.

So for example

void foo (int * __restrict a, int *b)
{
  for (int i = 0; i < 9; ++i)
a[i] = 42;
  for (int i = 2; i < 6; ++i)
b[i] = a[i];
}

with -O3 -fdisable-tree-cunrolli -march=znver4 --param 
vect-partial-vector-usage=1  -fno-tree-loop-distribute-patterns
gets us before FRE4

  .MASK_STORE (a_14(D), 32B, { -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 
0, 0, 0, 0, 0 }, { 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 42, 
42, 42 });
  vectp_a.10_42 = a_14(D) + 64;
  ivtmp_46 = 249;
  _47 = {ivtmp_46, ivtmp_46, ivtmp_46, ivtmp_46, ivtmp_46, ivtmp_46, 
ivtmp_46, ivtmp_46, ivtmp_46, ivtmp_46, ivtmp_46, ivtmp_46, ivtmp_46, 
ivtmp_46, ivtmp_46, ivtmp_46};
  _48 = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 } < _47;
  vectp_a.6_11 = a_14(D) + 8;
  vectp_b.9_34 = b_15(D) + 8;
  vect__8.7_33 = MEM  [(int *)vectp_a.6_11];
  MEM  [(int *)vectp_b.9_34] = vect__8.7_33;

where we can optimize the non-VL vector non-masked load:

  MEM  [(int *)vectp_b.9_34] = { 42, 42, 42, 42 };

We don't optimize a masked load in this place (GIMPLE currently
doesn't define the value of the masked elements - we want to
nail those to zero though).  So what we are after is VLS mode
loads optimized from VLA masked stores.  IIRC there's a
testcase in the testsuite that triggered for power/s390 with
LEN_STORE.

There's always the possibility to write GIMPLE testcases but
you have to expect to run into parser limitations as I never
tried to write VLA vector code there.

I'd say put this patch on hold until we get a motivating testcase.
That's either when you add VLS modes so you could get the above
(but then you will likely not get len_mask_store but at most
mask_store).

Maybe SVE folks can come up with something useful.

Another thing to test is for example scalar element load CSE
from VLA [masked/with-len] stores if we can for example
compute a lower bound on the number of elements accessed.

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-27 15:47
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; pan2.li
> Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but 
> > actual actual nunits is 8 in that case,
> > then I failed to walk all elements analysis.
> > 
> > >> The most simple thing would be to make this all conditional
> > >> to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
> > >> asking for VL vector testcases.
> > Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in 
> > intrinsics. I am not sure how to reproduce VL vectors
> > in C code.
>  
> The original motivation was from unrolled vector code for the
> conditional masking case.  Does RVV allow MASK_STORE from
> intrinsics?
>  
> Otherwise how about
>  
> for (i = 0; i < 8; ++i)
>a[i] = 42;
> for (i = 2; i < 6; ++i)
>b[i] = a[i];
>  
> with disabling complete unrolling -fdisable-tree-cunrolli both
> loops should get vectorized, hopefully not iterating(?)  Then
> we end up with the overlapping CSE opportunity, no?  Maybe
> it needs support for fixed size vectors and using VL vectors
> just for the epilog.
>  
> Richard.
>  
>  
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-06-27 15:33
> > To: Ju-Zhe Zhong
> > CC: gcc-patches; richard.sandiford; pan2.li
> > Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> > On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > From: Ju-Zhe Zhong 
> > > 
> > > Hi, Richi.
> > > 
> > > I tried to understand your last email and to refactor the do-while loop 
> > > using VECTOR_CST_NELTS.
> > > 
> > > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant 
> > > store.
> > > I have appended testcase in this patch to test VN for LEN_MASK_STORE.
> > > 
> > > I am not sure whether I am on the same page with you.
> > > 
> > > Feel free to correct me, Thanks.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE 
> > > and fix LEN_STORE
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > > 

Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> After several tries, I found a case that is "close to" have CSE opportunity 
> in RVV for VL vectors:
> 
> void __attribute__((noinline,noclone))
> foo (uint16_t *out, uint16_t *res)
> {
>   int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1 };
>   int i;
>   for (i = 0; i < 8; ++i)
> {
>   if (mask[i])
> out[i] = 33;
> }
>   uint16_t o0 = out[0];
>   uint16_t o7 = out[3];
>   uint16_t o14 = out[6];
>   uint16_t o15 = out[7];
>   res[0] = o0;
>   res[2] = o7;
>   res[4] = o14;
>   res[6] = o15;
> }
> 
> The Gimple IR:
>   _64 = .SELECT_VL (ivtmp_31, POLY_INT_CST [4, 4]);
>   vect__1.15_54 = .LEN_MASK_LOAD (vectp_mask.13_21, 32B, _64, { -1, ... }, 0);
>   mask__7.16_56 = vect__1.15_54 != { 0, ... };
>   .LEN_MASK_STORE (vectp_out.17_34, 16B, _64, mask__7.16_56, { 33, ... }, 0);
> 
> You can see the "len" is always variable produced by SELECT_VL so it failed 
> to have CSE opportunity.
> And I tried in ARM SVE: https://godbolt.org/z/63a6WcT9o
> It also fail to have CSE opportunity.
> 
> It seems that it's difficult to have such CSE opportunity in VL vectors.

Ah.  Nice example.  This shows we fail to constant fold
[vec_unpack_lo_expr] { -1, -1, ..., 0, 0, 0, ... } which means we
fail to fold the .MASK_LOAD from 'mask' (that's not something we
support, see my previous answer) and that means the .MASK_STORE mask
doesn't end up constant.

I understant that SVE cannot easily generate all constant [masks]
so we probably shouldn't aggressively perform elimination on
constant foldings but for actual constant folding of dependent stmts
it might be nice to have the constant folded masks.

I will open a bugreport with this testcase.

Richard.

> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-27 15:47
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; pan2.li
> Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but 
> > actual actual nunits is 8 in that case,
> > then I failed to walk all elements analysis.
> > 
> > >> The most simple thing would be to make this all conditional
> > >> to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
> > >> asking for VL vector testcases.
> > Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in 
> > intrinsics. I am not sure how to reproduce VL vectors
> > in C code.
>  
> The original motivation was from unrolled vector code for the
> conditional masking case.  Does RVV allow MASK_STORE from
> intrinsics?
>  
> Otherwise how about
>  
> for (i = 0; i < 8; ++i)
>a[i] = 42;
> for (i = 2; i < 6; ++i)
>b[i] = a[i];
>  
> with disabling complete unrolling -fdisable-tree-cunrolli both
> loops should get vectorized, hopefully not iterating(?)  Then
> we end up with the overlapping CSE opportunity, no?  Maybe
> it needs support for fixed size vectors and using VL vectors
> just for the epilog.
>  
> Richard.
>  
>  
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-06-27 15:33
> > To: Ju-Zhe Zhong
> > CC: gcc-patches; richard.sandiford; pan2.li
> > Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> > On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > From: Ju-Zhe Zhong 
> > > 
> > > Hi, Richi.
> > > 
> > > I tried to understand your last email and to refactor the do-while loop 
> > > using VECTOR_CST_NELTS.
> > > 
> > > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant 
> > > store.
> > > I have appended testcase in this patch to test VN for LEN_MASK_STORE.
> > > 
> > > I am not sure whether I am on the same page with you.
> > > 
> > > Feel free to correct me, Thanks.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE 
> > > and fix LEN_STORE
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > > * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New 
> > > test.
> > > 
> > > ---
> > >  .../rvv/autovec/partial/len_maskstore_vn-1.c  | 30 +++
> > >  gcc/tree-ssa-sccvn.cc | 24 +++
> > >  2 files changed, 49 insertions(+), 5 deletions(-)
> > >  create mode 100644 
> > > gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > > 
> > > diff --git 
> > > a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c 
> > > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > > new file mode 100644
> > > index 000..0b2d03693dc
> > > --- /dev/null
> > > +++ 
> > > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c
> > > @@ -0,0 +1,30 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-march=rv32gcv_zvl256b -mabi=ilp32d --param 
> > > riscv-autovec-preference=fixed-vlmax -O3 -fdump-tree-fre

Re: [PATCH] match.pd: Use element_mode instead of TYPE_MODE.

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, 27 Jun 2023, Robin Dapp wrote:

> > Why does the expander not have a fallback here?  If we put up
> > restrictions like this like we do for vector operations (after
> > vector lowering!), we need to document this.  Your check covers
> > more than just FP16 types as well which I think is undesirable.
> 
> I'm not sure I follow.  What would we fall back to if
> (_Float16)a + (_Float16)b is not supported?  Should I provide
> a (_Float16)((float)a + (float)b) fallback?  But that would just
> undo the simplification we performed.  Or do you mean in optabs
> already?

Yeah, the optab should already have the fallback of WIDENing here?
So why does that fail?

> > So it seems for FP16 we need this for correctness (to not ICE)
> > while for other modes it might be appropriate for performance
> > (though I cannot imagine a target supporting say long double
> > not supporting float).
> 
> What about something like:
> 
> -  && target_supports_op_p (newtype, op, optab_default)
> +  && (!target_supports_op_p (itype, op, optab_default)
> +  || element_mode (newtype) != HFmode
> +  || target_supports_op_p (newtype, op, optab_default))

I'd say

  && (!target_supports_op_p (itype, op, optab_default)
  || target_supports_op_p (newtype, op, optab_default))

would make sense in general.  But as said you'll likely find (many?)
other places affected.  Singling out HFmode probably doesn't work
across targets since this mode isn't defined in generic code.

Richard.


Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

2023-06-27 Thread juzhe.zh...@rivai.ai
Hi, Richi.  After reading your emails.

Is that correct that I put supporting LEN_MASK_STORE into SCCVN on hold for now 
?

Go ahead to the next RVV auto-vectorization support patterns in middle-end (for 
example I sent to add optabs and internal fn for LEN_MASK_GATHER_LOAD).
Then, after I finish all the RVV auto-vectorization patterns in middle-end, I 
come back to take a look at LEN_MASK_STORE in SCCVN?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-27 16:34
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; pan2.li
Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.
> After several tries, I found a case that is "close to" have CSE opportunity 
> in RVV for VL vectors:
> 
> void __attribute__((noinline,noclone))
> foo (uint16_t *out, uint16_t *res)
> {
>   int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1 };
>   int i;
>   for (i = 0; i < 8; ++i)
> {
>   if (mask[i])
> out[i] = 33;
> }
>   uint16_t o0 = out[0];
>   uint16_t o7 = out[3];
>   uint16_t o14 = out[6];
>   uint16_t o15 = out[7];
>   res[0] = o0;
>   res[2] = o7;
>   res[4] = o14;
>   res[6] = o15;
> }
> 
> The Gimple IR:
>   _64 = .SELECT_VL (ivtmp_31, POLY_INT_CST [4, 4]);
>   vect__1.15_54 = .LEN_MASK_LOAD (vectp_mask.13_21, 32B, _64, { -1, ... }, 0);
>   mask__7.16_56 = vect__1.15_54 != { 0, ... };
>   .LEN_MASK_STORE (vectp_out.17_34, 16B, _64, mask__7.16_56, { 33, ... }, 0);
> 
> You can see the "len" is always variable produced by SELECT_VL so it failed 
> to have CSE opportunity.
> And I tried in ARM SVE: https://godbolt.org/z/63a6WcT9o
> It also fail to have CSE opportunity.
> 
> It seems that it's difficult to have such CSE opportunity in VL vectors.
 
Ah.  Nice example.  This shows we fail to constant fold
[vec_unpack_lo_expr] { -1, -1, ..., 0, 0, 0, ... } which means we
fail to fold the .MASK_LOAD from 'mask' (that's not something we
support, see my previous answer) and that means the .MASK_STORE mask
doesn't end up constant.
 
I understant that SVE cannot easily generate all constant [masks]
so we probably shouldn't aggressively perform elimination on
constant foldings but for actual constant folding of dependent stmts
it might be nice to have the constant folded masks.
 
I will open a bugreport with this testcase.
 
Richard.
 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-27 15:47
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; pan2.li
> Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > 
> > When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 but 
> > actual actual nunits is 8 in that case,
> > then I failed to walk all elements analysis.
> > 
> > >> The most simple thing would be to make this all conditional
> > >> to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
> > >> asking for VL vector testcases.
> > Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in 
> > intrinsics. I am not sure how to reproduce VL vectors
> > in C code.
>  
> The original motivation was from unrolled vector code for the
> conditional masking case.  Does RVV allow MASK_STORE from
> intrinsics?
>  
> Otherwise how about
>  
> for (i = 0; i < 8; ++i)
>a[i] = 42;
> for (i = 2; i < 6; ++i)
>b[i] = a[i];
>  
> with disabling complete unrolling -fdisable-tree-cunrolli both
> loops should get vectorized, hopefully not iterating(?)  Then
> we end up with the overlapping CSE opportunity, no?  Maybe
> it needs support for fixed size vectors and using VL vectors
> just for the epilog.
>  
> Richard.
>  
>  
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-06-27 15:33
> > To: Ju-Zhe Zhong
> > CC: gcc-patches; richard.sandiford; pan2.li
> > Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> > On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > From: Ju-Zhe Zhong 
> > > 
> > > Hi, Richi.
> > > 
> > > I tried to understand your last email and to refactor the do-while loop 
> > > using VECTOR_CST_NELTS.
> > > 
> > > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant 
> > > store.
> > > I have appended testcase in this patch to test VN for LEN_MASK_STORE.
> > > 
> > > I am not sure whether I am on the same page with you.
> > > 
> > > Feel free to correct me, Thanks.
> > > 
> > > gcc/ChangeLog:
> > > 
> > > * tree-ssa-sccvn.cc (vn_reference_lookup_3): Add LEN_MASK_STORE 
> > > and fix LEN_STORE
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > > * gcc.target/riscv/rvv/autovec/partial/len_maskstore_vn-1.c: New 
> > > test.
> > > 
> > > ---
> > >  .../rvv/autovec/partial/len_maskstore_vn-1.c  | 30 +++
> > >  gcc/tree-ssa-sccvn.cc | 24 +++
> > >  2 files changed, 49 insertions(+), 5 deletions(-)
> > >  create mod

Re: [PATCH v2] mips: Fix overaligned function arguments [PR109435]

2023-06-27 Thread Jovan Dmitrovic
Hi,
I am sending a revised patch, now with different tests for N64/N32 and O32 ABIs.
For the O32 ABI, I've skipped the -O0 and -Os pipelines, considering there is a
difference between exact offsets for store instructions (the registers used 
remain
the same).

Skipping -flto isn't really necessary, so I've removed that part.

I've fixed the Changelog, hopefully I've corrected the mistakes I made.

Regards,
JovanFrom 05e4ff4d2fbb91ea8040fb10d8d6a130ad24bba7 Mon Sep 17 00:00:00 2001
From: Jovan Dmitrovic 
Date: Mon, 26 Jun 2023 17:00:20 +0200
Subject: [PATCH] mips: Fix overaligned function arguments [PR109435]
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This patch changes alignment for typedef types when passed as
arguments, making the alignment equal to the alignment of
original (aliased) types.

This change makes it impossible for a typedef type to have
alignment that is less than its size.

2023-06-27  Jovan Dmitrović  

gcc/ChangeLog:
PR target/109435
	* config/mips/mips.cc (mips_function_arg_alignment): Returns
the alignment of function argument. In case of typedef type,
it returns the aligment of the aliased type.
	(mips_function_arg_boundary): Relocated calculation of the
aligment of function arguments.

gcc/testsuite/ChangeLog:

	* gcc.target/mips/align-1-n64.c: New test.
	* gcc.target/mips/align-1-o32.c: New test.
---
 gcc/config/mips/mips.cc | 19 ++-
 gcc/testsuite/gcc.target/mips/align-1-n64.c | 19 +++
 gcc/testsuite/gcc.target/mips/align-1-o32.c | 20 
 3 files changed, 57 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/mips/align-1-n64.c
 create mode 100644 gcc/testsuite/gcc.target/mips/align-1-o32.c

diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index c1d1691306e..20ba35f754c 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -6190,6 +6190,23 @@ mips_arg_partial_bytes (cumulative_args_t cum, const function_arg_info &arg)
   return info.stack_words > 0 ? info.reg_words * UNITS_PER_WORD : 0;
 }
 
+/* Given MODE and TYPE of a function argument, return the alignment in
+   bits.
+   In case of typedef, alignment of its original type is
+   used.  */
+
+static unsigned int
+mips_function_arg_alignment (machine_mode mode, const_tree type)
+{
+  if (!type)
+return GET_MODE_ALIGNMENT (mode);
+
+  if (is_typedef_decl (TYPE_NAME (type)))
+type = DECL_ORIGINAL_TYPE (TYPE_NAME (type));
+
+  return TYPE_ALIGN (type);
+}
+
 /* Implement TARGET_FUNCTION_ARG_BOUNDARY.  Every parameter gets at
least PARM_BOUNDARY bits of alignment, but will be given anything up
to STACK_BOUNDARY bits if the type requires it.  */
@@ -6198,8 +6215,8 @@ static unsigned int
 mips_function_arg_boundary (machine_mode mode, const_tree type)
 {
   unsigned int alignment;
+  alignment = mips_function_arg_alignment (mode, type);
 
-  alignment = type ? TYPE_ALIGN (type) : GET_MODE_ALIGNMENT (mode);
   if (alignment < PARM_BOUNDARY)
 alignment = PARM_BOUNDARY;
   if (alignment > STACK_BOUNDARY)
diff --git a/gcc/testsuite/gcc.target/mips/align-1-n64.c b/gcc/testsuite/gcc.target/mips/align-1-n64.c
new file mode 100644
index 000..46e718d548d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/align-1-n64.c
@@ -0,0 +1,19 @@
+/* Check that typedef alignment does not affect passing of function
+   parameters for N64/N32 ABIs.  */
+/* { dg-do compile { target { "mips*-*-*" } } } */
+/* { dg-options "-mabi=64"  } */
+
+typedef struct ui8
+{
+  unsigned v[8];
+} uint8 __attribute__ ((aligned(64)));
+
+unsigned
+callee (int x, uint8 a)
+{
+  return a.v[0];
+}
+
+/* { dg-final { scan-assembler "\tsd\t\\\$5,0\\(\\\$\[0-9\]\\)" } } */
+/* { dg-final { scan-assembler "\tsd\t\\\$6,8\\(\\\$\[0-9\]\\)" } } */
+/* { dg-final { scan-assembler "\tsd\t\\\$7,16\\(\\\$\[0-9\]\\)" } } */
diff --git a/gcc/testsuite/gcc.target/mips/align-1-o32.c b/gcc/testsuite/gcc.target/mips/align-1-o32.c
new file mode 100644
index 000..a548632b7f6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/mips/align-1-o32.c
@@ -0,0 +1,20 @@
+/* Check that typedef alignment does not affect passing of function
+   parameters for O32 ABI.  */
+/* { dg-do compile { target { "mips*-*-*" } } } */
+/* { dg-options "-mabi=32"  } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Os" } { "" } } */
+
+typedef struct ui8
+{
+  unsigned v[8];
+} uint8 __attribute__ ((aligned(64)));
+
+unsigned
+callee (int x, uint8 a)
+{
+  return a.v[0];
+}
+
+/* { dg-final { scan-assembler "\tsw\t\\\$5,100\\(\\\$sp\\)" } } */
+/* { dg-final { scan-assembler "\tsw\t\\\$6,104\\(\\\$sp\\)" } } */
+/* { dg-final { scan-assembler "\tsw\t\\\$7,108\\(\\\$sp\\)" } } */
-- 
2.34.1



Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.  After reading your emails.
> 
> Is that correct that I put supporting LEN_MASK_STORE into SCCVN on hold for 
> now ?
> 
> Go ahead to the next RVV auto-vectorization support patterns in 
> middle-end (for example I sent to add optabs and internal fn for 
> LEN_MASK_GATHER_LOAD). Then, after I finish all the RVV 
> auto-vectorization patterns in middle-end, I come back to take a look at 
> LEN_MASK_STORE in SCCVN?

Yes, I don't think it makes much sense to have the support without
being able to write a testcase that exercises it (and verifies
correctness).

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-27 16:34
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; pan2.li
> Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > After several tries, I found a case that is "close to" have CSE opportunity 
> > in RVV for VL vectors:
> > 
> > void __attribute__((noinline,noclone))
> > foo (uint16_t *out, uint16_t *res)
> > {
> >   int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1 };
> >   int i;
> >   for (i = 0; i < 8; ++i)
> > {
> >   if (mask[i])
> > out[i] = 33;
> > }
> >   uint16_t o0 = out[0];
> >   uint16_t o7 = out[3];
> >   uint16_t o14 = out[6];
> >   uint16_t o15 = out[7];
> >   res[0] = o0;
> >   res[2] = o7;
> >   res[4] = o14;
> >   res[6] = o15;
> > }
> > 
> > The Gimple IR:
> >   _64 = .SELECT_VL (ivtmp_31, POLY_INT_CST [4, 4]);
> >   vect__1.15_54 = .LEN_MASK_LOAD (vectp_mask.13_21, 32B, _64, { -1, ... }, 
> > 0);
> >   mask__7.16_56 = vect__1.15_54 != { 0, ... };
> >   .LEN_MASK_STORE (vectp_out.17_34, 16B, _64, mask__7.16_56, { 33, ... }, 
> > 0);
> > 
> > You can see the "len" is always variable produced by SELECT_VL so it failed 
> > to have CSE opportunity.
> > And I tried in ARM SVE: https://godbolt.org/z/63a6WcT9o
> > It also fail to have CSE opportunity.
> > 
> > It seems that it's difficult to have such CSE opportunity in VL vectors.
>  
> Ah.  Nice example.  This shows we fail to constant fold
> [vec_unpack_lo_expr] { -1, -1, ..., 0, 0, 0, ... } which means we
> fail to fold the .MASK_LOAD from 'mask' (that's not something we
> support, see my previous answer) and that means the .MASK_STORE mask
> doesn't end up constant.
>  
> I understant that SVE cannot easily generate all constant [masks]
> so we probably shouldn't aggressively perform elimination on
> constant foldings but for actual constant folding of dependent stmts
> it might be nice to have the constant folded masks.
>  
> I will open a bugreport with this testcase.
>  
> Richard.
>  
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-06-27 15:47
> > To: juzhe.zh...@rivai.ai
> > CC: gcc-patches; richard.sandiford; pan2.li
> > Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> > On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > Hi, Richi.
> > > 
> > > When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 
> > > but actual actual nunits is 8 in that case,
> > > then I failed to walk all elements analysis.
> > > 
> > > >> The most simple thing would be to make this all conditional
> > > >> to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
> > > >> asking for VL vector testcases.
> > > Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in 
> > > intrinsics. I am not sure how to reproduce VL vectors
> > > in C code.
> >  
> > The original motivation was from unrolled vector code for the
> > conditional masking case.  Does RVV allow MASK_STORE from
> > intrinsics?
> >  
> > Otherwise how about
> >  
> > for (i = 0; i < 8; ++i)
> >a[i] = 42;
> > for (i = 2; i < 6; ++i)
> >b[i] = a[i];
> >  
> > with disabling complete unrolling -fdisable-tree-cunrolli both
> > loops should get vectorized, hopefully not iterating(?)  Then
> > we end up with the overlapping CSE opportunity, no?  Maybe
> > it needs support for fixed size vectors and using VL vectors
> > just for the epilog.
> >  
> > Richard.
> >  
> >  
> > > Thanks.
> > > 
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-06-27 15:33
> > > To: Ju-Zhe Zhong
> > > CC: gcc-patches; richard.sandiford; pan2.li
> > > Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> > > On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
> > >  
> > > > From: Ju-Zhe Zhong 
> > > > 
> > > > Hi, Richi.
> > > > 
> > > > I tried to understand your last email and to refactor the do-while loop 
> > > > using VECTOR_CST_NELTS.
> > > > 
> > > > This patch works fine for LEN_MASK_STORE and compiler can CSE redundant 
> > > > store.
> > > > I have appended testcase in this patch to test VN for LEN_MASK_STORE.
> > > > 
> > > > I am not sure whether I am on the same page with you.
> > > > 
> > > > Feel free to correct me, Thanks.
> > > > 
> > > 

Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE

2023-06-27 Thread juzhe.zh...@rivai.ai
Thanks so much! Richi, I am gonna open a BUG that I won't forget this issue.

And I am gonna go ahead on LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE:
This is the first patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622824.html 
which is only adding optabs && internal_fn and documents (No complicate 
vectorizer support).

Really appreciate your help!


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-27 16:56
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; pan2.li
Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi.  After reading your emails.
> 
> Is that correct that I put supporting LEN_MASK_STORE into SCCVN on hold for 
> now ?
> 
> Go ahead to the next RVV auto-vectorization support patterns in 
> middle-end (for example I sent to add optabs and internal fn for 
> LEN_MASK_GATHER_LOAD). Then, after I finish all the RVV 
> auto-vectorization patterns in middle-end, I come back to take a look at 
> LEN_MASK_STORE in SCCVN?
 
Yes, I don't think it makes much sense to have the support without
being able to write a testcase that exercises it (and verifies
correctness).
 
Richard.
 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-06-27 16:34
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; pan2.li
> Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi.
> > After several tries, I found a case that is "close to" have CSE opportunity 
> > in RVV for VL vectors:
> > 
> > void __attribute__((noinline,noclone))
> > foo (uint16_t *out, uint16_t *res)
> > {
> >   int mask[] = { 0, 1, 0, 1, 0, 1, 0, 1 };
> >   int i;
> >   for (i = 0; i < 8; ++i)
> > {
> >   if (mask[i])
> > out[i] = 33;
> > }
> >   uint16_t o0 = out[0];
> >   uint16_t o7 = out[3];
> >   uint16_t o14 = out[6];
> >   uint16_t o15 = out[7];
> >   res[0] = o0;
> >   res[2] = o7;
> >   res[4] = o14;
> >   res[6] = o15;
> > }
> > 
> > The Gimple IR:
> >   _64 = .SELECT_VL (ivtmp_31, POLY_INT_CST [4, 4]);
> >   vect__1.15_54 = .LEN_MASK_LOAD (vectp_mask.13_21, 32B, _64, { -1, ... }, 
> > 0);
> >   mask__7.16_56 = vect__1.15_54 != { 0, ... };
> >   .LEN_MASK_STORE (vectp_out.17_34, 16B, _64, mask__7.16_56, { 33, ... }, 
> > 0);
> > 
> > You can see the "len" is always variable produced by SELECT_VL so it failed 
> > to have CSE opportunity.
> > And I tried in ARM SVE: https://godbolt.org/z/63a6WcT9o
> > It also fail to have CSE opportunity.
> > 
> > It seems that it's difficult to have such CSE opportunity in VL vectors.
>  
> Ah.  Nice example.  This shows we fail to constant fold
> [vec_unpack_lo_expr] { -1, -1, ..., 0, 0, 0, ... } which means we
> fail to fold the .MASK_LOAD from 'mask' (that's not something we
> support, see my previous answer) and that means the .MASK_STORE mask
> doesn't end up constant.
>  
> I understant that SVE cannot easily generate all constant [masks]
> so we probably shouldn't aggressively perform elimination on
> constant foldings but for actual constant folding of dependent stmts
> it might be nice to have the constant folded masks.
>  
> I will open a bugreport with this testcase.
>  
> Richard.
>  
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-06-27 15:47
> > To: juzhe.zh...@rivai.ai
> > CC: gcc-patches; richard.sandiford; pan2.li
> > Subject: Re: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE and fix LEN_STORE
> > On Tue, 27 Jun 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > Hi, Richi.
> > > 
> > > When I try vector_cst_encoded_nelts (mask), the testcase I append is 2 
> > > but actual actual nunits is 8 in that case,
> > > then I failed to walk all elements analysis.
> > > 
> > > >> The most simple thing would be to make this all conditional
> > > >> to constant TYPE_VECTOR_SUBPARTS.  Which is also why I was
> > > >> asking for VL vector testcases.
> > > Ok, I understand your point, but RVV doesn't use LEN_MASK_STORE in 
> > > intrinsics. I am not sure how to reproduce VL vectors
> > > in C code.
> >  
> > The original motivation was from unrolled vector code for the
> > conditional masking case.  Does RVV allow MASK_STORE from
> > intrinsics?
> >  
> > Otherwise how about
> >  
> > for (i = 0; i < 8; ++i)
> >a[i] = 42;
> > for (i = 2; i < 6; ++i)
> >b[i] = a[i];
> >  
> > with disabling complete unrolling -fdisable-tree-cunrolli both
> > loops should get vectorized, hopefully not iterating(?)  Then
> > we end up with the overlapping CSE opportunity, no?  Maybe
> > it needs support for fixed size vectors and using VL vectors
> > just for the epilog.
> >  
> > Richard.
> >  
> >  
> > > Thanks.
> > > 
> > > 
> > > juzhe.zh...@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2023-06-27 15:33
> > > To: Ju-Zhe Zhong
> > > CC: gcc-patches; richard.sandiford; pan2.li
> > > Subject: Re: [PATCH V4] SCCVN: Add LEN_MASK_STORE 

Re: [PATCH] i386: Relax inline requirement for functions with different target attrs

2023-06-27 Thread Uros Bizjak via Gcc-patches
On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang  wrote:
>
> Hi,
>
> For function with different target attributes, current logic rejects to
> inline the callee when any arch or tune is mismatched. Relax the
> condition to honor just prefer_vecotr_width_type and other flags that
> may cause safety issue so caller can get more optimization opportunity.

I don't think this is desirable. If we inline something with different
ISAs, we get some strange mix of ISAs when the function is inlined.
OTOH - we already inline with mismatched tune flags if the function is
marked with always_inline.

Uros.

> Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or
> tune directly, just check prefer_vector_width_type and make sure
> not to inline if they mismatch.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/inline-target-attr.c: New test.
> ---
>  gcc/config/i386/i386.cc   | 11 +
>  .../gcc.target/i386/inline-target-attr.c  | 24 +++
>  2 files changed, 30 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 0761965344b..1d86384ac06 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee)
>!= (callee_opts->x_target_flags & ~always_inline_safe_mask))
>  ret = false;
>
> -  /* See if arch, tune, etc. are the same.  */
> -  else if (caller_opts->arch != callee_opts->arch)
> -ret = false;
> -
> -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> +  /* Do not inline when specified perfer-vector-width mismatched between
> + callee and caller.  */
> +  else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE
> +  && caller_opts->x_prefer_vector_width_type != PVW_NONE)
> +  && callee_opts->x_prefer_vector_width_type
> + != caller_opts->x_prefer_vector_width_type)
>  ret = false;
>
>else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c 
> b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> new file mode 100644
> index 000..995502165f0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> @@ -0,0 +1,24 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */
> +
> +__attribute__((target("arch=skylake")))
> +int callee (int n)
> +{
> +  int sum = 0;
> +  for (int i = 0; i < n; i++)
> +{
> +  if (i % 2 == 0)
> +   sum +=i;
> +  else
> +   sum += (i - 1);
> +}
> +  return sum + n;
> +}
> +
> +__attribute__((target("arch=icelake-server")))
> +int caller (int n)
> +{
> +  return callee (n) + n;
> +}
> +
> --
> 2.31.1
>


Re: [PATCH] match.pd: Use element_mode instead of TYPE_MODE.

2023-06-27 Thread Robin Dapp via Gcc-patches
> Yeah, the optab should already have the fallback of WIDENing here?
> So why does that fail?

We reach
 if (CLASS_HAS_WIDER_MODES_P (mclass))
which returns false because mclass == MODE_VECTOR_FLOAT.
CLASS_HAS_WIDER_MODES_P only handles non-vector classes?
Same for FOR_EACH_WIDER_MODE that follows.

Regards
 Robin



[PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-27 Thread Richard Biener via Gcc-patches
The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
ICEs when tree checking is enabled.  This should avoid wrong-code
in cases like PR110182 and instead ICE.

It also introduces a TYPE_PRECISION_RAW accessor and adjusts
places I found that are eligible to use that.

Bootstrapped and tested on x86_64-unknown-linux-gnu with all
languages enabled.

OK for trunk?  There is definitely going to be fallout but it
should be straight-forward to fix with quick fixes using
TYPE_PRECISION_RAW possible.

Thanks,
Richard.

* tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE.
(TYPE_PRECISION_RAW): Provide raw access to the precision
field.
* tree.cc (verify_type_variant): Compare TYPE_PRECISION_RAW.
(gimple_canonical_types_compatible_p): Likewise.
* tree-streamer-out.cc (pack_ts_type_common_value_fields):
Stream TYPE_PRECISION_RAW.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields):
Likewise.
* lto-streamer-out.cc (hash_tree): Hash TYPE_PRECISION_RAW.

gcc/lto/
* lto-common.cc (compare_tree_sccs_1): Use TYPE_PRECISION_RAW.
---
 gcc/lto-streamer-out.cc  | 2 +-
 gcc/lto/lto-common.cc| 2 +-
 gcc/tree-streamer-in.cc  | 2 +-
 gcc/tree-streamer-out.cc | 2 +-
 gcc/tree.cc  | 6 +++---
 gcc/tree.h   | 4 +++-
 6 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/gcc/lto-streamer-out.cc b/gcc/lto-streamer-out.cc
index 5ab2eb4301e..3432dd434e2 100644
--- a/gcc/lto-streamer-out.cc
+++ b/gcc/lto-streamer-out.cc
@@ -1373,7 +1373,7 @@ hash_tree (struct streamer_tree_cache_d *cache, 
hash_map *map,
   if (AGGREGATE_TYPE_P (t))
hstate.add_flag (TYPE_TYPELESS_STORAGE (t));
   hstate.commit_flag ();
-  hstate.add_int (TYPE_PRECISION (t));
+  hstate.add_int (TYPE_PRECISION_RAW (t));
   hstate.add_int (TYPE_ALIGN (t));
   hstate.add_int (TYPE_EMPTY_P (t));
 }
diff --git a/gcc/lto/lto-common.cc b/gcc/lto/lto-common.cc
index 537570204b3..afe051edf74 100644
--- a/gcc/lto/lto-common.cc
+++ b/gcc/lto/lto-common.cc
@@ -1280,7 +1280,7 @@ compare_tree_sccs_1 (tree t1, tree t2, tree **map)
   compare_values (TYPE_RESTRICT);
   compare_values (TYPE_USER_ALIGN);
   compare_values (TYPE_READONLY);
-  compare_values (TYPE_PRECISION);
+  compare_values (TYPE_PRECISION_RAW);
   compare_values (TYPE_ALIGN);
   /* Do not compare TYPE_ALIAS_SET.  Doing so introduce ordering issues
 with calls to get_alias_set which may initialize it for streamed
diff --git a/gcc/tree-streamer-in.cc b/gcc/tree-streamer-in.cc
index c803800862c..e6919e463c0 100644
--- a/gcc/tree-streamer-in.cc
+++ b/gcc/tree-streamer-in.cc
@@ -387,7 +387,7 @@ unpack_ts_type_common_value_fields (struct bitpack_d *bp, 
tree expr)
 TYPE_TYPELESS_STORAGE (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_EMPTY_P (expr) = (unsigned) bp_unpack_value (bp, 1);
   TYPE_NO_NAMED_ARGS_STDARG_P (expr) = (unsigned) bp_unpack_value (bp, 1);
-  TYPE_PRECISION (expr) = bp_unpack_var_len_unsigned (bp);
+  TYPE_PRECISION_RAW (expr) = bp_unpack_var_len_unsigned (bp);
   SET_TYPE_ALIGN (expr, bp_unpack_var_len_unsigned (bp));
 #ifdef ACCEL_COMPILER
   if (TYPE_ALIGN (expr) > targetm.absolute_biggest_alignment)
diff --git a/gcc/tree-streamer-out.cc b/gcc/tree-streamer-out.cc
index 5751f77273b..719cbeacf99 100644
--- a/gcc/tree-streamer-out.cc
+++ b/gcc/tree-streamer-out.cc
@@ -356,7 +356,7 @@ pack_ts_type_common_value_fields (struct bitpack_d *bp, 
tree expr)
 bp_pack_value (bp, TYPE_TYPELESS_STORAGE (expr), 1);
   bp_pack_value (bp, TYPE_EMPTY_P (expr), 1);
   bp_pack_value (bp, TYPE_NO_NAMED_ARGS_STDARG_P (expr), 1);
-  bp_pack_var_len_unsigned (bp, TYPE_PRECISION (expr));
+  bp_pack_var_len_unsigned (bp, TYPE_PRECISION_RAW (expr));
   bp_pack_var_len_unsigned (bp, TYPE_ALIGN (expr));
 }
 
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 8e144bc090e..58288efa2e2 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -13423,7 +13423,7 @@ verify_type_variant (const_tree t, tree tv)
}
   verify_variant_match (TYPE_NEEDS_CONSTRUCTING);
 }
-  verify_variant_match (TYPE_PRECISION);
+  verify_variant_match (TYPE_PRECISION_RAW);
   if (RECORD_OR_UNION_TYPE_P (t))
 verify_variant_match (TYPE_TRANSPARENT_AGGR);
   else if (TREE_CODE (t) == ARRAY_TYPE)
@@ -13701,8 +13701,8 @@ gimple_canonical_types_compatible_p (const_tree t1, 
const_tree t2,
   || TREE_CODE (t1) == OFFSET_TYPE
   || POINTER_TYPE_P (t1))
 {
-  /* Can't be the same type if they have different recision.  */
-  if (TYPE_PRECISION (t1) != TYPE_PRECISION (t2))
+  /* Can't be the same type if they have different precision.  */
+  if (TYPE_PRECISION_RAW (t1) != TYPE_PRECISION_RAW (t2))
return false;
 
   /* In some cases the signed and unsigned types are required to be
diff --git a/gcc/tree.h b/gcc/tree.h
index 1854fe4a7d4..1b791335d38 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -2191,7 +2191

Re: [PATCH] Prevent TYPE_PRECISION on VECTOR_TYPEs

2023-06-27 Thread Jakub Jelinek via Gcc-patches
On Tue, Jun 27, 2023 at 11:45:33AM +0200, Richard Biener wrote:
> The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
> ICEs when tree checking is enabled.  This should avoid wrong-code
> in cases like PR110182 and instead ICE.
> 
> It also introduces a TYPE_PRECISION_RAW accessor and adjusts
> places I found that are eligible to use that.
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu with all
> languages enabled.
> 
> OK for trunk?  There is definitely going to be fallout but it
> should be straight-forward to fix with quick fixes using
> TYPE_PRECISION_RAW possible.
> 
> Thanks,
> Richard.
> 
>   * tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE.
>   (TYPE_PRECISION_RAW): Provide raw access to the precision
>   field.
>   * tree.cc (verify_type_variant): Compare TYPE_PRECISION_RAW.
>   (gimple_canonical_types_compatible_p): Likewise.
>   * tree-streamer-out.cc (pack_ts_type_common_value_fields):
>   Stream TYPE_PRECISION_RAW.
>   * tree-streamer-in.cc (unpack_ts_type_common_value_fields):
>   Likewise.
>   * lto-streamer-out.cc (hash_tree): Hash TYPE_PRECISION_RAW.
> 
> gcc/lto/
>   * lto-common.cc (compare_tree_sccs_1): Use TYPE_PRECISION_RAW.

LGTM.

Jakub



Re: [PATCH] match.pd: Use element_mode instead of TYPE_MODE.

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, 27 Jun 2023, Robin Dapp wrote:

> > Yeah, the optab should already have the fallback of WIDENing here?
> > So why does that fail?
> 
> We reach
>  if (CLASS_HAS_WIDER_MODES_P (mclass))
> which returns false because mclass == MODE_VECTOR_FLOAT.
> CLASS_HAS_WIDER_MODES_P only handles non-vector classes?
> Same for FOR_EACH_WIDER_MODE that follows.

Oh, so this is about vector modes.  So yes, for vectors we need to
perform this test.  In other places we do

&& (!VECTOR_MODE_P (TYPE_MODE (type))
|| (VECTOR_MODE_P (TYPE_MODE (itype))
&& optab_handler (and_optab,
  TYPE_MODE (itype)) != 
CODE_FOR_nothing)))

so I suggest to do a similar VECTOR_MODE_P check and your original test.
So

  && (!VECTOR_MODE_P (TYPE_MODE (newtype))
  || target_supports_op_p (newtype, op, optab_default))

OK with that change.

Thanks,
Richard.


Re: [PATCH] match.pd: Use element_mode instead of TYPE_MODE.

2023-06-27 Thread Robin Dapp via Gcc-patches
> so I suggest to do a similar VECTOR_MODE_P check and your original test.
> So
> 
>   && (!VECTOR_MODE_P (TYPE_MODE (newtype))
>   || target_supports_op_p (newtype, op, optab_default))
> 
> OK with that change.

Separate patch or into the original one?  We needed element_mode because
TYPE_MODE wouldn't work for a vector_mode so it still somehow fits.

Apart from that, out of curiosity, do we want the same optab mechanism
(try widening/widened op if the original one failed) for vector types as
well in the future?

Regards
 Robin


Re: [PATCH] match.pd: Use element_mode instead of TYPE_MODE.

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, 27 Jun 2023, Robin Dapp wrote:

> > so I suggest to do a similar VECTOR_MODE_P check and your original test.
> > So
> > 
> >   && (!VECTOR_MODE_P (TYPE_MODE (newtype))
> >   || target_supports_op_p (newtype, op, optab_default))
> > 
> > OK with that change.
> 
> Separate patch or into the original one?  We needed element_mode because
> TYPE_MODE wouldn't work for a vector_mode so it still somehow fits.

You can put it into the original one.

> Apart from that, out of curiosity, do we want the same optab mechanism
> (try widening/widened op if the original one failed) for vector types as
> well in the future?

With the current design that would belong to vector lowering.  So no,
I don't think so.

Richard.


Re: [Patch, fortran] PR49213 - [OOP] gfortran rejects structure constructor expression

2023-06-27 Thread Paul Richard Thomas via Gcc-patches
Hi Harald,

Let's try again :-)

OK for trunk?

Regards

Paul

Fortran: Enable class expressions in structure constructors [PR49213]

2023-06-27  Paul Thomas  

gcc/fortran
PR fortran/49213
* expr.cc (gfc_is_ptr_fcn): Remove reference to class_pointer.
* resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow
associate names with pointer function targets to be used in
variable definition context.
* trans-decl.cc (get_symbol_decl): Remove extraneous line.
* trans-expr.cc (alloc_scalar_allocatable_subcomponent): Obtain
size of intrinsic and character expressions.
(gfc_trans_subcomponent_assign): Expand assignment to class
components to include intrinsic and character expressions.

gcc/testsuite/
PR fortran/49213
* gfortran.dg/pr49213.f90 : New test

On Sat, 24 Jun 2023 at 20:50, Harald Anlauf  wrote:
>
> Hi Paul!
>
> On 6/24/23 15:18, Paul Richard Thomas via Gcc-patches wrote:
> > I have included the adjustment to 'gfc_is_ptr_fcn' and eliminating the
> > extra blank line, introduced by my last patch. I played safe and went
> > exclusively for class functions with attr.class_pointer set on the
> > grounds that these have had all the accoutrements checked and built
> > (ie. class_ok). I am still not sure if this is necessary or not.
>
> maybe it is my fault, but I find the version in the patch confusing:
>
> @@ -816,7 +816,7 @@ bool
>   gfc_is_ptr_fcn (gfc_expr *e)
>   {
> return e != NULL && e->expr_type == EXPR_FUNCTION
> - && (gfc_expr_attr (e).pointer
> + && ((e->ts.type != BT_CLASS && gfc_expr_attr (e).pointer)
>|| (e->ts.type == BT_CLASS
>&& CLASS_DATA (e)->attr.class_pointer));
>   }
>
> The caller 'gfc_is_ptr_fcn' has e->expr_type == EXPR_FUNCTION, so
> gfc_expr_attr (e) boils down to:
>
>if (e->value.function.esym && e->value.function.esym->result)
> {
>   gfc_symbol *sym = e->value.function.esym->result;
>   attr = sym->attr;
>   if (sym->ts.type == BT_CLASS && sym->attr.class_ok)
> {
>   attr.dimension = CLASS_DATA (sym)->attr.dimension;
>   attr.pointer = CLASS_DATA (sym)->attr.class_pointer;
>   attr.allocatable = CLASS_DATA (sym)->attr.allocatable;
> }
> }
> ...
>else if (e->symtree)
> attr = gfc_variable_attr (e, NULL);
>
> So I thought this should already do what you want if you do
>
> gfc_is_ptr_fcn (gfc_expr *e)
> {
>return e != NULL && e->expr_type == EXPR_FUNCTION && gfc_expr_attr
> (e).pointer;
> }
>
> or what am I missing?  The additional checks in gfc_expr_attr are
> there to avoid ICEs in case CLASS_DATA (sym) has issues, and we all
> know Gerhard who showed that he is an expert in exploiting this.
>
> To sum up, I'd prefer to use the safer form if it works.  If it
> doesn't, I would expect a latent issue.
>
> The rest of the code looked good to me, but I was suspicious about
> the handling of CHARACTER.
>
> Nasty as I am, I modified the testcase to use character(kind=4)
> instead of kind=1 (see attached).  This either fails here (stop 10),
> or if I activate the marked line
>
> !cont = tContainer('hello!')   ! ### ICE! ###
>
> I get an ICE.
>
> Can you have another look?
>
> Thanks,
> Harald
>
> >
>
> > OK for trunk?
> >
> > Paul
> >
> > Fortran: Enable class expressions in structure constructors [PR49213]
> >
> > 2023-06-24  Paul Thomas  
> >
> > gcc/fortran
> > PR fortran/49213
> > * expr.cc (gfc_is_ptr_fcn): Guard pointer attribute to exclude
> > class expressions.
> > * resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow
> > associate names with pointer function targets to be used in
> > variable definition context.
> > * trans-decl.cc (get_symbol_decl): Remove extraneous line.
> > * trans-expr.cc (alloc_scalar_allocatable_subcomponent): Obtain
> > size of intrinsic and character expressions.
> > (gfc_trans_subcomponent_assign): Expand assignment to class
> > components to include intrinsic and character expressions.
> >
> > gcc/testsuite/
> > PR fortran/49213
> > * gfortran.dg/pr49213.f90 : New test



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein
! { dg-do run }
!
! Contributed by Neil Carlson  
!
program main
  character(2) :: c

  type :: S
integer :: n
  end type
  type(S) :: Sobj

  type, extends(S) :: S2
integer :: m
  end type
  type(S2) :: S2obj

  type :: T
class(S), allocatable :: x
  end type

  type tContainer
class(*), allocatable :: x
  end type

  type(T) :: Tobj

  Sobj = S(1)
  Tobj = T(Sobj)

  S2obj = S2(1,2)
  Tobj = T(S2obj)! Failed here
  select type (x => Tobj%x)
type is (S2)
  if ((x%n .ne. 1) .or. (x%m .ne. 2)) stop 1
class default
  stop 2
  end select

  c = "  "
  call pass_it (T(Sobj))
  if (c .ne. "S ") stop 3
  call pass_it (T(S2obj))! and here
  if (c .ne. "S2") stop 4

  call bar

contains

  subroutine pass_it (foo)
type(T), intent(i

Re: [PATCH] configure: Implement --enable-host-bind-now

2023-06-27 Thread Martin Jambor
Hello,

On Tue, May 16 2023, Marek Polacek via Gcc-patches wrote:
> As promised in the --enable-host-pie patch, this patch adds another
> configure option, --enable-host-bind-now, which adds -z now when linking
> the compiler executables in order to extend hardening.  BIND_NOW with RELRO
> allows the GOT to be marked RO; this prevents GOT modification attacks.
>
> This option does not affect linking of target libraries; you can use
> LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now to enable RELRO/BIND_NOW.
>
> With this patch:
> $ readelf -Wd cc1{,plus} | grep FLAGS
>  0x001e (FLAGS)  BIND_NOW
>  0x6ffb (FLAGS_1)Flags: NOW PIE
>  0x001e (FLAGS)  BIND_NOW
>  0x6ffb (FLAGS_1)Flags: NOW PIE
>
> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>
> c++tools/ChangeLog:
>
>   * configure.ac (--enable-host-bind-now): New check.
>   * configure: Regenerate.
>
> gcc/ChangeLog:
>
>   * configure.ac (--enable-host-bind-now): New check.  Add
>   -Wl,-z,now to LD_PICFLAG if --enable-host-bind-now.
>   * configure: Regenerate.
>   * doc/install.texi: Document --enable-host-bind-now.
>
> lto-plugin/ChangeLog:
>
>   * configure.ac (--enable-host-bind-now): New check.  Link with
>   -z,now.
>   * configure: Regenerate.

Our reconfiguration checking script complains about a missing hunk in
lto-plugin/Makefile.in:

diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in
index cb568e1e09f..f6f5b020ff5 100644
--- a/lto-plugin/Makefile.in
+++ b/lto-plugin/Makefile.in
@@ -298,6 +298,7 @@ datadir = @datadir@
 datarootdir = @datarootdir@
 docdir = @docdir@
 dvidir = @dvidir@
+enable_host_bind_now = @enable_host_bind_now@
 exec_prefix = @exec_prefix@
 gcc_build_dir = @gcc_build_dir@
 get_gcc_base_ver = @get_gcc_base_ver@


I am somewhat puzzled why the line is not missing in any of the other
Makefile.in files.  Can you please check whether that is the only thing
that is missing (assuming it is actually missing)?

Thanks,

Martin


[SVE] Fold svdupq to VEC_PERM_EXPR if elements are not constant

2023-06-27 Thread Prathamesh Kulkarni via Gcc-patches
Hi Richard,
Sorry I forgot to commit this patch, which you had approved in:
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615308.html

Just for context for the following test:
svint32_t f_s32(int32x4_t x)
{
  return svdupq_s32 (x[0], x[1], x[2], x[3]);
}

-O3 -mcpu=generic+sve generates following code after interleave+zip1 patch:
f_s32:
dup s31, v0.s[1]
mov v30.8b, v0.8b
ins v31.s[1], v0.s[3]
ins v30.s[1], v0.s[2]
zip1v0.4s, v30.4s, v31.4s
dup z0.q, z0.q[0]
ret

Code-gen with attached patch:
f_s32:
dup z0.q, z0.q[0]
ret

Bootstrapped+tested on aarch64-linux-gnu.
OK to commit ?

Thanks,
Prathamesh
[SVE] Fold svdupq to VEC_PERM_EXPR if elements are not constant.

gcc/ChangeLog:
* config/aarch64/aarch64-sve-builtins-base.cc
(svdupq_impl::fold_nonconst_dupq): New method.
(svdupq_impl::fold): Call fold_nonconst_dupq.

gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/general/dupq_11.c: New test.

diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
index 95b4cb8a943..9010ecca6da 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
@@ -817,6 +817,52 @@ public:
 
 class svdupq_impl : public quiet
 {
+private:
+  gimple *
+  fold_nonconst_dupq (gimple_folder &f) const
+  {
+/* Lower lhs = svdupq (arg0, arg1, ..., argN} into:
+   tmp = {arg0, arg1, ..., arg}
+   lhs = VEC_PERM_EXPR (tmp, tmp, {0, 1, 2, N-1, ...})  */
+
+if (f.type_suffix (0).bool_p
+   || BYTES_BIG_ENDIAN)
+  return NULL;
+
+tree lhs = gimple_call_lhs (f.call);
+tree lhs_type = TREE_TYPE (lhs);
+tree elt_type = TREE_TYPE (lhs_type);
+scalar_mode elt_mode = SCALAR_TYPE_MODE (elt_type);
+machine_mode vq_mode = aarch64_vq_mode (elt_mode).require ();
+tree vq_type = build_vector_type_for_mode (elt_type, vq_mode);
+
+unsigned nargs = gimple_call_num_args (f.call);
+vec *v;
+vec_alloc (v, nargs);
+for (unsigned i = 0; i < nargs; i++)
+  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, gimple_call_arg (f.call, i));
+tree vec = build_constructor (vq_type, v);
+tree tmp = make_ssa_name_fn (cfun, vq_type, 0);
+gimple *g = gimple_build_assign (tmp, vec);
+
+gimple_seq stmts = NULL;
+gimple_seq_add_stmt_without_update (&stmts, g);
+
+poly_uint64 lhs_len = TYPE_VECTOR_SUBPARTS (lhs_type);
+vec_perm_builder sel (lhs_len, nargs, 1);
+for (unsigned i = 0; i < nargs; i++)
+  sel.quick_push (i);
+
+vec_perm_indices indices (sel, 1, nargs);
+tree mask_type = build_vector_type (ssizetype, lhs_len);
+tree mask = vec_perm_indices_to_tree (mask_type, indices);
+
+gimple *g2 = gimple_build_assign (lhs, VEC_PERM_EXPR, tmp, tmp, mask);
+gimple_seq_add_stmt_without_update (&stmts, g2);
+gsi_replace_with_seq (f.gsi, stmts, false);
+return g2;
+  }
+
 public:
   gimple *
   fold (gimple_folder &f) const override
@@ -832,7 +878,7 @@ public:
   {
tree elt = gimple_call_arg (f.call, i);
if (!CONSTANT_CLASS_P (elt))
- return NULL;
+ return fold_nonconst_dupq (f);
builder.quick_push (elt);
for (unsigned int j = 1; j < factor; ++j)
  builder.quick_push (build_zero_cst (TREE_TYPE (vec_type)));
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_11.c 
b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_11.c
new file mode 100644
index 000..f19f8deb1e5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_11.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+#include 
+#include 
+
+svint8_t f_s8(int8x16_t x)
+{
+  return svdupq_s8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
+   x[8], x[9], x[10], x[11], x[12], x[13], x[14], x[15]);
+}
+
+svint16_t f_s16(int16x8_t x)
+{
+  return svdupq_s16 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7]);
+}
+
+svint32_t f_s32(int32x4_t x)
+{
+  return svdupq_s32 (x[0], x[1], x[2], x[3]);
+}
+
+svint64_t f_s64(int64x2_t x)
+{
+  return svdupq_s64 (x[0], x[1]);
+}
+
+/* { dg-final { scan-tree-dump "VEC_PERM_EXPR" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "svdupq" "optimized" } } */
+
+/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} 4 
} } */


[COMMITTED] ada: Fix expanding container aggregates

2023-06-27 Thread Marc Poulhiès via Gcc-patches
From: Viljar Indus 

Ensure that that container aggregate expressions are expanded as
such and not as records even if the type of the expression is a
record.

gcc/ada/

* exp_aggr.adb (Expand_N_Aggregate): Ensure that container
aggregate expressions do not get expanded as records but instead
as container aggregates.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_aggr.adb | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
index 5e22fefbc1d..d922c3bf1a4 100644
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -6463,6 +6463,7 @@ package body Exp_Aggr is
 
   if Is_Record_Type (T)
 and then not Is_Private_Type (T)
+and then not Is_Homogeneous_Aggregate (N)
   then
  Expand_Record_Aggregate (N);
 
-- 
2.40.0



[COMMITTED] ada: Plug another loophole in the handling of private views in instances

2023-06-27 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

This deals with discriminants of types declared in package bodies.

gcc/ada/

* sem_ch12.adb (Check_Private_View): Also check the type of
visible discriminants in record and concurrent types.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch12.adb | 17 +
 1 file changed, 17 insertions(+)

diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index d5280cea712..fbfc2db7f9a 100644
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -7710,6 +7710,9 @@ package body Sem_Ch12 is
 Prepend_Elmt (Typ, Exchanged_Views);
 Exchange_Declarations (Etype (Get_Associated_Node (N)));
 
+ --  Check that the available views of Typ match their respective flag.
+ --  Note that the type of a visible discriminant is never private.
+
  else
 Check_Private_Type (Typ, Has_Private_View (N));
 
@@ -7720,6 +7723,20 @@ package body Sem_Ch12 is
 elsif Is_Array_Type (Typ) then
Check_Private_Type
  (Component_Type (Typ), Has_Secondary_Private_View (N));
+
+elsif (Is_Record_Type (Typ) or else Is_Concurrent_Type (Typ))
+  and then Has_Discriminants (Typ)
+then
+   declare
+  Disc : Entity_Id;
+
+   begin
+  Disc := First_Discriminant (Typ);
+  while Present (Disc) loop
+ Check_Private_Type (Etype (Disc), False);
+ Next_Discriminant (Disc);
+  end loop;
+   end;
 end if;
  end if;
   end if;
-- 
2.40.0



[COMMITTED] ada: Plug small loophole in the handling of private views in instances

2023-06-27 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

This deals with nested instantiations in package bodies.

gcc/ada/

* sem_ch12.adb (Scope_Within_Body_Or_Same): New predicate.
(Check_Actual_Type): Take into account packages nested in bodies
to compute the enclosing scope by means of
Scope_Within_Body_Or_Same.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch12.adb | 46 +---
 1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index fbfc2db7f9a..43fcff2c9d5 100644
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -7001,11 +7001,11 @@ package body Sem_Ch12 is
   --  The enclosing scope of the generic unit
 
   procedure Check_Actual_Type (Typ : Entity_Id);
-  --  If the type of the actual is a private type declared in the
-  --  enclosing scope of the generic unit, but not a derived type
-  --  of a private type declared elsewhere, the body of the generic
-  --  sees the full view of the type (because it has to appear in
-  --  the corresponding package body). If the type is private now,
+  --  If the type of the actual is a private type declared in the enclosing
+  --  scope of the generic, either directly or through packages nested in
+  --  bodies, but not a derived type of a private type declared elsewhere,
+  --  then the body of the generic sees the full view of the type because
+  --  it has to appear in the package body. If the type is private now then
   --  exchange views to restore the proper visibility in the instance.
 
   ---
@@ -7015,16 +7015,48 @@ package body Sem_Ch12 is
   procedure Check_Actual_Type (Typ : Entity_Id) is
  Btyp : constant Entity_Id := Base_Type (Typ);
 
+ function Scope_Within_Body_Or_Same
+   (Inner : Entity_Id;
+Outer : Entity_Id) return Boolean;
+ --  Determine whether scope Inner is within the body of scope Outer
+ --  or is Outer itself.
+
+ ---
+ -- Scope_Within_Body_Or_Same --
+ ---
+
+ function Scope_Within_Body_Or_Same
+   (Inner : Entity_Id;
+Outer : Entity_Id) return Boolean
+ is
+Curr : Entity_Id := Inner;
+
+ begin
+while Curr /= Standard_Standard loop
+   if Curr = Outer then
+  return True;
+
+   elsif Is_Package_Body_Entity (Curr) then
+  Curr := Scope (Curr);
+
+   else
+  exit;
+   end if;
+end loop;
+
+return False;
+ end Scope_Within_Body_Or_Same;
+
   begin
  --  The exchange is only needed if the generic is defined
  --  within a package which is not a common ancestor of the
  --  scope of the instance, and is not already in scope.
 
  if Is_Private_Type (Btyp)
-   and then Scope (Btyp) = Parent_Scope
and then not Has_Private_Ancestor (Btyp)
and then Ekind (Parent_Scope) in E_Package | E_Generic_Package
-   and then Scope (Instance) /= Parent_Scope
+   and then Scope_Within_Body_Or_Same (Parent_Scope, Scope (Btyp))
+   and then Parent_Scope /= Scope (Instance)
and then not Is_Child_Unit (Gen_Id)
  then
 Switch_View (Btyp);
-- 
2.40.0



[COMMITTED] ada: Correct the contract of Ada.Text_IO.Get_Line

2023-06-27 Thread Marc Poulhiès via Gcc-patches
From: Claire Dross 

Item might not be entirely initialized after a call to Get_Line.

gcc/ada/

* libgnat/a-textio.ads (Get_Line): Use Relaxed_Initialization on
the Item parameter of Get_Line.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/libgnat/a-textio.ads | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/gcc/ada/libgnat/a-textio.ads b/gcc/ada/libgnat/a-textio.ads
index ddbbd8592cc..4318b6c62b8 100644
--- a/gcc/ada/libgnat/a-textio.ads
+++ b/gcc/ada/libgnat/a-textio.ads
@@ -523,24 +523,28 @@ is
   Item : out String;
   Last : out Natural)
with
- Pre   => Is_Open (File) and then Mode (File) = In_File,
- Post  =>
+ Relaxed_Initialization => Item,
+ Pre=> Is_Open (File) and then Mode (File) = In_File,
+ Post   =>
(if Item'Length > 0 then Last in Item'First - 1 .. Item'Last
-else Last = Item'First - 1),
- Global=> (In_Out => File_System),
- Exceptional_Cases => (End_Error => Item'Length'Old > 0);
+else Last = Item'First - 1)
+   and (for all I in Item'First .. Last => Item (I)'Initialized),
+ Global => (In_Out => File_System),
+ Exceptional_Cases  => (End_Error => Item'Length'Old > 0);
 
procedure Get_Line
  (Item : out String;
   Last : out Natural)
with
- Post  =>
+ Relaxed_Initialization => Item,
+ Post   =>
Line_Length'Old = Line_Length
and Page_Length'Old = Page_Length
and (if Item'Length > 0 then Last in Item'First - 1 .. Item'Last
-else Last = Item'First - 1),
- Global=> (In_Out => File_System),
- Exceptional_Cases => (End_Error => Item'Length'Old > 0);
+else Last = Item'First - 1)
+   and (for all I in Item'First .. Last => Item (I)'Initialized),
+ Global=> (In_Out => File_System),
+ Exceptional_Cases => (End_Error => Item'Length'Old > 0);
 
function Get_Line (File : File_Type) return String with SPARK_Mode => Off;
pragma Ada_05 (Get_Line);
-- 
2.40.0



[COMMITTED] ada: Update printing container aggregates for debugging

2023-06-27 Thread Marc Poulhiès via Gcc-patches
From: Viljar Indus 

All N_Aggregate nodes  were printed with parentheses "()". However
the new container aggregates (homogeneous N_Aggregate nodes) should
be printed with brackets "[]".

gcc/ada/

* sprint.adb (Print_Node_Actual): Print homogeneous N_Aggregate
nodes with brackets.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sprint.adb | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/ada/sprint.adb b/gcc/ada/sprint.adb
index dd4f420af35..f54d409ef96 100644
--- a/gcc/ada/sprint.adb
+++ b/gcc/ada/sprint.adb
@@ -1084,7 +1084,8 @@ package body Sprint is
Write_Str_With_Col_Check_Sloc ("(null record)");
 
 else
-   Write_Str_With_Col_Check_Sloc ("(");
+   Write_Str_With_Col_Check_Sloc
+ (if Is_Homogeneous_Aggregate (Node) then "[" else "(");
 
if Present (Expressions (Node)) then
   Sprint_Comma_List (Expressions (Node));
@@ -1120,7 +1121,8 @@ package body Sprint is
   Indent_End;
end if;
 
-   Write_Char (')');
+   Write_Char
+ (if Is_Homogeneous_Aggregate (Node) then ']' else ')');
 end if;
 
  when N_Allocator =>
-- 
2.40.0



[COMMITTED] ada: Fix too late finalization and secondary stack release in iterator loops

2023-06-27 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

Sem_Ch5 contains an entire machinery to deal with finalization actions and
secondary stack releases around iterator loops, so this removes a recent
fix that was made in a narrower case and instead refines the condition under
which this machinery is triggered.

As a side effect, given that finalization and secondary stack management are
still entangled in this machinery, this also fixes the counterpart of a leak
for the former, which is a finalization occurring too late.

gcc/ada/

* exp_ch4.adb (Expand_N_Quantified_Expression): Revert the latest
change as it is subsumed by the machinery in Sem_Ch5.
* sem_ch5.adb (Prepare_Iterator_Loop): Also wrap the loop
statement in a block in the name contains a function call that
returns on the secondary stack.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/exp_ch4.adb | 26 --
 gcc/ada/sem_ch5.adb | 19 ++-
 2 files changed, 14 insertions(+), 31 deletions(-)

diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index 7b6e997e3e7..fdaeb50512f 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -6,32 +6,6 @@ package body Exp_Ch4 is
  Freeze_Before (P, Etype (Var));
   end;
 
-  --  For an expression of the form "for all/some X of F(...) => ...",
-  --  where F(...) is a function call that returns on the secondary stack,
-  --  we need to mark an enclosing scope as Uses_Sec_Stack. We must do
-  --  this before expansion, which can obscure the tree. Note that we
-  --  might be inside another quantified expression. Skip blocks and
-  --  loops that were generated by expansion.
-
-  if Present (Iterator_Specification (N))
-and then Nkind (Name (Iterator_Specification (N))) = N_Function_Call
-and then Needs_Secondary_Stack
-   (Etype (Name (Iterator_Specification (N
-  then
- declare
-Source_Scope : Entity_Id := Current_Scope;
- begin
-while Ekind (Source_Scope) in E_Block | E_Loop
-  and then not Comes_From_Source (Source_Scope)
-loop
-   Source_Scope := Scope (Source_Scope);
-end loop;
-
-Set_Uses_Sec_Stack (Source_Scope);
-Check_Restriction (No_Secondary_Stack, N);
- end;
-  end if;
-
   --  Create the declaration of the flag which tracks the status of the
   --  quantified expression. Generate:
 
diff --git a/gcc/ada/sem_ch5.adb b/gcc/ada/sem_ch5.adb
index fa36a5a0741..72e7d186baa 100644
--- a/gcc/ada/sem_ch5.adb
+++ b/gcc/ada/sem_ch5.adb
@@ -91,9 +91,14 @@ package body Sem_Ch5 is
 
function Has_Sec_Stack_Call (N : Node_Id) return Boolean;
--  N is the node for an arbitrary construct. This function searches the
-   --  construct N to see if any expressions within it contain function
-   --  calls that use the secondary stack, returning True if any such call
-   --  is found, and False otherwise.
+   --  construct N to see if it contains a function call that returns on the
+   --  secondary stack, returning True if any such call is found, and False
+   --  otherwise.
+
+   --  ??? The implementation invokes Sem_Util.Requires_Transient_Scope so it
+   --  will return True if N contains a function call that needs finalization,
+   --  in addition to the above specification. See Analyze_Loop_Statement for
+   --  a similar comment about this entanglement.
 
procedure Preanalyze_Range (R_Copy : Node_Id);
--  Determine expected type of range or domain of iteration of Ada 2012
@@ -3626,9 +3631,13 @@ package body Sem_Ch5 is
Cont_Typ := Etype (Nam_Copy);
 
--  The iterator loop is traversing an array. This case does not
-   --  require any transformation.
+   --  require any transformation, unless the name contains a call
+   --  that returns on the secondary stack since we need to release
+   --  the space allocated there.
 
-   if Is_Array_Type (Cont_Typ) then
+   if Is_Array_Type (Cont_Typ)
+ and then not Has_Sec_Stack_Call (Nam_Copy)
+   then
   null;
 
--  Otherwise unconditionally wrap the loop statement within
-- 
2.40.0



[COMMITTED] ada: Fix double finalization of case expression in concatenation

2023-06-27 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

This streamlines the expansion of case expressions by not wrapping them in
an Expression_With_Actions node when the type is not by copy, which avoids
the creation of a temporary and the associated finalization issues.

That's the same strategy as the one used for the expansion of if expressions
when the type is by reference, unless Back_End_Handles_Limited_Types is set
to True. Given that it is never set to True, except by a debug switch, and
has never been implemented, this parameter is removed in the process.

gcc/ada/

* debug.adb (d.L): Remove documentation.
* exp_ch4.adb (Expand_N_Case_Expression): In the not-by-copy case,
do not wrap the case statement in an Expression_With_Actions node.
(Expand_N_If_Expression): Do not test
Back_End_Handles_Limited_Types
* gnat1drv.adb (Adjust_Global_Switches): Do not set it.
* opt.ads (Back_End_Handles_Limited_Types): Delete.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/debug.adb|  6 -
 gcc/ada/exp_ch4.adb  | 58 ++--
 gcc/ada/gnat1drv.adb | 21 
 gcc/ada/opt.ads  | 10 
 4 files changed, 23 insertions(+), 72 deletions(-)

diff --git a/gcc/ada/debug.adb b/gcc/ada/debug.adb
index fd94203faf8..9c2a2b0e8d0 100644
--- a/gcc/ada/debug.adb
+++ b/gcc/ada/debug.adb
@@ -123,7 +123,6 @@ package body Debug is
--  d.I  Do not ignore enum representation clauses in CodePeer mode
--  d.J  Relaxed rules for pragma No_Return
--  d.K  Do not reject components in extensions overlapping with parent
-   --  d.L  Depend on back end for limited types in if and case expressions
--  d.M  Relaxed RM semantics
--  d.N  Use rounding when converting from floating point to fixed point
--  d.O  Dump internal SCO tables
@@ -898,11 +897,6 @@ package body Debug is
--   clause but they cannot be fully supported by the GCC type system.
--   This switch nevertheless allows them for the sake of compatibility.
 
-   --  d.L  Normally the front end generates special expansion for conditional
-   --   expressions of a limited type. This debug flag removes this special
-   --   case expansion, leaving it up to the back end to handle conditional
-   --   expressions correctly.
-
--  d.M  Relaxed RM semantics. This flag sets Opt.Relaxed_RM_Semantics
--   See Opt.Relaxed_RM_Semantics for more details.
 
diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
index fdaeb50512f..7af6dc087a4 100644
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -5306,7 +5306,6 @@ package body Exp_Ch4 is
   Alt: Node_Id;
   Case_Stmt  : Node_Id;
   Decl   : Node_Id;
-  Expr   : Node_Id;
   Target : Entity_Id := Empty;
   Target_Typ : Entity_Id;
 
@@ -5361,7 +5360,6 @@ package body Exp_Ch4 is
 
   --  In all other cases expand into
 
-  --do
   --   type Ptr_Typ is access all Typ;
   --   Target : Ptr_Typ;
   --   case X is
@@ -5371,7 +5369,8 @@ package body Exp_Ch4 is
   -- Target := BX'Unrestricted_Access;
   --  ...
   --   end case;
-  --in Target.all end;
+
+  --  and replace the case expression by a reference to Target.all.
 
   --  This approach avoids extra copies of potentially large objects. It
   --  also allows handling of values of limited or unconstrained types.
@@ -5514,20 +5513,21 @@ package body Exp_Ch4 is
Prepend_List (Actions (Alt), Stmts);
 end if;
 
---  Finalize any transient objects on exit from the alternative.
---  This is done only in the return optimization case because
---  otherwise the case expression is converted into an expression
---  with actions which already contains this form of processing.
-
-if Optimize_Return_Stmt then
-   Process_If_Case_Statements (N, Stmts);
-end if;
-
 Append_To
   (Alternatives (Case_Stmt),
Make_Case_Statement_Alternative (Sloc (Alt),
  Discrete_Choices => Discrete_Choices (Alt),
  Statements   => Stmts));
+
+--  Finalize any transient objects on exit from the alternative.
+--  This needs to be done only when the case expression is _not_
+--  later converted into an expression with actions, which already
+--  contains this form of processing, and after Stmts is attached
+--  to the Alternatives list above (for Safe_To_Capture_Value).
+
+if Optimize_Return_Stmt or else not Is_Copy_Type (Typ) then
+   Process_If_Case_Statements (N, Stmts);
+end if;
  end;
 
  Next (Alt);
@@ -5539,30 +5539,24 @@ package body Exp_Ch4 is
  Rewrite (Par, Case_Stmt);
  Analyze (Par);
 
-  --  Otherwise conv

[COMMITTED] ada: Fix incorrect handling of iterator specifications in recent change

2023-06-27 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

Unlike for loop parameter specifications where it references an index, the
defining identifier references an element in them.

gcc/ada/

* sem_ch12.adb (Check_Generic_Actuals): Check the component type
of constants and variables of an array type.
(Copy_Generic_Node): Fix bogus handling of iterator
specifications.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch12.adb | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/gcc/ada/sem_ch12.adb b/gcc/ada/sem_ch12.adb
index 43fcff2c9d5..61e0ec47392 100644
--- a/gcc/ada/sem_ch12.adb
+++ b/gcc/ada/sem_ch12.adb
@@ -7192,11 +7192,16 @@ package body Sem_Ch12 is
 Set_Is_Hidden (E, False);
  end if;
 
- --  Check directly the type of the actual objects
+ --  Check directly the type of the actual objects, including the
+ --  component type for array types.
 
  if Ekind (E) in E_Constant | E_Variable then
 Check_Actual_Type (Etype (E));
 
+if Is_Array_Type (Etype (E)) then
+   Check_Actual_Type (Component_Type (Etype (E)));
+end if;
+
  --  As well as the type of formal parameters of actual subprograms
 
  elsif Ekind (E) in E_Function | E_Procedure
@@ -8520,13 +8525,12 @@ package body Sem_Ch12 is
 Copy_Descendants;
  end;
 
-  --  Iterator and loop parameter specifications do not have an identifier
-  --  denoting the index type, so we must locate it through the expression
-  --  to check whether the views are consistent.
+  --  Loop parameter specifications do not have an identifier denoting the
+  --  index type, so we must locate it through the defining identifier to
+  --  check whether the views are consistent.
 
-  elsif Nkind (N) in N_Iterator_Specification
-   | N_Loop_Parameter_Specification
- and then Instantiating
+  elsif Nkind (N) = N_Loop_Parameter_Specification
+and then Instantiating
   then
  declare
 Id : constant Entity_Id :=
-- 
2.40.0



[COMMITTED] ada: Make the identification of case expressions more robust

2023-06-27 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

gcc/ada/

* gcc-interface/trans.cc (Case_Statement_to_gnu): Rename boolean
constant and use From_Conditional_Expression flag for its value.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index ddc7b6dde1e..b74bb0683bf 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -2700,11 +2700,9 @@ Case_Statement_to_gnu (Node_Id gnat_node)
 never been problematic, but not for case expressions in Ada 2012.  */
   if (choices_added_p)
{
- const bool is_case_expression
-   = (Nkind (Parent (gnat_node)) == N_Expression_With_Actions);
- tree group
-   = build_stmt_group (Statements (gnat_when), !is_case_expression);
- bool group_may_fallthru = block_may_fallthru (group);
+ const bool case_expr_p = From_Conditional_Expression (gnat_node);
+ tree group = build_stmt_group (Statements (gnat_when), !case_expr_p);
+ const bool group_may_fallthru = block_may_fallthru (group);
  add_stmt (group);
  if (group_may_fallthru)
{
-- 
2.40.0



[COMMITTED] ada: Fix bad interaction between inlining and thunk generation

2023-06-27 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

This may cause the type of the RESULT_DECL of a function which returns by
invisible reference to be turned into a reference type twice.

gcc/ada/

* gcc-interface/trans.cc (Subprogram_Body_to_gnu): Add guard to the
code turning the type of the RESULT_DECL into a reference type.
(maybe_make_gnu_thunk): Use a more precise guard in the same case.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index b74bb0683bf..f5eadbbc895 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -3902,8 +3902,11 @@ Subprogram_Body_to_gnu (Node_Id gnat_node)
 gnu_return_var_elmt = NULL_TREE;
 
   /* If the function returns by invisible reference, make it explicit in the
- function body.  See gnat_to_gnu_subprog_type for more details.  */
-  if (TREE_ADDRESSABLE (gnu_subprog_type))
+ function body, but beware that maybe_make_gnu_thunk may already have done
+ it if the function is inlined across units.  See gnat_to_gnu_subprog_type
+ for more details.  */
+  if (TREE_ADDRESSABLE (gnu_subprog_type)
+  && TREE_CODE (TREE_TYPE (gnu_result_decl)) != REFERENCE_TYPE)
 {
   TREE_TYPE (gnu_result_decl)
= build_reference_type (TREE_TYPE (gnu_result_decl));
@@ -11015,7 +11018,7 @@ maybe_make_gnu_thunk (Entity_Id gnat_thunk, tree 
gnu_thunk)
  same transformation as Subprogram_Body_to_gnu here.  */
   if (TREE_ADDRESSABLE (TREE_TYPE (gnu_target))
   && DECL_EXTERNAL (gnu_target)
-  && !POINTER_TYPE_P (TREE_TYPE (DECL_RESULT (gnu_target
+  && TREE_CODE (TREE_TYPE (DECL_RESULT (gnu_target))) != REFERENCE_TYPE)
 {
   TREE_TYPE (DECL_RESULT (gnu_target))
= build_reference_type (TREE_TYPE (DECL_RESULT (gnu_target)));
-- 
2.40.0



Enable ranger for ipa-prop

2023-06-27 Thread Jan Hubicka via Gcc-patches
Hi,
as shown in the testcase (which would eventually be useful for
optimizing std::vector's push_back), ipa-prop can use context dependent ranger
queries for better value range info.

Bootstrapped/regtested x86_64-linux, OK?

Honza

gcc/ChangeLog:

PR middle-end/110377
* ipa-prop.cc (ipa_compute_jump_functions_for_edge): Add ranger
parameter; use ranger instance for rnage queries.
(ipa_compute_jump_functions_for_bb): Pass around ranger.
(analysis_dom_walker::before_dom_children): Enable ranger.

gcc/testsuite/ChangeLog:

PR middle-end/110377
* gcc.dg/tree-ssa/pr110377.c: New test.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c
new file mode 100644
index 000..cbe3441caea
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c
@@ -0,0 +1,16 @@
+/* { dg-do compile */
+/* { dg-options "-O2 -fdump-ipa-fnsummary" } */
+int test3(int);
+__attribute__ ((noinline))
+void test2(int a)
+{
+   test3(a);
+}
+void
+test(int n)
+{
+if (n > 5)
+  __builtin_unreachable ();
+test2(n);
+}
+/* { dg-final { scan-tree-dump "-INF, 5-INF" "fnsummary" } }  */
diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
index 41c812194ca..693d4805d93 100644
--- a/gcc/ipa-prop.cc
+++ b/gcc/ipa-prop.cc
@@ -2341,7 +2341,8 @@ ipa_set_jfunc_vr (ipa_jump_func *jf, const ipa_vr &vr)
 
 static void
 ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi,
-struct cgraph_edge *cs)
+struct cgraph_edge *cs,
+gimple_ranger *ranger)
 {
   ipa_node_params *info = ipa_node_params_sum->get (cs->caller);
   ipa_edge_args *args = ipa_edge_args_sum->get_create (cs);
@@ -2386,7 +2387,7 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
 
  if (TREE_CODE (arg) == SSA_NAME
  && param_type
- && get_range_query (cfun)->range_of_expr (vr, arg)
+ && get_range_query (cfun)->range_of_expr (vr, arg, cs->call_stmt)
  && vr.nonzero_p ())
addr_nonzero = true;
  else if (tree_single_nonzero_warnv_p (arg, &strict_overflow))
@@ -2408,7 +2409,7 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
  && Value_Range::supports_type_p (param_type)
  && irange::supports_p (TREE_TYPE (arg))
  && irange::supports_p (param_type)
- && get_range_query (cfun)->range_of_expr (vr, arg)
+ && ranger->range_of_expr (vr, arg, cs->call_stmt)
  && !vr.undefined_p ())
{
  Value_Range resvr (vr);
@@ -2517,7 +2518,8 @@ ipa_compute_jump_functions_for_edge (struct 
ipa_func_body_info *fbi,
from BB.  */
 
 static void
-ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, basic_block 
bb)
+ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, basic_block 
bb,
+  gimple_ranger *ranger)
 {
   struct ipa_bb_info *bi = ipa_get_bb_info (fbi, bb);
   int i;
@@ -2536,7 +2538,7 @@ ipa_compute_jump_functions_for_bb (struct 
ipa_func_body_info *fbi, basic_block b
  && !gimple_call_fnspec (cs->call_stmt).known_p ())
continue;
}
-  ipa_compute_jump_functions_for_edge (fbi, cs);
+  ipa_compute_jump_functions_for_edge (fbi, cs, ranger);
 }
 }
 
@@ -3110,19 +3112,27 @@ class analysis_dom_walker : public dom_walker
 {
 public:
   analysis_dom_walker (struct ipa_func_body_info *fbi)
-: dom_walker (CDI_DOMINATORS), m_fbi (fbi) {}
+: dom_walker (CDI_DOMINATORS), m_fbi (fbi)
+  {
+m_ranger = enable_ranger (cfun, false);
+  }
+  ~analysis_dom_walker ()
+  {
+disable_ranger (cfun);
+  }
 
   edge before_dom_children (basic_block) final override;
 
 private:
   struct ipa_func_body_info *m_fbi;
+  gimple_ranger *m_ranger;
 };
 
 edge
 analysis_dom_walker::before_dom_children (basic_block bb)
 {
   ipa_analyze_params_uses_in_bb (m_fbi, bb);
-  ipa_compute_jump_functions_for_bb (m_fbi, bb);
+  ipa_compute_jump_functions_for_bb (m_fbi, bb, m_ranger);
   return NULL;
 }
 


Re: Enable ranger for ipa-prop

2023-06-27 Thread Andrew MacLeod via Gcc-patches



On 6/27/23 09:19, Jan Hubicka wrote:

Hi,
as shown in the testcase (which would eventually be useful for
optimizing std::vector's push_back), ipa-prop can use context dependent ranger
queries for better value range info.

Bootstrapped/regtested x86_64-linux, OK?


Quick question.

When you call enable_ranger(), its gives you a ranger back, but it also 
sets the range query for the specified context to that same instance.  
So from that point forward  all existing calls to get_range_query(fun) 
will now use the context ranger


enable_ranger (struct function *fun, bool use_imm_uses)
<...>
  gcc_checking_assert (!fun->x_range_query);
  r = new gimple_ranger (use_imm_uses);
  fun->x_range_query = r;
  return r;

So you probably dont have to pass a ranger around?  or is that ranger 
you are passing for a different context?



Andrew




[PATCH] RISC-V: Fix bug of pre-calculated const vector mask

2023-06-27 Thread Juzhe-Zhong
GCC doesn't known RVV is using compact mask model.
Consider this following case:

#define N 16

int
main ()
{
  int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
  int8_t out[N] = {0};
  for (int8_t i = 0; i < N; ++i)
if (mask[i])
  out[i] = i;
  for (int8_t i = 0; i < N; ++i)
{
  if (mask[i])
assert (out[i] == i);
  else
assert (out[i] == 0);
}
}

Before this patch, the pre-calculated mask in constant memory pool:
.LC1:
.byte   68 > 0b01000100

This is incorrect, such case failed in execution.

After this patch:
.LC1:
.byte   10 > 0b1010

Pass on exection.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (rvv_builder::compact_mask): New function.
(expand_const_vector): Fix bug.
* config/riscv/riscv.cc (riscv_const_insns): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 61 +--
 gcc/config/riscv/riscv.cc |  6 ++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-1.c   | 23 +++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-2.c   | 23 +++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-3.c   | 23 +++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-4.c   | 23 +++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-5.c   | 25 
 .../riscv/rvv/autovec/vls-vlmax/bitmask-6.c   | 27 
 .../riscv/rvv/autovec/vls-vlmax/bitmask-7.c   | 30 +
 9 files changed, 236 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index adb8d7d36a5..54d1904bbe8 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -291,6 +291,7 @@ public:
 
   bool single_step_npatterns_p () const;
   bool npatterns_all_equal_p () const;
+  rtx compact_mask () const;
 
   machine_mode new_mode () const { return m_new_mode; }
   scalar_mode inner_mode () const { return m_inner_mode; }
@@ -505,6 +506,44 @@ rvv_builder::npatterns_all_equal_p () const
   return true;
 }
 
+/* Generate the compact mask.
+
+ E.g: mask = { 0, -1 }, mode = VNx2BI, bitsize = 128bits.
+
+ GCC by default will generate the mask = 0b0001x.
+
+ However, it's not expected mask for RVV since RVV
+ prefers the compact mask = 0b10x.
+*/
+rtx
+rvv_builder::compact_mask () const
+{
+  /* Use the container mode with SEW = 8 and LMUL = 1.  */
+  unsigned container_size
+= MAX (CEIL (npatterns (), 8), BYTES_PER_RISCV_VECTOR.to_constant () / 8);
+  machine_mode container_mode
+= get_vector_mode (QImode, container_size).require ();
+
+  unsigned nunits = GET_MODE_NUNITS (container_mode).to_constant ();
+  rtvec v = rtvec_alloc (nunits);
+  for (unsigned i = 0; i < nunits; i++)
+RTVEC_ELT (v, i) = const0_rtx;
+
+  unsigned char b = 0;
+  for (unsigned i = 0; i < npatterns (); i++)
+{
+  if (INTVAL (elt (i)))
+   b = b | (1 << (i % 8));
+
+  if ((i > 0 && (i % 8) == 7) || (i == (npatterns () - 1)))
+   {
+ RTVEC_ELT (v, ((i + 7) / 8) - 1) = gen_int_mode (b, QImode);
+ b = 0;
+   }
+}
+  return gen_rtx_CONST_VECTOR (container_mode, v);
+}
+
 static unsigned
 get_sew (machine_mode mode)
 {
@@ -1141,11 +1180,23 @@ expand_const_vector (rtx target, rtx src)
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
 {
   rtx elt;
-  gcc_assert (
-   const_vec_duplicate_p (src, &elt)
-   && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
-  rtx ops[] = {target, src};
-  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
+  unsigned int nelts;
+  if (const_vec_duplicate_p (src, &elt))
+   {
+ rtx ops[] = {target, src};
+ emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
+   }
+  else if (GET_MODE_NUNITS (mode).is_constant (&nelts))
+   {
+ rvv_builder builder (mode, nelts, 1);
+  

Re: Enable ranger for ipa-prop

2023-06-27 Thread Martin Jambor
On Tue, Jun 27 2023, Jan Hubicka wrote:
> Hi,
> as shown in the testcase (which would eventually be useful for
> optimizing std::vector's push_back), ipa-prop can use context dependent ranger
> queries for better value range info.
>
> Bootstrapped/regtested x86_64-linux, OK?
>
> Honza
>
> gcc/ChangeLog:
>
>   PR middle-end/110377
>   * ipa-prop.cc (ipa_compute_jump_functions_for_edge): Add ranger
>   parameter; use ranger instance for rnage queries.
>   (ipa_compute_jump_functions_for_bb): Pass around ranger.
>   (analysis_dom_walker::before_dom_children): Enable ranger.

Looks good to me (with or without passing a ranger parameter around).

Martin


>
> gcc/testsuite/ChangeLog:
>
>   PR middle-end/110377
>   * gcc.dg/tree-ssa/pr110377.c: New test.
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c
> new file mode 100644
> index 000..cbe3441caea
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr110377.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile */
> +/* { dg-options "-O2 -fdump-ipa-fnsummary" } */
> +int test3(int);
> +__attribute__ ((noinline))
> +void test2(int a)
> +{
> + test3(a);
> +}
> +void
> +test(int n)
> +{
> +if (n > 5)
> +  __builtin_unreachable ();
> +test2(n);
> +}
> +/* { dg-final { scan-tree-dump "-INF, 5-INF" "fnsummary" } }  */
> diff --git a/gcc/ipa-prop.cc b/gcc/ipa-prop.cc
> index 41c812194ca..693d4805d93 100644
> --- a/gcc/ipa-prop.cc
> +++ b/gcc/ipa-prop.cc
> @@ -2341,7 +2341,8 @@ ipa_set_jfunc_vr (ipa_jump_func *jf, const ipa_vr &vr)
>  
>  static void
>  ipa_compute_jump_functions_for_edge (struct ipa_func_body_info *fbi,
> -  struct cgraph_edge *cs)
> +  struct cgraph_edge *cs,
> +  gimple_ranger *ranger)
>  {
>ipa_node_params *info = ipa_node_params_sum->get (cs->caller);
>ipa_edge_args *args = ipa_edge_args_sum->get_create (cs);
> @@ -2386,7 +2387,7 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
>  
> if (TREE_CODE (arg) == SSA_NAME
> && param_type
> -   && get_range_query (cfun)->range_of_expr (vr, arg)
> +   && get_range_query (cfun)->range_of_expr (vr, arg, cs->call_stmt)
> && vr.nonzero_p ())
>   addr_nonzero = true;
> else if (tree_single_nonzero_warnv_p (arg, &strict_overflow))
> @@ -2408,7 +2409,7 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
> && Value_Range::supports_type_p (param_type)
> && irange::supports_p (TREE_TYPE (arg))
> && irange::supports_p (param_type)
> -   && get_range_query (cfun)->range_of_expr (vr, arg)
> +   && ranger->range_of_expr (vr, arg, cs->call_stmt)
> && !vr.undefined_p ())
>   {
> Value_Range resvr (vr);
> @@ -2517,7 +2518,8 @@ ipa_compute_jump_functions_for_edge (struct 
> ipa_func_body_info *fbi,
> from BB.  */
>  
>  static void
> -ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, 
> basic_block bb)
> +ipa_compute_jump_functions_for_bb (struct ipa_func_body_info *fbi, 
> basic_block bb,
> +gimple_ranger *ranger)
>  {
>struct ipa_bb_info *bi = ipa_get_bb_info (fbi, bb);
>int i;
> @@ -2536,7 +2538,7 @@ ipa_compute_jump_functions_for_bb (struct 
> ipa_func_body_info *fbi, basic_block b
> && !gimple_call_fnspec (cs->call_stmt).known_p ())
>   continue;
>   }
> -  ipa_compute_jump_functions_for_edge (fbi, cs);
> +  ipa_compute_jump_functions_for_edge (fbi, cs, ranger);
>  }
>  }
>  
> @@ -3110,19 +3112,27 @@ class analysis_dom_walker : public dom_walker
>  {
>  public:
>analysis_dom_walker (struct ipa_func_body_info *fbi)
> -: dom_walker (CDI_DOMINATORS), m_fbi (fbi) {}
> +: dom_walker (CDI_DOMINATORS), m_fbi (fbi)
> +  {
> +m_ranger = enable_ranger (cfun, false);
> +  }
> +  ~analysis_dom_walker ()
> +  {
> +disable_ranger (cfun);
> +  }
>  
>edge before_dom_children (basic_block) final override;
>  
>  private:
>struct ipa_func_body_info *m_fbi;
> +  gimple_ranger *m_ranger;
>  };
>  
>  edge
>  analysis_dom_walker::before_dom_children (basic_block bb)
>  {
>ipa_analyze_params_uses_in_bb (m_fbi, bb);
> -  ipa_compute_jump_functions_for_bb (m_fbi, bb);
> +  ipa_compute_jump_functions_for_bb (m_fbi, bb, m_ranger);
>return NULL;
>  }
>  


Re: [PATCH] configure: Implement --enable-host-bind-now

2023-06-27 Thread Marek Polacek via Gcc-patches
On Tue, Jun 27, 2023 at 01:39:16PM +0200, Martin Jambor wrote:
> Hello,
> 
> On Tue, May 16 2023, Marek Polacek via Gcc-patches wrote:
> > As promised in the --enable-host-pie patch, this patch adds another
> > configure option, --enable-host-bind-now, which adds -z now when linking
> > the compiler executables in order to extend hardening.  BIND_NOW with RELRO
> > allows the GOT to be marked RO; this prevents GOT modification attacks.
> >
> > This option does not affect linking of target libraries; you can use
> > LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now to enable RELRO/BIND_NOW.
> >
> > With this patch:
> > $ readelf -Wd cc1{,plus} | grep FLAGS
> >  0x001e (FLAGS)  BIND_NOW
> >  0x6ffb (FLAGS_1)Flags: NOW PIE
> >  0x001e (FLAGS)  BIND_NOW
> >  0x6ffb (FLAGS_1)Flags: NOW PIE
> >
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> >
> > c++tools/ChangeLog:
> >
> > * configure.ac (--enable-host-bind-now): New check.
> > * configure: Regenerate.
> >
> > gcc/ChangeLog:
> >
> > * configure.ac (--enable-host-bind-now): New check.  Add
> > -Wl,-z,now to LD_PICFLAG if --enable-host-bind-now.
> > * configure: Regenerate.
> > * doc/install.texi: Document --enable-host-bind-now.
> >
> > lto-plugin/ChangeLog:
> >
> > * configure.ac (--enable-host-bind-now): New check.  Link with
> > -z,now.
> > * configure: Regenerate.
> 
> Our reconfiguration checking script complains about a missing hunk in
> lto-plugin/Makefile.in:
> 
> diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in
> index cb568e1e09f..f6f5b020ff5 100644
> --- a/lto-plugin/Makefile.in
> +++ b/lto-plugin/Makefile.in
> @@ -298,6 +298,7 @@ datadir = @datadir@
>  datarootdir = @datarootdir@
>  docdir = @docdir@
>  dvidir = @dvidir@
> +enable_host_bind_now = @enable_host_bind_now@
>  exec_prefix = @exec_prefix@
>  gcc_build_dir = @gcc_build_dir@
>  get_gcc_base_ver = @get_gcc_base_ver@
> 
> 
> I am somewhat puzzled why the line is not missing in any of the other
> Makefile.in files.  Can you please check whether that is the only thing
> that is missing (assuming it is actually missing)?

Arg, once again, I'm sorry.  I don't know how this happened.  It would
be trivial to fix it but since

commit 4a48a38fa99f067b8f3a3d1a5dc7a1e602db351f
Author: Eric Botcazou 
Date:   Wed Jun 21 18:19:36 2023 +0200

ada: Fix build of GNAT tools

the build with Ada included fails with --enable-host-pie.  So that needs
to be fixed first.

Eric, I'm not asking you to fix that, but I'm curious, what did the
commit above fix?  The patch looks correct; I'm just puzzled why I
hadn't seen any build failures.

The --enable-host-pie patch has been a nightmare :(.

Marek



Re: [V1][PATCH 1/3] Provide element_count attribute to flexible array member field (PR108896)

2023-06-27 Thread Qing Zhao via Gcc-patches
Hi,

Based on the discussion so far and further consideration, the following is my 
plan for this new attribute:

1.  The syntax of the new attribute will be:

__attribute__((counted_by (count_field_id)));

In the above, count_field_id is the identifier for the field that carries the 
number 
of elements info in the same structure of the FAM. 

For example:

struct object {
..
size_t count:  /* carries the number of elements info for the FAM flex.  */
int flex[] __attribute__((counted_by (count)));
};

2.  Later, if the argument of the this attribute need to be extended to an 
expression, we might need to 
extend the C FE to accept ".count”  in the future. 

Let me know if you have further comments and suggestions.

thanks.

Qing

> On Jun 20, 2023, at 3:40 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
> 
> 
>> On Jun 16, 2023, at 5:35 PM, Joseph Myers  wrote:
>> 
>> On Fri, 16 Jun 2023, Qing Zhao via Gcc-patches wrote:
>> 
 So for 
 
 struct foo { int c; int buf[(struct { int d; }){ .d = .c }]; };
 
 one knows during parsing that the .d is a designator
 and that .c is not.
>>> 
>>> Therefore, the above should be invalid based on this rule since .c is 
>>> not a member in the current structure.
>> 
>> What do you mean by "current structure"?  I think two different concepts 
>> are being conflated: the structure *being initialized* (what the C 
>> standard calls the "current object" for a brace-enclosed initializer 
>> list),
> 
> I think the concept of “current structure” should be stick to this. 
> 
>> and the structure *being defined*.
> Not this.
> 
> (Forgive me about my poor English -:)).
> 
> Then it will be cleaner? 
> 
> What’s your opinion?
> 
> 
>> The former is what's relevant 
>> for designators.  The latter is what's relevant for the suggested new 
>> syntax.  And .c *is* a member of the structure being defined in this 
>> example.
>> 
>> Those two structure types are always different, except for corner cases 
>> with C2x tag compatibility (where an object of structure type might be 
>> initialized in the middle of a redefinition of that type).
> 
> Can you give an example on this?  Thanks.
> 
> Qing
>> 
>> -- 
>> Joseph S. Myers
>> jos...@codesourcery.com



Re: [PATCH] configure: Implement --enable-host-bind-now

2023-06-27 Thread Iain Sandoe via Gcc-patches



> On 27 Jun 2023, at 16:31, Marek Polacek via Gcc-patches 
>  wrote:
> 
> On Tue, Jun 27, 2023 at 01:39:16PM +0200, Martin Jambor wrote:
>> Hello,
>> 
>> On Tue, May 16 2023, Marek Polacek via Gcc-patches wrote:
>>> As promised in the --enable-host-pie patch, this patch adds another
>>> configure option, --enable-host-bind-now, which adds -z now when linking
>>> the compiler executables in order to extend hardening.  BIND_NOW with RELRO
>>> allows the GOT to be marked RO; this prevents GOT modification attacks.
>>> 
>>> This option does not affect linking of target libraries; you can use
>>> LDFLAGS_FOR_TARGET=-Wl,-z,relro,-z,now to enable RELRO/BIND_NOW.
>>> 
>>> With this patch:
>>> $ readelf -Wd cc1{,plus} | grep FLAGS
>>> 0x001e (FLAGS)  BIND_NOW
>>> 0x6ffb (FLAGS_1)Flags: NOW PIE
>>> 0x001e (FLAGS)  BIND_NOW
>>> 0x6ffb (FLAGS_1)Flags: NOW PIE
>>> 
>>> Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
>>> 
>>> c++tools/ChangeLog:
>>> 
>>> * configure.ac (--enable-host-bind-now): New check.
>>> * configure: Regenerate.
>>> 
>>> gcc/ChangeLog:
>>> 
>>> * configure.ac (--enable-host-bind-now): New check.  Add
>>> -Wl,-z,now to LD_PICFLAG if --enable-host-bind-now.
>>> * configure: Regenerate.
>>> * doc/install.texi: Document --enable-host-bind-now.
>>> 
>>> lto-plugin/ChangeLog:
>>> 
>>> * configure.ac (--enable-host-bind-now): New check.  Link with
>>> -z,now.
>>> * configure: Regenerate.
>> 
>> Our reconfiguration checking script complains about a missing hunk in
>> lto-plugin/Makefile.in:
>> 
>> diff --git a/lto-plugin/Makefile.in b/lto-plugin/Makefile.in
>> index cb568e1e09f..f6f5b020ff5 100644
>> --- a/lto-plugin/Makefile.in
>> +++ b/lto-plugin/Makefile.in
>> @@ -298,6 +298,7 @@ datadir = @datadir@
>> datarootdir = @datarootdir@
>> docdir = @docdir@
>> dvidir = @dvidir@
>> +enable_host_bind_now = @enable_host_bind_now@
>> exec_prefix = @exec_prefix@
>> gcc_build_dir = @gcc_build_dir@
>> get_gcc_base_ver = @get_gcc_base_ver@
>> 
>> 
>> I am somewhat puzzled why the line is not missing in any of the other
>> Makefile.in files.  Can you please check whether that is the only thing
>> that is missing (assuming it is actually missing)?
> 
> Arg, once again, I'm sorry.  I don't know how this happened.  It would
> be trivial to fix it but since
> 
> commit 4a48a38fa99f067b8f3a3d1a5dc7a1e602db351f
> Author: Eric Botcazou 
> Date:   Wed Jun 21 18:19:36 2023 +0200
> 
>ada: Fix build of GNAT tools
> 
> the build with Ada included fails with --enable-host-pie.  So that needs
> to be fixed first.
> 
> Eric, I'm not asking you to fix that, but I'm curious, what did the
> commit above fix?  The patch looks correct; I'm just puzzled why I
> hadn't seen any build failures.

I am also curious as to why we do not need some logic to do a similar job
in gcc-interface/Make-lang.in:

ifeq ($(STAGE1),True)
  ADA_INCLUDES=$(COMMON_ADA_INCLUDES)
  adalib=$(dir $(shell $(CC) -print-libgcc-file-name))adalib
  GNATLIB=$(adalib)/$(if $(wildcard $(adalib)/libgnat.a), libgnat.a,libgnat.so) 
$(STAGE1_LIBS)
else

^^^ I would expect us to need to switch to libgnat_pic.a when we are building a 
PIE exe.

Iain

> The --enable-host-pie patch has been a nightmare :(.
> 
> Marek



Re: [PATCH] match.pd: Use element_mode instead of TYPE_MODE.

2023-06-27 Thread Robin Dapp via Gcc-patches
> You can put it into the original one.

Bootstrap and testsuite run were successful.
I'm going to push the attached, thanks.

Regards
 Robin

diff --git a/gcc/match.pd b/gcc/match.pd
index 33ccda3e7b6..83bcefa914b 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7454,10 +7454,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  values representable in the TYPE to be within the
  range of normal values of ITYPE.  */
  (if (element_precision (newtype) < element_precision (itype)
+  && (!VECTOR_MODE_P (TYPE_MODE (newtype))
+  || target_supports_op_p (newtype, op, optab_default))
   && (flag_unsafe_math_optimizations
   || (element_precision (newtype) == element_precision 
(type)
-  && real_can_shorten_arithmetic (TYPE_MODE (itype),
-  TYPE_MODE (type))
+  && real_can_shorten_arithmetic (element_mode (itype),
+  element_mode (type))
   && !excess_precision_type (newtype)))
   && !types_match (itype, newtype))
 (convert:type (op (convert:newtype @1)



[PATCH] Basic asm blocks should always be volatile

2023-06-27 Thread Julian Waters via Gcc-patches
gcc's documentatation mentions that all basic asm blocks are always volatile,
yet the parser fails to account for this by only ever setting
volatile_p to true
if the volatile qualifier is found. This patch fixes this by adding a
special case check for extended_p before finish_asm_stmt is called

>From 3094be39e3e65a6a638f05fafd858b89fefde6b5 Mon Sep 17 00:00:00 2001
From: TheShermanTanker 
Date: Tue, 27 Jun 2023 23:56:38 +0800
Subject: [PATCH] asm not using extended syntax should always be volatile

---
 gcc/cp/parser.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index a6341b9..ef3d06a 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -22355,6 +22355,9 @@ cp_parser_asm_definition (cp_parser* parser)
   /* Create the ASM_EXPR.  */
   if (parser->in_function_body)
  {
+  if (!extended_p) {
+volatile_p = true;
+  }
asm_stmt = finish_asm_stmt (asm_loc, volatile_p, string, outputs,
inputs, clobbers, labels, inline_p);
/* If the extended syntax was not used, mark the ASM_EXPR.  */
-- 
2.35.1.windows.2


Re: Enable ranger for ipa-prop

2023-06-27 Thread Jan Hubicka via Gcc-patches
> 
> On 6/27/23 09:19, Jan Hubicka wrote:
> > Hi,
> > as shown in the testcase (which would eventually be useful for
> > optimizing std::vector's push_back), ipa-prop can use context dependent 
> > ranger
> > queries for better value range info.
> > 
> > Bootstrapped/regtested x86_64-linux, OK?
> 
> Quick question.
> 
> When you call enable_ranger(), its gives you a ranger back, but it also sets
> the range query for the specified context to that same instance.  So from
> that point forward  all existing calls to get_range_query(fun) will now use
> the context ranger
> 
> enable_ranger (struct function *fun, bool use_imm_uses)
> <...>
>   gcc_checking_assert (!fun->x_range_query);
>   r = new gimple_ranger (use_imm_uses);
>   fun->x_range_query = r;
>   return r;
> 
> So you probably dont have to pass a ranger around?  or is that ranger you
> are passing for a different context?

I don't need passing ranger around - I just did not know that.  I tought
the default one is the context insensitive one, I will simplify the
patch.  I need to look more into how ranger works.

Honza
> 
> 
> Andrew
> 
> 


Re: [PATCH] configure: Implement --enable-host-bind-now

2023-06-27 Thread Eric Botcazou via Gcc-patches
> Arg, once again, I'm sorry.  I don't know how this happened.  It would
> be trivial to fix it but since
> 
> commit 4a48a38fa99f067b8f3a3d1a5dc7a1e602db351f
> Author: Eric Botcazou 
> Date:   Wed Jun 21 18:19:36 2023 +0200
> 
> ada: Fix build of GNAT tools
> 
> the build with Ada included fails with --enable-host-pie.  So that needs
> to be fixed first.
> 
> Eric, I'm not asking you to fix that, but I'm curious, what did the
> commit above fix?  The patch looks correct; I'm just puzzled why I
> hadn't seen any build failures.

The GNAT tools were failing to build for a compiler configured with --disable-
host-pie --enable-default-pie.

-- 
Eric Botcazou




[x86 PATCH] Add cbranchti4 pattern to i386.md (for -m32 compare_by_pieces).

2023-06-27 Thread Roger Sayle

This patch fixes some very odd (unanticipated) code generation by
compare_by_pieces with -m32 -mavx, since the recent addition of the
cbranchoi4 pattern.  The issue is that cbranchoi4 is available with
TARGET_AVX, but cbranchti4 is currently conditional on TARGET_64BIT
which results in the odd behaviour (thanks to OPTAB_WIDEN) that with
-m32 -mavx, compare_by_pieces ends up (inefficiently) widening 128-bit
comparisons to 256-bits before performing PTEST.

This patch fixes this by providing a cbranchti4 pattern that's available
with either TARGET_64BIT or TARGET_SSE4_1.

For the test case below (again from PR 104610):

int foo(char *a)
{
static const char t[] = "0123456789012345678901234567890";
return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
}

GCC with -m32 -O2 -mavx currently produces the bonkers:

foo:pushl   %ebp
movl%esp, %ebp
andl$-32, %esp
subl$64, %esp
movl8(%ebp), %eax
vmovdqa .LC0, %xmm4
movl$0, 48(%esp)
vmovdqu (%eax), %xmm2
movl$0, 52(%esp)
movl$0, 56(%esp)
movl$0, 60(%esp)
movl$0, 16(%esp)
movl$0, 20(%esp)
movl$0, 24(%esp)
movl$0, 28(%esp)
vmovdqa %xmm2, 32(%esp)
vmovdqa %xmm4, (%esp)
vmovdqa (%esp), %ymm5
vpxor   32(%esp), %ymm5, %ymm0
vptest  %ymm0, %ymm0
jne .L2
vmovdqu 16(%eax), %xmm7
movl$0, 48(%esp)
movl$0, 52(%esp)
vmovdqa %xmm7, 32(%esp)
vmovdqa .LC1, %xmm7
movl$0, 56(%esp)
movl$0, 60(%esp)
movl$0, 16(%esp)
movl$0, 20(%esp)
movl$0, 24(%esp)
movl$0, 28(%esp)
vmovdqa %xmm7, (%esp)
vmovdqa (%esp), %ymm1
vpxor   32(%esp), %ymm1, %ymm0
vptest  %ymm0, %ymm0
je  .L6
.L2:movl$1, %eax
xorl$1, %eax
vzeroupper
leave
ret
.L6:xorl%eax, %eax
xorl$1, %eax
vzeroupper
leave
ret

with this patch, we now generate the (slightly) more sensible:

foo:vmovdqa .LC0, %xmm0
movl4(%esp), %eax
vpxor   (%eax), %xmm0, %xmm0
vptest  %xmm0, %xmm0
jne .L2
vmovdqa .LC1, %xmm0
vpxor   16(%eax), %xmm0, %xmm0
vptest  %xmm0, %xmm0
je  .L5
.L2:movl$1, %eax
xorl$1, %eax
ret
.L5:xorl%eax, %eax
xorl$1, %eax
ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-27  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_branch): Also use ptest
for TImode comparisons on 32-bit architectures.
* config/i386/i386.md (cbranch4): Change from SDWIM to
SWIM1248x to exclude/avoid TImode being conditional on -m64.
(cbranchti4): New define_expand for TImode on both TARGET_64BIT
and/or with TARGET_SSE4_1.
* config/i386/predicates.md (ix86_timode_comparison_operator):
New predicate that depends upon TARGET_64BIT.
(ix86_timode_comparison_operand): Likewise.

gcc/testsuite/ChangeLog
* gcc.target/i386/pieces-memcmp-2.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 9a8d244..567248d 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -2365,6 +2365,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, 
rtx label)
   /* Handle special case - vector comparsion with boolean result, transform
  it using ptest instruction.  */
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
+  || (mode == TImode && !TARGET_64BIT)
   || mode == OImode)
 {
   rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG);
@@ -2372,7 +2373,7 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, 
rtx label)
 
   gcc_assert (code == EQ || code == NE);
 
-  if (mode == OImode)
+  if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
{
  op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode);
  op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode);
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b50d82b..dcf0ba6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1352,8 +1352,8 @@
 
 (define_expand "cbranch4"
   [(set (reg:CC FLAGS_REG)
-   (compare:CC (match_operand:SDWIM 1 "nonimmediate_operand")
-   (match_operand:SDWIM 2 "")))
+   (compare:CC (match_operand:SWIM1248x 1 "nonimmediate_operand")
+   (match_operand:SWIM1248x 2 "")))
(set (pc) (if_then_else
   (match_operator 0 "ordered_comparison_operator"
[(reg:CC FLAGS_REG) (const_int 0)])
@@ -1368,6 +1368,22 @@

Re: [PATCH] Mark asm goto with outputs as volatile

2023-06-27 Thread Andrew Pinski via Gcc-patches
On Tue, Jun 27, 2023 at 12:14 AM Richard Biener via Gcc-patches
 wrote:
>
> On Tue, Jun 27, 2023 at 5:26 AM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > The manual references asm goto as being implicitly volatile already
> > and that was done when asm goto could not have outputs. When outputs
> > were added to `asm goto`, only asm goto without outputs were still being
> > marked as volatile. Now some parts of GCC decide, removing the `asm goto`
> > is ok if the output is not used, though not updating the CFG (this happens
> > on both the RTL level and the gimple level). Since the biggest user of `asm 
> > goto`
> > is the Linux kernel and they expect them to be volatile (they use them to
> > copy to/from userspace), we should just mark the inline-asm as volatile.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu.
>
> OK.

Committed to GCC 12 and GCC 13 branches also.

Thanks,
Andrew

>
> > PR middle-end/110420
> > PR middle-end/103979
> > PR middle-end/98619
> >
> > gcc/ChangeLog:
> >
> > * gimplify.cc (gimplify_asm_expr): Mark asm with labels as volatile.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.c-torture/compile/asmgoto-6.c: New test.
> > ---
> >  gcc/gimplify.cc   |  7 -
> >  .../gcc.c-torture/compile/asmgoto-6.c | 26 +++
> >  2 files changed, 32 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
> >
> > diff --git a/gcc/gimplify.cc b/gcc/gimplify.cc
> > index 0e24b915b8f..dc6a00e8bd9 100644
> > --- a/gcc/gimplify.cc
> > +++ b/gcc/gimplify.cc
> > @@ -6935,7 +6935,12 @@ gimplify_asm_expr (tree *expr_p, gimple_seq *pre_p, 
> > gimple_seq *post_p)
> >stmt = gimple_build_asm_vec (TREE_STRING_POINTER (ASM_STRING (expr)),
> >inputs, outputs, clobbers, labels);
> >
> > -  gimple_asm_set_volatile (stmt, ASM_VOLATILE_P (expr) || noutputs == 
> > 0);
> > +  /* asm is volatile if it was marked by the user as volatile or
> > +there is no outputs or this is an asm goto.  */
> > +  gimple_asm_set_volatile (stmt,
> > +  ASM_VOLATILE_P (expr)
> > +  || noutputs == 0
> > +  || labels);
> >gimple_asm_set_input (stmt, ASM_INPUT_P (expr));
> >gimple_asm_set_inline (stmt, ASM_INLINE_P (expr));
> >
> > diff --git a/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c 
> > b/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
> > new file mode 100644
> > index 000..0652bd4e4e1
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.c-torture/compile/asmgoto-6.c
> > @@ -0,0 +1,26 @@
> > +
> > +/* { dg-do compile } */
> > +/* PR middle-end/110420 */
> > +/* PR middle-end/103979 */
> > +/* PR middle-end/98619 */
> > +/* Test that the middle-end does not remove the asm goto
> > +   with an output. */
> > +
> > +static int t;
> > +void g(void);
> > +
> > +void f(void)
> > +{
> > +  int  __gu_val;
> > +  asm goto("#my asm "
> > + : "=&r"(__gu_val)
> > + :
> > + :
> > + : Efault);
> > +  t = __gu_val;
> > +  g();
> > +Efault:
> > +}
> > +
> > +/* Make sure "my asm " is still in the assembly. */
> > +/* { dg-final { scan-assembler "my asm " } } */
> > --
> > 2.31.1
> >


Re: Enable ranger for ipa-prop

2023-06-27 Thread Andrew MacLeod via Gcc-patches



On 6/27/23 12:24, Jan Hubicka wrote:

On 6/27/23 09:19, Jan Hubicka wrote:

Hi,
as shown in the testcase (which would eventually be useful for
optimizing std::vector's push_back), ipa-prop can use context dependent ranger
queries for better value range info.

Bootstrapped/regtested x86_64-linux, OK?

Quick question.

When you call enable_ranger(), its gives you a ranger back, but it also sets
the range query for the specified context to that same instance.  So from
that point forward  all existing calls to get_range_query(fun) will now use
the context ranger

enable_ranger (struct function *fun, bool use_imm_uses)
<...>
   gcc_checking_assert (!fun->x_range_query);
   r = new gimple_ranger (use_imm_uses);
   fun->x_range_query = r;
   return r;

So you probably dont have to pass a ranger around?  or is that ranger you
are passing for a different context?

I don't need passing ranger around - I just did not know that.  I tought
the default one is the context insensitive one, I will simplify the
patch.  I need to look more into how ranger works.



No need. Its magic!

Andrew


PS. well, we tried to provide an interface to make it as seamless as 
possible with the whole range-query thing.

10,000 foot view:

The range_query object (value-range.h) replaces the old 
SSA_NAME_RANGE_INFO macros.  It adds the ability to provide an optional 
context in the form of a stmt or edge to any query.  If no context is 
provided, it simply provides the global value. There are basically 3 
queries:


  virtual bool range_of_expr (vrange &r, tree expr, gimple * = NULL) ;
  virtual bool range_on_edge (vrange &r, edge, tree expr);
  virtual bool range_of_stmt (vrange &r, gimple *, tree name = NULL);

- range_of_stmt evaluates the DEF of the stmt, but can also evaluate 
things like  "if (x < y)" that have an implicit boolean LHS.  If NAME is 
provided, it needs to match the DEF. Thats mostly flexibility for 
dealing with something like multiple defs, you can specify which def.
- range_on_edge provides the range of an ssa-name as it would be valued 
on a specific edge.
- range_of_expr is used to ask for the range of any ssa_name or tree 
expression as it occurs on entry to a specific stmt. Normally we use 
this to ask for the range of an ssa-name as its used on a stmt,  but it 
can evaluate expression trees as well.


These requests are not limited to names which occur on a stmt.. we can 
recompute values by asking for the range of value as they occur at other 
locations in the IL.  ie

x_2 = b_3 + 5
<...>
if (b_3 > 7)
   blah (x_2)
When we ask for the range of x_2 at the call to blah, ranger actually 
recomputes x_2 = b_3 + 5 at the call site by asking for the range of b_3 
on the outgoing edge leading to the block with the call to blah, and 
thus uses b_3 == [8, +INF] to re-evaluate x_2


Internally, ranger uses the exact same API to evaluate everything that 
external clients use.



The default query object is global_range_query, which ignores any 
location (stmt or edge) information provided, and simply returns the 
global value. This amounts to an identical result as the old 
SSA_NAME_RANGE_INFO request, and when get_range_query () is called, this 
is the default range_query that is provided.


When a pass calls enable_ranger(), the default query is changed to this 
new instance (which supports context information), and any further calls 
to get_range_query() will now invoke ranger instead of the 
global_range_query.  It uses its on-demand support to go and answer the 
range question by looking at only what it needs to in order to answer 
the question.  This is the exact same ranger code base that all the VRP 
passes use, so you get almost the same level of power to answer 
questions.  There are just a couple of little things that VRP enables 
because it does a DOM walk, but they are fairly minor for most cases.


if you use the range_query API, and do not provide a stmt or an edge, 
then we can't provide contextual range information, and you'll go back 
to getting just global information again.


I think Aldy has converted everything to the new range_query API...  
which means any pass that could benefit from contextual range 
information , in theory, only needs to enable_ranger() and provide a 
context stmt or edge on the range query call.


Just remember to disable it when done :-)

Andrew



Re: [testsuite] note pitfall in how outputs.exp sets gld

2023-06-27 Thread Mike Stump via Gcc-patches
On Jun 22, 2023, at 10:35 PM, Alexandre Oliva  wrote:
> 
> This patch documents a glitch in gcc.misc-tests/outputs.exp: it checks
> whether the linker is GNU ld, and uses that to decide whether to
> expect collect2 to create .ld1_args files under -save-temps, but
> collect2 bases that decision on whether HAVE_GNU_LD is set, which may
> be false zero if the linker in use is GNU ld.  Configuring
> --with-gnu-ld fixes this misalignment.  Without that, atsave tests are
> likely to fail, because without HAVE_GNU_LD, collect2 won't use @file
> syntax to run the linker (so it won't create .ld1_args files).
> 
> Long version: HAVE_GNU_LD is set when (i) DEFAULT_LINKER is set during
> configure, pointing at GNU ld; (ii) --with-gnu-ld is passed to
> configure; or (iii) config.gcc sets gnu_ld=yes.  If a port doesn't set
> gnu_ld, and the toolchain isn't configured so as to assume GNU ld,
> configure and thus collect2 conservatively assume the linker doesn't
> support @file arguments.
> 
> But outputs.exp can't see how configure set HAVE_GNU_LD (it may be
> used to test an installed compiler), and upon finding that the linker
> used by the compiler is GNU ld, it will expect collect2 to use @file
> arguments when running the linker.  If that assumption doesn't hold,
> atsave tests will fail.
> 
> Does it make sense to put this in?  I'd like to preserve this knowledge
> somehow, and I suppose this would be most useful for someone observing
> these failures and trying to figure out why they come about, so this
> seems the best place for them.  Ok to install?

Ok.



Re: [SVE] Fold svdupq to VEC_PERM_EXPR if elements are not constant

2023-06-27 Thread Richard Sandiford via Gcc-patches
Prathamesh Kulkarni  writes:
> Hi Richard,
> Sorry I forgot to commit this patch, which you had approved in:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615308.html
>
> Just for context for the following test:
> svint32_t f_s32(int32x4_t x)
> {
>   return svdupq_s32 (x[0], x[1], x[2], x[3]);
> }
>
> -O3 -mcpu=generic+sve generates following code after interleave+zip1 patch:
> f_s32:
> dup s31, v0.s[1]
> mov v30.8b, v0.8b
> ins v31.s[1], v0.s[3]
> ins v30.s[1], v0.s[2]
> zip1v0.4s, v30.4s, v31.4s
> dup z0.q, z0.q[0]
> ret
>
> Code-gen with attached patch:
> f_s32:
> dup z0.q, z0.q[0]
> ret
>
> Bootstrapped+tested on aarch64-linux-gnu.
> OK to commit ?
>
> Thanks,
> Prathamesh
>
> [SVE] Fold svdupq to VEC_PERM_EXPR if elements are not constant.
>
> gcc/ChangeLog:
> * config/aarch64/aarch64-sve-builtins-base.cc
> (svdupq_impl::fold_nonconst_dupq): New method.
> (svdupq_impl::fold): Call fold_nonconst_dupq.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/aarch64/sve/acle/general/dupq_11.c: New test.

OK, thanks.

Richard

> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index 95b4cb8a943..9010ecca6da 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -817,6 +817,52 @@ public:
>  
>  class svdupq_impl : public quiet
>  {
> +private:
> +  gimple *
> +  fold_nonconst_dupq (gimple_folder &f) const
> +  {
> +/* Lower lhs = svdupq (arg0, arg1, ..., argN} into:
> +   tmp = {arg0, arg1, ..., arg}
> +   lhs = VEC_PERM_EXPR (tmp, tmp, {0, 1, 2, N-1, ...})  */
> +
> +if (f.type_suffix (0).bool_p
> + || BYTES_BIG_ENDIAN)
> +  return NULL;
> +
> +tree lhs = gimple_call_lhs (f.call);
> +tree lhs_type = TREE_TYPE (lhs);
> +tree elt_type = TREE_TYPE (lhs_type);
> +scalar_mode elt_mode = SCALAR_TYPE_MODE (elt_type);
> +machine_mode vq_mode = aarch64_vq_mode (elt_mode).require ();
> +tree vq_type = build_vector_type_for_mode (elt_type, vq_mode);
> +
> +unsigned nargs = gimple_call_num_args (f.call);
> +vec *v;
> +vec_alloc (v, nargs);
> +for (unsigned i = 0; i < nargs; i++)
> +  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, gimple_call_arg (f.call, i));
> +tree vec = build_constructor (vq_type, v);
> +tree tmp = make_ssa_name_fn (cfun, vq_type, 0);
> +gimple *g = gimple_build_assign (tmp, vec);
> +
> +gimple_seq stmts = NULL;
> +gimple_seq_add_stmt_without_update (&stmts, g);
> +
> +poly_uint64 lhs_len = TYPE_VECTOR_SUBPARTS (lhs_type);
> +vec_perm_builder sel (lhs_len, nargs, 1);
> +for (unsigned i = 0; i < nargs; i++)
> +  sel.quick_push (i);
> +
> +vec_perm_indices indices (sel, 1, nargs);
> +tree mask_type = build_vector_type (ssizetype, lhs_len);
> +tree mask = vec_perm_indices_to_tree (mask_type, indices);
> +
> +gimple *g2 = gimple_build_assign (lhs, VEC_PERM_EXPR, tmp, tmp, mask);
> +gimple_seq_add_stmt_without_update (&stmts, g2);
> +gsi_replace_with_seq (f.gsi, stmts, false);
> +return g2;
> +  }
> +
>  public:
>gimple *
>fold (gimple_folder &f) const override
> @@ -832,7 +878,7 @@ public:
>{
>   tree elt = gimple_call_arg (f.call, i);
>   if (!CONSTANT_CLASS_P (elt))
> -   return NULL;
> +   return fold_nonconst_dupq (f);
>   builder.quick_push (elt);
>   for (unsigned int j = 1; j < factor; ++j)
> builder.quick_push (build_zero_cst (TREE_TYPE (vec_type)));
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_11.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_11.c
> new file mode 100644
> index 000..f19f8deb1e5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_11.c
> @@ -0,0 +1,31 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-optimized" } */
> +
> +#include 
> +#include 
> +
> +svint8_t f_s8(int8x16_t x)
> +{
> +  return svdupq_s8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
> + x[8], x[9], x[10], x[11], x[12], x[13], x[14], x[15]);
> +}
> +
> +svint16_t f_s16(int16x8_t x)
> +{
> +  return svdupq_s16 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7]);
> +}
> +
> +svint32_t f_s32(int32x4_t x)
> +{
> +  return svdupq_s32 (x[0], x[1], x[2], x[3]);
> +}
> +
> +svint64_t f_s64(int64x2_t x)
> +{
> +  return svdupq_s64 (x[0], x[1]);
> +}
> +
> +/* { dg-final { scan-tree-dump "VEC_PERM_EXPR" "optimized" } } */
> +/* { dg-final { scan-tree-dump-not "svdupq" "optimized" } } */
> +
> +/* { dg-final { scan-assembler-times {\tdup\tz[0-9]+\.q, z[0-9]+\.q\[0\]\n} 
> 4 } } */


[x86 PATCH] Fix FAIL of gcc.target/i386/pr78794.c on ia32.

2023-06-27 Thread Roger Sayle

This patch fixes the FAIL of gcc.target/i386/pr78794.c on ia32, which
is caused by minor STV rtx_cost differences with -march=silvermont.
It turns out that generic tuning results in pandn, but the lack of
accurate parameterization for COMPARE in compute_convert_gain combined
with small differences in scalar<->SSE costs on silvermont results in
this DImode chain not being converted.

The solution is to provide more accurate costs/gains for converting
(DImode and SImode) comparisons.

I'd been holding off of doing this as I'd thought it would be possible
to turn pandn;ptestz into ptestc (for an even bigger scalar-to-vector
win) but I've recently realized that these optimizations (as I've
implemented them) occur in the wrong order (stv2 occurs after
combine), so it isn't easy for STV to convert CCZmode into CCCmode.
Doh!  Perhaps something can be done in peephole2...


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-27  Roger Sayle  

gcc/ChangeLog
PR target/78794
* config/i386/i386-features.cc (compute_convert_gain): Provide
more accurate gains for conversion of scalar comparisons to
PTEST.


Thanks for your patience.
Roger
--

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 4a3b07a..53bec08 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -631,7 +631,31 @@ general_scalar_chain::compute_convert_gain ()
break;
 
  case COMPARE:
-   /* Assume comparison cost is the same.  */
+   if (XEXP (src, 1) != const0_rtx)
+ {
+   /* cmp vs. pxor;pshufd;ptest.  */
+   igain += COSTS_N_INSNS (m - 3);
+ }
+   else if (GET_CODE (XEXP (src, 0)) != AND)
+ {
+   /* test vs. pshufd;ptest.  */
+   igain += COSTS_N_INSNS (m - 2);
+ }
+   else if (GET_CODE (XEXP (XEXP (src, 0), 0)) != NOT)
+ {
+   /* and;test vs. pshufd;ptest.  */
+   igain += COSTS_N_INSNS (2 * m - 2);
+ }
+   else if (TARGET_BMI)
+ {
+   /* andn;test vs. pandn;pshufd;ptest.  */
+   igain += COSTS_N_INSNS (2 * m - 3);
+ }
+   else
+ {
+   /* not;and;test vs. pandn;pshufd;ptest.  */
+   igain += COSTS_N_INSNS (3 * m - 3);
+ }
break;
 
  case CONST_INT:


[PATCH V3, rs6000] Disable generation of scalar modulo instructions

2023-06-27 Thread Pat Haugen via Gcc-patches
Updated from prior version to address review comments (update 
rs6000_rtx_cost,

update scan strings of mod-1.c/mod-2.c)l.

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-27  Pat Haugen  

gcc/
* config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
scalar modulo.
* config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
* config/rs6000/rs6000.md (mod3, *mod3): Disable.
(define_expand umod3): New.
(define_insn umod3): Rename to *umod3 and disable.
(umodti3, modti3): Disable.

gcc/testsuite/
* gcc.target/powerpc/clone1.c: Add xfails.
* gcc.target/powerpc/clone3.c: Likewise.
* gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
* gcc.target/powerpc/mod-2.c: Likewise.
* gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.


diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 546c353029b..2dae217bf64 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22127,7 +22127,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,

*total = rs6000_cost->divsi;
}
   /* Add in shift and subtract for MOD unless we have a mod 
instruction. */

-  if (!TARGET_MODULO && (code == MOD || code == UMOD))
+  if ((!TARGET_MODULO
+  || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode)))
+&& (code == MOD || code == UMOD))
*total += COSTS_N_INSNS (2);
   return false;

diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..22595f6ebd7 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions due to performance 
issues

+   with certain input values.  This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..6c2f237a539 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
FAIL;

   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])

+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+   FAIL;

-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})
+
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
@@ -3507,7 +3543,7 @@ (define_insn "umodti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
(umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
 (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
   "vmoduq %0,%1,%2"
   [(set_attr "type" "vecdiv")
(set

Re: [PATCH V3, rs6000] Disable generation of scalar modulo instructions

2023-06-27 Thread Pat Haugen via Gcc-patches

On 6/27/23 1:52 PM, Pat Haugen via Gcc-patches wrote:
Updated from prior version to address review comments (update 
rs6000_rtx_cost,

update scan strings of mod-1.c/mod-2.c)l.

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-27  Pat Haugen  

gcc/
 * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
 scalar modulo.
 * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
 * config/rs6000/rs6000.md (mod3, *mod3): Disable.
 (define_expand umod3): New.
 (define_insn umod3): Rename to *umod3 and disable.
 (umodti3, modti3): Disable.

gcc/testsuite/
 * gcc.target/powerpc/clone1.c: Add xfails.
 * gcc.target/powerpc/clone3.c: Likewise.
 * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
 * gcc.target/powerpc/mod-2.c: Likewise.
 * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.



Attaching patch since my mailer apparently messed up some formatting again.

-Pat
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 546c353029b..2dae217bf64 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22127,7 +22127,9 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
*total = rs6000_cost->divsi;
}
   /* Add in shift and subtract for MOD unless we have a mod instruction. */
-  if (!TARGET_MODULO && (code == MOD || code == UMOD))
+  if ((!TARGET_MODULO
+  || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode)))
+&& (code == MOD || code == UMOD))
*total += COSTS_N_INSNS (2);
   return false;
 
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..22595f6ebd7 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@ while (0)
rs6000_asm_output_opcode (STREAM);  \
 }  \
   while (0)
+
+/* Disable generation of scalar modulo instructions due to performance issues
+   with certain input values.  This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index b0db8ae508d..6c2f237a539 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3421,6 +3421,17 @@ (define_expand "mod3"
FAIL;
 
   operands[2] = force_reg (mode, operands[2]);
+
+  if (RS6000_DISABLE_SCALAR_MODULO)
+   {
+ temp1 = gen_reg_rtx (mode);
+ temp2 = gen_reg_rtx (mode);
+
+ emit_insn (gen_div3 (temp1, operands[1], operands[2]));
+ emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+ emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+ DONE;
+   }
 }
   else
 {
@@ -3440,17 +3451,42 @@ (define_insn "*mod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r,r")
 (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
 
+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+   (umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+ (match_operand:GPR 2 "gpc_reg_operand")))]
+  ""
+{
+  rtx temp1;
+  rtx temp2;
+
+  if (!TARGET_MODULO)
+   FAIL;
 
-(define_insn "umod3"
+  if (RS6000_DISABLE_SCALAR_MODULO)
+{
+  temp1 = gen_reg_rtx (mode);
+  temp2 = gen_reg_rtx (mode);
+
+  emit_insn (gen_udiv3 (temp1, operands[1], operands[2]));
+  emit_insn (gen_mul3 (temp2, temp1, operands[2]));
+  emit_insn (gen_sub3 (operands[0], operands[1], temp2));
+  DONE;
+}
+})
+
+(define_insn "*umod3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r,r")
 (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
(set_attr "size" "")])
@@ -3507,7 +3543,7 @@ (define_insn "umodti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
(umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
 (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POW

Re: [SVE] Fold svdupq to VEC_PERM_EXPR if elements are not constant

2023-06-27 Thread Prathamesh Kulkarni via Gcc-patches
On Wed, 28 Jun 2023 at 00:05, Richard Sandiford
 wrote:
>
> Prathamesh Kulkarni  writes:
> > Hi Richard,
> > Sorry I forgot to commit this patch, which you had approved in:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615308.html
> >
> > Just for context for the following test:
> > svint32_t f_s32(int32x4_t x)
> > {
> >   return svdupq_s32 (x[0], x[1], x[2], x[3]);
> > }
> >
> > -O3 -mcpu=generic+sve generates following code after interleave+zip1 patch:
> > f_s32:
> > dup s31, v0.s[1]
> > mov v30.8b, v0.8b
> > ins v31.s[1], v0.s[3]
> > ins v30.s[1], v0.s[2]
> > zip1v0.4s, v30.4s, v31.4s
> > dup z0.q, z0.q[0]
> > ret
> >
> > Code-gen with attached patch:
> > f_s32:
> > dup z0.q, z0.q[0]
> > ret
> >
> > Bootstrapped+tested on aarch64-linux-gnu.
> > OK to commit ?
> >
> > Thanks,
> > Prathamesh
> >
> > [SVE] Fold svdupq to VEC_PERM_EXPR if elements are not constant.
> >
> > gcc/ChangeLog:
> > * config/aarch64/aarch64-sve-builtins-base.cc
> > (svdupq_impl::fold_nonconst_dupq): New method.
> > (svdupq_impl::fold): Call fold_nonconst_dupq.
> >
> > gcc/testsuite/ChangeLog:
> > * gcc.target/aarch64/sve/acle/general/dupq_11.c: New test.
>
> OK, thanks.
Thanks, pushed to trunk in 231f6b56c77c50f337f2529b3ae51e2083ce461d

Thanks,
Prathamesh
>
> Richard
>
> > diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> > b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > index 95b4cb8a943..9010ecca6da 100644
> > --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> > @@ -817,6 +817,52 @@ public:
> >
> >  class svdupq_impl : public quiet
> >  {
> > +private:
> > +  gimple *
> > +  fold_nonconst_dupq (gimple_folder &f) const
> > +  {
> > +/* Lower lhs = svdupq (arg0, arg1, ..., argN} into:
> > +   tmp = {arg0, arg1, ..., arg}
> > +   lhs = VEC_PERM_EXPR (tmp, tmp, {0, 1, 2, N-1, ...})  */
> > +
> > +if (f.type_suffix (0).bool_p
> > + || BYTES_BIG_ENDIAN)
> > +  return NULL;
> > +
> > +tree lhs = gimple_call_lhs (f.call);
> > +tree lhs_type = TREE_TYPE (lhs);
> > +tree elt_type = TREE_TYPE (lhs_type);
> > +scalar_mode elt_mode = SCALAR_TYPE_MODE (elt_type);
> > +machine_mode vq_mode = aarch64_vq_mode (elt_mode).require ();
> > +tree vq_type = build_vector_type_for_mode (elt_type, vq_mode);
> > +
> > +unsigned nargs = gimple_call_num_args (f.call);
> > +vec *v;
> > +vec_alloc (v, nargs);
> > +for (unsigned i = 0; i < nargs; i++)
> > +  CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, gimple_call_arg (f.call, i));
> > +tree vec = build_constructor (vq_type, v);
> > +tree tmp = make_ssa_name_fn (cfun, vq_type, 0);
> > +gimple *g = gimple_build_assign (tmp, vec);
> > +
> > +gimple_seq stmts = NULL;
> > +gimple_seq_add_stmt_without_update (&stmts, g);
> > +
> > +poly_uint64 lhs_len = TYPE_VECTOR_SUBPARTS (lhs_type);
> > +vec_perm_builder sel (lhs_len, nargs, 1);
> > +for (unsigned i = 0; i < nargs; i++)
> > +  sel.quick_push (i);
> > +
> > +vec_perm_indices indices (sel, 1, nargs);
> > +tree mask_type = build_vector_type (ssizetype, lhs_len);
> > +tree mask = vec_perm_indices_to_tree (mask_type, indices);
> > +
> > +gimple *g2 = gimple_build_assign (lhs, VEC_PERM_EXPR, tmp, tmp, mask);
> > +gimple_seq_add_stmt_without_update (&stmts, g2);
> > +gsi_replace_with_seq (f.gsi, stmts, false);
> > +return g2;
> > +  }
> > +
> >  public:
> >gimple *
> >fold (gimple_folder &f) const override
> > @@ -832,7 +878,7 @@ public:
> >{
> >   tree elt = gimple_call_arg (f.call, i);
> >   if (!CONSTANT_CLASS_P (elt))
> > -   return NULL;
> > +   return fold_nonconst_dupq (f);
> >   builder.quick_push (elt);
> >   for (unsigned int j = 1; j < factor; ++j)
> > builder.quick_push (build_zero_cst (TREE_TYPE (vec_type)));
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_11.c 
> > b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_11.c
> > new file mode 100644
> > index 000..f19f8deb1e5
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/dupq_11.c
> > @@ -0,0 +1,31 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -fdump-tree-optimized" } */
> > +
> > +#include 
> > +#include 
> > +
> > +svint8_t f_s8(int8x16_t x)
> > +{
> > +  return svdupq_s8 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7],
> > + x[8], x[9], x[10], x[11], x[12], x[13], x[14], x[15]);
> > +}
> > +
> > +svint16_t f_s16(int16x8_t x)
> > +{
> > +  return svdupq_s16 (x[0], x[1], x[2], x[3], x[4], x[5], x[6], x[7]);
> > +}
> > +
> > +svint32_t f_s32(int32x4_t x)
> > +{
> > +  return svdupq_s32 (x[0], x[1], x[2], x[3]);
> > +}
> > +
> > +svint64_t f_s64(int64x2_t x)
> > +{
> > +  return svdupq_s64 (x[0], x[1]);
> > +}
> > +
> > +/*

Re: [Patch, fortran] PR49213 - [OOP] gfortran rejects structure constructor expression

2023-06-27 Thread Harald Anlauf via Gcc-patches

Hi Paul,

this is much better now.

I have only a minor comment left: in the calculation of the
size of a character string you are using an intermediate
gfc_array_index_type, whereas I have learned to use
gfc_charlen_type_node now, which seems like the natural
type here.

OK for trunk, and thanks for your patience!

Harald


On 6/27/23 12:30, Paul Richard Thomas via Gcc-patches wrote:

Hi Harald,

Let's try again :-)

OK for trunk?

Regards

Paul

Fortran: Enable class expressions in structure constructors [PR49213]

2023-06-27  Paul Thomas  

gcc/fortran
PR fortran/49213
* expr.cc (gfc_is_ptr_fcn): Remove reference to class_pointer.
* resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow
associate names with pointer function targets to be used in
variable definition context.
* trans-decl.cc (get_symbol_decl): Remove extraneous line.
* trans-expr.cc (alloc_scalar_allocatable_subcomponent): Obtain
size of intrinsic and character expressions.
(gfc_trans_subcomponent_assign): Expand assignment to class
components to include intrinsic and character expressions.

gcc/testsuite/
PR fortran/49213
* gfortran.dg/pr49213.f90 : New test

On Sat, 24 Jun 2023 at 20:50, Harald Anlauf  wrote:


Hi Paul!

On 6/24/23 15:18, Paul Richard Thomas via Gcc-patches wrote:

I have included the adjustment to 'gfc_is_ptr_fcn' and eliminating the
extra blank line, introduced by my last patch. I played safe and went
exclusively for class functions with attr.class_pointer set on the
grounds that these have had all the accoutrements checked and built
(ie. class_ok). I am still not sure if this is necessary or not.


maybe it is my fault, but I find the version in the patch confusing:

@@ -816,7 +816,7 @@ bool
   gfc_is_ptr_fcn (gfc_expr *e)
   {
 return e != NULL && e->expr_type == EXPR_FUNCTION
- && (gfc_expr_attr (e).pointer
+ && ((e->ts.type != BT_CLASS && gfc_expr_attr (e).pointer)
|| (e->ts.type == BT_CLASS
&& CLASS_DATA (e)->attr.class_pointer));
   }

The caller 'gfc_is_ptr_fcn' has e->expr_type == EXPR_FUNCTION, so
gfc_expr_attr (e) boils down to:

if (e->value.function.esym && e->value.function.esym->result)
 {
   gfc_symbol *sym = e->value.function.esym->result;
   attr = sym->attr;
   if (sym->ts.type == BT_CLASS && sym->attr.class_ok)
 {
   attr.dimension = CLASS_DATA (sym)->attr.dimension;
   attr.pointer = CLASS_DATA (sym)->attr.class_pointer;
   attr.allocatable = CLASS_DATA (sym)->attr.allocatable;
 }
 }
...
else if (e->symtree)
 attr = gfc_variable_attr (e, NULL);

So I thought this should already do what you want if you do

gfc_is_ptr_fcn (gfc_expr *e)
{
return e != NULL && e->expr_type == EXPR_FUNCTION && gfc_expr_attr
(e).pointer;
}

or what am I missing?  The additional checks in gfc_expr_attr are
there to avoid ICEs in case CLASS_DATA (sym) has issues, and we all
know Gerhard who showed that he is an expert in exploiting this.

To sum up, I'd prefer to use the safer form if it works.  If it
doesn't, I would expect a latent issue.

The rest of the code looked good to me, but I was suspicious about
the handling of CHARACTER.

Nasty as I am, I modified the testcase to use character(kind=4)
instead of kind=1 (see attached).  This either fails here (stop 10),
or if I activate the marked line

!cont = tContainer('hello!')   ! ### ICE! ###

I get an ICE.

Can you have another look?

Thanks,
Harald






OK for trunk?

Paul

Fortran: Enable class expressions in structure constructors [PR49213]

2023-06-24  Paul Thomas  

gcc/fortran
PR fortran/49213
* expr.cc (gfc_is_ptr_fcn): Guard pointer attribute to exclude
class expressions.
* resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow
associate names with pointer function targets to be used in
variable definition context.
* trans-decl.cc (get_symbol_decl): Remove extraneous line.
* trans-expr.cc (alloc_scalar_allocatable_subcomponent): Obtain
size of intrinsic and character expressions.
(gfc_trans_subcomponent_assign): Expand assignment to class
components to include intrinsic and character expressions.

gcc/testsuite/
PR fortran/49213
* gfortran.dg/pr49213.f90 : New test








[x86 PATCH] Tweak ix86_expand_int_compare to use PTEST for vector equality.

2023-06-27 Thread Roger Sayle
 

Hi Uros,

 

Hopefully Hongtao will approve my patch to support SUBREG conversions

in STV https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html

but for some of the examples described in the above post (and its test

case), I've also come up with an alternate/complementary/supplementary

fix of generating the PTEST during RTL expansion, rather than rely on

this being caught/optimized later during STV.

 

You may notice in this patch, the tests for TARGET_SSE4_1 and TImode

appear last.  When I was writing this, I initially also added support

for AVX VPTEST and OImode, before realizing that x86 doesn't (yet)

support 256-bit OImode (which also explains why we don't have an OImode

to V1OImode scalar-to-vector pass).  Retaining this clause ordering

should minimize the lines changed if things change in future.

 

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap

and make -k check, both with and without --target_board=unix{-m32}

with no new failures.  Ok for mainline?

 

 

2023-06-27  Roger Sayle  

 

gcc/ChangeLog

* config/i386/i386-expand.cc (ix86_expand_int_compare): If

testing a TImode SUBREG of a 128-bit vector register against

zero, use a PTEST instruction instead of first moving it to

to scalar registers.

 

 

Please let me know what you think.

Roger

--

 

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 4a3b07a..53bec08 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -631,7 +631,31 @@ general_scalar_chain::compute_convert_gain ()
break;
 
  case COMPARE:
-   /* Assume comparison cost is the same.  */
+   if (XEXP (src, 1) != const0_rtx)
+ {
+   /* cmp vs. pxor;pshufd;ptest.  */
+   igain += COSTS_N_INSNS (m - 3);
+ }
+   else if (GET_CODE (XEXP (src, 0)) != AND)
+ {
+   /* test vs. pshufd;ptest.  */
+   igain += COSTS_N_INSNS (m - 2);
+ }
+   else if (GET_CODE (XEXP (XEXP (src, 0), 0)) != NOT)
+ {
+   /* and;test vs. pshufd;ptest.  */
+   igain += COSTS_N_INSNS (2 * m - 2);
+ }
+   else if (TARGET_BMI)
+ {
+   /* andn;test vs. pandn;pshufd;ptest.  */
+   igain += COSTS_N_INSNS (2 * m - 3);
+ }
+   else
+ {
+   /* not;and;test vs. pandn;pshufd;ptest.  */
+   igain += COSTS_N_INSNS (3 * m - 3);
+ }
break;
 
  case CONST_INT:


RE: [x86 PATCH] Tweak ix86_expand_int_compare to use PTEST for vector equality.

2023-06-27 Thread Roger Sayle

Doh! Wrong patch...
Roger
--

From: Roger Sayle  
Sent: 27 June 2023 20:28
To: 'gcc-patches@gcc.gnu.org' 
Cc: 'Uros Bizjak' ; 'Hongtao Liu' 
Subject: [x86 PATCH] Tweak ix86_expand_int_compare to use PTEST for vector
equality.


Hi Uros,

Hopefully Hongtao will approve my patch to support SUBREG conversions
in STV https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html
but for some of the examples described in the above post (and its test
case), I've also come up with an alternate/complementary/supplementary
fix of generating the PTEST during RTL expansion, rather than rely on
this being caught/optimized later during STV.

You may notice in this patch, the tests for TARGET_SSE4_1 and TImode
appear last.  When I was writing this, I initially also added support
for AVX VPTEST and OImode, before realizing that x86 doesn't (yet)
support 256-bit OImode (which also explains why we don't have an OImode
to V1OImode scalar-to-vector pass).  Retaining this clause ordering
should minimize the lines changed if things change in future.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-27  Roger Sayle  

gcc/ChangeLog
    * config/i386/i386-expand.cc (ix86_expand_int_compare): If
    testing a TImode SUBREG of a 128-bit vector register against
    zero, use a PTEST instruction instead of first moving it to
    to scalar registers.


Please let me know what you think.
Roger
--

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 9a8d244..814d63b 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -2958,9 +2958,26 @@ ix86_expand_int_compare (enum rtx_code code, rtx op0, 
rtx op1)
   cmpmode = SELECT_CC_MODE (code, op0, op1);
   flags = gen_rtx_REG (cmpmode, FLAGS_REG);
 
+  /* Attempt to use PTEST, if available, when testing vector modes for
+ equality/inequality against zero.  */
+  if (op1 == const0_rtx
+  && SUBREG_P (op0)
+  && cmpmode == CCZmode
+  && SUBREG_BYTE (op0) == 0
+  && REG_P (SUBREG_REG (op0))
+  && VECTOR_MODE_P (GET_MODE (SUBREG_REG (op0)))
+  && TARGET_SSE4_1
+  && GET_MODE (op0) == TImode
+  && GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))) == 16)
+{
+  tmp = SUBREG_REG (op0);
+  tmp = gen_rtx_UNSPEC (CCZmode, gen_rtvec (2, tmp, tmp), UNSPEC_PTEST);
+}
+  else
+tmp = gen_rtx_COMPARE (cmpmode, op0, op1);
+
   /* This is very simple, but making the interface the same as in the
  FP case makes the rest of the code easier.  */
-  tmp = gen_rtx_COMPARE (cmpmode, op0, op1);
   emit_insn (gen_rtx_SET (flags, tmp));
 
   /* Return the test that should be put into the flags user, i.e.


Re: [x86 PATCH] Fix FAIL of gcc.target/i386/pr78794.c on ia32.

2023-06-27 Thread Uros Bizjak via Gcc-patches
On Tue, Jun 27, 2023 at 8:40 PM Roger Sayle  wrote:
>
>
> This patch fixes the FAIL of gcc.target/i386/pr78794.c on ia32, which
> is caused by minor STV rtx_cost differences with -march=silvermont.
> It turns out that generic tuning results in pandn, but the lack of
> accurate parameterization for COMPARE in compute_convert_gain combined
> with small differences in scalar<->SSE costs on silvermont results in
> this DImode chain not being converted.
>
> The solution is to provide more accurate costs/gains for converting
> (DImode and SImode) comparisons.
>
> I'd been holding off of doing this as I'd thought it would be possible
> to turn pandn;ptestz into ptestc (for an even bigger scalar-to-vector
> win) but I've recently realized that these optimizations (as I've
> implemented them) occur in the wrong order (stv2 occurs after
> combine), so it isn't easy for STV to convert CCZmode into CCCmode.
> Doh!  Perhaps something can be done in peephole2...
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-06-27  Roger Sayle  
>
> gcc/ChangeLog
> PR target/78794
> * config/i386/i386-features.cc (compute_convert_gain): Provide
> more accurate gains for conversion of scalar comparisons to
> PTEST.

LGTM.

Thanks,
Uros.

>
> Thanks for your patience.
> Roger
> --
>


Re: [x86 PATCH] Add cbranchti4 pattern to i386.md (for -m32 compare_by_pieces).

2023-06-27 Thread Uros Bizjak via Gcc-patches
On Tue, Jun 27, 2023 at 7:22 PM Roger Sayle  wrote:
>
>
> This patch fixes some very odd (unanticipated) code generation by
> compare_by_pieces with -m32 -mavx, since the recent addition of the
> cbranchoi4 pattern.  The issue is that cbranchoi4 is available with
> TARGET_AVX, but cbranchti4 is currently conditional on TARGET_64BIT
> which results in the odd behaviour (thanks to OPTAB_WIDEN) that with
> -m32 -mavx, compare_by_pieces ends up (inefficiently) widening 128-bit
> comparisons to 256-bits before performing PTEST.
>
> This patch fixes this by providing a cbranchti4 pattern that's available
> with either TARGET_64BIT or TARGET_SSE4_1.
>
> For the test case below (again from PR 104610):
>
> int foo(char *a)
> {
> static const char t[] = "0123456789012345678901234567890";
> return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
> }
>
> GCC with -m32 -O2 -mavx currently produces the bonkers:
>
> foo:pushl   %ebp
> movl%esp, %ebp
> andl$-32, %esp
> subl$64, %esp
> movl8(%ebp), %eax
> vmovdqa .LC0, %xmm4
> movl$0, 48(%esp)
> vmovdqu (%eax), %xmm2
> movl$0, 52(%esp)
> movl$0, 56(%esp)
> movl$0, 60(%esp)
> movl$0, 16(%esp)
> movl$0, 20(%esp)
> movl$0, 24(%esp)
> movl$0, 28(%esp)
> vmovdqa %xmm2, 32(%esp)
> vmovdqa %xmm4, (%esp)
> vmovdqa (%esp), %ymm5
> vpxor   32(%esp), %ymm5, %ymm0
> vptest  %ymm0, %ymm0
> jne .L2
> vmovdqu 16(%eax), %xmm7
> movl$0, 48(%esp)
> movl$0, 52(%esp)
> vmovdqa %xmm7, 32(%esp)
> vmovdqa .LC1, %xmm7
> movl$0, 56(%esp)
> movl$0, 60(%esp)
> movl$0, 16(%esp)
> movl$0, 20(%esp)
> movl$0, 24(%esp)
> movl$0, 28(%esp)
> vmovdqa %xmm7, (%esp)
> vmovdqa (%esp), %ymm1
> vpxor   32(%esp), %ymm1, %ymm0
> vptest  %ymm0, %ymm0
> je  .L6
> .L2:movl$1, %eax
> xorl$1, %eax
> vzeroupper
> leave
> ret
> .L6:xorl%eax, %eax
> xorl$1, %eax
> vzeroupper
> leave
> ret
>
> with this patch, we now generate the (slightly) more sensible:
>
> foo:vmovdqa .LC0, %xmm0
> movl4(%esp), %eax
> vpxor   (%eax), %xmm0, %xmm0
> vptest  %xmm0, %xmm0
> jne .L2
> vmovdqa .LC1, %xmm0
> vpxor   16(%eax), %xmm0, %xmm0
> vptest  %xmm0, %xmm0
> je  .L5
> .L2:movl$1, %eax
> xorl$1, %eax
> ret
> .L5:xorl%eax, %eax
> xorl$1, %eax
> ret
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-06-27  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_expand_branch): Also use ptest
> for TImode comparisons on 32-bit architectures.
> * config/i386/i386.md (cbranch4): Change from SDWIM to
> SWIM1248x to exclude/avoid TImode being conditional on -m64.
> (cbranchti4): New define_expand for TImode on both TARGET_64BIT
> and/or with TARGET_SSE4_1.
> * config/i386/predicates.md (ix86_timode_comparison_operator):
> New predicate that depends upon TARGET_64BIT.
> (ix86_timode_comparison_operand): Likewise.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/pieces-memcmp-2.c: New test case.

OK with a small fix.

Thanks,
Uros.

+;; Return true if this is a valid second operand for a TImode comparison.
+(define_predicate "ix86_timode_comparison_operand"
+  (if_then_else (match_test "TARGET_64BIT")
+(match_operand 0 "x86_64_general_operand")
+(match_operand 0 "nonimmediate_operand")))
+
+

Please remove the duplicate blank line above.


[PATCH] Fix collection and processing of autoprofile data for target libs

2023-06-27 Thread Eugene Rozenfeld via Gcc-patches
cc1, cc1plus, and lto built during STAGEautoprofile need to be built with
debug info since they are used to build target libs. -gtoggle was
turning off debug info for this stage.

create_gcov should be passed prev-gcc/cc1, prev-gcc/cc1plus, and prev-gcc/lto
instead of stage1-gcc/cc1, stage1-gcc/cc1plus, and stage1-gcc/lto when
processing profile data collected while building target libraries.

Tested on x86_64-pc-linux-gnu.

ChangeLog:

* Makefile.in: Remove -gtoggle for STAGEautoprofile
* Makefile.tpl: Remove -gtoggle for STAGEautoprofile

gcc/c/ChangeLog:

* c/Make-lang.in: Pass correct stage cc1 when processing
profile data collected while building target libraries

gcc/cp/ChangeLog:

* cp/Make-lang.in: Pass correct stage cc1plus when processing
profile data collected while building target libraries

gcc/lto/ChangeLog:

* lto/Make-lang.in: Pass correct stage lto when processing
profile data collected while building target libraries
---
 Makefile.in  | 2 +-
 Makefile.tpl | 2 +-
 gcc/c/Make-lang.in   | 4 ++--
 gcc/cp/Make-lang.in  | 4 ++--
 gcc/lto/Make-lang.in | 2 +-
 5 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/Makefile.in b/Makefile.in
index b559454cc90..61e5faf550f 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -635,7 +635,7 @@ STAGEtrain_TFLAGS = $(filter-out 
-fchecking=1,$(STAGE3_TFLAGS))
 STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
-fprofile-reproducible=parallel-runs
 STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
 
-STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g
+STAGEautoprofile_CFLAGS = $(filter-out -gtoggle,$(STAGE2_CFLAGS)) -g
 STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
 
 STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS)
diff --git a/Makefile.tpl b/Makefile.tpl
index 6bcee3021c9..3a5b7ed3c92 100644
--- a/Makefile.tpl
+++ b/Makefile.tpl
@@ -558,7 +558,7 @@ STAGEtrain_TFLAGS = $(filter-out 
-fchecking=1,$(STAGE3_TFLAGS))
 STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
-fprofile-reproducible=parallel-runs
 STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
 
-STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g
+STAGEautoprofile_CFLAGS = $(filter-out -gtoggle,$(STAGE2_CFLAGS)) -g
 STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
 
 STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS)
diff --git a/gcc/c/Make-lang.in b/gcc/c/Make-lang.in
index 20840aceab6..79bc0dfd1cf 100644
--- a/gcc/c/Make-lang.in
+++ b/gcc/c/Make-lang.in
@@ -113,10 +113,10 @@ create_fdas_for_cc1: ../stage1-gcc/cc1$(exeext) 
../prev-gcc/$(PERF_DATA)
  echo $$perf_path; \
  if [ -f $$perf_path ]; then \
profile_name=cc1_$$component_in_prev_target.fda; \
-   $(CREATE_GCOV) -binary ../stage1-gcc/cc1$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
+   $(CREATE_GCOV) -binary ../prev-gcc/cc1$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
  fi; \
done;
-#

+#
 # Build hooks:
 
 c.info:
diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
index c08ee91447e..ba5e8766e99 100644
--- a/gcc/cp/Make-lang.in
+++ b/gcc/cp/Make-lang.in
@@ -211,10 +211,10 @@ create_fdas_for_cc1plus: ../stage1-gcc/cc1plus$(exeext) 
../prev-gcc/$(PERF_DATA)
  echo $$perf_path; \
  if [ -f $$perf_path ]; then \
profile_name=cc1plus_$$component_in_prev_target.fda; \
-   $(CREATE_GCOV) -binary ../stage1-gcc/cc1plus$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
+   $(CREATE_GCOV) -binary ../prev-gcc/cc1plus$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
  fi; \
done;
-#

+#
 # Build hooks:
 
 c++.all.cross: g++-cross$(exeext)
diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in
index 4f6025100a3..98aa9f4cc39 100644
--- a/gcc/lto/Make-lang.in
+++ b/gcc/lto/Make-lang.in
@@ -130,7 +130,7 @@ create_fdas_for_lto1: ../stage1-gcc/lto1$(exeext) 
../prev-gcc/$(PERF_DATA)
  echo $$perf_path; \
  if [ -f $$perf_path ]; then \
profile_name=lto1_$$component_in_prev_target.fda; \
-   $(CREATE_GCOV) -binary ../stage1-gcc/lto1$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
+   $(CREATE_GCOV) -binary ../prev-gcc/lto1$(exeext) -gcov 
$$profile_name -profile $$perf_path -gcov_version 2; \
  fi; \
done;
 
-- 
2.25.1



Re: [PATCH v3] Introduce strub: machine-independent stack scrubbing

2023-06-27 Thread Qing Zhao via Gcc-patches
Hi, Alexandre,

Thanks a lot for the work. I think that this will be a valuable feature to be 
added for GCC’s security functionality. 

I have several questions on this patch:


1.  The implementation of register scrubbing,  -fzero-call-used-regs,  is to 
insert the register zeroing sequence in the routine’s epilogue,
So each routine will be responsible to clean its own call-clobbered registers 
before returning. 
 This is simple and straightforward, no change to the function’s interface.

 I am wondering why stack scrubbing, proposed in this patch series, cannot 
do the stack scrubbing in the routine’s epilogue similar as
register scrubbing?

 There are the following benefits from doing the stack scrubbing in the 
callee’s epilogue:
  A.  The size of the stack need to be cleaned is known by itself, no need 
to pass this information to other routines,
Therefore  functions' interface change can be avoided; no need to 
change the caller’s body, no need for cloning the callee, etc.
  B.   As a result, the runtime overhead of stack scrubbing should be 
reduced.
  C.  If we do the stack scrubbing in a very late stage and in the 
routine’s epilogue, similar as register scrubbing, we don’t need 
to deal with the complicated call-chain staff anymore, right?


   So, what’s the fundamental issues that stack scrubbing cannot be done by the 
routine itself but its caller?

2.  I have concerns on the runtime performance overhead, do you have any data 
on this for your current implementation?

3. You mentioned that there are several “modes” for this feature, could you 
please provide more details on the modes and their description?

thanks.

Qing



> On Jun 16, 2023, at 2:09 AM, Alexandre Oliva via Gcc-patches 
>  wrote:
> 
> 
> This patch adds the strub attribute for function and variable types,
> command-line options, passes and adjustments to implement it,
> documentation, and tests.
> 
> Stack scrubbing is implemented in a machine-independent way: functions
> with strub enabled are modified so that they take an extra stack
> watermark argument, that they update with their stack use, and the
> caller can then zero it out once it regains control, whether by return
> or exception.  There are two ways to go about it: at-calls, that
> modifies the visible interface (signature) of the function, and
> internal, in which the body is moved to a clone, the clone undergoes
> the interface change, and the function becomes a wrapper, preserving
> its original interface, that calls the clone and then clears the stack
> used by it.
> 
> Variables can also be annotated with the strub attribute, so that
> functions that read from them get stack scrubbing enabled implicitly,
> whether at-calls, for functions only usable within a translation unit,
> or internal, for functions whose interfaces must not be modified.
> 
> There is a strict mode, in which functions that have their stack
> scrubbed can only call other functions with stack-scrubbing
> interfaces, or those explicitly marked as callable from strub
> contexts, so that an entire call chain gets scrubbing, at once or
> piecemeal depending on optimization levels.  In the default mode,
> relaxed, this requirement is not enforced by the compiler.
> 
> The implementation adds two IPA passes, one that assigns strub modes
> early on, another that modifies interfaces and adds calls to the
> builtins that jointly implement stack scrubbing.  Another builtin,
> that obtains the stack pointer, is added for use in the implementation
> of the builtins, whether expanded inline or called in libgcc.
> 
> There are new command-line options to change operation modes and to
> force the feature disabled; it is enabled by default, but it has no
> effect and is implicitly disabled if the strub attribute is never
> used.  There are also options meant to use for testing the feature,
> enabling different strubbing modes for all (viable) functions.
> 
> Regstrapped on x86_64-linux-gnu.  Also tested with gcc-13, and with
> various other targets.  Ok to install?
> 
> There have been only minor changes since v2:
> 
> - scrub the stack in the same direction it grows, inline and out-of-line
> 
> - remove need for stack space in __strub_leave
> 
> - add (ultimately not needed) means to avoid using the red zone in
>  __strub_leave
> 
> - introduce and document TARGET_ macros to tune __strub_leave
> 
> - drop a misoptimization in inlined __strub_enter
> 
> - fix handling of cgraph edges without call stmts
> 
> - adjust some testcases (async stack uses; Ada compiler bug fix)
> 
> - drop bits for compatibility with gcc 10
> 
> - preserve the comdat group when resetting a function into a strub
>  wrapper, coping with a symtab_node::reset change in gcc-13
> 
> 
> for  gcc/ChangeLog
> 
>   * Makefile.in (OBJS): Add ipa-strub.o.
>   * builtins.def (BUILT_IN_STACK_ADDRESS): New.
>   (BUILT_IN___STRUB_ENTER): New.
>   (BUILT_IN___STRUB_UPDATE): New.
>   

[r14-2117 Regression] FAIL: gcc.dg/vect/slp-46.c scan-tree-dump-times vect "vectorizing stmts using SLP" 4 on Linux/x86_64

2023-06-27 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

dd86a5a69cbda40cf76388a65d3317c91cb2b501 is the first bad commit
commit dd86a5a69cbda40cf76388a65d3317c91cb2b501
Author: Richard Biener 
Date:   Thu Jun 22 11:40:46 2023 +0200

tree-optimization/96208 - SLP of non-grouped loads

caused

FAIL: gcc.dg/vect/slp-46.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 4
FAIL: gcc.dg/vect/slp-46.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 4

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2117/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/slp-46.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="vect.exp=gcc.dg/vect/slp-46.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: [PATCH] match.pd: Use element_mode instead of TYPE_MODE.

2023-06-27 Thread Andrew Pinski via Gcc-patches
On Tue, Jun 27, 2023 at 8:56 AM Robin Dapp via Gcc-patches
 wrote:
>
> > You can put it into the original one.
>
> Bootstrap and testsuite run were successful.
> I'm going to push the attached, thanks.

I am reducing a bug report which I think will be fixed by this change
(PR 110444). I will double check to see if this has fixed this issue
once I finished reducing it.
I will commit a testcase if this patch fixed the issue.

Thanks,
Andrew Pinski

>
> Regards
>  Robin
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 33ccda3e7b6..83bcefa914b 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -7454,10 +7454,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   values representable in the TYPE to be within the
>   range of normal values of ITYPE.  */
>   (if (element_precision (newtype) < element_precision (itype)
> +  && (!VECTOR_MODE_P (TYPE_MODE (newtype))
> +  || target_supports_op_p (newtype, op, optab_default))
>&& (flag_unsafe_math_optimizations
>|| (element_precision (newtype) == element_precision 
> (type)
> -  && real_can_shorten_arithmetic (TYPE_MODE (itype),
> -  TYPE_MODE (type))
> +  && real_can_shorten_arithmetic (element_mode 
> (itype),
> +  element_mode 
> (type))
>&& !excess_precision_type (newtype)))
>&& !types_match (itype, newtype))
>  (convert:type (op (convert:newtype @1)
>


Re: [PATCH] i386: Relax inline requirement for functions with different target attrs

2023-06-27 Thread Hongyu Wang via Gcc-patches
> I don't think this is desirable. If we inline something with different
> ISAs, we get some strange mix of ISAs when the function is inlined.
> OTOH - we already inline with mismatched tune flags if the function is
> marked with always_inline.

Previously ix86_can_inline_p has

if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
 != callee_opts->x_ix86_isa_flags)
|| ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
!= callee_opts->x_ix86_isa_flags2))
  ret = false;

It make sure caller ISA is a super set of callee, and the inlined one
should follow caller's ISA specification.

IMHO I cannot give a real example that after inline the caller's
performance get harmed, I added PVW since there might
be some callee want to limit its vector size and caller may have
larger preferred vector size. At least with current change
we get more optimization opportunity for different target_clones.

But I agree the tuning setting may be a factor that affect the
performance. One possible choice is that if the
tune for callee is unspecified or default, just inline it to the
caller with specified arch and tune.

Uros Bizjak via Gcc-patches  于2023年6月27日周二 17:16写道:



>
> On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang  wrote:
> >
> > Hi,
> >
> > For function with different target attributes, current logic rejects to
> > inline the callee when any arch or tune is mismatched. Relax the
> > condition to honor just prefer_vecotr_width_type and other flags that
> > may cause safety issue so caller can get more optimization opportunity.
>
> I don't think this is desirable. If we inline something with different
> ISAs, we get some strange mix of ISAs when the function is inlined.
> OTOH - we already inline with mismatched tune flags if the function is
> marked with always_inline.
>
> Uros.
>
> > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
> >
> > Ok for trunk?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or
> > tune directly, just check prefer_vector_width_type and make sure
> > not to inline if they mismatch.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/inline-target-attr.c: New test.
> > ---
> >  gcc/config/i386/i386.cc   | 11 +
> >  .../gcc.target/i386/inline-target-attr.c  | 24 +++
> >  2 files changed, 30 insertions(+), 5 deletions(-)
> >  create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c
> >
> > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > index 0761965344b..1d86384ac06 100644
> > --- a/gcc/config/i386/i386.cc
> > +++ b/gcc/config/i386/i386.cc
> > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee)
> >!= (callee_opts->x_target_flags & ~always_inline_safe_mask))
> >  ret = false;
> >
> > -  /* See if arch, tune, etc. are the same.  */
> > -  else if (caller_opts->arch != callee_opts->arch)
> > -ret = false;
> > -
> > -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > +  /* Do not inline when specified perfer-vector-width mismatched between
> > + callee and caller.  */
> > +  else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE
> > +  && caller_opts->x_prefer_vector_width_type != PVW_NONE)
> > +  && callee_opts->x_prefer_vector_width_type
> > + != caller_opts->x_prefer_vector_width_type)
> >  ret = false;
> >
> >else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c 
> > b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > new file mode 100644
> > index 000..995502165f0
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > @@ -0,0 +1,24 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +/* { dg-final { scan-assembler-not "call\[ \t\]callee" } } */
> > +
> > +__attribute__((target("arch=skylake")))
> > +int callee (int n)
> > +{
> > +  int sum = 0;
> > +  for (int i = 0; i < n; i++)
> > +{
> > +  if (i % 2 == 0)
> > +   sum +=i;
> > +  else
> > +   sum += (i - 1);
> > +}
> > +  return sum + n;
> > +}
> > +
> > +__attribute__((target("arch=icelake-server")))
> > +int caller (int n)
> > +{
> > +  return callee (n) + n;
> > +}
> > +
> > --
> > 2.31.1
> >


[PATCH V2] RISC-V: Fix bug of pre-calculated const vector mask

2023-06-27 Thread Juzhe-Zhong
This bug blocks the following patches.

GCC doesn't know RVV is using compact mask model.
Consider this following case:

#define N 16

int
main ()
{
  int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
  int8_t out[N] = {0};
  for (int8_t i = 0; i < N; ++i)
if (mask[i])
  out[i] = i;
  for (int8_t i = 0; i < N; ++i)
{
  if (mask[i])
assert (out[i] == i);
  else
assert (out[i] == 0);
}
}

Before this patch, the pre-calculated mask in constant memory pool:
.LC1:
.byte   68 > 0b01000100

This is incorrect, such case failed in execution.

After this patch:
.LC1:
.byte   10 > 0b1010

Pass on exection.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (rvv_builder::get_compact_mask): New function.
(expand_const_vector): Ditto.
* config/riscv/riscv.cc (riscv_const_insns): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c: New test.

---
 gcc/config/riscv/riscv-v.cc   | 64 +--
 gcc/config/riscv/riscv.cc |  6 ++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-1.c   | 23 +++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-2.c   | 23 +++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-3.c   | 23 +++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-4.c   | 23 +++
 .../riscv/rvv/autovec/vls-vlmax/bitmask-5.c   | 25 
 .../riscv/rvv/autovec/vls-vlmax/bitmask-6.c   | 27 
 .../riscv/rvv/autovec/vls-vlmax/bitmask-7.c   | 30 +
 .../riscv/rvv/autovec/vls-vlmax/bitmask-8.c   | 30 +
 .../riscv/rvv/autovec/vls-vlmax/bitmask-9.c   | 30 +
 11 files changed, 299 insertions(+), 5 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index adb8d7d36a5..5da0dc5e998 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -291,6 +291,7 @@ public:
 
   bool single_step_npatterns_p () const;
   bool npatterns_all_equal_p () const;
+  rtx get_compact_mask () const;
 
   machine_mode new_mode () const { return m_new_mode; }
   scalar_mode inner_mode () const { return m_inner_mode; }
@@ -505,6 +506,47 @@ rvv_builder::npatterns_all_equal_p () const
   return true;
 }
 
+/* Generate the compact mask.
+
+ E.g: mask = { 0, -1 }, mode = VNx2BI, bitsize = 128bits.
+
+ GCC by default will generate the mask = 0b0001x.
+
+ However, it's not expected mask for RVV since RVV
+ prefers the compact mask = 0b10x.
+*/
+rtx
+rvv_builder::get_compact_mask () const
+{
+  /* If TARGET_MIN_VLEN == 32, the minimum LMUL = 1/4.
+ Otherwise, the minimum LMUL = 1/8.  */
+  unsigned min_lmul = TARGET_MIN_VLEN == 32 ? 4 : 8;
+  unsigned min_container_size
+= BYTES_PER_RISCV_VECTOR.to_constant () / min_lmul;
+  unsigned container_size = MAX (CEIL (npatterns (), 8), min_container_size);
+  machine_mode container_mode
+= get_vector_mode (QImode, container_size).require ();
+
+  unsigned nunits = GET_MODE_NUNITS (container_mode).to_constant ();
+  rtvec v = rtvec_alloc (nunits);
+  for (unsigned i = 0; i < nunits; i++)
+RTVEC_ELT (v, i) = const0_rtx;
+
+  unsigned char b = 0;
+  for (unsigned i = 0; i < npatterns (); i++)
+{
+  if (INTVAL (elt (i)))
+   b = b | (1 << (i % 8));
+
+  if ((i > 0 && (i % 8) == 7) || (i == (npatterns () - 1)))
+   {
+ RTVEC_ELT (v, ((i + 7) / 8) - 1) = gen_int_mode (b, QImode);
+ b = 0;
+   }
+}
+  return gen_rtx_CONST_VECTOR (container_mode, v);
+}
+
 static unsigned
 get_sew (machine_mode mode)
 {
@@ -1141,11 +1183,23 @@ expand_const_vector (rtx target

Re: [PATCH] i386: Sync tune_string with arch_string for target attribute arch=*

2023-06-27 Thread Hongyu Wang via Gcc-patches
The testcase fails with --with-arch=native build on cascadelake, here
is the patch to adjust it

gcc/testsuite/ChangeLog:

* gcc.target/i386/mvc17.c: Add -march=x86-64 to dg-options.
---
 gcc/testsuite/gcc.target/i386/mvc17.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/i386/mvc17.c
b/gcc/testsuite/gcc.target/i386/mvc17.c
index 2c7cc2fdace..8b83c1aecb3 100644
--- a/gcc/testsuite/gcc.target/i386/mvc17.c
+++ b/gcc/testsuite/gcc.target/i386/mvc17.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-ifunc "" } */
-/* { dg-options "-O2" } */
+/* { dg-options "-O2 -march=x86-64" } */
 /* { dg-final { scan-assembler-times "rep mov" 1 } } */

 __attribute__((target_clones("default","arch=icelake-server")))
---

Will push it as an obvious fix, also will apply to the pending backports.

Hongyu Wang  于2023年6月27日周二 13:43写道:
>
> Thanks, I'll backport it down to GCC10 after this passed all 
> bootstrap/regtest.
>
> Uros Bizjak via Gcc-patches  于2023年6月26日周一 14:05写道:
> >
> > On Mon, Jun 26, 2023 at 4:31 AM Hongyu Wang  wrote:
> > >
> > > Hi,
> > >
> > > For function with target attribute arch=*, current logic will set its
> > > tune to -mtune from command line so all target_clones will get same
> > > tuning flags which would affect the performance for each clone. Override
> > > tune with arch if tune was not explicitly specified to get proper tuning
> > > flags for target_clones.
> > >
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
> > >
> > > Ok for trunk and backport to active release branches?
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386-options.cc (ix86_valid_target_attribute_tree):
> > > Override tune_string with arch_string if tune_string is not
> > > explicitly specified.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/mvc17.c: New test.
> >
> > LGTM.
> >
> > Thanks,
> > Uros.
> >
> > > ---
> > >  gcc/config/i386/i386-options.cc   |  6 +-
> > >  gcc/testsuite/gcc.target/i386/mvc17.c | 11 +++
> > >  2 files changed, 16 insertions(+), 1 deletion(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/mvc17.c
> > >
> > > diff --git a/gcc/config/i386/i386-options.cc 
> > > b/gcc/config/i386/i386-options.cc
> > > index 2cb0bddcd35..7f593cebe76 100644
> > > --- a/gcc/config/i386/i386-options.cc
> > > +++ b/gcc/config/i386/i386-options.cc
> > > @@ -1400,7 +1400,11 @@ ix86_valid_target_attribute_tree (tree fndecl, 
> > > tree args,
> > >if (option_strings[IX86_FUNCTION_SPECIFIC_TUNE])
> > > opts->x_ix86_tune_string
> > >   = ggc_strdup (option_strings[IX86_FUNCTION_SPECIFIC_TUNE]);
> > > -  else if (orig_tune_defaulted)
> > > +  /* If we have explicit arch string and no tune string specified, 
> > > set
> > > +tune_string to NULL and later it will be overriden by arch_string
> > > +so target clones can get proper optimization.  */
> > > +  else if (option_strings[IX86_FUNCTION_SPECIFIC_ARCH]
> > > +  || orig_tune_defaulted)
> > > opts->x_ix86_tune_string = NULL;
> > >
> > >/* If fpmath= is not set, and we now have sse2 on 32-bit, use it.  
> > > */
> > > diff --git a/gcc/testsuite/gcc.target/i386/mvc17.c 
> > > b/gcc/testsuite/gcc.target/i386/mvc17.c
> > > new file mode 100644
> > > index 000..2c7cc2fdace
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/mvc17.c
> > > @@ -0,0 +1,11 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-require-ifunc "" } */
> > > +/* { dg-options "-O2" } */
> > > +/* { dg-final { scan-assembler-times "rep mov" 1 } } */
> > > +
> > > +__attribute__((target_clones("default","arch=icelake-server")))
> > > +void
> > > +foo (char *a, char *b, int size)
> > > +{
> > > +  __builtin_memcpy (a, b, size & 0x7F);
> > > +}
> > > --
> > > 2.31.1
> > >


[PATCH] RISC-V: Support floating-point vfwadd/vfwsub vv/wv combine lowering

2023-06-27 Thread Juzhe-Zhong
gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Adapt expand.
* config/riscv/vector.md (@pred_single_widen_): 
Remove.
(@pred_single_widen_add): New pattern.
(@pred_single_widen_sub): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-1.c: Add floating-point.
* gcc.target/riscv/rvv/autovec/widen/widen-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-6.c: New test.

---
 .../riscv/riscv-vector-builtins-bases.cc  |  8 +++--
 gcc/config/riscv/vector.md| 31 ---
 .../riscv/rvv/autovec/widen/widen-1.c |  7 +++--
 .../riscv/rvv/autovec/widen/widen-2.c |  7 +++--
 .../riscv/rvv/autovec/widen/widen-5.c |  7 +++--
 .../riscv/rvv/autovec/widen/widen-6.c |  7 +++--
 .../rvv/autovec/widen/widen-complicate-1.c|  7 +++--
 .../rvv/autovec/widen/widen-complicate-2.c|  7 +++--
 .../riscv/rvv/autovec/widen/widen_run-1.c |  5 +--
 .../riscv/rvv/autovec/widen/widen_run-2.c |  5 +--
 .../riscv/rvv/autovec/widen/widen_run-5.c |  5 +--
 .../riscv/rvv/autovec/widen/widen_run-6.c |  5 +--
 .../rvv/autovec/widen/widen_run_zvfh-1.c  | 28 +
 .../rvv/autovec/widen/widen_run_zvfh-2.c  | 28 +
 .../rvv/autovec/widen/widen_run_zvfh-5.c  | 28 +
 .../rvv/autovec/widen/widen_run_zvfh-6.c  | 28 +
 16 files changed, 187 insertions(+), 26 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-6.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index fb74cb36ebd..f4a061a831b 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -390,8 +390,12 @@ public:
return e.use_exact_insn (
  code_for_pred_dual_widen_scalar (CODE, e.vector_mode ()));
   case OP_TYPE_wv:
-   return e.use_exact_insn (
- code_for_pred_single_widen (CODE, e.vector_mode ()));
+   if (CODE == PLUS)
+ return e.use_exact_insn (
+   code_for_pred_single_widen_add (e.vector_mode ()));
+   else
+ return e.use_exact_insn (
+   code_for_pred_single_widen_sub (e.vector_mode ()));
   case OP_TYPE_wf:
return e.use_exact_insn (
  code_for_pred_single_widen_scalar (CODE, e.vector_mode ()));
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index b0b3b0ed977..406f96439ec 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -6574,7 +6574,7 @@
   [(set_attr "type" "vf")
(set_attr "mode" "")])
 
-(define_insn "@pred_single_widen_"
+(define_insn "@pred_single_widen_add"
   [(set (match_operand:VWEXTF 0 "register_operand"  "=&vr,  
&vr")
(if_then_else:VWEXTF
  (unspec:
@@ -6587,14 +6587,37 @@
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)
 (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
- (plus_minus:VWEXTF
+ (plus:VWEXTF
+   (float_extend:VWEXTF
+ (match_operand: 4 "register_operand" "   vr,   
vr"))
+   (match_operand:VWEXTF 3 "register_operand" "   vr,   
vr"))
+ (match_operand:VWEXTF 2 "vector_merge_operand"   "   vu,
0")))]
+  "TARGET_VECTOR"
+  "vfwadd.wv\t%0,%3,%4%p1"
+  [(set_attr "type" "vfwalu")
+   (set_attr "mode" "")])
+
+(define_insn "@pred_single_widen_sub"
+  [(set (match_operand:VWEXTF 0 "register_operand"  "=&vr,  
&vr")
+   (if_then_else:VWEXTF
+ (unspec:
+   [(match_operand: 1 "vector_mask_operand"   
"vmWc1,vmWc1")
+(match_operand 5 "vector_length_operand"  "   rK,   
rK")
+(match_operand 6 "const_int_operand"  "i,
i")
+(match_o

Re: [PATCH] RISC-V: Support floating-point vfwadd/vfwsub vv/wv combine lowering

2023-06-27 Thread Kito Cheng via Gcc-patches
It seems because of canonical form of RTL, right?

LGTM, but plz add some more comments about the reason into the commit log.

On Wed, Jun 28, 2023 at 11:00 AM Juzhe-Zhong  wrote:
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc: Adapt expand.
> * config/riscv/vector.md 
> (@pred_single_widen_): Remove.
> (@pred_single_widen_add): New pattern.
> (@pred_single_widen_sub): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/widen/widen-1.c: Add floating-point.
> * gcc.target/riscv/rvv/autovec/widen/widen-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/widen-5.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/widen-6.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: Ditto.
> * gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-1.c: New test.
> * gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-2.c: New test.
> * gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-5.c: New test.
> * gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-6.c: New test.
>
> ---
>  .../riscv/riscv-vector-builtins-bases.cc  |  8 +++--
>  gcc/config/riscv/vector.md| 31 ---
>  .../riscv/rvv/autovec/widen/widen-1.c |  7 +++--
>  .../riscv/rvv/autovec/widen/widen-2.c |  7 +++--
>  .../riscv/rvv/autovec/widen/widen-5.c |  7 +++--
>  .../riscv/rvv/autovec/widen/widen-6.c |  7 +++--
>  .../rvv/autovec/widen/widen-complicate-1.c|  7 +++--
>  .../rvv/autovec/widen/widen-complicate-2.c|  7 +++--
>  .../riscv/rvv/autovec/widen/widen_run-1.c |  5 +--
>  .../riscv/rvv/autovec/widen/widen_run-2.c |  5 +--
>  .../riscv/rvv/autovec/widen/widen_run-5.c |  5 +--
>  .../riscv/rvv/autovec/widen/widen_run-6.c |  5 +--
>  .../rvv/autovec/widen/widen_run_zvfh-1.c  | 28 +
>  .../rvv/autovec/widen/widen_run_zvfh-2.c  | 28 +
>  .../rvv/autovec/widen/widen_run_zvfh-5.c  | 28 +
>  .../rvv/autovec/widen/widen_run_zvfh-6.c  | 28 +
>  16 files changed, 187 insertions(+), 26 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-5.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-6.c
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index fb74cb36ebd..f4a061a831b 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -390,8 +390,12 @@ public:
> return e.use_exact_insn (
>   code_for_pred_dual_widen_scalar (CODE, e.vector_mode ()));
>case OP_TYPE_wv:
> -   return e.use_exact_insn (
> - code_for_pred_single_widen (CODE, e.vector_mode ()));
> +   if (CODE == PLUS)
> + return e.use_exact_insn (
> +   code_for_pred_single_widen_add (e.vector_mode ()));
> +   else
> + return e.use_exact_insn (
> +   code_for_pred_single_widen_sub (e.vector_mode ()));
>case OP_TYPE_wf:
> return e.use_exact_insn (
>   code_for_pred_single_widen_scalar (CODE, e.vector_mode ()));
> diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
> index b0b3b0ed977..406f96439ec 100644
> --- a/gcc/config/riscv/vector.md
> +++ b/gcc/config/riscv/vector.md
> @@ -6574,7 +6574,7 @@
>[(set_attr "type" "vf")
> (set_attr "mode" "")])
>
> -(define_insn "@pred_single_widen_"
> +(define_insn "@pred_single_widen_add"
>[(set (match_operand:VWEXTF 0 "register_operand"  "=&vr,  
> &vr")
> (if_then_else:VWEXTF
>   (unspec:
> @@ -6587,14 +6587,37 @@
>  (reg:SI VL_REGNUM)
>  (reg:SI VTYPE_REGNUM)
>  (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
> - (plus_minus:VWEXTF
> + (plus:VWEXTF
> +   (float_extend:VWEXTF
> + (match_operand: 4 "register_operand" "   vr,   
> vr"))
> +   (match_operand:VWEXTF 3 "register_operand" "   vr,   
> vr"))
> + (match_operand:VWEXTF 2 "vector_merge_operand"   "   vu,
> 0")))]
> +  "TARGET_VECTOR"
> +  "vfwadd.wv\t%0,%3,%4%p1"
> +  [(set_attr "type" "vfwalu")
> +   (set_attr "mode" "")])
> +
> +(define_insn "@pred_single_widen_sub"
> +  [(set (m

[PATCH V2] RISC-V: Support floating-point vfwadd/vfwsub vv/wv combine lowering

2023-06-27 Thread Juzhe-Zhong
Currently, vfwadd.wv is the pattern with (set (reg) (float_extend:(reg)) which 
makes
combine pass faile to combine.

change RTL format of vfwadd.wv --> (set (float_extend:(reg) (reg)) so that 
combine
PASS can combine.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Adapt expand.
* config/riscv/vector.md (@pred_single_widen_): 
Remove.
(@pred_single_widen_add): New pattern.
(@pred_single_widen_sub): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-1.c: Add floating-point.
* gcc.target/riscv/rvv/autovec/widen/widen-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-1.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-2.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-5.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-6.c: New test.

---
 .../riscv/riscv-vector-builtins-bases.cc  |  8 +++--
 gcc/config/riscv/vector.md| 31 ---
 .../riscv/rvv/autovec/widen/widen-1.c |  7 +++--
 .../riscv/rvv/autovec/widen/widen-2.c |  7 +++--
 .../riscv/rvv/autovec/widen/widen-5.c |  7 +++--
 .../riscv/rvv/autovec/widen/widen-6.c |  7 +++--
 .../rvv/autovec/widen/widen-complicate-1.c|  7 +++--
 .../rvv/autovec/widen/widen-complicate-2.c|  7 +++--
 .../riscv/rvv/autovec/widen/widen_run-1.c |  5 +--
 .../riscv/rvv/autovec/widen/widen_run-2.c |  5 +--
 .../riscv/rvv/autovec/widen/widen_run-5.c |  5 +--
 .../riscv/rvv/autovec/widen/widen_run-6.c |  5 +--
 .../rvv/autovec/widen/widen_run_zvfh-1.c  | 28 +
 .../rvv/autovec/widen/widen_run_zvfh-2.c  | 28 +
 .../rvv/autovec/widen/widen_run_zvfh-5.c  | 28 +
 .../rvv/autovec/widen/widen_run_zvfh-6.c  | 28 +
 16 files changed, 187 insertions(+), 26 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-5.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-6.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index fb74cb36ebd..f4a061a831b 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -390,8 +390,12 @@ public:
return e.use_exact_insn (
  code_for_pred_dual_widen_scalar (CODE, e.vector_mode ()));
   case OP_TYPE_wv:
-   return e.use_exact_insn (
- code_for_pred_single_widen (CODE, e.vector_mode ()));
+   if (CODE == PLUS)
+ return e.use_exact_insn (
+   code_for_pred_single_widen_add (e.vector_mode ()));
+   else
+ return e.use_exact_insn (
+   code_for_pred_single_widen_sub (e.vector_mode ()));
   case OP_TYPE_wf:
return e.use_exact_insn (
  code_for_pred_single_widen_scalar (CODE, e.vector_mode ()));
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index b0b3b0ed977..406f96439ec 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -6574,7 +6574,7 @@
   [(set_attr "type" "vf")
(set_attr "mode" "")])
 
-(define_insn "@pred_single_widen_"
+(define_insn "@pred_single_widen_add"
   [(set (match_operand:VWEXTF 0 "register_operand"  "=&vr,  
&vr")
(if_then_else:VWEXTF
  (unspec:
@@ -6587,14 +6587,37 @@
 (reg:SI VL_REGNUM)
 (reg:SI VTYPE_REGNUM)
 (reg:SI FRM_REGNUM)] UNSPEC_VPREDICATE)
- (plus_minus:VWEXTF
+ (plus:VWEXTF
+   (float_extend:VWEXTF
+ (match_operand: 4 "register_operand" "   vr,   
vr"))
+   (match_operand:VWEXTF 3 "register_operand" "   vr,   
vr"))
+ (match_operand:VWEXTF 2 "vector_merge_operand"   "   vu,
0")))]
+  "TARGET_VECTOR"
+  "vfwadd.wv\t%0,%3,%4%p1"
+  [(set_attr "type" "vfwalu")
+   (set_attr "mode" "")])
+
+(define_insn "@pred_single_widen_sub"
+  [(set (match_operand:VWEXTF 0 "register_operand"  "=&vr,  
&vr")
+   (if_then_else:VWEXTF
+ (unspec:
+   [(match_operand: 1 "vector_ma

Re: [PATCH V2] RISC-V: Fix bug of pre-calculated const vector mask

2023-06-27 Thread juzhe.zh...@rivai.ai
This patch is the critical patch for following patches since it is a bug which 
I already address in rvv-next.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-06-28 09:59
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH V2] RISC-V: Fix bug of pre-calculated const vector mask
This bug blocks the following patches.
 
GCC doesn't know RVV is using compact mask model.
Consider this following case:
 
#define N 16
 
int
main ()
{
  int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
  int8_t out[N] = {0};
  for (int8_t i = 0; i < N; ++i)
if (mask[i])
  out[i] = i;
  for (int8_t i = 0; i < N; ++i)
{
  if (mask[i])
assert (out[i] == i);
  else
assert (out[i] == 0);
}
}
 
Before this patch, the pre-calculated mask in constant memory pool:
.LC1:
.byte   68 > 0b01000100
 
This is incorrect, such case failed in execution.
 
After this patch:
.LC1:
.byte 10 > 0b1010
 
Pass on exection.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (rvv_builder::get_compact_mask): New function.
(expand_const_vector): Ditto.
* config/riscv/riscv.cc (riscv_const_insns): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c: New test.
 
---
gcc/config/riscv/riscv-v.cc   | 64 +--
gcc/config/riscv/riscv.cc |  6 ++
.../riscv/rvv/autovec/vls-vlmax/bitmask-1.c   | 23 +++
.../riscv/rvv/autovec/vls-vlmax/bitmask-2.c   | 23 +++
.../riscv/rvv/autovec/vls-vlmax/bitmask-3.c   | 23 +++
.../riscv/rvv/autovec/vls-vlmax/bitmask-4.c   | 23 +++
.../riscv/rvv/autovec/vls-vlmax/bitmask-5.c   | 25 
.../riscv/rvv/autovec/vls-vlmax/bitmask-6.c   | 27 
.../riscv/rvv/autovec/vls-vlmax/bitmask-7.c   | 30 +
.../riscv/rvv/autovec/vls-vlmax/bitmask-8.c   | 30 +
.../riscv/rvv/autovec/vls-vlmax/bitmask-9.c   | 30 +
11 files changed, 299 insertions(+), 5 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index adb8d7d36a5..5da0dc5e998 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -291,6 +291,7 @@ public:
   bool single_step_npatterns_p () const;
   bool npatterns_all_equal_p () const;
+  rtx get_compact_mask () const;
   machine_mode new_mode () const { return m_new_mode; }
   scalar_mode inner_mode () const { return m_inner_mode; }
@@ -505,6 +506,47 @@ rvv_builder::npatterns_all_equal_p () const
   return true;
}
+/* Generate the compact mask.
+
+ E.g: mask = { 0, -1 }, mode = VNx2BI, bitsize = 128bits.
+
+   GCC by default will generate the mask = 0b0001x.
+
+   However, it's not expected mask for RVV since RVV
+   prefers the compact mask = 0b10x.
+*/
+rtx
+rvv_builder::get_compact_mask () const
+{
+  /* If TARGET_MIN_VLEN == 32, the minimum LMUL = 1/4.
+ Otherwise, the minimum LMUL = 1/8.  */
+  unsigned min_lmul = TARGET_MIN_VLEN == 32 ? 4 : 8;
+  unsigned min_container_size
+= BYTES_PER_RISCV_VECTOR.to_constant () / min_lmul;
+  unsigned container_size = MAX (CEIL (npatterns (), 8), min_container_size);
+  machine_mode container_mode
+= get_vector_mode (QImode, container_size).require ();
+
+  unsigned nunits = GET_MODE_NUNITS (container_mode).to_constant ();
+  rtvec v = rtvec_alloc (nunits);
+  for (unsigned i = 0; i < nunits; i++)
+RTVEC_ELT (v, i) = const0_rtx;
+
+  unsigned char b = 0;
+  for (unsigned i = 0; i < npatterns (); i++)
+{
+  if (INTVAL (elt (i)))
+ b = b | (1 << (i % 8));
+
+  if ((i > 0 && (i % 8) == 7) || (i == (npatterns () - 

Re: [PATCH V2] RISC-V: Support floating-point vfwadd/vfwsub vv/wv combine lowering

2023-06-27 Thread Kito Cheng via Gcc-patches
LGTM with a minor comment.

> Currently, vfwadd.wv is the pattern with (set (reg) (float_extend:(reg)) 
> which makes

it's minor so you can just go commit after the fix: this should be
(set (plus (reg) (float_extend:(reg)))

> combine pass faile to combine.
>
> change RTL format of vfwadd.wv --> (set (float_extend:(reg) (reg)) so 
> that combine

and (set (plus (float_extend:(reg) (reg)))

> PASS can combine.


Re: [PATCH V2] RISC-V: Fix bug of pre-calculated const vector mask

2023-06-27 Thread Kito Cheng via Gcc-patches
Do you mind giving some comments about what the difference between the
two versions?

On Wed, Jun 28, 2023 at 11:14 AM juzhe.zh...@rivai.ai
 wrote:
>
> This patch is the critical patch for following patches since it is a bug 
> which I already address in rvv-next.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Juzhe-Zhong
> Date: 2023-06-28 09:59
> To: gcc-patches
> CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; 
> Juzhe-Zhong
> Subject: [PATCH V2] RISC-V: Fix bug of pre-calculated const vector mask
> This bug blocks the following patches.
>
> GCC doesn't know RVV is using compact mask model.
> Consider this following case:
>
> #define N 16
>
> int
> main ()
> {
>   int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
>   int8_t out[N] = {0};
>   for (int8_t i = 0; i < N; ++i)
> if (mask[i])
>   out[i] = i;
>   for (int8_t i = 0; i < N; ++i)
> {
>   if (mask[i])
> assert (out[i] == i);
>   else
> assert (out[i] == 0);
> }
> }
>
> Before this patch, the pre-calculated mask in constant memory pool:
> .LC1:
> .byte   68 > 0b01000100
>
> This is incorrect, such case failed in execution.
>
> After this patch:
> .LC1:
> .byte 10 > 0b1010
>
> Pass on exection.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-v.cc (rvv_builder::get_compact_mask): New 
> function.
> (expand_const_vector): Ditto.
> * config/riscv/riscv.cc (riscv_const_insns): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c: New test.
>
> ---
> gcc/config/riscv/riscv-v.cc   | 64 +--
> gcc/config/riscv/riscv.cc |  6 ++
> .../riscv/rvv/autovec/vls-vlmax/bitmask-1.c   | 23 +++
> .../riscv/rvv/autovec/vls-vlmax/bitmask-2.c   | 23 +++
> .../riscv/rvv/autovec/vls-vlmax/bitmask-3.c   | 23 +++
> .../riscv/rvv/autovec/vls-vlmax/bitmask-4.c   | 23 +++
> .../riscv/rvv/autovec/vls-vlmax/bitmask-5.c   | 25 
> .../riscv/rvv/autovec/vls-vlmax/bitmask-6.c   | 27 
> .../riscv/rvv/autovec/vls-vlmax/bitmask-7.c   | 30 +
> .../riscv/rvv/autovec/vls-vlmax/bitmask-8.c   | 30 +
> .../riscv/rvv/autovec/vls-vlmax/bitmask-9.c   | 30 +
> 11 files changed, 299 insertions(+), 5 deletions(-)
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index adb8d7d36a5..5da0dc5e998 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -291,6 +291,7 @@ public:
>bool single_step_npatterns_p () const;
>bool npatterns_all_equal_p () const;
> +  rtx get_compact_mask () const;
>machine_mode new_mode () const { return m_new_mode; }
>scalar_mode inner_mode () const { return m_inner_mode; }
> @@ -505,6 +506,47 @@ rvv_builder::npatterns_all_equal_p () const
>return true;
> }
> +/* Generate the compact mask.
> +
> + E.g: mask = { 0, -1 }, mode = VNx2BI, bitsize = 128bits.
> +
> +   GCC by default will generate the mask = 0b0001x.
> +
> +   However, it's not expected mask for RVV since RVV
> +   prefers the compact mask = 0b10x.
> +*/
> +rtx
> +rvv_builder::get_compact_mask () const
> +{
> +  /* If TARGET_MIN_VLEN == 32, the minimum LMUL = 1/4.
> + Otherwise, the minimum LMUL = 1/8.  */
> +  unsigned min_lmul = TARGET_MIN_VLEN == 32 ? 4 : 8;
> +  unsigned min_container_size
> += BYTES_PER_RISCV_VECTOR.to_constant () / min_lmul;
> +  unsigned container_size = MAX (CEIL (npatterns (), 8), min_container_size);
> +  machine_mode container_mode
> += get_vector_mod

Re: Re: [PATCH V2] RISC-V: Fix bug of pre-calculated const vector mask

2023-06-27 Thread juzhe.zh...@rivai.ai
I have commented in commit log:

before this patch:
The mask is:
.LC1:
.byte   68 > 0b01000100

However, this is incorrect for RVV since RVV always uses 1-bit compact mask,
now after this patch:
.LC1:
.byte   10 > 0b1010



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-06-28 11:16
To: juzhe.zh...@rivai.ai
CC: gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw; Robin Dapp
Subject: Re: [PATCH V2] RISC-V: Fix bug of pre-calculated const vector mask
Do you mind giving some comments about what the difference between the
two versions?
 
On Wed, Jun 28, 2023 at 11:14 AM juzhe.zh...@rivai.ai
 wrote:
>
> This patch is the critical patch for following patches since it is a bug 
> which I already address in rvv-next.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Juzhe-Zhong
> Date: 2023-06-28 09:59
> To: gcc-patches
> CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; 
> Juzhe-Zhong
> Subject: [PATCH V2] RISC-V: Fix bug of pre-calculated const vector mask
> This bug blocks the following patches.
>
> GCC doesn't know RVV is using compact mask model.
> Consider this following case:
>
> #define N 16
>
> int
> main ()
> {
>   int8_t mask[N] = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
>   int8_t out[N] = {0};
>   for (int8_t i = 0; i < N; ++i)
> if (mask[i])
>   out[i] = i;
>   for (int8_t i = 0; i < N; ++i)
> {
>   if (mask[i])
> assert (out[i] == i);
>   else
> assert (out[i] == 0);
> }
> }
>
> Before this patch, the pre-calculated mask in constant memory pool:
> .LC1:
> .byte   68 > 0b01000100
>
> This is incorrect, such case failed in execution.
>
> After this patch:
> .LC1:
> .byte 10 > 0b1010
>
> Pass on exection.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-v.cc (rvv_builder::get_compact_mask): New 
> function.
> (expand_const_vector): Ditto.
> * config/riscv/riscv.cc (riscv_const_insns): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c: New test.
> * gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c: New test.
>
> ---
> gcc/config/riscv/riscv-v.cc   | 64 +--
> gcc/config/riscv/riscv.cc |  6 ++
> .../riscv/rvv/autovec/vls-vlmax/bitmask-1.c   | 23 +++
> .../riscv/rvv/autovec/vls-vlmax/bitmask-2.c   | 23 +++
> .../riscv/rvv/autovec/vls-vlmax/bitmask-3.c   | 23 +++
> .../riscv/rvv/autovec/vls-vlmax/bitmask-4.c   | 23 +++
> .../riscv/rvv/autovec/vls-vlmax/bitmask-5.c   | 25 
> .../riscv/rvv/autovec/vls-vlmax/bitmask-6.c   | 27 
> .../riscv/rvv/autovec/vls-vlmax/bitmask-7.c   | 30 +
> .../riscv/rvv/autovec/vls-vlmax/bitmask-8.c   | 30 +
> .../riscv/rvv/autovec/vls-vlmax/bitmask-9.c   | 30 +
> 11 files changed, 299 insertions(+), 5 deletions(-)
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-1.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-2.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-3.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-4.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-5.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-6.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-7.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-8.c
> create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/bitmask-9.c
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index adb8d7d36a5..5da0dc5e998 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -291,6 +291,7 @@ public:
>bool single_step_npatterns_p () const;
>bool npatterns_all_equal_p () const;
> +  rtx get_compact_mask () const;
>machine_mode new_mode () const { return m_new_mode; }
>scalar_mode inner_mode () const { return m_inner_mode; }
> @@ -505,6 +506,47 @@ rvv_builder::npatterns_all_equal_p () const
>return true;
> }
> +/* Generate the compact mask.
> +
> + E.g: mask = { 0, -1 }, mode = VNx2BI, bitsize = 128bits.
> +
> +   GCC by default will generate the mask = 0b0001x.
> +
> +   However, it's not expected mask for RVV since RVV
> +   prefer

Re: [x86 PATCH] Tweak ix86_expand_int_compare to use PTEST for vector equality.

2023-06-27 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 28, 2023 at 3:32 AM Roger Sayle  wrote:
>
>
> Doh! Wrong patch...
> Roger
> --
>
> From: Roger Sayle 
> Sent: 27 June 2023 20:28
> To: 'gcc-patches@gcc.gnu.org' 
> Cc: 'Uros Bizjak' ; 'Hongtao Liu' 
> Subject: [x86 PATCH] Tweak ix86_expand_int_compare to use PTEST for vector
> equality.
>
>
> Hi Uros,
>
> Hopefully Hongtao will approve my patch to support SUBREG conversions
> in STV https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html
> but for some of the examples described in the above post (and its test
> case), I've also come up with an alternate/complementary/supplementary
> fix of generating the PTEST during RTL expansion, rather than rely on
> this being caught/optimized later during STV.
>
> You may notice in this patch, the tests for TARGET_SSE4_1 and TImode
> appear last.  When I was writing this, I initially also added support
> for AVX VPTEST and OImode, before realizing that x86 doesn't (yet)
> support 256-bit OImode (which also explains why we don't have an OImode
> to V1OImode scalar-to-vector pass).  Retaining this clause ordering
> should minimize the lines changed if things change in future.
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2023-06-27  Roger Sayle  
>
> gcc/ChangeLog
> * config/i386/i386-expand.cc (ix86_expand_int_compare): If
> testing a TImode SUBREG of a 128-bit vector register against
> zero, use a PTEST instruction instead of first moving it to
> to scalar registers.
>
>
> Please let me know what you think.
> Roger
> --
>

+  /* Attempt to use PTEST, if available, when testing vector modes for
+ equality/inequality against zero.  */
+  if (op1 == const0_rtx
+  && SUBREG_P (op0)
+  && cmpmode == CCZmode
+  && SUBREG_BYTE (op0) == 0
+  && REG_P (SUBREG_REG (op0))
Just register_operand (op0, TImode),
+  && VECTOR_MODE_P (GET_MODE (SUBREG_REG (op0)))
+  && TARGET_SSE4_1
+  && GET_MODE (op0) == TImode
+  && GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))) == 16)
+{
+  tmp = SUBREG_REG (op0);
and tmp = lowpart_subreg (V1TImode, force_reg (TImode, op0));?
I think RA can handle SUBREG correctly, no need for extra predicates.
+  tmp = gen_rtx_UNSPEC (CCZmode, gen_rtvec (2, tmp, tmp), UNSPEC_PTEST);
+}
+  else
+tmp = gen_rtx_COMPARE (cmpmode, op0, op1);



--
BR,
Hongtao


[pushed] testsuite: std_list handling for { target c++26 }

2023-06-27 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

As with c++23, we want to run { target c++26 } tests even though it isn't
part of the default std_list.

C++17 with Concepts TS is no longer an interesting target configuration.

And bump the impcx target to use C++26 mode instead of 23.

gcc/testsuite/ChangeLog:

* lib/g++-dg.exp (g++-dg-runtest): Update for C++26.
---
 gcc/testsuite/lib/g++-dg.exp | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/testsuite/lib/g++-dg.exp b/gcc/testsuite/lib/g++-dg.exp
index 08185a8987e..046d63170c8 100644
--- a/gcc/testsuite/lib/g++-dg.exp
+++ b/gcc/testsuite/lib/g++-dg.exp
@@ -58,17 +58,17 @@ proc g++-dg-runtest { testcases flags default-extra-flags } 
{
# single test.  This should be updated or commented
# out whenever the default std_list is updated or newer
# C++ effective target is added.
-   if [search_for $test "{ dg-do * { target c++23 } }"] {
-   set std_list { 23 }
+   if [search_for $test "\{ dg-do * \{ target c++23"] {
+   set std_list { 23 26 }
+   } elseif [search_for $test "\{ dg-do * \{ target c++26"] {
+   set std_list { 26 }
} else {
set std_list { 98 14 17 20 }
}
}
set option_list { }
foreach x $std_list {
-   # Handle "concepts" as C++17 plus Concepts TS.
-   if { $x eq "concepts" } then { set x "17 -fconcepts"
-   } elseif { $x eq "impcx" } then { set x "23 
-fimplicit-constexpr" }
+   if { $x eq "impcx" } then { set x "26 -fimplicit-constexpr" }
lappend option_list "${std_prefix}$x"
}
} else {

base-commit: ebe7c586f62b1c5218b19c3c6853163287b3c887
-- 
2.39.3



[pushed] c++: C++26 constexpr cast from void* [PR110344]

2023-06-27 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

P2768 allows static_cast from void* to ob* in constant evaluation if the
pointer does in fact point to an object of the appropriate type.
cxx_fold_indirect_ref already does the work of finding such an object if it
happens to be a subobject rather than the outermost object at that address,
as in constexpr-voidptr2.C.

P2768
PR c++/110344

gcc/c-family/ChangeLog:

* c-cppbuiltin.cc (c_cpp_builtins): Update __cpp_constexpr.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression): In C++26, allow cast
from void* to the type of a pointed-to object.

gcc/testsuite/ChangeLog:

* g++.dg/cpp26/constexpr-voidptr1.C: New test.
* g++.dg/cpp26/constexpr-voidptr2.C: New test.
* g++.dg/cpp26/feat-cxx26.C: New test.
---
 gcc/c-family/c-cppbuiltin.cc  |   8 +-
 gcc/cp/constexpr.cc   |  11 +
 .../g++.dg/cpp26/constexpr-voidptr1.C |  35 +
 .../g++.dg/cpp26/constexpr-voidptr2.C |  15 +
 gcc/testsuite/g++.dg/cpp26/feat-cxx26.C   | 597 ++
 5 files changed, 665 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp26/constexpr-voidptr1.C
 create mode 100644 gcc/testsuite/g++.dg/cpp26/constexpr-voidptr2.C
 create mode 100644 gcc/testsuite/g++.dg/cpp26/feat-cxx26.C

diff --git a/gcc/c-family/c-cppbuiltin.cc b/gcc/c-family/c-cppbuiltin.cc
index 5d64625fcd7..6bd4c1261a7 100644
--- a/gcc/c-family/c-cppbuiltin.cc
+++ b/gcc/c-family/c-cppbuiltin.cc
@@ -1075,12 +1075,18 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_size_t_suffix=202011L");
  cpp_define (pfile, "__cpp_if_consteval=202106L");
  cpp_define (pfile, "__cpp_auto_cast=202110L");
- cpp_define (pfile, "__cpp_constexpr=202211L");
+ if (cxx_dialect <= cxx23)
+   cpp_define (pfile, "__cpp_constexpr=202211L");
  cpp_define (pfile, "__cpp_multidimensional_subscript=202211L");
  cpp_define (pfile, "__cpp_named_character_escapes=202207L");
  cpp_define (pfile, "__cpp_static_call_operator=202207L");
  cpp_define (pfile, "__cpp_implicit_move=202207L");
}
+  if (cxx_dialect > cxx23)
+   {
+ /* Set feature test macros for C++26.  */
+ cpp_define (pfile, "__cpp_constexpr=202306L");
+   }
   if (flag_concepts)
 {
  if (cxx_dialect >= cxx20)
diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index 432b3a275e8..cca0435bafc 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -7681,6 +7681,17 @@ cxx_eval_constant_expression (const constexpr_ctx *ctx, 
tree t,
&& !is_std_construct_at (ctx->call)
&& !is_std_allocator_allocate (ctx->call))
  {
+   /* P2738 (C++26): a conversion from a prvalue P of type "pointer to
+  cv void" to a pointer-to-object type T unless P points to an
+  object whose type is similar to T.  */
+   if (cxx_dialect > cxx23)
+ if (tree ob
+ = cxx_fold_indirect_ref (ctx, loc, TREE_TYPE (type), op))
+   {
+ r = build1 (ADDR_EXPR, type, ob);
+ break;
+   }
+
/* Likewise, don't error when casting from void* when OP is
   &heap uninit and similar.  */
tree sop = tree_strip_nop_conversions (op);
diff --git a/gcc/testsuite/g++.dg/cpp26/constexpr-voidptr1.C 
b/gcc/testsuite/g++.dg/cpp26/constexpr-voidptr1.C
new file mode 100644
index 000..ce0ccbef5f9
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp26/constexpr-voidptr1.C
@@ -0,0 +1,35 @@
+// PR c++/110344
+// { dg-do compile { target c++26 } }
+
+#include 
+struct Sheep {
+  constexpr std::string_view speak() const noexcept { return "Baa"; }
+};
+struct Cow {
+  constexpr std::string_view speak() const noexcept { return "Mooo"; }
+};
+class Animal_View {
+private:
+  const void *animal;
+  std::string_view (*speak_function)(const void *);
+public:
+  template 
+  constexpr Animal_View(const Animal &a)
+: animal{&a}, speak_function{[](const void *object) {
+  return static_cast(object)->speak();
+}} {}
+  constexpr std::string_view speak() const noexcept {
+return speak_function(animal);
+  }
+};
+// This is the key bit here. This is a single concrete function
+// that can take anything that happens to have the "Animal_View"
+// interface
+constexpr std::string_view do_speak(Animal_View av) { return av.speak(); }
+int main() {
+  // A Cow is a cow. The only think that makes it special
+  // is that it has a "std::string_view speak() const" member
+  constexpr Cow cow;
+  constexpr auto result = do_speak(cow);
+  return static_cast(result.size());
+}
diff --git a/gcc/testsuite/g++.dg/cpp26/constexpr-voidptr2.C 
b/gcc/testsuite/g++.dg/cpp26/constexpr-voidptr2.C
new file mode 100644

Re: [PATCH V2] RISC-V: Fix bug of pre-calculated const vector mask

2023-06-27 Thread Jeff Law via Gcc-patches




On 6/27/23 21:16, Kito Cheng wrote:

Do you mind giving some comments about what the difference between the
two versions?
And I'd like a before/after assembly code with the example in the commit 
message.  I didn't see the same behavior when I tried it earlier today 
and ran out of time to dig into it further.


Juzhe -- most folks wait ~1wk to ping patches, even codegen bugfixes. 
Pinging this fast runs the risk of irritating others.  Please be patient.


Jeff


[PATCH] RISC-V: Support vfwmul.vv combine lowering

2023-06-27 Thread Juzhe-Zhong
Consider the following complicate case:
#define TEST_TYPE(TYPE1, TYPE2)\
  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 ( \
TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3, \
TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,  \
TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n) \
  {\
for (int i = 0; i < n; i++)\
  {\
dst[i] = (TYPE1) a[i] * (TYPE1) b[i];  \
dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];\
dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];\
dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];\
  }\
  }

TEST_TYPE (double, float)

Such complicate situation, Combine PASS can not combine extension of both 
operands on the fly.
So the combine PASS will first try to combine one of the combine extension, and 
then combine
the other. The combine flow is as follows:

Original IR:
(set (reg 0) (float_extend: (reg 1))
(set (reg 3) (float_extend: (reg 2)) 
(set (reg 4) (mult: (reg 0) (reg 3))

First step of combine:
(set (reg 3) (float_extend: (reg 2))
(set (reg 4) (mult: (float_extend: (reg 1) (reg 3))

Second step of combine:
(set (reg 4) (mult: (float_extend: (reg 1) (float_extend: (reg 2))

So, to enhance the combine optimization, we add a "pseudo vwfmul.wv" RTL 
pattern in autovec-opt.md
which is (set (reg 0) (mult (float_extend (reg 1) (reg 2.

gcc/ChangeLog:

* config/riscv/autovec-opt.md 
(@pred_single_widen_mul): Change "@" into "*" in pattern 
name which simplifies build files.
(*pred_single_widen_mul): Ditto.
(*pred_single_widen_mul): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-3.c: Add floating-point.
* gcc.target/riscv/rvv/autovec/widen/widen-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c: New test.

---
 gcc/config/riscv/autovec-opt.md   | 41 ++-
 .../riscv/rvv/autovec/widen/widen-3.c |  7 +++-
 .../riscv/rvv/autovec/widen/widen-7.c |  7 +++-
 .../rvv/autovec/widen/widen-complicate-3.c|  7 +++-
 .../riscv/rvv/autovec/widen/widen_run-3.c |  5 ++-
 .../riscv/rvv/autovec/widen/widen_run-7.c |  5 ++-
 .../rvv/autovec/widen/widen_run_zvfh-3.c  | 28 +
 .../rvv/autovec/widen/widen_run_zvfh-7.c  | 28 +
 8 files changed, 117 insertions(+), 11 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-7.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 28040805b23..1fcd55ac2a0 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -21,7 +21,7 @@
 ;; We don't have vwmul.wv instruction like vwadd.wv in RVV.
 ;; This pattern is an intermediate RTL IR as a pseudo vwmul.wv to enhance
 ;; optimization of instructions combine.
-(define_insn_and_split "@pred_single_widen_mul"
+(define_insn_and_split "*pred_single_widen_mul"
   [(set (match_operand:VWEXTI 0 "register_operand"  "=&vr,&vr")
(if_then_else:VWEXTI
  (unspec:
@@ -405,3 +405,42 @@
   "vmv.x.s\t%0,%1"
   [(set_attr "type" "vimovvx")
(set_attr "mode" "")])
+
+;; We don't have vfwmul.wv instruction like vfwadd.wv in RVV.
+;; This pattern is an intermediate RTL IR as a pseudo vfwmul.wv to enhance
+;; optimization of instructions combine.
+(define_insn_and_split "*pred_single_widen_mul"
+  [(set (match_operand:VWEXTF 0 "register_operand"  "=&vr,  
&vr")
+   (if_then_else:VWEXTF
+ (unspec:
+   [(match_operand: 1 "vector_mask_operand"   
"vmWc1,vmWc1")
+(match_operand 5 "vector_length_operand"  "   rK,   
rK")
+(match_operand 6 "const_int_operand"  "i,
i")
+(match_operand 7 "const_int_operand"  "i,
i")
+(match_operand 8 "const_int_operand"  "i,
i")
+(match_operand 9 "const_int_operand"  "i,
i")
+(reg:SI VL_REGNUM)
+(reg:SI VTYPE_REGNUM)
+(reg:SI 

[pushed] c++: inherited constructor attributes

2023-06-27 Thread Jason Merrill via Gcc-patches
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

Inherited constructors are like constructor clones; they don't exist from
the language perspective, so they should copy the attributes in the same
way.  But it doesn't make sense to copy alias or ifunc attributes in either
case.  Unlike handle_copy_attribute, we do want to copy inlining attributes.

The discussion of PR110334 pointed out that we weren't copying the
always_inline attribute, leading to poor inlining choices.

PR c++/110334

gcc/cp/ChangeLog:

* cp-tree.h (clone_attrs): Declare.
* method.cc (implicitly_declare_fn): Use it for inherited
constructor.
* optimize.cc (clone_attrs): New.
(maybe_clone_body): Use it.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/nodiscard-inh1.C: New test.
---
 gcc/cp/cp-tree.h|  1 +
 gcc/cp/method.cc|  2 ++
 gcc/cp/optimize.cc  | 26 -
 gcc/testsuite/g++.dg/cpp1z/nodiscard-inh1.C | 15 
 4 files changed, 43 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp1z/nodiscard-inh1.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 83982233111..0d7a6c153dc 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7282,6 +7282,7 @@ extern void module_preprocess_options (cpp_reader *);
 extern bool handle_module_option (unsigned opt, const char *arg, int value);
 
 /* In optimize.cc */
+extern tree clone_attrs(tree);
 extern bool maybe_clone_body   (tree);
 
 /* In parser.cc */
diff --git a/gcc/cp/method.cc b/gcc/cp/method.cc
index 91cf943f110..8ed967ddb21 100644
--- a/gcc/cp/method.cc
+++ b/gcc/cp/method.cc
@@ -3294,6 +3294,8 @@ implicitly_declare_fn (special_function_kind kind, tree 
type,
   /* Copy constexpr from the inherited constructor even if the
 inheriting constructor doesn't satisfy the requirements.  */
   constexpr_p = DECL_DECLARED_CONSTEXPR_P (inherited_ctor);
+  /* Also copy any attributes.  */
+  DECL_ATTRIBUTES (fn) = clone_attrs (DECL_ATTRIBUTES (inherited_ctor));
 }
 
   /* Add the "this" parameter.  */
diff --git a/gcc/cp/optimize.cc b/gcc/cp/optimize.cc
index f73d86b6c6b..9e8926e4cc6 100644
--- a/gcc/cp/optimize.cc
+++ b/gcc/cp/optimize.cc
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "debug.h"
 #include "tree-inline.h"
 #include "tree-iterator.h"
+#include "attribs.h"
 
 /* Prototypes.  */
 
@@ -446,6 +447,29 @@ maybe_thunk_body (tree fn, bool force)
   return 1;
 }
 
+/* Copy most attributes from ATTRS, omitting attributes that can really only
+   apply to a single decl.  */
+
+tree
+clone_attrs (tree attrs)
+{
+  tree new_attrs = NULL_TREE;
+  tree *p = &new_attrs;
+
+  for (tree a = attrs; a; a = TREE_CHAIN (a))
+{
+  tree aname = get_attribute_name (a);
+  if (is_attribute_namespace_p ("", a)
+ && (is_attribute_p ("alias", aname)
+ || is_attribute_p ("ifunc", aname)))
+   continue;
+  *p = copy_node (a);
+  p = &TREE_CHAIN (*p);
+}
+  *p = NULL_TREE;
+  return new_attrs;
+}
+
 /* FN is a function that has a complete body.  Clone the body as
necessary.  Returns nonzero if there's no longer any need to
process the main body.  */
@@ -503,7 +527,7 @@ maybe_clone_body (tree fn)
   DECL_VISIBILITY (clone) = DECL_VISIBILITY (fn);
   DECL_VISIBILITY_SPECIFIED (clone) = DECL_VISIBILITY_SPECIFIED (fn);
   DECL_DLLIMPORT_P (clone) = DECL_DLLIMPORT_P (fn);
-  DECL_ATTRIBUTES (clone) = copy_list (DECL_ATTRIBUTES (fn));
+  DECL_ATTRIBUTES (clone) = clone_attrs (DECL_ATTRIBUTES (fn));
   DECL_DISREGARD_INLINE_LIMITS (clone) = DECL_DISREGARD_INLINE_LIMITS (fn);
   set_decl_section_name (clone, fn);
 
diff --git a/gcc/testsuite/g++.dg/cpp1z/nodiscard-inh1.C 
b/gcc/testsuite/g++.dg/cpp1z/nodiscard-inh1.C
new file mode 100644
index 000..bc2555930f1
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1z/nodiscard-inh1.C
@@ -0,0 +1,15 @@
+// [[nodiscard]] should apply to inherited constructors.
+// { dg-do compile { target c++11 } }
+
+struct A {
+  [[nodiscard]] A(int);
+};
+
+struct B: A {
+  using A::A;
+};
+
+int main()
+{
+  B(42);   // { dg-warning nodiscard }
+}

base-commit: a1c6e9631ca33990a2b7411060ca4d18db081a7d
-- 
2.39.3



Re: [PATCH V2] RISC-V: Fix bug of pre-calculated const vector mask

2023-06-27 Thread Kito Cheng via Gcc-patches
I mean the difference between v1 and v2 patch

On Wed, Jun 28, 2023 at 12:09 PM Jeff Law  wrote:
>
>
>
> On 6/27/23 21:16, Kito Cheng wrote:
> > Do you mind giving some comments about what the difference between the
> > two versions?
> And I'd like a before/after assembly code with the example in the commit
> message.  I didn't see the same behavior when I tried it earlier today
> and ran out of time to dig into it further.
>
> Juzhe -- most folks wait ~1wk to ping patches, even codegen bugfixes.
> Pinging this fast runs the risk of irritating others.  Please be patient.
>
> Jeff


Re: Re: [PATCH V2] RISC-V: Fix bug of pre-calculated const vector mask

2023-06-27 Thread juzhe.zh...@rivai.ai
The difference between v1 and v2 is the compact mask generation:

v1 : 
+rtx
+rvv_builder::compact_mask () const
+{
+  /* Use the container mode with SEW = 8 and LMUL = 1.  */
+  unsigned container_size
+= MAX (CEIL (npatterns (), 8), BYTES_PER_RISCV_VECTOR.to_constant () / 8);
+  machine_mode container_mode
+= get_vector_mode (QImode, container_size).require ();...

v2:
+rtx
+rvv_builder::get_compact_mask () const
+{
+  /* If TARGET_MIN_VLEN == 32, the minimum LMUL = 1/4.
+ Otherwise, the minimum LMUL = 1/8.  */
+  unsigned min_lmul = TARGET_MIN_VLEN == 32 ? 4 : 8;
+  unsigned min_container_size
+= BYTES_PER_RISCV_VECTOR.to_constant () / min_lmul;
+  unsigned container_size = MAX (CEIL (npatterns (), 8), min_container_size);
+  machine_mode container_mode
+= get_vector_mode (QImode, container_size).require ();...

The difference is that v1:
unsigned container_size = MAX (CEIL (npatterns (), 8), 
BYTES_PER_RISCV_VECTOR.to_constant () / 8);


v2: 

+  /* If TARGET_MIN_VLEN == 32, the minimum LMUL = 1/4.
+ Otherwise, the minimum LMUL = 1/8.  */
+  unsigned min_lmul = TARGET_MIN_VLEN == 32 ? 4 : 8;
+  unsigned min_container_size
+= BYTES_PER_RISCV_VECTOR.to_constant () / min_lmul;
+  unsigned container_size = MAX (CEIL (npatterns (), 8), min_container_size);




juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-06-28 14:01
To: Jeff Law
CC: juzhe.zh...@rivai.ai; gcc-patches; kito.cheng; palmer; palmer; Robin Dapp
Subject: Re: [PATCH V2] RISC-V: Fix bug of pre-calculated const vector mask
I mean the difference between v1 and v2 patch
 
On Wed, Jun 28, 2023 at 12:09 PM Jeff Law  wrote:
>
>
>
> On 6/27/23 21:16, Kito Cheng wrote:
> > Do you mind giving some comments about what the difference between the
> > two versions?
> And I'd like a before/after assembly code with the example in the commit
> message.  I didn't see the same behavior when I tried it earlier today
> and ran out of time to dig into it further.
>
> Juzhe -- most folks wait ~1wk to ping patches, even codegen bugfixes.
> Pinging this fast runs the risk of irritating others.  Please be patient.
>
> Jeff
 


Re: [PATCH] Fix collection and processing of autoprofile data for target libs

2023-06-27 Thread Richard Biener via Gcc-patches
On Tue, Jun 27, 2023 at 11:31 PM Eugene Rozenfeld via Gcc-patches
 wrote:
>
> cc1, cc1plus, and lto built during STAGEautoprofile need to be built with
> debug info since they are used to build target libs. -gtoggle was
> turning off debug info for this stage.
>
> create_gcov should be passed prev-gcc/cc1, prev-gcc/cc1plus, and prev-gcc/lto
> instead of stage1-gcc/cc1, stage1-gcc/cc1plus, and stage1-gcc/lto when
> processing profile data collected while building target libraries.
>
> Tested on x86_64-pc-linux-gnu.

OK.

Thanks,
Richard.

> ChangeLog:
>
> * Makefile.in: Remove -gtoggle for STAGEautoprofile
> * Makefile.tpl: Remove -gtoggle for STAGEautoprofile
>
> gcc/c/ChangeLog:
>
> * c/Make-lang.in: Pass correct stage cc1 when processing
> profile data collected while building target libraries
>
> gcc/cp/ChangeLog:
>
> * cp/Make-lang.in: Pass correct stage cc1plus when processing
> profile data collected while building target libraries
>
> gcc/lto/ChangeLog:
>
> * lto/Make-lang.in: Pass correct stage lto when processing
> profile data collected while building target libraries
> ---
>  Makefile.in  | 2 +-
>  Makefile.tpl | 2 +-
>  gcc/c/Make-lang.in   | 4 ++--
>  gcc/cp/Make-lang.in  | 4 ++--
>  gcc/lto/Make-lang.in | 2 +-
>  5 files changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/Makefile.in b/Makefile.in
> index b559454cc90..61e5faf550f 100644
> --- a/Makefile.in
> +++ b/Makefile.in
> @@ -635,7 +635,7 @@ STAGEtrain_TFLAGS = $(filter-out 
> -fchecking=1,$(STAGE3_TFLAGS))
>  STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
> -fprofile-reproducible=parallel-runs
>  STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
>
> -STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g
> +STAGEautoprofile_CFLAGS = $(filter-out -gtoggle,$(STAGE2_CFLAGS)) -g
>  STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
>
>  STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS)
> diff --git a/Makefile.tpl b/Makefile.tpl
> index 6bcee3021c9..3a5b7ed3c92 100644
> --- a/Makefile.tpl
> +++ b/Makefile.tpl
> @@ -558,7 +558,7 @@ STAGEtrain_TFLAGS = $(filter-out 
> -fchecking=1,$(STAGE3_TFLAGS))
>  STAGEfeedback_CFLAGS = $(STAGE4_CFLAGS) -fprofile-use 
> -fprofile-reproducible=parallel-runs
>  STAGEfeedback_TFLAGS = $(STAGE4_TFLAGS)
>
> -STAGEautoprofile_CFLAGS = $(STAGE2_CFLAGS) -g
> +STAGEautoprofile_CFLAGS = $(filter-out -gtoggle,$(STAGE2_CFLAGS)) -g
>  STAGEautoprofile_TFLAGS = $(STAGE2_TFLAGS)
>
>  STAGEautofeedback_CFLAGS = $(STAGE3_CFLAGS)
> diff --git a/gcc/c/Make-lang.in b/gcc/c/Make-lang.in
> index 20840aceab6..79bc0dfd1cf 100644
> --- a/gcc/c/Make-lang.in
> +++ b/gcc/c/Make-lang.in
> @@ -113,10 +113,10 @@ create_fdas_for_cc1: ../stage1-gcc/cc1$(exeext) 
> ../prev-gcc/$(PERF_DATA)
>   echo $$perf_path; \
>   if [ -f $$perf_path ]; then \
> profile_name=cc1_$$component_in_prev_target.fda; \
> -   $(CREATE_GCOV) -binary ../stage1-gcc/cc1$(exeext) -gcov 
> $$profile_name -profile $$perf_path -gcov_version 2; \
> +   $(CREATE_GCOV) -binary ../prev-gcc/cc1$(exeext) -gcov 
> $$profile_name -profile $$perf_path -gcov_version 2; \
>   fi; \
> done;
> -#
>
> +#
>  # Build hooks:
>
>  c.info:
> diff --git a/gcc/cp/Make-lang.in b/gcc/cp/Make-lang.in
> index c08ee91447e..ba5e8766e99 100644
> --- a/gcc/cp/Make-lang.in
> +++ b/gcc/cp/Make-lang.in
> @@ -211,10 +211,10 @@ create_fdas_for_cc1plus: ../stage1-gcc/cc1plus$(exeext) 
> ../prev-gcc/$(PERF_DATA)
>   echo $$perf_path; \
>   if [ -f $$perf_path ]; then \
> profile_name=cc1plus_$$component_in_prev_target.fda; \
> -   $(CREATE_GCOV) -binary ../stage1-gcc/cc1plus$(exeext) -gcov 
> $$profile_name -profile $$perf_path -gcov_version 2; \
> +   $(CREATE_GCOV) -binary ../prev-gcc/cc1plus$(exeext) -gcov 
> $$profile_name -profile $$perf_path -gcov_version 2; \
>   fi; \
> done;
> -#
>
> +#
>  # Build hooks:
>
>  c++.all.cross: g++-cross$(exeext)
> diff --git a/gcc/lto/Make-lang.in b/gcc/lto/Make-lang.in
> index 4f6025100a3..98aa9f4cc39 100644
> --- a/gcc/lto/Make-lang.in
> +++ b/gcc/lto/Make-lang.in
> @@ -130,7 +130,7 @@ create_fdas_for_lto1: ../stage1-gcc/lto1$(exeext) 
> ../prev-gcc/$(PERF_DATA)
>   echo $$perf_path; \
>   if [ -f $$perf_path ]; then \
> profile_name=lto1_$$component_in_prev_target.fda; \
> -   $(CREATE_GCOV) -binary ../stage1-gcc/lto1$(exeext) -gcov 
> $$profile_name -profile $$perf_path -gcov_version 2; \
> +   $(CREATE_GCOV) -binary ../prev-gcc/lto1$(exeext) -gcov 
> $$profile_name -profile $$perf_path -gcov_version 2; \
>   fi; \
> done;
>
> --
> 2.25.1
>


Re: [PATCH] i386: Relax inline requirement for functions with different target attrs

2023-06-27 Thread Uros Bizjak via Gcc-patches
On Wed, Jun 28, 2023 at 3:56 AM Hongyu Wang  wrote:
>
> > I don't think this is desirable. If we inline something with different
> > ISAs, we get some strange mix of ISAs when the function is inlined.
> > OTOH - we already inline with mismatched tune flags if the function is
> > marked with always_inline.
>
> Previously ix86_can_inline_p has
>
> if (((caller_opts->x_ix86_isa_flags & callee_opts->x_ix86_isa_flags)
>  != callee_opts->x_ix86_isa_flags)
> || ((caller_opts->x_ix86_isa_flags2 & callee_opts->x_ix86_isa_flags2)
> != callee_opts->x_ix86_isa_flags2))
>   ret = false;
>
> It make sure caller ISA is a super set of callee, and the inlined one
> should follow caller's ISA specification.
>
> IMHO I cannot give a real example that after inline the caller's
> performance get harmed, I added PVW since there might
> be some callee want to limit its vector size and caller may have
> larger preferred vector size. At least with current change
> we get more optimization opportunity for different target_clones.
>
> But I agree the tuning setting may be a factor that affect the
> performance. One possible choice is that if the
> tune for callee is unspecified or default, just inline it to the
> caller with specified arch and tune.

If the user specified a different arch for callee than the caller,
then the compiler will switch on different ISAs (-march is just a
shortcut for different ISA packs), and the programmer is aware that
inlining isn't intended here (we have -mtune, which is not as strong
as -march, but even functions with different -mtune are not inlined
without always_inline attribute). This is documented as:

--q--
On the x86, the inliner does not inline a function that has different
target options than the caller, unless the callee has a subset of the
target options of the caller. For example a function declared with
target("sse3") can inline a function with target("sse2"), since -msse3
implies -msse2.
--/q--

I don't think arch=skylake can be considered as a subset of arch=icelake-server.

I agree that the compiler should reject functions with different PVW.
This is also in accordance with the documentation.

Uros.

>
> Uros Bizjak via Gcc-patches  于2023年6月27日周二 17:16写道:
>
>
>
> >
> > On Mon, Jun 26, 2023 at 4:36 AM Hongyu Wang  wrote:
> > >
> > > Hi,
> > >
> > > For function with different target attributes, current logic rejects to
> > > inline the callee when any arch or tune is mismatched. Relax the
> > > condition to honor just prefer_vecotr_width_type and other flags that
> > > may cause safety issue so caller can get more optimization opportunity.
> >
> > I don't think this is desirable. If we inline something with different
> > ISAs, we get some strange mix of ISAs when the function is inlined.
> > OTOH - we already inline with mismatched tune flags if the function is
> > marked with always_inline.
> >
> > Uros.
> >
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,}
> > >
> > > Ok for trunk?
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386.cc (ix86_can_inline_p): Do not check arch or
> > > tune directly, just check prefer_vector_width_type and make sure
> > > not to inline if they mismatch.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/inline-target-attr.c: New test.
> > > ---
> > >  gcc/config/i386/i386.cc   | 11 +
> > >  .../gcc.target/i386/inline-target-attr.c  | 24 +++
> > >  2 files changed, 30 insertions(+), 5 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > >
> > > diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > index 0761965344b..1d86384ac06 100644
> > > --- a/gcc/config/i386/i386.cc
> > > +++ b/gcc/config/i386/i386.cc
> > > @@ -605,11 +605,12 @@ ix86_can_inline_p (tree caller, tree callee)
> > >!= (callee_opts->x_target_flags & 
> > > ~always_inline_safe_mask))
> > >  ret = false;
> > >
> > > -  /* See if arch, tune, etc. are the same.  */
> > > -  else if (caller_opts->arch != callee_opts->arch)
> > > -ret = false;
> > > -
> > > -  else if (!always_inline && caller_opts->tune != callee_opts->tune)
> > > +  /* Do not inline when specified perfer-vector-width mismatched between
> > > + callee and caller.  */
> > > +  else if ((callee_opts->x_prefer_vector_width_type != PVW_NONE
> > > +  && caller_opts->x_prefer_vector_width_type != PVW_NONE)
> > > +  && callee_opts->x_prefer_vector_width_type
> > > + != caller_opts->x_prefer_vector_width_type)
> > >  ret = false;
> > >
> > >else if (caller_opts->x_ix86_fpmath != callee_opts->x_ix86_fpmath
> > > diff --git a/gcc/testsuite/gcc.target/i386/inline-target-attr.c 
> > > b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > new file mode 100644
> > > index 000..995502165f0
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/i386/inline-target-attr.c
> > > @@ -0,0 +1

Re: [r14-2117 Regression] FAIL: gcc.dg/vect/slp-46.c scan-tree-dump-times vect "vectorizing stmts using SLP" 4 on Linux/x86_64

2023-06-27 Thread Richard Biener via Gcc-patches
On Wed, 28 Jun 2023, haochen.jiang wrote:

> On Linux/x86_64,
> 
> dd86a5a69cbda40cf76388a65d3317c91cb2b501 is the first bad commit
> commit dd86a5a69cbda40cf76388a65d3317c91cb2b501
> Author: Richard Biener 
> Date:   Thu Jun 22 11:40:46 2023 +0200
> 
> tree-optimization/96208 - SLP of non-grouped loads
> 
> caused
> 
> FAIL: gcc.dg/vect/slp-46.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 4
> FAIL: gcc.dg/vect/slp-46.c scan-tree-dump-times vect "vectorizing stmts using 
> SLP" 4
> 
> with GCC configured with
> 
> ../../gcc/configure 
> --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-2117/usr 
> --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
> --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
> --enable-libmpx x86_64-linux --disable-bootstrap
> 
> To reproduce:
> 
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="vect.exp=gcc.dg/vect/slp-46.c --target_board='unix{-m32\ 
> -march=cascadelake}'"
> $ cd {build_dir}/gcc && make check 
> RUNTESTFLAGS="vect.exp=gcc.dg/vect/slp-46.c --target_board='unix{-m64\ 
> -march=cascadelake}'"
> 
> (Please do not reply to this email, for question about this report, contact 
> me at haochen dot jiang at intel.com)

I have opened PR110445 for the missed optimization.