date:20230824

Re: [PATCH V2] RISC-V: Add conditional autovec convert(INT<->INT) patterns

2023-08-24 Thread Robin Dapp via Gcc-patches

Hi Lehua,

thanks, LGTM.

One thing maybe for the next patches:  It seems to me that we lump all of
the COND_... tests into the cond subdirectory when IMHO they would also
fit into the respective directories of their operations (binop, unop etc).
Right now we will have a lot of rather unrelated tests (or just related
by their use of COND_) in one dir.  What do you think?  

Regards
 Robin

Re: [PATCH 3/3] PHIOPT: Allow BIT_AND and BIT_IOR in early phiopt

2023-08-24 Thread Richard Biener via Gcc-patches

On Thu, Aug 24, 2023 at 9:16 PM Andrew Pinski via Gcc-patches
 wrote:
>
> Now that MIN/MAX can sometimes be transformed into BIT_AND/BIT_IOR,
> we should allow BIT_AND and BIT_IOR in the early phiopt.
> Also we produce BIT_AND/BIT_IOR for things like `bool0 ? bool1 : 0`
> which seems like a good thing to allow early on too.

Hum.

I think if we allow AND/IOR we should also allow XOR and NOT.

Can you add dumping for replacements we disallow?  I'm esp. curious
for those otherwise being "singleton".  I know when doing early phiopt
I wanted to be very conservative (also to reduce testsuite fallout), and
I was mostly interested in MIN/MAX which I then extended to similar
things like ABS.  But maybe we can revisit this if we understand which
cases we definitely do not want to do early?

> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
> * tree-ssa-phiopt.cc (phiopt_early_allow): Allow
> BIT_AND_EXPR and BIT_IOR_EXPR.
> ---
>  gcc/tree-ssa-phiopt.cc | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 54706f4c7e7..7e63fb115db 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -469,6 +469,9 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op)
>  {
>case MIN_EXPR:
>case MAX_EXPR:
> +  /* MIN/MAX could be convert into these. */
> +  case BIT_IOR_EXPR:
> +  case BIT_AND_EXPR:
>case ABS_EXPR:
>case ABSU_EXPR:
>case NEGATE_EXPR:
> --
> 2.31.1
>

[PATCH-2, rs6000] Implement 32bit inline lrint [PR88558]

2023-08-24 Thread HAO CHEN GUI via Gcc-patches

Hi,
  This patch implements 32bit inline lrint by "fctiw". It depends on
the patch1 to do SImode move from FP register on P7.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.

Thanks
Gui Haochen

ChangeLog
rs6000: support 32bit inline lrint

gcc/
PR target/88558
* config/rs6000/rs6000.md (lrintdi2): Remove TARGET_FPRND
from insn condition.
(lrintsi2): New insn pattern for 32bit lrint.

gcc/testsuite/
PR target/106769
* gcc.target/powerpc/pr88558.h: New.
* gcc.target/powerpc/pr88558-p7.c: New.
* gcc.target/powerpc/pr88558-p8v.c: New.

patch.diff
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index fd263e8dfe3..b36304de8c6 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6655,10 +6655,18 @@ (define_insn "lrintdi2"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d")
(unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
   UNSPEC_FCTID))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT"
   "fctid %0,%1"
   [(set_attr "type" "fp")])

+(define_insn "lrintsi2"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
+   (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
+  UNSPEC_FCTIW))]
+  "TARGET_HARD_FLOAT && TARGET_POPCNTD"
+  "fctiw %0,%1"
+  [(set_attr "type" "fp")])
+
 (define_insn "btrunc2"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" "d,wa")]
diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c 
b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
new file mode 100644
index 000..6437c55fa61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p7.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=power7" } */
+
+#include "pr88558.h"
+
+/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mfctid\M} 1 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mfctiw\M} 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mfctiw\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mstfiwx\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558-p8v.c 
b/gcc/testsuite/gcc.target/powerpc/pr88558-p8v.c
new file mode 100644
index 000..fd22123ffb6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr88558-p8v.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O2 -fno-math-errno -mdejagnu-cpu=power8" } */
+
+long int foo (double a)
+{
+  return __builtin_lrint (a);
+}
+
+long long bar (double a)
+{
+  return __builtin_llrint (a);
+}
+
+int baz (double a)
+{
+  return __builtin_irint (a);
+}
+
+/* { dg-final { scan-assembler-times {\mfctid\M} 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mfctid\M} 1 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mfctiw\M} 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mfctiw\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\mmfvsrwz\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr88558.h 
b/gcc/testsuite/gcc.target/powerpc/pr88558.h
new file mode 100644
index 000..0cc0c68dd4e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr88558.h
@@ -0,0 +1,14 @@
+long int foo (double a)
+{
+  return __builtin_lrint (a);
+}
+
+long long bar (double a)
+{
+  return __builtin_llrint (a);
+}
+
+int baz (double a)
+{
+  return __builtin_irint (a);
+}

[PATCH-1, rs6000] Enable SImode in FP register on P7 [PR88558]

2023-08-24 Thread HAO CHEN GUI via Gcc-patches

Hi,
  This patch enables SImode in FP register on P7. Instruction "fctiw"
stores its integer output in an FP register. So SImode in FP register
needs be enabled on P7 if we want support "fctiw" on P7.

  The test case is in the second patch which implements 32bit inline
lrint.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.

Thanks
Gui Haochen

ChangeLog
rs6000: enable SImode in FP register on P7

gcc/
PR target/88558
* config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached):
Enable Simode in FP register for P7.
* config/rs6000/rs6000.md (*movsi_internal1): Add fmr for SImode
move between FP register.  Set attribute isa of stfiwx to "*"
and attribute of stxsiwx to "p7".

patch.diff
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 44b448d2ba6..99085c2cdd7 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1903,7 +1903,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, 
machine_mode mode)
  if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD)
return 1;

- if (TARGET_P8_VECTOR && (mode == SImode))
+ if (TARGET_POPCNTD && mode == SImode)
return 1;

  if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index cdab49fbb91..ac5d29a2cf8 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7566,7 +7566,7 @@ (define_split

 (define_insn "*movsi_internal1"
   [(set (match_operand:SI 0 "nonimmediate_operand"
- "=r, r,
+ "=r, r,  ^d,
   r,  d,  v,
   m,  ?Z, ?Z,
   r,  r,  r,  r,
@@ -7575,7 +7575,7 @@ (define_insn "*movsi_internal1"
   wa, r,
   r,  *h, *h")
(match_operand:SI 1 "input_operand"
- "r,  U,
+ "r,  U,  ^d,
   m,  ?Z, ?Z,
   r,  d,  v,
   I,  L,  eI, n,
@@ -7588,6 +7588,7 @@ (define_insn "*movsi_internal1"
   "@
mr %0,%1
la %0,%a1
+   fmr %0,%1
lwz%U1%X1 %0,%1
lfiwzx %0,%y1
lxsiwzx %x0,%y1
@@ -7611,7 +7612,7 @@ (define_insn "*movsi_internal1"
mt%0 %1
nop"
   [(set_attr "type"
- "*,  *,
+ "*,  *,  fpsimple,
   load,   fpload, fpload,
   store,  fpstore,fpstore,
   *,  *,  *,  *,
@@ -7620,7 +7621,7 @@ (define_insn "*movsi_internal1"
   mtvsr,  mfvsr,
   *,  *,  *")
(set_attr "length"
- "*,  *,
+ "*,  *,  *,
   *,  *,  *,
   *,  *,  *,
   *,  *,  *,  8,
@@ -7629,9 +7630,9 @@ (define_insn "*movsi_internal1"
   *,  *,
   *,  *,  *")
(set_attr "isa"
- "*,  *,
-  *,  p8v,p8v,
-  *,  p8v,p8v,
+ "*,  *,  *,
+  *,  p7, p8v,
+  *,  *,  p8v,
   *,  *,  p10,*,
   p8v,p9v,p9v,p8v,
   p9v,p8v,p9v,

Re: [PATCH 1/3] MATCH: Move `a ? one_zero : one_zero` matching after min/max matching

2023-08-24 Thread Richard Biener via Gcc-patches

On Thu, Aug 24, 2023 at 9:16 PM Andrew Pinski via Gcc-patches
 wrote:
>
> In PR 106677, I noticed that on the trunk we were producing:
> ```
>   _25 = SR.116_117 == 0;
>   _27 = (unsigned char) _25;
>   _32 = _27 | SR.116_117;
> ```
> From `SR.115_117 != 0 ? SR.115_117 : 1`
> Rather than:
> ```
>   _119 = MAX_EXPR <1, SR.115_117>;
> ```
> Or (rather)
> ```
>   _119 = SR.115_117 | 1;
> ```
> Due to the order of the patterns.

Hmm, that means the former when present in source isn't optimized?

> OK? Bootstrapped and tested on x86_64-linux-gnu with no
> regressions.

OK, but please add a comment indicating the ordering requirement.

Can you also add a testcase?

Richard.

> gcc/ChangeLog:
>
> * match.pd (`a ? one_zero : one_zero`): Move
> below detection of minmax.
> ---
>  gcc/match.pd | 38 --
>  1 file changed, 20 insertions(+), 18 deletions(-)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 890f050cbad..c87a0795667 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -4950,24 +4950,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   )
>  )
>
> -(simplify
> - (cond @0 zero_one_valued_p@1 zero_one_valued_p@2)
> - (switch
> -  /* bool0 ? bool1 : 0 -> bool0 & bool1 */
> -  (if (integer_zerop (@2))
> -   (bit_and (convert @0) @1))
> -  /* bool0 ? 0 : bool2 -> (bool0^1) & bool2 */
> -  (if (integer_zerop (@1))
> -   (bit_and (bit_xor (convert @0) { build_one_cst (type); } ) @2))
> -  /* bool0 ? 1 : bool2 -> bool0 | bool2 */
> -  (if (integer_onep (@1))
> -   (bit_ior (convert @0) @2))
> -  /* bool0 ? bool1 : 1 -> (bool0^1) | bool1 */
> -  (if (integer_onep (@2))
> -   (bit_ior (bit_xor (convert @0) @2) @1))
> - )
> -)
> -
>  /* Optimize
> # x_5 in range [cst1, cst2] where cst2 = cst1 + 1
> x_5 ? cstN ? cst4 : cst3
> @@ -5298,6 +5280,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>&& integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node, @3, 
> @1)))
>(max @2 @4))
>
> +#if GIMPLE
> +(simplify
> + (cond @0 zero_one_valued_p@1 zero_one_valued_p@2)
> + (switch
> +  /* bool0 ? bool1 : 0 -> bool0 & bool1 */
> +  (if (integer_zerop (@2))
> +   (bit_and (convert @0) @1))
> +  /* bool0 ? 0 : bool2 -> (bool0^1) & bool2 */
> +  (if (integer_zerop (@1))
> +   (bit_and (bit_xor (convert @0) { build_one_cst (type); } ) @2))
> +  /* bool0 ? 1 : bool2 -> bool0 | bool2 */
> +  (if (integer_onep (@1))
> +   (bit_ior (convert @0) @2))
> +  /* bool0 ? bool1 : 1 -> (bool0^1) | bool1 */
> +  (if (integer_onep (@2))
> +   (bit_ior (bit_xor (convert @0) @2) @1))
> + )
> +)
> +#endif
> +
>  /* X != C1 ? -X : C2 simplifies to -X when -C1 == C2.  */
>  (simplify
>   (cond (ne @0 INTEGER_CST@1) (negate@3 @0) INTEGER_CST@2)
> --
> 2.31.1
>

Re: [PATCH 2/3] MATCH: `a | C -> C` when we know that `a & ~C == 0`

2023-08-24 Thread Richard Biener via Gcc-patches

On Thu, Aug 24, 2023 at 9:16 PM Andrew Pinski via Gcc-patches
 wrote:
>
> Even though this is handled by other code inside both VRP and CCP,
> sometimes we want to optimize this outside of VRP and CCP.
> An example is given in PR 106677 where phiopt will happen
> after VRP (which removes a cast for a comparison) and then
> phiopt will optimize the phi to be `a | 1` which can then
> be optimized to `1` due to this patch.

Also works for xor, no?

> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK with or without adding XOR.

Richard.

> Note Similar code already exists in simplify_rtx for the RTL level;
> it was moved from combine to simplify_rtx in r0-72539-gbd1ef757767f6d.
> gcc/ChangeLog:
>
> * match.pd (`a | C -> C`): New pattern.
> ---
>  gcc/match.pd | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index c87a0795667..3bbeceb37b4 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1456,6 +1456,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
>&& wi::bit_and_not (get_nonzero_bits (@0), wi::to_wide (@1)) == 0)
>@0))
> +/* x | C -> C if we know that x & ~C == 0.  */
> +(simplify
> + (bit_ior SSA_NAME@0 INTEGER_CST@1)
> + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +  && wi::bit_and_not (get_nonzero_bits (@0), wi::to_wide (@1)) == 0)
> +  @1))
>  #endif
>
>  /* ~(~X - Y) -> X + Y and ~(~X + Y) -> X - Y.  */
> --
> 2.31.1
>

Re: [PATCH v1] LoongArch: Remove the symbolic extension instruction due to the SLT directive.

2023-08-24 Thread chenglulu




在 2023/8/25 下午12:16, WANG Xuerui 写道:

On 8/25/23 12:01, Lulu Cheng wrote:
Since the slt instruction does not distinguish between 32-bit and 
64-bit operations
under the LoongArch 64-bit architecture, if the operands of slt are 
of SImode, symbol

expansion is required before operation.
Hint:“符号扩展” is "sign extension" (as noun) or "sign-extend" (as verb), 
not "symbol expansion".


But similar to the following test case, symbol expansion can be omitted:

extern int src1, src2, src3;

int
test (void)
{
  int data1 = src1 + src2;
  int data2 = src1 + src3;
  return test1 > test2 ? test1 : test2;
}
Assembly code before optimization:
  ...
add.w    $r4,$r4,$r14
add.w    $r13,$r13,$r14
slli.w    $r12,$r4,0
slli.w    $r14,$r13,0
slt    $r12,$r12,$r14
masknez    $r4,$r4,$r12
maskeqz    $r12,$r13,$r12
or    $r4,$r4,$r12
slli.w    $r4,$r4,0
...

After optimization:
...
add.w    $r12,$r12,$r14
add.w    $r13,$r13,$r14
slt    $r4,$r12,$r13
masknez    $r12,$r12,$r4
maskeqz    $r4,$r13,$r4
or    $r4,$r12,$r4
...

Similar to this test example, the two operands of SLT are obtained by 
the

addition operation, and the addition operation "add.w" is an implicit
symbolic extension function, so the two operands of SLT do not require


more naturally: "and add.w implicitly sign-extends" -- brevity are 
often desired and clearer ;-)


Sorry I'll revise it soon!

Thanks!:-)




symbolic expansion.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_expand_conditional_move):
Optimize the function implementation.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/slt-sign-extend.c: New test.
---
  gcc/config/loongarch/loongarch.cc | 53 +--
  .../gcc.target/loongarch/slt-sign-extend.c    | 14 +
  2 files changed, 63 insertions(+), 4 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/slt-sign-extend.c


diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc

index 86d58784113..1905599b9e8 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4384,14 +4384,30 @@ loongarch_expand_conditional_move (rtx 
*operands)

    enum rtx_code code = GET_CODE (operands[1]);
    rtx op0 = XEXP (operands[1], 0);
    rtx op1 = XEXP (operands[1], 1);
+  rtx op0_extend = op0;
+  rtx op1_extend = op1;
+
+  /* Record whether operands[2] and operands[3] modes are promoted 
to word_mode.  */

+  bool promote_p = false;
+  machine_mode mode = GET_MODE (operands[0]);
      if (FLOAT_MODE_P (GET_MODE (op1)))
  loongarch_emit_float_compare (&code, &op0, &op1);
    else
  {
+  if ((REGNO (op0) == REGNO (operands[2])
+   || (REGNO (op1) == REGNO (operands[3]) && (op1 != const0_rtx)))
+  && (GET_MODE_SIZE (GET_MODE (op0)) < word_mode))
+    {
+  mode = word_mode;
+  promote_p = true;
+    }
+
    loongarch_extend_comparands (code, &op0, &op1);
      op0 = force_reg (word_mode, op0);
+  op0_extend = op0;
+  op1_extend = force_reg (word_mode, op1);
      if (code == EQ || code == NE)
  {
@@ -4418,23 +4434,52 @@ loongarch_expand_conditional_move (rtx 
*operands)

    && register_operand (operands[2], VOIDmode)
    && register_operand (operands[3], VOIDmode))
  {
-  machine_mode mode = GET_MODE (operands[0]);
+  rtx op2 = operands[2];
+  rtx op3 = operands[3];
+
+  if (promote_p)
+    {
+  if (REGNO (XEXP (operands[1], 0)) == REGNO (operands[2]))
+    op2 = op0_extend;
+  else
+    {
+  loongarch_extend_comparands (code, &op2, &const0_rtx);
+  op2 = force_reg (mode, op2);
+    }
+
+  if (REGNO (XEXP (operands[1], 1)) == REGNO (operands[3]))
+    op3 = op1_extend;
+  else
+    {
+  loongarch_extend_comparands (code, &op3, &const0_rtx);
+  op3 = force_reg (mode, op3);
+    }
+    }
+
    rtx temp = gen_reg_rtx (mode);
    rtx temp2 = gen_reg_rtx (mode);
      emit_insn (gen_rtx_SET (temp,
    gen_rtx_IF_THEN_ELSE (mode, cond,
-    operands[2], const0_rtx)));
+    op2, const0_rtx)));
      /* Flip the test for the second operand.  */
    cond = gen_rtx_fmt_ee ((code == EQ) ? NE : EQ, GET_MODE 
(op0), op0, op1);

      emit_insn (gen_rtx_SET (temp2,
    gen_rtx_IF_THEN_ELSE (mode, cond,
-    operands[3], const0_rtx)));
+    op3, const0_rtx)));
      /* Merge the two results, at least one is guaranteed to be 
zero.  */
-  emit_insn (gen_rtx_SET (operands[0], gen_rtx_IOR (mode, temp, 
temp2)));

+  if (promote_p)
+    {
+  rtx temp3 = gen_reg_rtx (mode);
+  emit_insn (gen_rtx_SET (temp3, gen_rtx_IOR (mode, temp, temp2)));
+  temp3 = gen_lowpart (GET_MODE (operands[0]), temp3);
+  loongarch_emit_move (opera

Re: [PING][PATCH] LoongArch: initial ada support on linux

2023-08-24 Thread Yujie Yang

Hi!

I'd like to ping this patch for acknowledgement from the Ada team.

We have successfully compiled a cross-native toolchain with Ada enabled
for loongarch64-linux-gnuf64 (or loongarch64-linux-gnu), and have run the
regtests with the following results:

While the failures are being worked on, we would like to merge this patch
first so we can have basic ada support for debian test-builds.

=== gnat Summary ===

# of expected passes3376
# of unexpected failures1
# of expected failures  23
# of unsupported tests  25

FAIL: gnat.dg/prot7.adb (test for excess errors)

=== acats Summary ===
# of expected passes2325
# of unexpected failures3

*** FAILURES: c35503d c35503f c4a007a

Sincerely,
Yujie

[PATCH v2] RISC-V: Enable Hoist to GCSE simple constants

2023-08-24 Thread Vineet Gupta

Hoist want_to_gcse_p () calls rtx_cost () to compute max distance for
hoist candidates. For a simple const (say 6 which needs seperate insn "LI 6")
backend currently returns 0, causing Hoist to bail and elide GCSE.

Note that constants requiring more than 1 insns to setup were working
fine since riscv_rtx_costs () was returning non-zero (although that
itself might need refining: see bugzilla 39).

To keep testsuite parity, some V tests need updating which started failing
in the new costing regime.

gcc/ChangeLog:
* gcc/config/riscv.cc (riscv_rtx_costs): Adjust const_int cost.
Add some comments about different constants handling.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/gcse-const.c: New Test
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c: Remove test
  for Jump.
* gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c: Ditto.

Signed-off-by: Vineet Gupta 
---
Changes since v1
  - Simplified code under case CONST.
  - Added some comments for handling of CONST_INT in 2 places.
---
 gcc/config/riscv/riscv.cc  | 18 +-
 gcc/testsuite/gcc.target/riscv/gcse-const.c| 13 +
 .../riscv/rvv/vsetvl/vlmax_conflict-7.c|  1 -
 .../riscv/rvv/vsetvl/vlmax_conflict-8.c|  1 -
 4 files changed, 22 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/gcse-const.c

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 13166d19619c..98a46b00ceb5 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2490,6 +2490,8 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   switch (GET_CODE (x))
 {
 case CONST_INT:
+  /* trivial constants checked using OUTER_CODE in case they are
+encodable in insn itself w/o need for additional insn(s).  */
   if (riscv_immediate_operand_p (outer_code, INTVAL (x)))
{
  *total = 0;
@@ -2507,17 +2509,15 @@ riscv_rtx_costs (rtx x, machine_mode mode, int 
outer_code, int opno ATTRIBUTE_UN
   /* Fall through.  */
 
 case CONST:
+  /* Non trivial CONST_INT Fall through: check if need multiple insns.  */
   if ((cost = riscv_const_insns (x)) > 0)
{
- /* If the constant is likely to be stored in a GPR, SETs of
-single-insn constants are as cheap as register sets; we
-never want to CSE them.  */
- if (cost == 1 && outer_code == SET)
-   *total = 0;
- /* When we load a constant more than once, it usually is better
-to duplicate the last operation in the sequence than to CSE
-the constant itself.  */
- else if (outer_code == SET || GET_MODE (x) == VOIDmode)
+ /* 1. Hoist will GCSE constants only if TOTAL returned is non-zero.
+2. For constants loaded more than once, the approach so far has
+   been to duplicate the operation than to CSE the constant.
+3. TODO: make cost more accurate specially if riscv_const_insns
+   returns > 1.  */
+ if (outer_code == SET || GET_MODE (x) == VOIDmode)
*total = COSTS_N_INSNS (1);
}
   else /* The instruction will be fetched from the constant pool.  */
diff --git a/gcc/testsuite/gcc.target/riscv/gcse-const.c 
b/gcc/testsuite/gcc.target/riscv/gcse-const.c
new file mode 100644
index ..b04707ce9745
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/gcse-const.c
@@ -0,0 +1,13 @@
+/* Slightly modified copy of gcc.target/arm/pr40956.c.  */
+/* { dg-options "-Os" }  */
+/* Make sure the constant "6" is loaded into register only once.  */
+/* { dg-final { scan-assembler-times "\tli.*6" 1 } } */
+
+int foo(int p, int* q)
+{
+  if (p!=9)
+*q = 6;
+  else
+*(q+1) = 6;
+  return 3;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
index 60ad108666f8..085ca9db8542 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-7.c
@@ -21,5 +21,4 @@ void f (int32_t * restrict in, int32_t * restrict out, size_t 
n, size_t cond, si
 }
 
 /* { dg-final { scan-assembler-times {vsetvli} 4 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
-/* { dg-final { scan-assembler-times {j\s+\.L[0-9]+\s+\.L[0-9]+:\s+vlm\.v} 1 { 
target { no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
 /* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*m8,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_conflict-8.c
index 7b9574cc33

Re: RISC-V: Fix stack_save_restore_1/2 test cases

2023-08-24 Thread Vineet Gupta


Hi Jivan,

On 8/24/23 08:45, Jivan Hakobyan via Gcc-patches wrote:

This patch fixes failing stack_save_restore_1/2 test cases.
After 6619b3d4c15c commit size of the frame was changed.


gcc/testsuite/ChangeLog:
 * gcc.target/riscv/stack_save_restore_1.c: Update frame size
 * gcc.target/riscv/stack_save_restore_2.c: Likewise.


Do you mind sending your patches inline using git send-email or some such ?

Thx,
-Vineet

[PING][PATCH] LoongArch: initial ada support on linux

2023-08-24 Thread Yang Yujie

gcc/ChangeLog:

* ada/Makefile.rtl: Add LoongArch support.
* ada/libgnarl/s-linux__loongarch.ads: New.
* ada/libgnat/system-linux-loongarch.ads: New.
* config/loongarch/loongarch.h: mark normalized options
passed from driver to gnat1 as explicit for multilib.
---
 gcc/ada/Makefile.rtl   |  49 +++
 gcc/ada/libgnarl/s-linux__loongarch.ads| 134 +++
 gcc/ada/libgnat/system-linux-loongarch.ads | 145 +
 gcc/config/loongarch/loongarch.h   |   4 +-
 4 files changed, 330 insertions(+), 2 deletions(-)
 create mode 100644 gcc/ada/libgnarl/s-linux__loongarch.ads
 create mode 100644 gcc/ada/libgnat/system-linux-loongarch.ads

diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
index 96306f8cc9a..3f63c4893fd 100644
--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -2111,6 +2111,55 @@ ifeq ($(strip $(filter-out cygwin% mingw32% 
pe,$(target_os))),)
   LIBRARY_VERSION := $(LIB_VERSION)
 endif
 
+# LoongArch Linux
+ifeq ($(strip $(filter-out loongarch% linux%,$(target_cpu) $(target_os))),)
+  LIBGNAT_TARGET_PAIRS = \
+  a-exetim.adbhttp://www.gnu.org/licenses/>.  --
+--  --
+--
+
+--  This is the LoongArch version of this package
+
+--  This package encapsulates cpu specific differences between implementations
+--  of GNU/Linux, in order to share s-osinte-linux.ads.
+
+--  PLEASE DO NOT add any with-clauses to this package or remove the pragma
+--  Preelaborate. This package is designed to be a bottom-level (leaf) package
+
+with Interfaces.C;
+with System.Parameters;
+
+package System.Linux is
+   pragma Preelaborate;
+
+   --
+   -- Time --
+   --
+
+   subtype int is Interfaces.C.int;
+   subtype longis Interfaces.C.long;
+   subtype suseconds_t is Interfaces.C.long;
+   type time_t is range -2 ** (System.Parameters.time_t_bits - 1)
+ .. 2 ** (System.Parameters.time_t_bits - 1) - 1;
+   subtype clockid_t   is Interfaces.C.int;
+
+   type timespec is record
+  tv_sec  : time_t;
+  tv_nsec : long;
+   end record;
+   pragma Convention (C, timespec);
+
+   type timeval is record
+  tv_sec  : time_t;
+  tv_usec : suseconds_t;
+   end record;
+   pragma Convention (C, timeval);
+
+   ---
+   -- Errno --
+   ---
+
+   EAGAIN: constant := 11;
+   EINTR : constant := 4;
+   EINVAL: constant := 22;
+   ENOMEM: constant := 12;
+   EPERM : constant := 1;
+   ETIMEDOUT : constant := 110;
+
+   -
+   -- Signals --
+   -
+
+   SIGHUP : constant := 1; --  hangup
+   SIGINT : constant := 2; --  interrupt (rubout)
+   SIGQUIT: constant := 3; --  quit (ASCD FS)
+   SIGILL : constant := 4; --  illegal instruction (not reset)
+   SIGTRAP: constant := 5; --  trace trap (not reset)
+   SIGIOT : constant := 6; --  IOT instruction
+   SIGABRT: constant := 6; --  used by abort, replace SIGIOT in the  future
+   SIGBUS : constant := 7; --  bus error
+   SIGFPE : constant := 8; --  floating point exception
+   SIGKILL: constant := 9; --  kill (cannot be caught or ignored)
+   SIGUSR1: constant := 10; --  user defined signal 1
+   SIGSEGV: constant := 11; --  segmentation violation
+   SIGUSR2: constant := 12; --  user defined signal 2
+   SIGPIPE: constant := 13; --  write on a pipe with no one to read it
+   SIGALRM: constant := 14; --  alarm clock
+   SIGTERM: constant := 15; --  software termination signal from kill
+   SIGSTKFLT  : constant := 16; --  coprocessor stack fault (Linux)
+   SIGCLD : constant := 17; --  alias for SIGCHLD
+   SIGCHLD: constant := 17; --  child status change
+   SIGCONT: constant := 18; --  stopped process has been continued
+   SIGSTOP: constant := 19; --  stop (cannot be caught or ignored)
+   SIGTSTP: constant := 20; --  user stop requested from tty
+   SIGTTIN: constant := 21; --  background tty read attempted
+   SIGTTOU: constant := 22; --  background tty write attempted
+   SIGURG : constant := 23; --  urgent condition on IO channel
+   SIGXCPU: constant := 24; --  CPU time limit exceeded
+   SIGXFSZ: constant := 25; --  filesize limit exceeded
+   SIGVTALRM  : constant := 26; --  virtual timer expired
+   SIGPROF: constant := 27; --  profiling timer expired
+   SIGWINCH   : constant := 28; --  window size change
+   SIGPOLL: constant := 29; --  pollable event occurred
+   SIGIO  : constant := 29; --  I/O now possible (4.2 BSD)
+   SIGPWR : constant := 30; --  power-fail restart
+   SIGSYS : constant := 31; --  bad system call
+   SIG32  : constant := 32; --  glibc internal signal
+   SIG33  : constant := 33; --  glibc internal signal
+   SIG34  : constant :=

[PATCH] Use vmaskmov{ps, pd} for VI48_128_256 when TARGET_AVX2 is not available.

2023-08-24 Thread liuhongt via Gcc-patches

vpmaskmov{d,q} is available for TARGET_AVX2, vmaskmov{ps,ps} is
available for TARGET_AVX, w/o TARGET_AVX2, we can use vmaskmov{ps,pd}
for VI48_128_256

Bootstrapped and regtested on x86_64-pc-linux{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

PR target/19
* config/i386/sse.md (V48_AVX2): Rename to ..
(V48_128_256): .. this.
(ssefltmodesuffix): Extend to V4SF/V8SF/V2DF/V4DF.
(_maskload): Change
V48_AVX2 to V48_128_256, also generate vmaskmov{ps,pd} for
integral modes when TARGET_AVX2 is not available.
(_maskstore): Ditto.
(maskload): Change V48_AVX2 to
V48_128_256.
(maskstore): Ditto.
---
 gcc/config/i386/sse.md | 48 ++
 1 file changed, 30 insertions(+), 18 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 59a0eb1c63f..414a807aa6c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -700,11 +700,12 @@ (define_mode_iterator VI12_AVX_AVX512F
   [ (V64QI "TARGET_AVX512F") (V32QI "TARGET_AVX") V16QI
 (V32HI "TARGET_AVX512F") (V16HI "TARGET_AVX") V8HI])
 
-(define_mode_iterator V48_AVX2
+(define_mode_iterator V48_128_256
   [V4SF V2DF
+   V4DI V2DI
V8SF V4DF
-   (V4SI "TARGET_AVX2") (V2DI "TARGET_AVX2")
-   (V8SI "TARGET_AVX2") (V4DI "TARGET_AVX2")])
+   V8SI V4SI])
+
 
 (define_mode_iterator VF4_128_8_256
   [V4DF V4SF])
@@ -22300,7 +22301,8 @@ (define_insn_and_split 
"*_blendv_lt"
(set_attr "mode" "")])
 
 (define_mode_attr ssefltmodesuffix
-  [(V2DI "pd") (V4DI "pd") (V4SI "ps") (V8SI "ps")])
+  [(V2DI "pd") (V4DI "pd") (V4SI "ps") (V8SI "ps")
+   (V2DF "pd") (V4DF "pd") (V4SF "ps") (V8SF "ps")])
 
 (define_mode_attr ssefltvecmode
   [(V2DI "V2DF") (V4DI "V4DF") (V4SI "V4SF") (V8SI "V8SF")])
@@ -27411,13 +27413,18 @@ (define_insn "vec_set_hi_v32qi"
(set_attr "mode" "OI")])
 
 (define_insn "_maskload"
-  [(set (match_operand:V48_AVX2 0 "register_operand" "=x")
-   (unspec:V48_AVX2
+  [(set (match_operand:V48_128_256 0 "register_operand" "=x")
+   (unspec:V48_128_256
  [(match_operand: 2 "register_operand" "x")
-  (match_operand:V48_AVX2 1 "memory_operand" "m")]
+  (match_operand:V48_128_256 1 "memory_operand" "m")]
  UNSPEC_MASKMOV))]
   "TARGET_AVX"
-  "vmaskmov\t{%1, %2, %0|%0, %2, %1}"
+{
+  if (TARGET_AVX2)
+return "vmaskmov\t{%1, %2, %0|%0, %2, %1}";
+  else
+return "vmaskmov\t{%1, %2, %0|%0, %2, %1}";
+}
   [(set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "vex")
@@ -27425,14 +27432,19 @@ (define_insn 
"_maskload"
(set_attr "mode" "")])
 
 (define_insn "_maskstore"
-  [(set (match_operand:V48_AVX2 0 "memory_operand" "+m")
-   (unspec:V48_AVX2
+  [(set (match_operand:V48_128_256 0 "memory_operand" "+m")
+   (unspec:V48_128_256
  [(match_operand: 1 "register_operand" "x")
-  (match_operand:V48_AVX2 2 "register_operand" "x")
+  (match_operand:V48_128_256 2 "register_operand" "x")
   (match_dup 0)]
  UNSPEC_MASKMOV))]
   "TARGET_AVX"
-  "vmaskmov\t{%2, %1, %0|%0, %1, %2}"
+{
+  if (TARGET_AVX2)
+return "vmaskmov\t{%2, %1, %0|%0, %1, %2}";
+  else
+return "vmaskmov\t{%2, %1, %0|%0, %1, %2}";
+}
   [(set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "vex")
@@ -27440,10 +27452,10 @@ (define_insn 
"_maskstore"
(set_attr "mode" "")])
 
 (define_expand "maskload"
-  [(set (match_operand:V48_AVX2 0 "register_operand")
-   (unspec:V48_AVX2
+  [(set (match_operand:V48_128_256 0 "register_operand")
+   (unspec:V48_128_256
  [(match_operand: 2 "register_operand")
-  (match_operand:V48_AVX2 1 "memory_operand")]
+  (match_operand:V48_128_256 1 "memory_operand")]
  UNSPEC_MASKMOV))]
   "TARGET_AVX")
 
@@ -27468,10 +27480,10 @@ (define_expand "maskload"
   "TARGET_AVX512BW")
 
 (define_expand "maskstore"
-  [(set (match_operand:V48_AVX2 0 "memory_operand")
-   (unspec:V48_AVX2
+  [(set (match_operand:V48_128_256 0 "memory_operand")
+   (unspec:V48_128_256
  [(match_operand: 2 "register_operand")
-  (match_operand:V48_AVX2 1 "register_operand")
+  (match_operand:V48_128_256 1 "register_operand")
   (match_dup 0)]
  UNSPEC_MASKMOV))]
   "TARGET_AVX")
-- 
2.31.1

Re: [PATCH v1] LoongArch: Remove the symbolic extension instruction due to the SLT directive.

2023-08-24 Thread WANG Xuerui


On 8/25/23 12:01, Lulu Cheng wrote:

Since the slt instruction does not distinguish between 32-bit and 64-bit 
operations
under the LoongArch 64-bit architecture, if the operands of slt are of SImode, 
symbol
expansion is required before operation.
Hint:“符号扩展” is "sign extension" (as noun) or "sign-extend" (as verb), 
not "symbol expansion".


But similar to the following test case, symbol expansion can be omitted:

extern int src1, src2, src3;

int
test (void)
{
  int data1 = src1 + src2;
  int data2 = src1 + src3;
  return test1 > test2 ? test1 : test2;
}
Assembly code before optimization:
...
add.w   $r4,$r4,$r14
add.w   $r13,$r13,$r14
slli.w  $r12,$r4,0
slli.w  $r14,$r13,0
slt $r12,$r12,$r14
masknez $r4,$r4,$r12
maskeqz $r12,$r13,$r12
or  $r4,$r4,$r12
slli.w  $r4,$r4,0
...

After optimization:
...
add.w   $r12,$r12,$r14
add.w   $r13,$r13,$r14
slt $r4,$r12,$r13
masknez $r12,$r12,$r4
maskeqz $r4,$r13,$r4
or  $r4,$r12,$r4
...

Similar to this test example, the two operands of SLT are obtained by the
addition operation, and the addition operation "add.w" is an implicit
symbolic extension function, so the two operands of SLT do not require


more naturally: "and add.w implicitly sign-extends" -- brevity are often 
desired and clearer ;-)



symbolic expansion.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_expand_conditional_move):
Optimize the function implementation.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/slt-sign-extend.c: New test.
---
  gcc/config/loongarch/loongarch.cc | 53 +--
  .../gcc.target/loongarch/slt-sign-extend.c| 14 +
  2 files changed, 63 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/loongarch/slt-sign-extend.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 86d58784113..1905599b9e8 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4384,14 +4384,30 @@ loongarch_expand_conditional_move (rtx *operands)
enum rtx_code code = GET_CODE (operands[1]);
rtx op0 = XEXP (operands[1], 0);
rtx op1 = XEXP (operands[1], 1);
+  rtx op0_extend = op0;
+  rtx op1_extend = op1;
+
+  /* Record whether operands[2] and operands[3] modes are promoted to 
word_mode.  */
+  bool promote_p = false;
+  machine_mode mode = GET_MODE (operands[0]);
  
if (FLOAT_MODE_P (GET_MODE (op1)))

  loongarch_emit_float_compare (&code, &op0, &op1);
else
  {
+  if ((REGNO (op0) == REGNO (operands[2])
+  || (REGNO (op1) == REGNO (operands[3]) && (op1 != const0_rtx)))
+ && (GET_MODE_SIZE (GET_MODE (op0)) < word_mode))
+   {
+ mode = word_mode;
+ promote_p = true;
+   }
+
loongarch_extend_comparands (code, &op0, &op1);
  
op0 = force_reg (word_mode, op0);

+  op0_extend = op0;
+  op1_extend = force_reg (word_mode, op1);
  
if (code == EQ || code == NE)

{
@@ -4418,23 +4434,52 @@ loongarch_expand_conditional_move (rtx *operands)
&& register_operand (operands[2], VOIDmode)
&& register_operand (operands[3], VOIDmode))
  {
-  machine_mode mode = GET_MODE (operands[0]);
+  rtx op2 = operands[2];
+  rtx op3 = operands[3];
+
+  if (promote_p)
+   {
+ if (REGNO (XEXP (operands[1], 0)) == REGNO (operands[2]))
+   op2 = op0_extend;
+ else
+   {
+ loongarch_extend_comparands (code, &op2, &const0_rtx);
+ op2 = force_reg (mode, op2);
+   }
+
+ if (REGNO (XEXP (operands[1], 1)) == REGNO (operands[3]))
+   op3 = op1_extend;
+ else
+   {
+ loongarch_extend_comparands (code, &op3, &const0_rtx);
+ op3 = force_reg (mode, op3);
+   }
+   }
+
rtx temp = gen_reg_rtx (mode);
rtx temp2 = gen_reg_rtx (mode);
  
emit_insn (gen_rtx_SET (temp,

  gen_rtx_IF_THEN_ELSE (mode, cond,
-   operands[2], const0_rtx)));
+   op2, const0_rtx)));
  
/* Flip the test for the second operand.  */

cond = gen_rtx_fmt_ee ((code == EQ) ? NE : EQ, GET_MODE (op0), op0, 
op1);
  
emit_insn (gen_rtx_SET (temp2,

  gen_rtx_IF_THEN_ELSE (mode, cond,
-   operands[3], const0_rtx)));
+   op3, const0_rtx)));
  
/* Merge the two results, at least one is guaranteed to be zero.  */

-  emit_insn (gen_rtx_SET (operands[0], gen_rtx_IOR (mode, temp, temp2)));
+  if (promote_p)
+   {
+

[PATCH v1] LoongArch: Remove the symbolic extension instruction due to the SLT directive.

2023-08-24 Thread Lulu Cheng

Since the slt instruction does not distinguish between 32-bit and 64-bit 
operations
under the LoongArch 64-bit architecture, if the operands of slt are of SImode, 
symbol
expansion is required before operation.

But similar to the following test case, symbol expansion can be omitted:

extern int src1, src2, src3;

int
test (void)
{
  int data1 = src1 + src2;
  int data2 = src1 + src3;
  return test1 > test2 ? test1 : test2;
}
Assembly code before optimization:
...
add.w   $r4,$r4,$r14
add.w   $r13,$r13,$r14
slli.w  $r12,$r4,0
slli.w  $r14,$r13,0
slt $r12,$r12,$r14
masknez $r4,$r4,$r12
maskeqz $r12,$r13,$r12
or  $r4,$r4,$r12
slli.w  $r4,$r4,0
...

After optimization:
...
add.w   $r12,$r12,$r14
add.w   $r13,$r13,$r14
slt $r4,$r12,$r13
masknez $r12,$r12,$r4
maskeqz $r4,$r13,$r4
or  $r4,$r12,$r4
...

Similar to this test example, the two operands of SLT are obtained by the
addition operation, and the addition operation "add.w" is an implicit
symbolic extension function, so the two operands of SLT do not require
symbolic expansion.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_expand_conditional_move):
Optimize the function implementation.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/slt-sign-extend.c: New test.
---
 gcc/config/loongarch/loongarch.cc | 53 +--
 .../gcc.target/loongarch/slt-sign-extend.c| 14 +
 2 files changed, 63 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/slt-sign-extend.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 86d58784113..1905599b9e8 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -4384,14 +4384,30 @@ loongarch_expand_conditional_move (rtx *operands)
   enum rtx_code code = GET_CODE (operands[1]);
   rtx op0 = XEXP (operands[1], 0);
   rtx op1 = XEXP (operands[1], 1);
+  rtx op0_extend = op0;
+  rtx op1_extend = op1;
+
+  /* Record whether operands[2] and operands[3] modes are promoted to 
word_mode.  */
+  bool promote_p = false;
+  machine_mode mode = GET_MODE (operands[0]);
 
   if (FLOAT_MODE_P (GET_MODE (op1)))
 loongarch_emit_float_compare (&code, &op0, &op1);
   else
 {
+  if ((REGNO (op0) == REGNO (operands[2])
+  || (REGNO (op1) == REGNO (operands[3]) && (op1 != const0_rtx)))
+ && (GET_MODE_SIZE (GET_MODE (op0)) < word_mode))
+   {
+ mode = word_mode;
+ promote_p = true;
+   }
+
   loongarch_extend_comparands (code, &op0, &op1);
 
   op0 = force_reg (word_mode, op0);
+  op0_extend = op0;
+  op1_extend = force_reg (word_mode, op1);
 
   if (code == EQ || code == NE)
{
@@ -4418,23 +4434,52 @@ loongarch_expand_conditional_move (rtx *operands)
   && register_operand (operands[2], VOIDmode)
   && register_operand (operands[3], VOIDmode))
 {
-  machine_mode mode = GET_MODE (operands[0]);
+  rtx op2 = operands[2];
+  rtx op3 = operands[3];
+
+  if (promote_p)
+   {
+ if (REGNO (XEXP (operands[1], 0)) == REGNO (operands[2]))
+   op2 = op0_extend;
+ else
+   {
+ loongarch_extend_comparands (code, &op2, &const0_rtx);
+ op2 = force_reg (mode, op2);
+   }
+
+ if (REGNO (XEXP (operands[1], 1)) == REGNO (operands[3]))
+   op3 = op1_extend;
+ else
+   {
+ loongarch_extend_comparands (code, &op3, &const0_rtx);
+ op3 = force_reg (mode, op3);
+   }
+   }
+
   rtx temp = gen_reg_rtx (mode);
   rtx temp2 = gen_reg_rtx (mode);
 
   emit_insn (gen_rtx_SET (temp,
  gen_rtx_IF_THEN_ELSE (mode, cond,
-   operands[2], const0_rtx)));
+   op2, const0_rtx)));
 
   /* Flip the test for the second operand.  */
   cond = gen_rtx_fmt_ee ((code == EQ) ? NE : EQ, GET_MODE (op0), op0, op1);
 
   emit_insn (gen_rtx_SET (temp2,
  gen_rtx_IF_THEN_ELSE (mode, cond,
-   operands[3], const0_rtx)));
+   op3, const0_rtx)));
 
   /* Merge the two results, at least one is guaranteed to be zero.  */
-  emit_insn (gen_rtx_SET (operands[0], gen_rtx_IOR (mode, temp, temp2)));
+  if (promote_p)
+   {
+ rtx temp3 = gen_reg_rtx (mode);
+ emit_insn (gen_rtx_SET (temp3, gen_rtx_IOR (mode, temp, temp2)));
+ temp3 = gen_lowpart (GET_MODE (operands[0]), temp3);
+ loongarch_emit_move (operands[0], temp3);
+   }
+  else
+   emit_insn (gen_

Re: [PATCH] rs6000: Disable PCREL for unsupported targets [PR111045]

2023-08-24 Thread Peter Bergner via Gcc-patches

On 8/24/23 12:56 AM, Kewen.Lin wrote:
> By looking into the uses of function rs6000_pcrel_p, I think we can
> just replace it with TARGET_PCREL.  Previously we don't require PCREL
> unset for any unsupported target/OS, so we need rs6000_pcrel_p() to
> ensure it's really supported in those use places, now if we can guarantee
> TARGET_PCREL only hold for all places where it's supported, it looks
> we can just check TARGET_PCREL only?

I think you're correct on only needing TARGET_PCREL instead of rs6000_pcrel_p()
as long as we correctly disable PCREL on the targets that either don't support
it or those that due, but don't due to explicit options (ie, -mno-prel or
ELFv2 and -mcmodel != medium, etc.).

> Then the code structure can look like:
> 
> if (PCREL_SUPPORTED_BY_OS
> && (rs6000_isa_flags_explicit & OPTION_MASK_PCREL) == 0)
>// enable
> else if (TARGET_PCREL && DEFAULT_ABI != ABI_ELFv2)
>// disable
> else if (TARGET_PCREL && TARGET_CMODEL != CMODEL_MEDIUM)
>// disable

I don't like that first conditional expression. The problem is,
PCREL_SUPPORTED_BY_OS could be true or false for the following
tests, because it's anded with the explicit option test, and
sometimes that won't make sense.  I think something safer is
something like:

if (PCREL_SUPPORTED_BY_OS)
  { 
/* PCREL on ELFv2 requires -mcmodel=medium.  */
if (DEFAULT_ABI == ABI_ELFv2 && TARGET_CMODEL != CMODEL_MEDIUM)
  error ("%qs requires %qs", "-mpcrel", "-mcmodel=medium");

if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) == 0)
  rs6000_isa_flags |= OPTION_MASK_PCREL;
  } 
else
  {
if ((rs6000_isa_flags_explicit & OPTION_MASK_PCREL) != 0
&& TARGET_PCREL)
  error ("use of %qs is invalid for this target", "-mpcrel");
rs6000_isa_flags &= ~OPTION_MASK_PCREL;
  }

...although, even the cmodel != medium test above should probably have
some extra tests to ensure TARGET_PCREL is true (and explicit?) and
mcmodel != medium before giving an error???  Ie, a ELFv2 P10 compile
with an implicit -mpcrel and explicit -mcmodel={small,large} probably
should not error and just silently disable pcrel???  Meaning only
when we explicitly say -mpcrel -mcmodel={small,large} should we give
that error.  Thoughts on that?

Peter

[PATCH V3] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-24 Thread Juzhe-Zhong

This patch refactors the Phase 3 (Demand fusion) and rename it into Earliest 
fusion.
I do the refactor for the following reasons:
  
  1. Current implementation of phase 3 is doing too many things which makes the 
code quality
 quite messy and not easy to maintain.
  2. The demand fusion I do previously is we explicitly make the fusion 
including how to fuse
 VSETVLs, where to make the VSETVL fusion happens, check the VSETVL fusion 
point (location)
 whether it is correct and optimal...etc.

 We are dong these things too much so I added these following functions:

enum fusion_type get_backward_fusion_type (const bb_info *,
 const vector_insn_info &);
bool hard_empty_block_p (const bb_info *, const vector_insn_info &) 
const;
bool backward_demand_fusion (void);
bool forward_demand_fusion (void);
bool cleanup_illegal_dirty_blocks (void);

 to make sure the VSETV fusion is optimal and correct. I found in may 
downstream testing it is
 not the reliable and optimal approach.

 Instead, this patch is to use 'compute_earliest' which is the function of 
LCM to fuse multiple
 'compatible' VSETVL demand info if they are having same earliest edge.  We 
let LCM decide almost
 everything of demand fusion for us. The only thing we do (Not the LCM do) 
is just checking the
 VSETVLs demand info are compatible or not. That's all we need to do.
 I belive such approach is much more reliable and optimal than before (We 
have many testcases already to check this refactor patch).
  3. Using LCM approach to do the demand fusion is more reliable and better CFG 
than before.
  ...

Here is the basics of this patch approach:

Consider this following case:

for
  for 
for
  ...
 for
   if (...)
 VSETVL 1 demand: RATIO = 32 and TU policy.
   else if (...)
 VSETVL 2 demand: SEW = 16.
   else
 VSETVL 3 demand: MU policy.

   - 'compute_earliest' which output the earliest edge of VSETVL 1, VSETVL 2 
and VSETVL 3.
 They are having same earliest edge which is outside the 1th inner-most 
loop.
   
   - Then, we check these 3 VSETVL demand info are compatible so fuse them into 
a single VSETVL info:
 demand SEW = 16, LMUL = MF2, TU, MU.
   
   - Then the later phase (phase 4) LCM PRE (partial reduandancy elimination) 
will hoist such VSETVL
 to the outer-most loop. So that we can get optimal codegen.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vsetvl_vtype_change_only_p): New 
function.
(after_or_same_p): Ditto.
(find_reg_killed_by): Delete.
(has_vsetvl_killed_avl_p): Ditto.
(anticipatable_occurrence_p): Refactor.
(any_set_in_bb_p): Delete.
(count_regno_occurrences): Ditto.
(backward_propagate_worthwhile_p): Ditto.
(demands_can_be_fused_p): Ditto.
(earliest_pred_can_be_fused_p): New function.
(vsetvl_dominated_by_p): Ditto.
(vector_insn_info::parse_insn): Refactor.
(vector_insn_info::merge): Refactor.
(vector_insn_info::dump): Refactor.
(vector_infos_manager::vector_infos_manager): Refactor.
(vector_infos_manager::all_empty_predecessor_p): Delete.
(vector_infos_manager::all_same_avl_p): Ditto.
(vector_infos_manager::create_bitmap_vectors): Refactor.
(vector_infos_manager::free_bitmap_vectors): Refactor.
(vector_infos_manager::dump): Refactor.
(pass_vsetvl::update_block_info): New function.
(enum fusion_type): Ditto.
(pass_vsetvl::get_backward_fusion_type): Delete.
(pass_vsetvl::hard_empty_block_p): Ditto.
(pass_vsetvl::backward_demand_fusion): Ditto.
(pass_vsetvl::forward_demand_fusion): Ditto.
(pass_vsetvl::demand_fusion): Ditto.
(pass_vsetvl::cleanup_illegal_dirty_blocks): Ditto.
(pass_vsetvl::compute_local_properties): Ditto.
(pass_vsetvl::earliest_fusion): New function.
(pass_vsetvl::vsetvl_fusion): Ditto.
(pass_vsetvl::commit_vsetvls): Refactor.
(get_first_vsetvl_before_rvv_insns): Ditto.
(pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::cleanup_earliest_vsetvls): New function.
(pass_vsetvl::df_post_optimization): Refactor.
(pass_vsetvl::lazy_vsetvl): Ditto.
* config/riscv/riscv-vsetvl.h: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vxrm-8.c: Adapt test.
* gcc.target/riscv/rvv/base/vxrm-9.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-7.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_multiple-8.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-102.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-14.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-15.c: Ditto.
* gcc.target/riscv/rvv/vsetvl/avl_single-27.c: Ditto.

Re: [PATCH v1] RISC-V: Support rounding mode for VFNMADD/VFNMACC autovec

2023-08-24 Thread Kito Cheng via Gcc-patches

lgtm

On Fri, Aug 25, 2023 at 9:49 AM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> There will be a case like below for intrinsic and autovec combination.
>
> vfadd RTZ   <- intrinisc static rounding
> vfnmadd <- autovec/autovec-opt
>
> The autovec generated vfnmadd should take DYN mode, and the
> frm must be restored before the vfnmadd insn. This patch
> would like to fix this issue by:
>
> * Add the frm operand to the autovec/autovec-opt pattern.
> * Set the frm_mode attr to DYN.
>
> Thus, the frm flow when combine autovec and intrinsic should be.
>
> +
> | frrm  a5
> | ...
> | fsrmi 4
> | vfadd   <- intrinsic static rounding.
> | ...
> | fsrm  a5
> | vfnmadd <- autovec/autovec-opt
> | ...
> +
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md: Add FRM_REGNUM to vfnmadd/vfnmacc.
> * config/riscv/autovec.md: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-frm-autovec-4.c: New test.
> ---
>  gcc/config/riscv/autovec-opt.md   | 38 
>  gcc/config/riscv/autovec.md   | 34 ---
>  .../rvv/base/float-point-frm-autovec-4.c  | 88 +++
>  3 files changed, 130 insertions(+), 30 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-autovec-4.c
>
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 54ca6df721c..2922f370a17 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -655,14 +655,16 @@ (define_insn_and_split "*single_widen_fms"
>  ;; vect__13.182_33 = .FNMS (vect__11.180_35, vect__8.176_40, vect__4.172_45);
>  (define_insn_and_split "*double_widen_fnms"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (neg:VWEXTF
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (neg:VWEXTF
> + (float_extend:VWEXTF
> +   (match_operand: 2 "register_operand")))
> (float_extend:VWEXTF
> - (match_operand: 2 "register_operand")))
> - (float_extend:VWEXTF
> -   (match_operand: 3 "register_operand"))
> - (neg:VWEXTF
> -   (match_operand:VWEXTF 1 "register_operand"]
> + (match_operand: 3 "register_operand"))
> +   (neg:VWEXTF
> + (match_operand:VWEXTF 1 "register_operand")))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -673,18 +675,21 @@ (define_insn_and_split "*double_widen_fnms"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; This helps to match ext + fnms.
>  (define_insn_and_split "*single_widen_fnms"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (neg:VWEXTF
> -   (float_extend:VWEXTF
> - (match_operand: 2 "register_operand")))
> - (match_operand:VWEXTF 3 "register_operand")
> - (neg:VWEXTF
> -   (match_operand:VWEXTF 1 "register_operand"]
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (neg:VWEXTF
> + (float_extend:VWEXTF
> +   (match_operand: 2 "register_operand")))
> +   (match_operand:VWEXTF 3 "register_operand")
> +   (neg:VWEXTF
> + (match_operand:VWEXTF 1 "register_operand")))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -701,4 +706,5 @@ (define_insn_and_split "*single_widen_fnms"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 28396c6175d..5f16ac53712 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1274,26 +1274,31 @@ (define_insn_and_split "*fms"
>  (define_expand "fnms4"
>[(parallel
>  [(set (match_operand:VF 0 "register_operand")
> - (fma:VF
> -   (neg:VF
> - (match_operand:VF 1 "register_operand"))
> -   (match_operand:VF 2 "register_operand")
> -   (neg:VF
> - (match_operand:VF 3 "register_operand"
> + (unspec:VF
> +   [(fma:VF
> + (neg:VF
> +   (match_operand:VF 1 "register_operand"))
> + (match_operand:VF 2 "register_operand")
> + (neg:VF
> +   (match_operand:VF 3 "register_operand")))
> +(reg:SI FRM_REGNUM)] UNSPEC_VFFMA))
>   (clobber (match_dup 4))])]
>"TARGET_VECTOR"
>{
>  operands[4] = gen_reg_rtx (Pmode);
> -  })
> +  }
> +  [(set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))

Re: [PATCH] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2023-08-24 Thread Peter Bergner via Gcc-patches

On 8/24/23 12:35 PM, Michael Meissner wrote:
> On Thu, Jul 20, 2023 at 10:05:28AM +0530, jeevitha wrote:
>> gcc/
>>  PR target/110411
>>  * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add fields
>>  to hold PTImode type.
>>  * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Add node
>>  for PTImode type.
> 
> It is good as far as it goes, but I suspect we will eventually need to extend
> it.  In particular, the reason people need PTImode is they need the even/odd
> register layout.  What you've done enables users to declare this value.

Sure, it could be extended, but that is not what this patch is about.
It's purely to allow the kernel team access to the guaranteed even/odd
register layout for some inline asm code.  Any extension would be a
follow-on patch to this.

On 8/9/23 3:48 AM, Kewen.Lin wrote:
> IIUC, this builtin type registering makes this type expose to users, so
> I wonder if we want to actually expose this type for users' uses.
> If yes, we need to update the documentation (and not sure if the current
> name is good enough); otherwise, I wonder if there is some existing
> practice to declare a builtin type with a name which users can't actually
> use and is just for shadowing a mode.

Segher, Mike, Jeevitha and I talked about the patch and Segher mentioned
that under some conditions, it's fine to keep the type undocumented.
Hopefully he'll weigh in on whether this particular patch is one of
those cases or not.  

Peter

[PATCH V2] RISC-V: Add conditional autovec convert(INT<->INT) patterns

2023-08-24 Thread Lehua Ding

V2 changes: Address comments from Robin.

Hi,

This patch adds conditional sign/zero extension and truncation autovec
patterns by combining convert and vcond_mask patterns.

For quad truncation, two vncvt instructions are generated. This patch
combine the second vncvt and vmerge to form a masked vncvt, while the
first vncvt remains unchanged. Of course, it is possible to convert the
first vncvt to the mask type as well, but I don't think it is necessary.
It is a similar story with 8x truncation.

--
Best,
Lehua

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_):
Add combine pattern.
(*cond_): Ditto.
(*cond_): Ditto.
(*cond_trunc): Ditto.
* config/riscv/autovec.md (2):
Change define_expand to define_insn_and_split.
(2): Ditto.
* config/riscv/riscv-protos.h (emit_vlmax_masked_insn): Exported.
* config/riscv/riscv-v.cc (emit_vlmax_masked_insn): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: Adjust.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-2.c: New 
test.
---
 gcc/config/riscv/autovec-opt.md   | 69 +++
 gcc/config/riscv/autovec.md   | 37 --
 gcc/config/riscv/riscv-protos.h   |  1 +
 gcc/config/riscv/riscv-v.cc   |  2 +-
 .../riscv/rvv/autovec/binop/narrow-3.c|  2 +-
 .../rvv/autovec/cond/cond_convert_int2int-1.h | 47 +
 .../rvv/autovec/cond/cond_convert_int2int-2.h | 46 +
 .../cond/cond_convert_int2int-rv32-1.c| 14 
 .../cond/cond_convert_int2int-rv32-2.c| 14 
 .../cond/cond_convert_int2int-rv64-1.c| 14 
 .../cond/cond_convert_int2int-rv64-2.c| 14 
 .../autovec/cond/cond_convert_int2int_run-1.c | 31 +
 .../autovec/cond/cond_convert_int2int_run-2.c | 30 
 13 files changed, 296 insertions(+), 25 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv32-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int-rv64-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2int_run-2.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 8247eb87ddb..f3ef3a839df 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -723,3 +723,72 @@
  riscv_vector::RVV_BINOP, operands);
   DONE;
 })
+
+;; Combine sign_extend/zero_extend(vf2) and vcond_mask
+(define_insn_and_split "*cond_"
+  [(set (match_operand:VWEXTI 0 "register_operand")
+(if_then_else:VWEXTI
+  (match_operand: 1 "register_operand")
+  (any_extend:VWEXTI (match_operand: 3 
"register_operand"))
+  (match_operand:VWEXTI 2 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_vf2 (, mode);
+  riscv_vector::emit_vlmax_masked_insn (icode, riscv_vector::RVV_UNOP_M, 
operands);
+  DONE;
+})
+
+;; Combine sign_extend/zero_extend(vf4) and vcond_mask
+(define_insn_and_split "*cond_"
+  [(set (match_operand:VQEXTI 0 "register_operand")
+(if_then_else:VQEXTI
+  (match_operand: 1 "register_operand")
+  (any_extend:VQEXTI (match_operand: 3 
"register_operand"))
+  (match_operand:VQEXTI 2 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred_vf4 (, mode);
+  riscv_vector::emit_vlmax_masked_insn (icode, riscv_vector::RVV_UNOP_M, 
operands);
+  DONE;
+})
+
+;; Combine sign_extend/zero_extend(vf8) and vcond_mask
+(define_insn_and_split "*cond_"
+  [(set (match_operand:VOEXTI 0 "register_operand")
+(if_then_else:VOEXTI
+  (match_operand: 1

[PATCH] RISC-V: Add early continue for ENTRY and EXIT block

2023-08-24 Thread Juzhe-Zhong

Committed.

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (pass_vsetvl::compute_local_properties): 
Add early continue.

---
 gcc/config/riscv/riscv-vsetvl.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index f7558cad2e2..7923702144c 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -3720,6 +3720,8 @@ pass_vsetvl::compute_local_properties (void)
   for (const bb_info *bb : crtl->ssa->bbs ())
 {
   unsigned int curr_bb_idx = bb->index ();
+  if (curr_bb_idx == ENTRY_BLOCK || curr_bb_idx == EXIT_BLOCK)
+   continue;
   const auto local_dem
= m_vector_manager->vector_block_infos[curr_bb_idx].local_dem;
   const auto reaching_out
-- 
2.36.3

[PATCH v1] RISC-V: Support rounding mode for VFNMADD/VFNMACC autovec

2023-08-24 Thread Pan Li via Gcc-patches

From: Pan Li 

There will be a case like below for intrinsic and autovec combination.

vfadd RTZ   <- intrinisc static rounding
vfnmadd <- autovec/autovec-opt

The autovec generated vfnmadd should take DYN mode, and the
frm must be restored before the vfnmadd insn. This patch
would like to fix this issue by:

* Add the frm operand to the autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+
| frrm  a5
| ...
| fsrmi 4
| vfadd   <- intrinsic static rounding.
| ...
| fsrm  a5
| vfnmadd <- autovec/autovec-opt
| ...
+

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfnmadd/vfnmacc.
* config/riscv/autovec.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-4.c: New test.
---
 gcc/config/riscv/autovec-opt.md   | 38 
 gcc/config/riscv/autovec.md   | 34 ---
 .../rvv/base/float-point-frm-autovec-4.c  | 88 +++
 3 files changed, 130 insertions(+), 30 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-autovec-4.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 54ca6df721c..2922f370a17 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -655,14 +655,16 @@ (define_insn_and_split "*single_widen_fms"
 ;; vect__13.182_33 = .FNMS (vect__11.180_35, vect__8.176_40, vect__4.172_45);
 (define_insn_and_split "*double_widen_fnms"
   [(set (match_operand:VWEXTF 0 "register_operand")
-   (fma:VWEXTF
- (neg:VWEXTF
+   (unspec:VWEXTF
+ [(fma:VWEXTF
+   (neg:VWEXTF
+ (float_extend:VWEXTF
+   (match_operand: 2 "register_operand")))
(float_extend:VWEXTF
- (match_operand: 2 "register_operand")))
- (float_extend:VWEXTF
-   (match_operand: 3 "register_operand"))
- (neg:VWEXTF
-   (match_operand:VWEXTF 1 "register_operand"]
+ (match_operand: 3 "register_operand"))
+   (neg:VWEXTF
+ (match_operand:VWEXTF 1 "register_operand")))
+  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
   "TARGET_VECTOR && can_create_pseudo_p ()"
   "#"
   "&& 1"
@@ -673,18 +675,21 @@ (define_insn_and_split "*double_widen_fnms"
 DONE;
   }
   [(set_attr "type" "vfwmuladd")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
 
 ;; This helps to match ext + fnms.
 (define_insn_and_split "*single_widen_fnms"
   [(set (match_operand:VWEXTF 0 "register_operand")
-   (fma:VWEXTF
- (neg:VWEXTF
-   (float_extend:VWEXTF
- (match_operand: 2 "register_operand")))
- (match_operand:VWEXTF 3 "register_operand")
- (neg:VWEXTF
-   (match_operand:VWEXTF 1 "register_operand"]
+   (unspec:VWEXTF
+ [(fma:VWEXTF
+   (neg:VWEXTF
+ (float_extend:VWEXTF
+   (match_operand: 2 "register_operand")))
+   (match_operand:VWEXTF 3 "register_operand")
+   (neg:VWEXTF
+ (match_operand:VWEXTF 1 "register_operand")))
+  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
   "TARGET_VECTOR && can_create_pseudo_p ()"
   "#"
   "&& 1"
@@ -701,4 +706,5 @@ (define_insn_and_split "*single_widen_fnms"
 DONE;
   }
   [(set_attr "type" "vfwmuladd")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 28396c6175d..5f16ac53712 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1274,26 +1274,31 @@ (define_insn_and_split "*fms"
 (define_expand "fnms4"
   [(parallel
 [(set (match_operand:VF 0 "register_operand")
- (fma:VF
-   (neg:VF
- (match_operand:VF 1 "register_operand"))
-   (match_operand:VF 2 "register_operand")
-   (neg:VF
- (match_operand:VF 3 "register_operand"
+ (unspec:VF
+   [(fma:VF
+ (neg:VF
+   (match_operand:VF 1 "register_operand"))
+ (match_operand:VF 2 "register_operand")
+ (neg:VF
+   (match_operand:VF 3 "register_operand")))
+(reg:SI FRM_REGNUM)] UNSPEC_VFFMA))
  (clobber (match_dup 4))])]
   "TARGET_VECTOR"
   {
 operands[4] = gen_reg_rtx (Pmode);
-  })
+  }
+  [(set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
 
 (define_insn_and_split "*fnms"
   [(set (match_operand:VF 0 "register_operand" "=vr, vr, ?&vr")
-   (fma:VF
- (neg:VF
-   (match_operand:VF 1 "register_operand" " %0, vr,   vr"))
- (match_operand:VF 2 "register_operand"   " vr, vr,   vr")
- (neg:VF
-   (match_operand:VF 3 "registe

RE: [PATCH v1] RISC-V: Support rounding mode for VFNMSAC/VFNMSUB autovec

2023-08-24 Thread Li, Pan2 via Gcc-patches

Thanks Kito, will commit it after VFMADD, VFMSAC.

Pan

-Original Message-
From: Kito Cheng  
Sent: Thursday, August 24, 2023 10:24 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH v1] RISC-V: Support rounding mode for VFNMSAC/VFNMSUB 
autovec

LGTM

On Thu, Aug 24, 2023 at 5:35 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> There will be a case like below for intrinsic and autovec combination.
>
> vfadd RTZ   <- intrinisc static rounding
> vfnmsub <- autovec/autovec-opt
>
> The autovec generated vfnmsub should take DYN mode, and the
> frm must be restored before the vfnmsub insn. This patch
> would like to fix this issue by:
>
> * Add the frm operand to the autovec/autovec-opt pattern.
> * Set the frm_mode attr to DYN.
>
> Thus, the frm flow when combine autovec and intrinsic should be.
>
> +
> | frrm  a5
> | ...
> | fsrmi 4
> | vfadd   <- intrinsic static rounding.
> | ...
> | fsrm  a5
> | vfnmsub <- autovec/autovec-opt
> | ...
> +
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md: Add FRM_REGNUM to vfnmsac/vfnmsub
> * config/riscv/autovec.md: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-frm-autovec-3.c: New test.
> ---
>  gcc/config/riscv/autovec-opt.md   | 34 ---
>  gcc/config/riscv/autovec.md   | 30 ---
>  .../rvv/base/float-point-frm-autovec-3.c  | 88 +++
>  3 files changed, 126 insertions(+), 26 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-autovec-3.c
>
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 732a51edacd..54ca6df721c 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -523,13 +523,15 @@ (define_insn_and_split "*single_widen_fma"
>  ;; vect__13.182_33 = .FNMA (vect__11.180_35, vect__8.176_40, vect__4.172_45);
>  (define_insn_and_split "*double_widen_fnma"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (neg:VWEXTF
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (neg:VWEXTF
> + (float_extend:VWEXTF
> +   (match_operand: 2 "register_operand")))
> (float_extend:VWEXTF
> - (match_operand: 2 "register_operand")))
> - (float_extend:VWEXTF
> -   (match_operand: 3 "register_operand"))
> - (match_operand:VWEXTF 1 "register_operand")))]
> + (match_operand: 3 "register_operand"))
> +   (match_operand:VWEXTF 1 "register_operand"))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -540,17 +542,20 @@ (define_insn_and_split "*double_widen_fnma"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; This helps to match ext + fnma.
>  (define_insn_and_split "*single_widen_fnma"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (neg:VWEXTF
> -   (float_extend:VWEXTF
> - (match_operand: 2 "register_operand")))
> - (match_operand:VWEXTF 3 "register_operand")
> - (match_operand:VWEXTF 1 "register_operand")))]
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (neg:VWEXTF
> + (float_extend:VWEXTF
> +   (match_operand: 2 "register_operand")))
> +   (match_operand:VWEXTF 3 "register_operand")
> +   (match_operand:VWEXTF 1 "register_operand"))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -567,7 +572,8 @@ (define_insn_and_split "*single_widen_fnma"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; -
>  ;;  [FP] VFWMSAC
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 0c1c546817a..28396c6175d 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1174,24 +1174,29 @@ (define_insn_and_split "*fma"
>  (define_expand "fnma4"
>[(parallel
>  [(set (match_operand:VF 0 "register_operand")
> - (fma:VF
> -   (neg:VF
> - (match_operand:VF 1 "register_operand"))
> -   (match_operand:VF 2 "register_operand")
> -   (match_operand:VF 3 "register_operand")))
> + (unspec:VF
> +   [(fma:VF
> + (neg:VF
> +   (match_operand:VF 1 "register_operand"))
> + (match_operand:VF 2 "register_operand")
> + (match_operand:VF 3 "register

Re: Fix profile update in tree-ssa-reassoc

2023-08-24 Thread Hans-Peter Nilsson via Gcc-patches

> Date: Wed, 23 Aug 2023 11:10:02 +0200
> From: Jan Hubicka via Gcc-patches 

> Hi,
> this patch adds missing profile update to maybe_optimize_range_tests.

[...]

> Jakub, it seems that the code is originally yours.  Any idea why those are 
> not turned to
> constant true or false conditionals?
> 
> Bootstrapped/regtested x86_64-linux, does it seem to make sense?
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/110628
>   * tree-ssa-reassoc.cc (maybe_optimize_range_tests): Add profile update.

Hi.  Feeling somewhat guilty for not noticing that you had
posted a patch before me xfailing it, I went ahead and
tested this patch for cris-elf against
r14-3431-g7e05cd632fab, but unfortunately it regresses a few
tests, and it appears it's not just testcase (dumps) that
need tweaking.  Four test-cases regress (counting multiple
runs as just one):

Running /x/gcc/gcc/testsuite/gcc.c-torture/execute/execute.exp ...
FAIL: gcc.c-torture/execute/pr95731.c   -O1  execution test
FAIL: gcc.c-torture/execute/pr95731.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr95731.c   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  execution test
FAIL: gcc.c-torture/execute/pr95731.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr95731.c   -Os  execution test
FAIL: gcc.c-torture/execute/pr95731.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr95731.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
...

Running /x/gcc/gcc/testsuite/gcc.dg/dg.exp ...
FAIL: gcc.dg/pr46309-2.c scan-tree-dump-times reassoc2 "Optimizing range tests 
[^\r\n]*_[0-9]* -.0, 0. and -.128, 128.[\n\r]* into" 1
...

Running /x/gcc/gcc/testsuite/gcc.dg/torture/dg-torture.exp ...
FAIL: gcc.dg/torture/pr63464.c   -Os  execution test
...

Running /x/gcc/gcc/testsuite/gcc.dg/tree-ssa/tree-ssa.exp ...
...
FAIL: gcc.dg/tree-ssa/pr95731.c scan-tree-dump-times optimized " >= 0| < 0" 6

brgds, H-P

[Committed] RISC-V: Move vector-abi testcases into rvv/base folder

2023-08-24 Thread Patrick O'Neill


On 8/24/23 11:28, Palmer Dabbelt wrote:

Reviewed-by: Palmer Dabbelt 

I think Joern is still looking into fixing up all these explicit ISA 
strings in the tests, but I don't see any reason to block fixing 
failing tests on that.


Thanks!


Committed

Patrick

Re: [PATCH] Fortran: improve bounds checking for DATA with implied-do [PR35095]

2023-08-24 Thread Jerry D via Gcc-patches


On 8/24/23 2:28 PM, Harald Anlauf via Fortran wrote:

Dear all,

the attached patch adds stricter bounds-checking for DATA statements
with implied-do.  I chose to allow overindexing (for arrays of rank
greater than 1) for -std=legacy, as there might be codes in the wild
that need this (and this is accepted by some other compilers, while
NAG is strict here).  We now get a warning with -std=gnu, and an
error with -std=f.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

(The PR is over 15 years old, so no backport intended... ;-)

Thanks,
Harald



Looks good Harold, OK for mainline.

Re: [PATCH] Fix avx512ne2ps2bf16 wrong code [PR 111127]

2023-08-24 Thread Hongtao Liu via Gcc-patches

On Thu, Aug 24, 2023 at 5:05 PM Hongyu Wang via Gcc-patches
 wrote:
>
> Hi,
>
> For PR27, the wrong code was caused by wrong expander for maskz.
> correct the parameter order for avx512ne2ps2bf16_maskz expander
>
> Bootstrapped/regtested on x86-64-pc-linux-gnu{m32,}.
> OK for master and backport to GCC13?
Ok.
>
> gcc/ChangeLog:
>
> PR target/27
> * config/i386/sse.md (avx512f_cvtne2ps2bf16__maskz):
> Adjust paramter order.
>
> gcc/testsuite/ChangeLog:
>
> PR target/27
> * gcc.target/i386/pr27.c: New test.
> ---
>  gcc/config/i386/sse.md   |  4 ++--
>  gcc/testsuite/gcc.target/i386/pr27.c | 24 
>  2 files changed, 26 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/pr27.c
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index da85223a9b4..194dab9a9d0 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -30006,8 +30006,8 @@ (define_expand "avx512f_cvtne2ps2bf16__maskz"
> (match_operand: 3 "register_operand")]
>"TARGET_AVX512BF16"
>  {
> -  emit_insn (gen_avx512f_cvtne2ps2bf16__mask(operands[0], operands[2],
> -operands[1], CONST0_RTX(mode), operands[3]));
> +  emit_insn (gen_avx512f_cvtne2ps2bf16__mask(operands[0], operands[1],
> +operands[2], CONST0_RTX(mode), operands[3]));
>DONE;
>  })
>
> diff --git a/gcc/testsuite/gcc.target/i386/pr27.c 
> b/gcc/testsuite/gcc.target/i386/pr27.c
> new file mode 100644
> index 000..c124bc18bc4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr27.c
> @@ -0,0 +1,24 @@
> +/* PR target/27 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512bf16 -mavx512vl" } */
> +/* { dg-final { scan-assembler-times "vcvtne2ps2bf16\[ 
> \\t\]+\[^\{\n\]*%zmm1, %zmm0, %zmm0\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ 
> \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vcvtne2ps2bf16\[ 
> \\t\]+\[^\{\n\]*%ymm1, %ymm0, %ymm0\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ 
> \\t\]+#)" 1 } } */
> +/* { dg-final { scan-assembler-times "vcvtne2ps2bf16\[ 
> \\t\]+\[^\{\n\]*%xmm1, %xmm0, %xmm0\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[ 
> \\t\]+#)" 1 } } */
> +
> +#include 
> +
> +__m512bh cvttest(__mmask32 k, __m512 a, __m512 b)
> +{
> +  return _mm512_maskz_cvtne2ps_pbh (k,a,b);
> +}
> +
> +__m256bh cvttest2(__mmask16 k, __m256 a, __m256 b)
> +{
> +  return _mm256_maskz_cvtne2ps_pbh (k,a,b);
> +}
> +
> +__m128bh cvttest3(__mmask8 k, __m128 a, __m128 b)
> +{
> +  return _mm_maskz_cvtne2ps_pbh (k,a,b);
> +}
> +
> --
> 2.31.1
>


-- 
BR,
Hongtao

Re: [PATCH] analyzer: Move gcc.dg/analyzer tests to c-c++-common (1).

2023-08-24 Thread David Malcolm via Gcc-patches

> From: benjamin priour 
> 
> Hi,
> 
> Below the first batch of a serie of patches to transition
> the analyzer testsuite from gcc.dg/analyzer to c-c++-common/analyzer.
> I do not know how long this serie will be, thus the patch was not
> numbered.
> 
> For the grand majority of the tests, the transition only required some
> adjustement over the syntax and casts to be C++-friendly, or to adjust
> the warnings regexes to fit the C++ FE.
> 
> The most noteworthy change is in the handling of known_functions,
> as described in the below patch.

Hi Benjamin.

Many thanks for putting this together, it looks like it was a lot of
work.

> Successfully regstrapped on x86_64-linux-gnu off trunk
> 18befd6f050e70f11ecca1dd58624f0ee3c68cc7.

Did you compare the before/after results from DejaGnu somehow?

Note that I've pushed 9 patches to the analyzer since
18befd6f050e70f11ecca1dd58624f0ee3c68cc7 and some of those touch the
files below, so it's worth rebasing and double-checking the results.

> Is it OK for trunk ?

It's *almost* ready; various comments inline below, throughout...

> 
> Thanks,
> Benjamin.
> 
> Patch below.
> ---
> 
> First batch of moving tests from under gcc.dg/analyzer into
> c-c++-common/analyzer.
> 
> C builtins are not recognized as such by C++, therefore
> this patch no longer uses tree.h:fndecl_built_in_p to recognize
> a builtin function, but rather the function names.
> 
> Thus functions named as C builtins - such as calloc, sprintf ... -
> are recognized as such both in C and C++ sources by the analyzer.
> 
> For user-declared functions named after builtins, the latters' function_decl
> tree are now preferred over the function_decl the user declared, even
> when the FE consider their declaration to mismatch
> (Wbuiltin-declaration-mismatch emitted). This mainly comes into account
> in the handling of these function attributes : the analyzer uses
> the builtin's attributes defined in gcc/builtins.def.
> 
> Signed-off-by: benjamin priour 
> 
> gcc/analyzer/ChangeLog:

Please add
PR analyzer/96395
to the ChangeLog entries, and [PR96395] to the end of the Subject of
the commit, so that these get tracked within that bug as they get
pushed.

> 
>   * analyzer.h (class known_function): Add virtual casts to
>   builtin_known_function.
>   (class builtin_known_function): New subclass of known_function
>   for builtins.
>   * kf.cc (class kf_alloca): Now derived from
>   builtin_known_function
>   (class kf_calloc): Likewise.
>   (class kf_free): Likewise.
>   (class kf_malloc): Likewise.
>   (class kf_memcpy_memmove): Likewise.
>   (class kf_memset): Likewise.
>   (class kf_realloc): Likewise.
>   (class kf_strchr): Likewise.
>   (class kf_sprintf): Likewise.
>   (class kf_strcpy): Likewise.
>   (class kf_strdup): Likewise.
>   (class kf_strlen): Likewise.
>   (class kf_strndup): Likewise.
>   (register_known_functions): Builtins are now registered as
>   known_functions by name rather than by their BUILTIN_CODE.
>   * known-function-manager.cc (get_normal_builtin): New overload.
>   * known-function-manager.h: New overload declaration.
>   * region-model.cc (region_model::get_builtin_kf): New function.
>   * region-model.h (class region_model): Add declaration of
>   get_builtin_kf.
>   * sm-fd.cc: For called recognized as builtins, use the attributes
>   of that builtin as defined in gcc/builtins.def rather than the user's.
>   * sm-malloc.cc (malloc_state_machine::on_stmt): Likewise.
> 
> gcc/testsuite/ChangeLog:

Add
PR analyzer/96395
here, as well, please.

> 
>   * gcc.dg/analyzer/aliasing-3.c: Moved to...
>   * c-c++-common/analyzer/aliasing-3.c: ...here.

[...snip...]

> diff --git a/gcc/analyzer/analyzer.h b/gcc/analyzer/analyzer.h
> index 93a28b4b5cf..63a220c9b6d 100644
> --- a/gcc/analyzer/analyzer.h
> +++ b/gcc/analyzer/analyzer.h
> @@ -128,6 +128,10 @@ struct interesting_t;
>  
>  class feasible_node;
>  
> +class known_function;
> +  class builtin_known_function;
> +  class internal_known_function;
> +
>  /* Forward decls of functions.  */
>  
>  extern void dump_tree (pretty_printer *pp, tree t);
> @@ -279,6 +283,28 @@ public:
>{
>  return;
>}
> +
> +  virtual const builtin_known_function *
> +  dyn_cast_builtin_kf () const { return NULL; }
> +  virtual builtin_known_function *
> +  dyn_cast_builtin_kf () { return NULL; }

I don't think we ever work with non-const known_function pointers, so
we don't need this non-const version of the vfunc.

> +};
> +
> +/* Subclass of known_function for builtin functions.  */
> +
> +class builtin_known_function : public known_function
> +{
> +public:
> +  virtual enum built_in_function builtin_code () const = 0;
> +  tree builtin_decl () const {
> +gcc_assert (builtin_code () < END_BUILTINS);
> +return builtin_info[builtin_code ()].decl;
> +  }
> +
> +  virtual const builtin_known_function *
> +

[PATCH] Fortran: improve bounds checking for DATA with implied-do [PR35095]

2023-08-24 Thread Harald Anlauf via Gcc-patches

Dear all,

the attached patch adds stricter bounds-checking for DATA statements
with implied-do.  I chose to allow overindexing (for arrays of rank
greater than 1) for -std=legacy, as there might be codes in the wild
that need this (and this is accepted by some other compilers, while
NAG is strict here).  We now get a warning with -std=gnu, and an
error with -std=f.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

(The PR is over 15 years old, so no backport intended... ;-)

Thanks,
Harald

From 420804e7399dbc307a80f084cfb840444b8ebfe7 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Thu, 24 Aug 2023 23:16:25 +0200
Subject: [PATCH] Fortran: improve bounds checking for DATA with implied-do
 [PR35095]

gcc/fortran/ChangeLog:

	PR fortran/35095
	* data.cc (get_array_index): Add bounds-checking code and return error
	status.  Overindexing will be allowed as an extension for -std=legacy
	and generate an error in standard-conforming mode.
	(gfc_assign_data_value): Use error status from get_array_index for
	graceful error recovery.

gcc/testsuite/ChangeLog:

	PR fortran/35095
	* gfortran.dg/data_bounds_1.f90: Adjust options to disable warnings.
	* gfortran.dg/data_bounds_2.f90: New test.
---
 gcc/fortran/data.cc | 47 ++---
 gcc/testsuite/gfortran.dg/data_bounds_1.f90 |  2 +-
 gcc/testsuite/gfortran.dg/data_bounds_2.f90 |  9 
 3 files changed, 51 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/data_bounds_2.f90

diff --git a/gcc/fortran/data.cc b/gcc/fortran/data.cc
index 7c2537dd3f0..0589fc3906f 100644
--- a/gcc/fortran/data.cc
+++ b/gcc/fortran/data.cc
@@ -43,13 +43,14 @@ static void formalize_init_expr (gfc_expr *);

 /* Calculate the array element offset.  */

-static void
+static bool
 get_array_index (gfc_array_ref *ar, mpz_t *offset)
 {
   gfc_expr *e;
   int i;
   mpz_t delta;
   mpz_t tmp;
+  bool ok = true;

   mpz_init (tmp);
   mpz_set_si (*offset, 0);
@@ -59,13 +60,42 @@ get_array_index (gfc_array_ref *ar, mpz_t *offset)
   e = gfc_copy_expr (ar->start[i]);
   gfc_simplify_expr (e, 1);

-  if ((gfc_is_constant_expr (ar->as->lower[i]) == 0)
-	  || (gfc_is_constant_expr (ar->as->upper[i]) == 0)
-	  || (gfc_is_constant_expr (e) == 0))
-	gfc_error ("non-constant array in DATA statement %L", &ar->where);
+  if (!gfc_is_constant_expr (ar->as->lower[i])
+	  || !gfc_is_constant_expr (ar->as->upper[i])
+	  || !gfc_is_constant_expr (e))
+	{
+	  gfc_error ("non-constant array in DATA statement %L", &ar->where);
+	  ok = false;
+	  break;
+	}

   mpz_set (tmp, e->value.integer);
   gfc_free_expr (e);
+
+  /* Overindexing is only allowed as a legacy extension.  */
+  if (mpz_cmp (tmp, ar->as->lower[i]->value.integer) < 0
+	  && !gfc_notify_std (GFC_STD_LEGACY,
+			  "Subscript at %L below array lower bound "
+			  "(%ld < %ld) in dimension %d", &ar->c_where[i],
+			  mpz_get_si (tmp),
+			  mpz_get_si (ar->as->lower[i]->value.integer),
+			  i+1))
+	{
+	  ok = false;
+	  break;
+	}
+  if (mpz_cmp (tmp, ar->as->upper[i]->value.integer) > 0
+	  && !gfc_notify_std (GFC_STD_LEGACY,
+			  "Subscript at %L above array upper bound "
+			  "(%ld > %ld) in dimension %d", &ar->c_where[i],
+			  mpz_get_si (tmp),
+			  mpz_get_si (ar->as->upper[i]->value.integer),
+			  i+1))
+	{
+	  ok = false;
+	  break;
+	}
+
   mpz_sub (tmp, tmp, ar->as->lower[i]->value.integer);
   mpz_mul (tmp, tmp, delta);
   mpz_add (*offset, tmp, *offset);
@@ -77,6 +107,8 @@ get_array_index (gfc_array_ref *ar, mpz_t *offset)
 }
   mpz_clear (delta);
   mpz_clear (tmp);
+
+  return ok;
 }

 /* Find if there is a constructor which component is equal to COM.
@@ -298,7 +330,10 @@ gfc_assign_data_value (gfc_expr *lvalue, gfc_expr *rvalue, mpz_t index,
 	}

 	  if (ref->u.ar.type == AR_ELEMENT)
-	get_array_index (&ref->u.ar, &offset);
+	{
+	  if (!get_array_index (&ref->u.ar, &offset))
+		goto abort;
+	}
 	  else
 	mpz_set (offset, index);

diff --git a/gcc/testsuite/gfortran.dg/data_bounds_1.f90 b/gcc/testsuite/gfortran.dg/data_bounds_1.f90
index 24cdc7c9815..1e6321a2884 100644
--- a/gcc/testsuite/gfortran.dg/data_bounds_1.f90
+++ b/gcc/testsuite/gfortran.dg/data_bounds_1.f90
@@ -1,5 +1,5 @@
 ! { dg-do compile }
-! { dg-options "-std=gnu" }
+! { dg-options "-std=gnu -w" }
 ! Checks the fix for PR32315, in which the bounds checks below were not being done.
 !
 ! Contributed by Tobias Burnus 
diff --git a/gcc/testsuite/gfortran.dg/data_bounds_2.f90 b/gcc/testsuite/gfortran.dg/data_bounds_2.f90
new file mode 100644
index 000..1aa9fd4c423
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/data_bounds_2.f90
@@ -0,0 +1,9 @@
+! { dg-do compile }
+! { dg-options "-std=f2018" }
+! PR fortran/35095 - Improve bounds checking for DATA with implied-do
+
+program chkdata
+  character(len=2), dimension(2,2) :: str
+  data (str(i,1),i=1,3) / 'A','B','C' / ! { dg-error "a

[PATCH V2] RISC-V: Add Types to Un-Typed Sync Instructions:

2023-08-24 Thread Edwin Lu

Related Discussion:
https://inbox.sourceware.org/gcc-patches/12fb5088-3f28-0a69-de1e-f387371a5...@gmail.com/

This patch updates the sync instructions to ensure that no insn is left
without a type attribute. Updates a total of 6 insns to have type "atomic"

Tested for regressions using rv32/64 multilib with newlib/linux. 

gcc/Changelog:

* config/riscv/sync-rvwmo.md: updated types to "multi" or
"atomic" based on number of assembly lines generated
* config/riscv/sync-ztso.md: likewise
* config/riscv/sync.md: likewise

Signed-off-by: Edwin Lu 
---
Changes in V2:
  - Update insns that were typed "atomic" to "multi" if insn
can generate multiple lines of assembly following
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628055.html
---
 gcc/config/riscv/sync-rvwmo.md |  7 ---
 gcc/config/riscv/sync-ztso.md  |  7 ---
 gcc/config/riscv/sync.md   | 14 +-
 3 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/gcc/config/riscv/sync-rvwmo.md b/gcc/config/riscv/sync-rvwmo.md
index 1fc7cf16b5b..cb641ea9ec3 100644
--- a/gcc/config/riscv/sync-rvwmo.md
+++ b/gcc/config/riscv/sync-rvwmo.md
@@ -41,7 +41,8 @@ (define_insn "mem_thread_fence_rvwmo"
 else
gcc_unreachable ();
   }
-  [(set (attr "length") (const_int 4))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 4))])
 
 ;; Atomic memory operations.
 
@@ -66,7 +67,7 @@ (define_insn "atomic_load_rvwmo"
 else
   return "l\t%0,%1";
   }
-  [(set_attr "type" "atomic")
+  [(set_attr "type" "multi")
(set (attr "length") (const_int 12))])
 
 ;; Implement atomic stores with conservative fences.
@@ -92,5 +93,5 @@ (define_insn "atomic_store_rvwmo"
 else
   return "s\t%z1,%0";
   }
-  [(set_attr "type" "atomic")
+  [(set_attr "type" "multi")
(set (attr "length") (const_int 12))])
diff --git a/gcc/config/riscv/sync-ztso.md b/gcc/config/riscv/sync-ztso.md
index ed94471b96b..7bb15b7ab8c 100644
--- a/gcc/config/riscv/sync-ztso.md
+++ b/gcc/config/riscv/sync-ztso.md
@@ -35,7 +35,8 @@ (define_insn "mem_thread_fence_ztso"
 else
gcc_unreachable ();
   }
-  [(set (attr "length") (const_int 4))])
+  [(set_attr "type" "atomic")
+   (set (attr "length") (const_int 4))])
 
 ;; Atomic memory operations.
 
@@ -56,7 +57,7 @@ (define_insn "atomic_load_ztso"
 else
   return "l\t%0,%1";
   }
-  [(set_attr "type" "atomic")
+  [(set_attr "type" "multi")
(set (attr "length") (const_int 12))])
 
 (define_insn "atomic_store_ztso"
@@ -76,5 +77,5 @@ (define_insn "atomic_store_ztso"
 else
   return "s\t%z1,%0";
   }
-  [(set_attr "type" "atomic")
+  [(set_attr "type" "multi")
(set (attr "length") (const_int 8))])
diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
index 2f85951508f..6ff3493b5ce 100644
--- a/gcc/config/riscv/sync.md
+++ b/gcc/config/riscv/sync.md
@@ -136,7 +136,8 @@ (define_insn "subword_atomic_fetch_strong_"
   "sc.w%J3\t%6, %7, %1\;"
   "bnez\t%6, 1b";
   }
-  [(set (attr "length") (const_int 28))])
+  [(set_attr "type" "multi")
+   (set (attr "length") (const_int 28))])
 
 (define_expand "atomic_fetch_nand"
   [(match_operand:SHORT 0 "register_operand");; old 
value at mem
@@ -203,7 +204,8 @@ (define_insn "subword_atomic_fetch_strong_nand"
   "sc.w%J3\t%6, %7, %1\;"
   "bnez\t%6, 1b";
   }
-  [(set (attr "length") (const_int 32))])
+  [(set_attr "type" "multi")
+   (set (attr "length") (const_int 32))])
 
 (define_expand "atomic_fetch_"
   [(match_operand:SHORT 0 "register_operand")   ;; old value 
at mem
@@ -310,7 +312,8 @@ (define_insn "subword_atomic_exchange_strong"
   "sc.w%J3\t%5, %5, %1\;"
   "bnez\t%5, 1b";
   }
-  [(set (attr "length") (const_int 20))])
+  [(set_attr "type" "multi")
+   (set (attr "length") (const_int 20))])
 
 (define_insn "atomic_cas_value_strong"
   [(set (match_operand:GPR 0 "register_operand" "=&r")
@@ -336,7 +339,7 @@ (define_insn "atomic_cas_value_strong"
   "bnez\t%6,1b\;"
   "1:";
   }
-  [(set_attr "type" "atomic")
+  [(set_attr "type" "multi")
(set (attr "length") (const_int 16))])
 
 (define_expand "atomic_compare_and_swap"
@@ -497,7 +500,8 @@ (define_insn "subword_atomic_cas_strong"
   "bnez\t%7, 1b\;"
   "1:";
   }
-  [(set (attr "length") (const_int 28))])
+  [(set_attr "type" "multi")
+   (set (attr "length") (const_int 28))])
 
 (define_expand "atomic_test_and_set"
   [(match_operand:QI 0 "register_operand" "") ;; bool output
-- 
2.34.1

[committed] i386: Optimize pinsrq of 0 with index 1 into movq [PR94866]

2023-08-24 Thread Uros Bizjak via Gcc-patches

Add new pattern involving vec_merge RTX that is produced by combine from the
combination of sse4_1_pinsrq and *movdi_internal:

7: r86:DI=0
8: r85:V2DI=vec_merge(vec_duplicate(r86:DI),r87:V2DI,0x2)
  REG_DEAD r87:V2DI
  REG_DEAD r86:DI
Successfully matched this instruction:
(set (reg:V2DI 85 [ a ])
(vec_merge:V2DI (reg:V2DI 87)
(const_vector:V2DI [
(const_int 0 [0]) repeated x2
])
(const_int 1 [0x1])))

PR target/94866

gcc/ChangeLog:

* config/i386/sse.md (*sse2_movq128__1): New insn pattern.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr94866.C: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index da85223a9b4..52104f8d1c9 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1770,6 +1770,18 @@ (define_insn "*sse2_movq128_"
(set_attr "prefix" "maybe_vex")
(set_attr "mode" "TI")])
 
+(define_insn "*sse2_movq128__1"
+  [(set (match_operand:VI8F_128 0 "register_operand" "=v")
+   (vec_merge:VI8F_128
+ (match_operand:VI8F_128 1 "nonimmediate_operand" "vm")
+ (match_operand:VI8F_128 2 "const0_operand")
+ (const_int 1)))]
+  "TARGET_SSE2"
+  "%vmovq\t{%1, %0|%0, %q1}"
+  [(set_attr "type" "ssemov")
+   (set_attr "prefix" "maybe_vex")
+   (set_attr "mode" "TI")])
+
 ;; Move a DI from a 32-bit register pair (e.g. %edx:%eax) to an xmm.
 ;; We'd rather avoid this entirely; if the 32-bit reg pair was loaded
 ;; from memory, we'd prefer to load the memory directly into the %xmm
diff --git a/gcc/testsuite/g++.target/i386/pr94866.C 
b/gcc/testsuite/g++.target/i386/pr94866.C
new file mode 100644
index 000..eb0f5ef11c5
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr94866.C
@@ -0,0 +1,13 @@
+// PR target/94866
+// { dg-do compile }
+// { dg-options "-O2 -msse4.1" }
+// { dg-require-effective-target c++11 }
+
+typedef long long v2di __attribute__((vector_size(16)));
+
+v2di _mm_move_epi64(v2di a)
+{
+return v2di{a[0], 0LL};
+}
+
+// { dg-final { scan-assembler-times "movq\[ \\t\]+\[^\n\]*%xmm" 1 } }

[PATCH ver 3] rs6000, add overloaded DFP quantize support

2023-08-24 Thread Carl Love via Gcc-patches

GCC maintainers:

Version 3, fixed the built-in instance names.  Missed removing the "n"
the name.  Added the tighter constraints on the predicates for the
define_insn.  Updated the wording for the built-ins in the
documentation file.  Changed the test file name again.  Updated the
ChangeLog file, added the PR target line.  Retested the patch on Power
10LE and Power 8 and Power 9.

Version 2, renamed the built-in instances.  Changed the name of the
overloaded built-in.  Added the missing documentation for the new
built-ins.  Fixed typos.  Changed name of the test.  Updated the
effective target for the test.  Retested the patch on Power 10LE and
Power 8 and Power 9.

The following patch adds four built-ins for the decimal floating point
(DFP) quantize instructions on rs6000.  The built-ins are for 64-bit
and 128-bit DFP operands.

The patch also adds a test case for the new builtins.

The Patch has been tested on Power 10LE and Power 9 LE/BE.

Please let me know if the patch is acceptable for mainline.  Thanks.

 Carl Love


---
rs6000, add overloaded DFP quantize support

Add decimal floating point (DFP) quantize built-ins for both 64-bit DFP
and 128-DFP operands.  In each case, there is an immediate version and a
variable version of the built-in.  The RM value is a 2-bit constant int
which specifies the rounding mode to use.  For the immediate versions of
the built-in, the TE field is a 5-bit constant that specifies the value of
the ideal exponent for the result.  The built-in specifications are:

  __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
const int RM)
  __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
const int RM)
  __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
 const int RM)
  __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
 const int RM)

A testcase is added for the new built-in definitions.

gcc/ChangeLog:
* config/rs6000/dfp.md: New UNSPEC_DQUAN.
(dfp_dqua_, dfp_dqua_i): New define_insn.
* config/rs6000/rs6000-builtins.def (__builtin_dfp_dqua,
__builtin_dfp_dquai, __builtin_dfp_dquaq, __builtin_dfp_dquaqi):
New buit-in definitions.
* config/rs6000/rs6000-overload.def (__builtin_dfp_quantize): New
overloaded definition.
* doc/extend.texi: Add documentation for __builtin_dfp_quantize.

gcc/testsuite/
* gcc.target/powerpc/pr93448.c: New test case.

PR target/93448
---
 gcc/config/rs6000/dfp.md   |  25 ++-
 gcc/config/rs6000/rs6000-builtins.def  |  15 ++
 gcc/config/rs6000/rs6000-overload.def  |  10 ++
 gcc/doc/extend.texi|  17 ++
 gcc/testsuite/gcc.target/powerpc/pr93448.c | 200 +
 5 files changed, 266 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr93448.c

diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
index 5ed8a73ac51..052dc0946d3 100644
--- a/gcc/config/rs6000/dfp.md
+++ b/gcc/config/rs6000/dfp.md
@@ -271,7 +271,8 @@ (define_c_enum "unspec"
UNSPEC_DIEX
UNSPEC_DSCLI
UNSPEC_DTSTSFI
-   UNSPEC_DSCRI])
+   UNSPEC_DSCRI
+   UNSPEC_DQUAN])
 
 (define_code_iterator DFP_TEST [eq lt gt unordered])
 
@@ -395,3 +396,25 @@ (define_insn "dfp_dscri_"
   "dscri %0,%1,%2"
   [(set_attr "type" "dfp")
(set_attr "size" "")])
+
+(define_insn "dfp_dqua_"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "const_0_to_3_operand" "n")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dqua %0,%1,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
+
+(define_insn "dfp_dqua_i"
+  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
+(unspec:DDTD [(match_operand:SI 1 "s5bit_cint_operand" "n")
+ (match_operand:DDTD 2 "gpc_reg_operand" "d")
+ (match_operand:SI 3 "const_0_to_3_operand" "n")]
+ UNSPEC_DQUAN))]
+  "TARGET_DFP"
+  "dquai %1,%0,%2,%3"
+  [(set_attr "type" "dfp")
+   (set_attr "size" "")])
diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def
index 8a294d6c934..81a0de88b9c 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -2983,6 +2983,21 @@
   const unsigned long long __builtin_unpack_dec128 (_Decimal128, const int<1>);
 UNPACK_TD unpacktd {}
 
+  const _Decimal64 __builtin_dfp_dqua (_Decimal64, _Decimal64, \
+  const int<2>);
+DFPQUAN_64 dfp_dqua_dd {}
+
+  const _Decimal64 __builtin_dfp_dquai (const int<5>, _Decimal64, \
+

Re: [PATCH ver 2] rs6000, add overloaded DFP quantize support

2023-08-24 Thread Carl Love via Gcc-patches

Kewen, Peter:

> on 2023/8/17 08:19, Carl Love wrote:
> > GCC maintainers:
> > 
> > Version 2, renamed the built-in instances.  Changed the name of the
> > overloaded built-in.  Added the missing documentation for the new
> > built-ins.  Fixed typos.  Changed name of the test.  Updated the
> > effective target for the test.  Retested the patch on Power 10LE
> > and
> > Power 8 and Power 9.
> > 
> > The following patch adds four built-ins for the decimal floating
> point
> > (DFP) quantize instructions on rs6000.  The built-ins are for 64-
> > bit
> > and 128-bit DFP operands.
> > 
> > The patch also adds a test case for the new builtins.
> > 
> > The Patch has been tested on Power 10LE and Power 9 LE/BE.
> > 
> > Please let me know if the patch is acceptable for
> > mainline.  Thanks.
> > 
> >  Carl Love
> > 
> > 
> > 
> > --
> > [PATCH] rs6000, add overloaded DFP quantize support
> > 
> > Add decimal floating point (DFP) quantize built-ins for both 64-bit
> DFP
> > and 128-DFP operands.  In each case, there is an immediate version
> and a
> > variable version of the built-in.  The RM value is a 2-bit constant
> int
> > which specifies the rounding mode to use.  For the immediate
> > versions
> of
> > the built-in, the TE field is a 5-bit constant that specifies the
> value of
> > the ideal exponent for the result.  The built-in specifications
> > are:
> > 
> >   __Decimal64 builtin_dfp_quantize (_Decimal64, _Decimal64,
> > const int RM)
> >   __Decimal64 builtin_dfp_quantize (const int TE, _Decimal64,
> > const int)
> >   __Decimal128 builtin_dfp_quantize (_Decimal128, _Decimal128,
> >  const int RM)
> >   __Decimal128 builtin_dfp_quantize (const int TE, _Decimal128,
> >  const int)
> 
> Nit: Add the parameter name "RM" for all instances, otherwise the
> readers
> might feel confused what do the other two without RM mean. :)

Yes, they all should have the parameter name RM.  Fixed.

> 
> > A testcase is added for the new built-in definitions.
> 
> Nit: A PR marker line like:
> 
>   PR target/93448
> 
> > gcc/ChangeLog:
> > * config/rs6000/dfp.md: New UNSPECDQUAN.
> > (dfp_quan_, dfp_quan_i): New define_insn.
> > * config/rs6000/rs6000-builtins.def (__builtin_dfp_quantize_64,
> > __builtin_dfp_quantize_64i, __builtin_dfp_quantize_128,
> > __builtin_dfp_quantize_128i): New buit-in definitions.
> > * config/rs6000/rs6000-overload.def (__builtin_dfp_quantize,
> > __builtin_dfpq_quantize): New overloaded definitions.
> 
> These entries need updates with this new revision, also miss one
> entry
Fixed with the new names, added the documentation entry.

> for documentation update.
> 
> > gcc/testsuite/
> >  * gcc.target/powerpc/builtin-dfp-quantize-runnable.c: New test
> > case.
> 
> Ditto, inconsistent name.

Fixed with the new name of the file, pr93448.c.

> 
> > ---
> >  gcc/config/rs6000/dfp.md  |  25 ++-
> >  gcc/config/rs6000/rs6000-builtins.def |  15 ++
> >  gcc/config/rs6000/rs6000-overload.def |  10 +
> >  gcc/doc/extend.texi   |  15 ++
> >  .../gcc.target/powerpc/pr93448-dfp-quantize.c | 199
> ++
> >  5 files changed, 263 insertions(+), 1 deletion(-)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr93448-dfp-
> quantize.c
> > diff --git a/gcc/config/rs6000/dfp.md b/gcc/config/rs6000/dfp.md
> > index 5ed8a73ac51..abd21c5db75 100644
> > --- a/gcc/config/rs6000/dfp.md
> > +++ b/gcc/config/rs6000/dfp.md
> > @@ -271,7 +271,8 @@
> > UNSPEC_DIEX
> > UNSPEC_DSCLI
> > UNSPEC_DTSTSFI
> > -   UNSPEC_DSCRI])
> > +   UNSPEC_DSCRI
> > +   UNSPEC_DQUAN])
> >  
> >  (define_code_iterator DFP_TEST [eq lt gt unordered])
> >  
> > @@ -395,3 +396,25 @@
> >"dscri %0,%1,%2"
> >[(set_attr "type" "dfp")
> > (set_attr "size" "")])
> > +
> > +(define_insn "dfp_dquan_"
> 
> I guess I mentioned this previously, I prefer "dfp_dqua_"
> which aligns with the most others ...

Yes, I missed that I had the extra "n" and didn't fix that part of the
name.  Sorry about that.  Updated both define_insn definitions.

> 
> > +  [(set (match_operand:DDTD 0 "gpc_reg_operand" "=d")
> > +(unspec:DDTD [(match_operand:DDTD 1 "gpc_reg_operand" "d")
> > + (match_operand:DDTD 2 "gpc_reg_operand" "d")
> > + (match_operand:QI 3 "immediate_operand" "i")]
> > + UNSPEC_DQUAN))]
> > +  "TARGET_DFP"
> > +  "dqua %0,%1,%2,%3"
> > +  [(set_attr "type" "dfp")
> > +   (set_attr "size" "")])
> > +
> > +(define_insn "dfp_dquan_i"
> 
> ..., also prefer "dfp_dquai_" here.

Ditto on the name change fix.

> 
> Please also incorporate Peter's insightful comments on predicates
> and constraints on this part.

OK, changed to the stricter predicate constraints.

> 
> > +  [(set

[PATCH 3/3] PHIOPT: Allow BIT_AND and BIT_IOR in early phiopt

2023-08-24 Thread Andrew Pinski via Gcc-patches

Now that MIN/MAX can sometimes be transformed into BIT_AND/BIT_IOR,
we should allow BIT_AND and BIT_IOR in the early phiopt.
Also we produce BIT_AND/BIT_IOR for things like `bool0 ? bool1 : 0`
which seems like a good thing to allow early on too.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* tree-ssa-phiopt.cc (phiopt_early_allow): Allow
BIT_AND_EXPR and BIT_IOR_EXPR.
---
 gcc/tree-ssa-phiopt.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 54706f4c7e7..7e63fb115db 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -469,6 +469,9 @@ phiopt_early_allow (gimple_seq &seq, gimple_match_op &op)
 {
   case MIN_EXPR:
   case MAX_EXPR:
+  /* MIN/MAX could be convert into these. */
+  case BIT_IOR_EXPR:
+  case BIT_AND_EXPR:
   case ABS_EXPR:
   case ABSU_EXPR:
   case NEGATE_EXPR:
-- 
2.31.1

[PATCH 1/3] MATCH: Move `a ? one_zero : one_zero` matching after min/max matching

2023-08-24 Thread Andrew Pinski via Gcc-patches

In PR 106677, I noticed that on the trunk we were producing:
```
  _25 = SR.116_117 == 0;
  _27 = (unsigned char) _25;
  _32 = _27 | SR.116_117;
```
>From `SR.115_117 != 0 ? SR.115_117 : 1`
Rather than:
```
  _119 = MAX_EXPR <1, SR.115_117>;
```
Or (rather)
```
  _119 = SR.115_117 | 1;
```
Due to the order of the patterns.

OK? Bootstrapped and tested on x86_64-linux-gnu with no
regressions.

gcc/ChangeLog:

* match.pd (`a ? one_zero : one_zero`): Move
below detection of minmax.
---
 gcc/match.pd | 38 --
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/gcc/match.pd b/gcc/match.pd
index 890f050cbad..c87a0795667 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4950,24 +4950,6 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  )
 )
 
-(simplify
- (cond @0 zero_one_valued_p@1 zero_one_valued_p@2)
- (switch
-  /* bool0 ? bool1 : 0 -> bool0 & bool1 */
-  (if (integer_zerop (@2))
-   (bit_and (convert @0) @1))
-  /* bool0 ? 0 : bool2 -> (bool0^1) & bool2 */
-  (if (integer_zerop (@1))
-   (bit_and (bit_xor (convert @0) { build_one_cst (type); } ) @2))
-  /* bool0 ? 1 : bool2 -> bool0 | bool2 */
-  (if (integer_onep (@1))
-   (bit_ior (convert @0) @2))
-  /* bool0 ? bool1 : 1 -> (bool0^1) | bool1 */
-  (if (integer_onep (@2))
-   (bit_ior (bit_xor (convert @0) @2) @1))
- )
-)
-
 /* Optimize
# x_5 in range [cst1, cst2] where cst2 = cst1 + 1
x_5 ? cstN ? cst4 : cst3
@@ -5298,6 +5280,26 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && integer_nonzerop (fold_build2 (GE_EXPR, boolean_type_node, @3, 
@1)))
   (max @2 @4))
 
+#if GIMPLE
+(simplify
+ (cond @0 zero_one_valued_p@1 zero_one_valued_p@2)
+ (switch
+  /* bool0 ? bool1 : 0 -> bool0 & bool1 */
+  (if (integer_zerop (@2))
+   (bit_and (convert @0) @1))
+  /* bool0 ? 0 : bool2 -> (bool0^1) & bool2 */
+  (if (integer_zerop (@1))
+   (bit_and (bit_xor (convert @0) { build_one_cst (type); } ) @2))
+  /* bool0 ? 1 : bool2 -> bool0 | bool2 */
+  (if (integer_onep (@1))
+   (bit_ior (convert @0) @2))
+  /* bool0 ? bool1 : 1 -> (bool0^1) | bool1 */
+  (if (integer_onep (@2))
+   (bit_ior (bit_xor (convert @0) @2) @1))
+ )
+)
+#endif
+
 /* X != C1 ? -X : C2 simplifies to -X when -C1 == C2.  */
 (simplify
  (cond (ne @0 INTEGER_CST@1) (negate@3 @0) INTEGER_CST@2)
-- 
2.31.1

[PATCH 2/3] MATCH: `a | C -> C` when we know that `a & ~C == 0`

2023-08-24 Thread Andrew Pinski via Gcc-patches

Even though this is handled by other code inside both VRP and CCP,
sometimes we want to optimize this outside of VRP and CCP.
An example is given in PR 106677 where phiopt will happen
after VRP (which removes a cast for a comparison) and then
phiopt will optimize the phi to be `a | 1` which can then
be optimized to `1` due to this patch.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Note Similar code already exists in simplify_rtx for the RTL level;
it was moved from combine to simplify_rtx in r0-72539-gbd1ef757767f6d.
gcc/ChangeLog:

* match.pd (`a | C -> C`): New pattern.
---
 gcc/match.pd | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/match.pd b/gcc/match.pd
index c87a0795667..3bbeceb37b4 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1456,6 +1456,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
   && wi::bit_and_not (get_nonzero_bits (@0), wi::to_wide (@1)) == 0)
   @0))
+/* x | C -> C if we know that x & ~C == 0.  */
+(simplify
+ (bit_ior SSA_NAME@0 INTEGER_CST@1)
+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+  && wi::bit_and_not (get_nonzero_bits (@0), wi::to_wide (@1)) == 0)
+  @1))
 #endif
 
 /* ~(~X - Y) -> X + Y and ~(~X + Y) -> X - Y.  */
-- 
2.31.1

Re: [PATCH] RISC-V: Move vector-abi testcases into rvv/base folder

2023-08-24 Thread Palmer Dabbelt


On Thu, 24 Aug 2023 11:04:59 PDT (-0700), Patrick O'Neill wrote:

Resolves failures like this on rv32gcv linux:
compiler exited with status 1
output is:
In file included from 
/tc-baseline/build-linux-gcv/sysroot/usr/include/features.h:515,
 from 
/tc-baseline/build-linux-gcv/sysroot/usr/include/bits/libc-header-start.h:33,
 from 
/tc-baseline/build-linux-gcv/sysroot/usr/include/stdint.h:26,
 from 
/tc-baseline/build-linux-gcv/lib/gcc/riscv32-unknown-linux-gnu/14.0.0/include/stdint.h:9,
 from 
/tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/stdint.h:9,
 from 
/tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/riscv_vector.h:28,
 from 
/tc-baseline/gcc/gcc/testsuite/gcc.target/riscv/vector-abi-1.c:4:
/tc-baseline/build-linux-gcv/sysroot/usr/include/gnu/stubs.h:17:11: fatal 
error: gnu/stubs-lp64d.h: No such file or directory
compilation terminated.

Tested using:
rv{32/64}{gc/gcv} newlib
rv{32/64}gcv linux

gcc/testsuite/ChangeLog:

* gcc.target/riscv/vector-abi-1.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-1.c: ...here.
* gcc.target/riscv/vector-abi-2.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-2.c: ...here.
* gcc.target/riscv/vector-abi-3.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-3.c: ...here.
* gcc.target/riscv/vector-abi-4.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-4.c: ...here.
* gcc.target/riscv/vector-abi-5.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-5.c: ...here.
* gcc.target/riscv/vector-abi-6.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-6.c: ...here.
* gcc.target/riscv/vector-abi-7.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-7.c: ...here.
* gcc.target/riscv/vector-abi-8.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-8.c: ...here.
* gcc.target/riscv/vector-abi-9.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-9.c: ...here.

Signed-off-by: Patrick O'Neill 
---
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-1.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-2.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-3.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-4.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-5.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-6.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-7.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-8.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-9.c | 0
 9 files changed, 0 insertions(+), 0 deletions(-)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-1.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-2.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-3.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-4.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-5.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-6.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-7.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-8.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-9.c (100%)

diff --git a/gcc/testsuite/gcc.target/riscv/vector-abi-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/vector-abi-1.c
rename to gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-1.c
diff --git a/gcc/testsuite/gcc.target/riscv/vector-abi-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-2.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/vector-abi-2.c
rename to gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-2.c
diff --git a/gcc/testsuite/gcc.target/riscv/vector-abi-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-3.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/vector-abi-3.c
rename to gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-3.c
diff --git a/gcc/testsuite/gcc.target/riscv/vector-abi-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-4.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/vector-abi-4.c
rename to gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-4.c
diff --git a/gcc/testsuite/gcc.target/riscv/vector-abi-5.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-5.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/vector-abi-5.c
rename to gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-5.c
diff --git a/gcc/testsuite/gcc.target/riscv/vector-abi-6.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-6.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/vector-abi-6.c
rename to

[PATCH] RISC-V: Move vector-abi testcases into rvv/base folder

2023-08-24 Thread Patrick O'Neill

Resolves failures like this on rv32gcv linux:
compiler exited with status 1
output is:
In file included from 
/tc-baseline/build-linux-gcv/sysroot/usr/include/features.h:515,
 from 
/tc-baseline/build-linux-gcv/sysroot/usr/include/bits/libc-header-start.h:33,
 from 
/tc-baseline/build-linux-gcv/sysroot/usr/include/stdint.h:26,
 from 
/tc-baseline/build-linux-gcv/lib/gcc/riscv32-unknown-linux-gnu/14.0.0/include/stdint.h:9,
 from 
/tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/stdint.h:9,
 from 
/tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/riscv_vector.h:28,
 from 
/tc-baseline/gcc/gcc/testsuite/gcc.target/riscv/vector-abi-1.c:4:
/tc-baseline/build-linux-gcv/sysroot/usr/include/gnu/stubs.h:17:11: fatal 
error: gnu/stubs-lp64d.h: No such file or directory
compilation terminated.

Tested using:
rv{32/64}{gc/gcv} newlib
rv{32/64}gcv linux

gcc/testsuite/ChangeLog:

* gcc.target/riscv/vector-abi-1.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-1.c: ...here.
* gcc.target/riscv/vector-abi-2.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-2.c: ...here.
* gcc.target/riscv/vector-abi-3.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-3.c: ...here.
* gcc.target/riscv/vector-abi-4.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-4.c: ...here.
* gcc.target/riscv/vector-abi-5.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-5.c: ...here.
* gcc.target/riscv/vector-abi-6.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-6.c: ...here.
* gcc.target/riscv/vector-abi-7.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-7.c: ...here.
* gcc.target/riscv/vector-abi-8.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-8.c: ...here.
* gcc.target/riscv/vector-abi-9.c: Moved to...
* gcc.target/riscv/rvv/base/vector-abi-9.c: ...here.

Signed-off-by: Patrick O'Neill 
---
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-1.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-2.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-3.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-4.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-5.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-6.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-7.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-8.c | 0
 gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-9.c | 0
 9 files changed, 0 insertions(+), 0 deletions(-)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-1.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-2.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-3.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-4.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-5.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-6.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-7.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-8.c (100%)
 rename gcc/testsuite/gcc.target/riscv/{ => rvv/base}/vector-abi-9.c (100%)

diff --git a/gcc/testsuite/gcc.target/riscv/vector-abi-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/vector-abi-1.c
rename to gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-1.c
diff --git a/gcc/testsuite/gcc.target/riscv/vector-abi-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-2.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/vector-abi-2.c
rename to gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-2.c
diff --git a/gcc/testsuite/gcc.target/riscv/vector-abi-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-3.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/vector-abi-3.c
rename to gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-3.c
diff --git a/gcc/testsuite/gcc.target/riscv/vector-abi-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-4.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/vector-abi-4.c
rename to gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-4.c
diff --git a/gcc/testsuite/gcc.target/riscv/vector-abi-5.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-5.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/vector-abi-5.c
rename to gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-5.c
diff --git a/gcc/testsuite/gcc.target/riscv/vector-abi-6.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-6.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/vector-abi-6.c
rename to gcc/testsuite/gcc.target/riscv/rvv/base/vector-abi-6.c
diff --git a

Re: [PATCH] rs6000: Fix issue in specifying PTImode as an attribute [PR106895]

2023-08-24 Thread Michael Meissner via Gcc-patches

On Thu, Jul 20, 2023 at 10:05:28AM +0530, jeevitha wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> When the user specifies PTImode as an attribute, it breaks. Created
> a tree node to handle PTImode types. PTImode attribute helps in generating
> even/odd register pairs on 128 bits.
> 
> 2023-07-20  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/110411
>   * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Add fields
>   to hold PTImode type.
>   * config/rs6000/rs6000-builtin.cc (rs6000_init_builtins): Add node
>   for PTImode type.
> 
> gcc/testsuite/
>   PR target/106895
>   * gcc.target/powerpc/pr106895.c: New testcase.

It is good as far as it goes, but I suspect we will eventually need to extend
it.  In particular, the reason people need PTImode is they need the even/odd
register layout.  What you've done enables users to declare this value.

However, it is likely the users (kernel users mostly) will want to use it with
the atomic built-in functions that take 16 byte values.  So I suspect we will
need to add overloads for those built-ins to allow either TImode and PTImode to
be used.  Note, the PTImode built-in would bypass the TImode parts where they
convert a TImode into PTImode.

This is the reason PTImode was created in the first place.  Due to the calling
sequence, TImode could be passed in odd/even (as well as even/odd) register
pairs, but the atomic insns and lq/stq need even/odd register pairs.  But if
you are calling a built-in with PTImode, you don't have to convert it to
PTImode.

But then the next problem is what happens when people start using it.  Do we
need to add all of the TImode insns (Add, subtract, and, ior, xor, shifts at
the very least)?  These are the things I expect people might want to do for
memory accessed via atomic insns.

Then we get to the thorny problems of load/store on little endian systems, and
do we define the order of the two registers.  Unfortunately, the lq/stq
instructions will load words in the opposite order as plq/pstq.  I imagine the
kernel folk want to use lq/stq, but we may have to figure out exactly what they
want.

If we define any form of operation on PTImode, we likely need to define whether
register 0 has the high bits or low bits.

Sorry to be so negative, but those are a lot of the issues that might come up
as people use it.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Re: [PATCH] sso-string@gnu-versioned-namespace [PR83077]

2023-08-24 Thread François Dumont via Gcc-patches


I've now prepared the patch to support following config:

--disable-libstdcxx-dual-abi --with-default-libstdcxx-abi=new

and so detected yet another problem with src/c++98/compatibility.cc. We 
need basic_istream<>::ignore(streamsize) definitions that rely here but 
not the rest of it.


François

On 17/08/2023 19:17, François Dumont wrote:


Another fix to define __cow_string(const std::string&) in 
cxx11-stdexcept.cc even if ! _GLIBCXX_USE_DUAL_ABI.


On 13/08/2023 21:51, François Dumont wrote:


Here is another version with enhanced sizeof/alignof static_assert in 
string-inst.cc for the std::__cow_string definition from . 
The assertions in cow-stdexcept.cc are now checking the definition 
which is in the same file.


On 13/08/2023 15:27, François Dumont wrote:


Here is the fixed patch tested in all 3 modes:

- _GLIBCXX_USE_DUAL_ABI

- !_GLIBCXX_USE_DUAL_ABI && !_GLIBCXX_USE_CXX11_ABI

- !_GLIBCXX_USE_DUAL_ABI && _GLIBCXX_USE_CXX11_ABI

I don't know what you have in mind for the change below but I wanted 
to let you know that I tried to put COW std::basic_string into a 
nested __cow namespace when _GLIBCXX_USE_CXX11_ABI. But it had more 
impact on string-inst.cc so I preferred the macro substitution approach.


There are some test failing when !_GLIBCXX_USE_CXX11_ABI that are 
unrelated with my changes. I'll propose fixes in coming days.


    libstdc++: [_GLIBCXX_INLINE_VERSION] Use cxx11 abi [PR83077]

    Use cxx11 abi when activating versioned namespace mode. To do 
support
    a new configuration mode where !_GLIBCXX_USE_DUAL_ABI and 
_GLIBCXX_USE_CXX11_ABI.


    The main change is that std::__cow_string is now defined 
whenever _GLIBCXX_USE_DUAL_ABI
    or _GLIBCXX_USE_CXX11_ABI is true. Implementation is using 
available std::string in

    case of dual abi and a subset of it when it's not.

    On the other side std::__sso_string is defined only when 
_GLIBCXX_USE_DUAL_ABI is true
    and _GLIBCXX_USE_CXX11_ABI is false. Meaning that 
std::__sso_string is a typedef for the
    cow std::string implementation when dual abi is disabled and cow 
string is being used.


    libstdcxx-v3/ChangeLog:

    PR libstdc++/83077
    * acinclude.m4 [GLIBCXX_ENABLE_LIBSTDCXX_DUAL_ABI]: 
Default to "new" libstdcxx abi.
    * config/locale/dragonfly/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Define money_base

    members.
    * config/locale/generic/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Likewise.
    * config/locale/gnu/monetary_members.cc 
[!_GLIBCXX_USE_DUAL_ABI]: Likewise.

    * config/locale/gnu/numeric_members.cc
[!_GLIBCXX_USE_DUAL_ABI](__narrow_multibyte_chars): Define.
    * configure: Regenerate.
    * include/bits/c++config
[_GLIBCXX_INLINE_VERSION](_GLIBCXX_NAMESPACE_CXX11, 
_GLIBCXX_BEGIN_NAMESPACE_CXX11):

    Define empty.
[_GLIBCXX_INLINE_VERSION](_GLIBCXX_END_NAMESPACE_CXX11, 
_GLIBCXX_DEFAULT_ABI_TAG):

    Likewise.
    * include/bits/cow_string.h [!_GLIBCXX_USE_CXX11_ABI]: 
Define a light version of COW

    basic_string as __std_cow_string for use in stdexcept.
    * include/std/stdexcept [_GLIBCXX_USE_CXX11_ABI]: Define 
__cow_string.

    (__cow_string(const char*)): New.
    (__cow_string::c_str()): New.
    * python/libstdcxx/v6/printers.py 
(StdStringPrinter::__init__): Set self.new_string to True

    when std::__8::basic_string type is found.
    * src/Makefile.am 
[ENABLE_SYMVERS_GNU_NAMESPACE](ldbl_alt128_compat_sources): Define 
empty.

    * src/Makefile.in: Regenerate.
    * src/c++11/Makefile.am (cxx11_abi_sources): Rename into...
    (dual_abi_sources): ...this. Also move 
cow-local_init.cc, cxx11-hash_tr1.cc,

    cxx11-ios_failure.cc entries to...
    (sources): ...this.
    (extra_string_inst_sources): Move cow-fstream-inst.cc, 
cow-sstream-inst.cc, cow-string-inst.cc,
    cow-string-io-inst.cc, cow-wtring-inst.cc, 
cow-wstring-io-inst.cc, cxx11-locale-inst.cc,

    cxx11-wlocale-inst.cc entries to...
    (inst_sources): ...this.
    * src/c++11/Makefile.in: Regenerate.
    * src/c++11/cow-fstream-inst.cc 
[_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
    * src/c++11/cow-locale_init.cc [_GLIBCXX_USE_CXX11_ABI]: 
Skip definitions.
    * src/c++11/cow-sstream-inst.cc 
[_GLIBCXX_USE_CXX11_ABI]: Skip definitions.
    * src/c++11/cow-stdexcept.cc [_GLIBCXX_USE_CXX11_ABI]: 
Include .
    [_GLIBCXX_USE_DUAL_ABI || 
_GLIBCXX_USE_CXX11_ABI](__cow_string): Redefine before
    including . Define 
_GLIBCXX_DEFINE_STDEXCEPT_INSTANTIATIONS so that

    __cow_string definition in  is skipped.
    [_GLIBCXX_USE_CXX11_ABI]: Skip Transaction Memory TS 
definitions.
    Move static_assert to check std::_cow_string abi layout 
to...

    * src/c++11/string-inst.cc: ...here

Re: [PATCH V2] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-24 Thread Kito Cheng via Gcc-patches

>
>-  Phase 3 - Backward && forward demanded info propagation and fusion 
> across
>   blocks.
>

Need update comment here.

>-  Phase 6 - Propagate AVL between vsetvl instructions.

Need update comment here too.

> +/* Return true if the current VSETVL is dominated by preceding VSETVL.  */
> +static bool
> +vsetvl_dominated_by_p (const basic_block cfg_bb,
> +  const vector_insn_info &vsetvl1,
> +  const vector_insn_info &vsetvl2, bool fuse_p)

"VSETVL1 is dominated by preceding VSETVL2." ?
and what's the definition of dominated?
it seems like not in the traditional sense of "dominate"?


> vector_insn_info::merge (const vector_insn_info &merge_info,
> -enum merge_type type) const
> +enum merge_type type, int bb_index) const

I would suggest just split this into two funciton, local_merge and
global_merge, and remove merge_type,
generally I like generalized those function by arguments, but those
two are different enough after this change.


> +  /* Recompute the AVL source when bb_index*/

This sentence seems to be incomplete?


> + if (dest_block_info.probability > 
> src_block_info.probability)
> +   prob = dest_block_info.probability;

prob = std::max(dest_block_info.probability, src_block_info.probability);

> @@ -3720,6 +3138,8 @@ pass_vsetvl::compute_local_properties (void)
>for (const bb_info *bb : crtl->ssa->bbs ())
>  {
>unsigned int curr_bb_idx = bb->index ();
> +  if (curr_bb_idx == ENTRY_BLOCK || curr_bb_idx == EXIT_BLOCK)
> +   continue;
>const auto local_dem
> = m_vector_manager->vector_block_infos[curr_bb_idx].local_dem;
>const auto reaching_out

This small change seems could be a small optimization for early exit
for this loop and could be a separated patch? if so plz send a
separated, and pre-aproved for that :)



> + if (src_block_info.reaching_out.empty_p ())
> +   {
...
> + else if (src_block_info.reaching_out.dirty_p ())

Could you add more comment to explain more for each condition?

> +   {
> + rtx vl = NULL_RTX;
> + if (!reaching_out.get_avl_source ())
> +   {
> + gcc_assert (vsetvl_insn_p (reaching_out.get_insn ()->rtl ()));
> + vl = get_vl (reaching_out.get_insn ()->rtl ());
> +   }
> + else
> +   vl = reaching_out.get_avl_reg_rtx ();
> + new_pat = gen_vsetvl_pat (VSETVL_NORMAL, reaching_out, vl);
> +   }

need more comment here too

> +  edge eg;
> +  edge_iterator eg_iterator;
> +  FOR_EACH_EDGE (eg, eg_iterator, cfg_bb->succs)
> {
> - fprintf (dump_file,
> -  "\nInsert vsetvl insn %d at the end of :\n",
> -  INSN_UID (new_insn), cfg_bb->index);
> - print_rtl_single (dump_file, new_insn);
> + /* We should not get an abnormal edge here.  */
> + gcc_assert (!(eg->flags & EDGE_ABNORMAL));
> + if (m_vector_manager->vsetvl_dominated_by_all_preds_p (cfg_bb,
> +
> reaching_out))
> +   continue;
> +

Also need more comments here .

Re: [PATCH] c++: Fix up mangling of function/block scope static structured bindings [PR111069]

2023-08-24 Thread Jakub Jelinek via Gcc-patches

On Wed, Aug 23, 2023 at 04:23:00PM -0400, Jason Merrill wrote:
> I'd be surprised if this would affect any real code, but I suppose so. In
> any case I'd like to fix this at the same time as the local statics, to
> avoid changing their mangled name twice.

Ok.
Running now into a problem with abi tags, because cp_maybe_mangle_decomp
is called before the type of the structured binding is finalized (sequence
is cp_maybe_mangle_decomp; cp_finish_decl; cp_finish_decomp), I vaguely
remember the reason was to have the name already mangled by cp_finish_decl
time, so that it can add it into varpool etc..
Will see if I can e.g. pass the initializer expression to
cp_maybe_mangle_decomp and figure out the tags from that.

> > @@ -9049,6 +9050,25 @@ cp_maybe_mangle_decomp (tree decl, tree
> > tree d = first;
> > for (unsigned int i = 0; i < count; i++, d = DECL_CHAIN (d))
> > v[count - i - 1] = d;
> > +  if (DECL_FUNCTION_SCOPE_P (decl))
> > +   {
> > + size_t sz = 3;
> > + for (unsigned int i = 0; i < count; ++i)
> > +   sz += IDENTIFIER_LENGTH (DECL_NAME (v[i])) + 1;
> > + char *name = XALLOCAVEC (char, sz);
> > + name[0] = 'D';
> > + name[1] = 'C';
> > + char *p = name + 2;
> > + for (unsigned int i = 0; i < count; ++i)
> > +   {
> > + size_t len = IDENTIFIER_LENGTH (DECL_NAME (v[i]));
> > + *p++ = ' ';
> > + memcpy (p, IDENTIFIER_POINTER (DECL_NAME (v[i])), len);
> > + p += len;
> > +   }
> > + *p = '\0';
> > + determine_local_discriminator (decl, get_identifier (name));
> > +   }
> 
> Maybe do this in mangle_decomp, based on the actual mangling in process
> instead of this pseudo-mangling?

Not sure that is possible, for 2 reasons:
1) determine_local_discriminator otherwise works on DECL_NAME, not mangled
   names, so if one uses (albeit implementation reserved)
   _ZZN1N3fooI1TB3bazEEivEDC1h1iEB6foobar and similar identifiers, they
   could clash with the counting of the structured bindings
2) seems the local discriminator counting shouldn't take into account
   details like abi tags, e.g. if I have:
struct [[gnu::abi_tag ("foobar")]] S { int a; };
namespace N {
  template 
  inline int
  foo ()
  {
static int h = 42; int r = ++h;
{ static S h = { 42 }; r += ++h.a; }
{ static int h = 42; r += ++h; }
{ static S h = { 42 }; r += ++h.a; }
return r;
  }
}
int (*p) () = N::foo;
  then both GCC and Clang mangle these as
  _ZZN1N3fooIiEEivE1h
  _ZZN1N3fooIiEEivE1hB6foobar_0
  _ZZN1N3fooIiEEivE1h_1
  _ZZN1N3fooIiEEivE1hB6foobar_2
  so whether abi tags appear in the mangled name or not shouldn't result
  in separate buckets for counting.

> 
> > @@ -4564,6 +4519,13 @@ write_guarded_var_name (const tree varia
> >   /* The name of a guard variable for a reference temporary should refer
> >  to the reference, not the temporary.  */
> >   write_string (IDENTIFIER_POINTER (DECL_NAME (variable)) + 4);
> > +  else if (DECL_DECOMPOSITION_P (variable)
> > +  && DECL_NAME (variable) == NULL_TREE
> > +  && startswith (IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (variable)),
> > + "_Z"))
> 
> Maybe add a startswith overload that takes an identifier?

Ok.

> > @@ -4630,7 +4592,10 @@ mangle_ref_init_variable (const tree var
> > start_mangling (variable);
> > write_string ("_ZGR");
> > check_abi_tags (variable);
> > -  write_name (variable, /*ignore_local_scope=*/0);
> > +  if (DECL_DECOMPOSITION_P (variable))
> > +write_guarded_var_name (variable);
> > +  else
> > +write_name (variable, /*ignore_local_scope=*/0);
> 
> Why not use write_guarded_name unconditionally?

Ok.

Jakub

Re: [PATCH V2 5/5] OpenMP: Fortran support for imperfectly-nested loops

2023-08-24 Thread Tobias Burnus


On 22.08.23 15:37, Jakub Jelinek wrote:

On Sun, Jul 23, 2023 at 04:15:21PM -0600, Sandra Loosemore wrote:

[...]
In the Fortran front end, most of the semantic processing happens during
the translation phase, so the parse phase just collects the intervening
statements, checks them for errors, and splices them around the loop body.

LGTM, but please let Tobias have a second look unless he has done so
already.


LGTM but some minor comments:

@@ -9764,13 +9820,12 @@ gfc_resolve_do_iterator (gfc_code *code, gfc_symbol 
*sym, bool add_clause)
...
-  c = c->block->next;
-}
+  c = find_nested_loop_in_chain (c->block->next);
+   }

Here the indentation of '}' is now in col 3 instead of 4.

* * *

I was wondering whether any removed error message (due to early return on error)
will cause missed error checking, but it looks as if all are covered elsewhere.
:-)

* * *

Can you update the implementation status - either as part of this patch as it is
the last in the series or as follow up? It feels good to see 'Y' getting added 
there.

https://gcc.gnu.org/onlinedocs/libgomp/OpenMP-5_002e0.html
"Collapse of associated loops that are imperfectly nested loops"

alias libgomp/libgomp.texi in line 203.

Thanks,

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-08-24 Thread Jakub Jelinek via Gcc-patches

On Thu, Aug 24, 2023 at 05:47:09PM +0200, Richard Biener via Gcc-patches wrote:
> > Do you think that the pass is worthy of inclusion into upstream GCC? What 
> > are
> > some things that I should change? Should I try to put the pass in different
> > places in passes.def?
> 
> The most obvious places would be right after SSA construction and before RTL 
> expansion.
> Can you provide measurements for those positions?
> Can the pass somehow be used as part of propagations like during value 
> numbering?

Could the new file be called gimple-ssa-sccp.cc or something similar?
Removing some PHIs is nice, but it would be also interesting to know what
are the effects on generated code size and/or performance.
And also if it has any effects on debug information coverage.

Jakub

Re: [PATCH] testsuite: aarch64: Adjust SVE ACLE tests to new generated code

2023-08-24 Thread Prathamesh Kulkarni via Gcc-patches

On Thu, 24 Aug 2023 at 08:27, Thiago Jung Bauermann
 wrote:
>
> Since commit e7a36e4715c7 "[PATCH] RISC-V: Support simplify (-1-x) for
> vector." these tests fail on aarch64-linux:
>
> === g++ tests ===
>
> Running g++:g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp ...
> FAIL: gcc.target/aarch64/sve/acle/asm/subr_s8.c -std=gnu++98 -O2 
> -fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_FULL  
> check-function-bodies subr_m1_s8_m
> FAIL: gcc.target/aarch64/sve/acle/asm/subr_s8.c -std=gnu++98 -O2 
> -fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_OVERLOADS  
> check-function-bodies subr_m1_s8_m
> FAIL: gcc.target/aarch64/sve/acle/asm/subr_u8.c -std=gnu++98 -O2 
> -fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_FULL  
> check-function-bodies subr_m1_u8_m
> FAIL: gcc.target/aarch64/sve/acle/asm/subr_u8.c -std=gnu++98 -O2 
> -fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_OVERLOADS  
> check-function-bodies subr_m1_u8_m
>
> === gcc tests ===
>
> Running gcc:gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp ...
> FAIL: gcc.target/aarch64/sve/acle/asm/subr_s8.c -std=gnu90 -O2 
> -fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_FULL  
> check-function-bodies subr_m1_s8_m
> FAIL: gcc.target/aarch64/sve/acle/asm/subr_s8.c -std=gnu90 -O2 
> -fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_OVERLOADS  
> check-function-bodies subr_m1_s8_m
> FAIL: gcc.target/aarch64/sve/acle/asm/subr_u8.c -std=gnu90 -O2 
> -fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_FULL  
> check-function-bodies subr_m1_u8_m
> FAIL: gcc.target/aarch64/sve/acle/asm/subr_u8.c -std=gnu90 -O2 
> -fno-schedule-insns -DCHECK_ASM --save-temps -DTEST_OVERLOADS  
> check-function-bodies subr_m1_u8_m
>
> Andrew Pinski's analysis in PR testsuite/111071 is that the new code is
> better and the testcase should be updated. I also asked Prathamesh Kulkarni
> in private and he agreed.
>
> Here is the update. With this change, all tests in
> gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp pass on aarch64-linux.
>
> gcc/testsuite/
> PR testsuite/111071
> * gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c: Adjust to 
> new code.
> * gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c: Likewise.
>
> Suggested-by: Andrew Pinski 
> ---
>  gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c | 3 +--
>  gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c | 3 +--
>  2 files changed, 2 insertions(+), 4 deletions(-)
Hi Thiago,
The patch looks OK to me, but can't approve.

Thanks,
Prathamesh
>
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c
> index b9615de6655f..3e521bc9ae32 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_s8.c
> @@ -76,8 +76,7 @@ TEST_UNIFORM_Z (subr_1_s8_m_untied, svint8_t,
>
>  /*
>  ** subr_m1_s8_m:
> -** mov (z[0-9]+\.b), #-1
> -** subrz0\.b, p0/m, z0\.b, \1
> +** not z0\.b, p0/m, z0\.b
>  ** ret
>  */
>  TEST_UNIFORM_Z (subr_m1_s8_m, svint8_t,
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c
> index 65606b6dda03..4922bdbacc47 100644
> --- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/subr_u8.c
> @@ -76,8 +76,7 @@ TEST_UNIFORM_Z (subr_1_u8_m_untied, svuint8_t,
>
>  /*
>  ** subr_m1_u8_m:
> -** mov (z[0-9]+\.b), #-1
> -** subrz0\.b, p0/m, z0\.b, \1
> +** not z0\.b, p0/m, z0\.b
>  ** ret
>  */
>  TEST_UNIFORM_Z (subr_m1_u8_m, svuint8_t,

Re: [RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-08-24 Thread Richard Biener via Gcc-patches




> Am 24.08.2023 um 17:07 schrieb Filip Kastl :
> 
> Hi,
> 
> As a part of my bachelor thesis under the supervision of Honza (Jan Hubicka), 
> I
> implemented a new PHI elimination algorithm into GCC. The algorithm is
> described in the following article:
> 
> Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A.
> (2013). Simple and Efficient Construction of Static Single Assignment
> Form. In: Jhala, R., De Bosschere, K. (eds) Compiler Construction. CC
> 2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin,
> Heidelberg. https://doi.org/10.1007/978-3-642-37051-9_6
> 
> In the article the PHI elimination algorithm is only considered a part of
> another algorithm. However, with Honza we tried running the algorithm as a
> standalone pass and found that there is a surprisingly big number of PHI
> functions it is able to remove -- sometimes even ~13% of PHI functions or 
> more.
> This email contains a patch with the pass and with the placement in passes.def
> we used to measure this.
> 
> Do you think that the pass is worthy of inclusion into upstream GCC? What are
> some things that I should change? Should I try to put the pass in different
> places in passes.def?

The most obvious places would be right after SSA construction and before RTL 
expansion.  Can you provide measurements for those positions?  Can the pass 
somehow be used as part of propagations like during value numbering?

Richard 

> Things I already know I'd like to change:
> - Split the patch into two (one for sccp, one for the utility functions)
> - Change the name SCCP to something else since there already is a pass with
>  that name (any suggestions?)
> - Add a comment into sccp.cc explaining the algorithm
> 
> I successfully bootstrapped and tested GCC with the patch applied (with the
> commit 3b691e0190c6e7291f8a52e1e14d8293a28ff4ce checked out). 
> 
> Here are my measurements. I measured the number of PHIs before the PHI
> elimination algorithm was run and after it was run. I measured on the standard
> 2017 benchmarks with -O3. Since the pass is present in passes.def twice,
> results of the first run are marked (1) and results of the second are marked
> (2). Honza also did measurements with profile feedback and got even bigger
> percentages.
> 
> 500.perlbench_r
> Started with (1) 30287
> Ended with (1) 26188
> Removed PHI % (1) 13.53385941162875161000
> Started with (2) 38005
> Ended with (2) 37897
> Removed PHI % (2) .28417313511380081600
> 
> 502.gcc_r
> Started with (1) 148187
> Ended with (1) 140292
> Removed PHI % (1) 5.32772780338356266100
> Started with (2) 211479
> Ended with (2) 210635
> Removed PHI % (2) .39909399987705635100
> 
> 505.mcf_r
> Started with (1) 341
> Ended with (1) 303
> Removed PHI % (1) 11.14369501466275659900
> Started with (2) 430
> Ended with (2) 426
> Removed PHI % (2) .93023255813953488400
> 
> 523.xalancbmk_r
> Started with (1) 62514
> Ended with (1) 57785
> Removed PHI % (1) 7.5647055059346800
> Started with (2) 132561
> Ended with (2) 131726
> Removed PHI % (2) .62989868815111533600
> 
> 531.deepsjeng_r
> Started with (1) 1388
> Ended with (1) 1250
> Removed PHI % (1) 9.94236311239193083600
> Started with (2) 1887
> Ended with (2) 1879
> Removed PHI % (2) .42395336512983571900
> 
> 541.leela_r
> Started with (1) 3332
> Ended with (1) 2994
> Removed PHI % (1) 10.14405762304921968800
> Started with (2) 4372
> Ended with (2) 4352
> Removed PHI % (2) .45745654162854528900
> 
> Here is the patch:
> 
> -- >8 --
> 
> This patch introduces two things:
> - A new PHI elimination pass (major)
> - New utility functions for passes that replace one ssa name with a
>  different one (minor)
> 
> Strongly-connected copy propagation (SCCP) pass
> 
> The PHI elimination pass is a lightweight optimization pass based on
> strongly-connected components. Some set of PHIs may be redundant because
> the PHIs only refer to each other or to a single value from outside the
> set. This pass finds and eliminates these sets. As a bonus the pass also
> does some basic copy propagation because it considers a copy statement
> to be a PHI with a single argument.
> 
> SCCP uses an algorithm from this article:
> Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A.
> (2013). Simple and Efficient Construction of Static Single Assignment
> Form. In: Jhala, R., De Bosschere, K. (eds) Compiler Construction. CC
> 2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin,
> Heidelberg. https://doi.org/10.1007/978-3-642-37051-9_6
> 
> cleanup_after_replace and cleanup_after_all_replaces_done
> 
> Whenever you replace all uses of an ssa name by a different ssa name,
> some GCC internal structures have to be updated. To streamline this
> process, the patch adds the cleanup_after_replace function that should
> be called after an ssa name is replaced by a different one and the
> cleanup_after_all_replaces_done that should be called before a pass that
> replaced one or more ssa names e

RISC-V: Fix stack_save_restore_1/2 test cases

2023-08-24 Thread Jivan Hakobyan via Gcc-patches

This patch fixes failing stack_save_restore_1/2 test cases.
After 6619b3d4c15c commit size of the frame was changed.


gcc/testsuite/ChangeLog:
* gcc.target/riscv/stack_save_restore_1.c: Update frame size
* gcc.target/riscv/stack_save_restore_2.c: Likewise.


-- 
With the best regards
Jivan Hakobyan
diff --git a/gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c b/gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c
index 255ce5f40c9e300cbcc245d69a045bed2b65d02b..0bf64bac767203685ec88c72394ada617d6940d5 100644
--- a/gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c
+++ b/gcc/testsuite/gcc.target/riscv/stack_save_restore_1.c
@@ -8,7 +8,7 @@ float getf();
 /*
 ** bar:
 **	call	t0,__riscv_save_(3|4)
-**	addi	sp,sp,-2032
+**	addi	sp,sp,-2016
 **	...
 **	li	t0,-12288
 **	add	sp,sp,t0
@@ -16,7 +16,7 @@ float getf();
 **	li	t0,12288
 **	add	sp,sp,t0
 **	...
-**	addi	sp,sp,2032
+**	addi	sp,sp,2016
 **	tail	__riscv_restore_(3|4)
 */
 int bar()
diff --git a/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c b/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
index 4ce5e0118a499136f625c0333c71e98417014851..f076a68613006e19d8110e975391299e48e89441 100644
--- a/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
+++ b/gcc/testsuite/gcc.target/riscv/stack_save_restore_2.c
@@ -8,7 +8,7 @@ float getf();
 /*
 ** bar:
 **	call	t0,__riscv_save_(3|4)
-**	addi	sp,sp,-2032
+**	addi	sp,sp,-2016
 **	...
 **	li	t0,-12288
 **	add	sp,sp,t0
@@ -16,7 +16,7 @@ float getf();
 **	li	t0,12288
 **	add	sp,sp,t0
 **	...
-**	addi	sp,sp,2032
+**	addi	sp,sp,2016
 **	tail	__riscv_restore_(3|4)
 */
 int bar()

Re: Check that passes do not forget to define profile

2023-08-24 Thread Jan Hubicka via Gcc-patches

> On Thu, Aug 24, 2023 at 3:15 PM Jan Hubicka via Gcc-patches
>  wrote:
> >
> > Hi,
> > this patch extends verifier to check that all probabilities and counts are
> > initialized if profile is supposed to be present.  This is a bit complicated
> > by the posibility that we inline !flag_guess_branch_probability function
> > into function with profile defined and in this case we need to stop
> > verification.  For this reason I added flag to cfg structure tracking this.
> >
> > Bootstrapped/regtested x86_64-linux, comitted.
> 
> Couldn't we have massaged profile_status to avoid extra full_profile?
> Aka add PROFILE_{READ,GUESSED}_PARTIAL?

I am working in direction of removing profile_status.  We mostly use it
to determine whether profile is reliable (or present at all).
This is available locally in profile quality info of profile_count and
profile_probability.

Most existing tests of that value goes wrong when we inline functions
with one profile status to functions with another, so they should be
replaced by more local tests.

Honza

Re: [PATCH V4] Add warning options -W[no-]compare-distinct-pointer-types

2023-08-24 Thread Jose E. Marchesi via Gcc-patches



Hi Marek.

> On Thu, Aug 17, 2023 at 05:37:03PM +0200, Jose E. Marchesi via Gcc-patches 
> wrote:
>> 
>> > On Thu, 17 Aug 2023, Jose E. Marchesi via Gcc-patches wrote:
>> >
>> >> +@opindex Wcompare-distinct-pointer-types
>> >> +@item -Wcompare-distinct-pointer-types
>> >
>> > This @item should say @r{(C and Objective-C only)}, since the option isn't 
>> > implemented for C++.  OK with that change.
>> 
>> Pushed with that change.
>> Thanks for the prompt review!
>
> I see the following failures:
>
> FAIL: gcc.c-torture/compile/pr106537-1.c   -Os   (test for warnings, line 28)
> FAIL: gcc.c-torture/compile/pr106537-1.c   -Os   (test for warnings, line 30)
> FAIL: gcc.c-torture/compile/pr106537-1.c -O2 -flto
> -fno-use-linker-plugin -flto-partition=none (test for warnings, line
> 28)
> FAIL: gcc.c-torture/compile/pr106537-1.c -O2 -flto
> -fno-use-linker-plugin -flto-partition=none (test for warnings, line
> 30)
> FAIL: gcc.c-torture/compile/pr106537-1.c -O2 -flto -fuse-linker-plugin
> -fno-fat-lto-objects (test for warnings, line 28)
> FAIL: gcc.c-torture/compile/pr106537-1.c -O2 -flto -fuse-linker-plugin
> -fno-fat-lto-objects (test for warnings, line 30)
> FAIL: gcc.c-torture/compile/pr106537-2.c   -O0   (test for warnings, line 26)
> FAIL: gcc.c-torture/compile/pr106537-2.c   -O0   (test for warnings, line 28)
> FAIL: gcc.c-torture/compile/pr106537-2.c   -O1   (test for warnings, line 26)
> FAIL: gcc.c-torture/compile/pr106537-2.c   -O1   (test for warnings, line 28)
> FAIL: gcc.c-torture/compile/pr106537-2.c   -O2   (test for warnings, line 26)
> FAIL: gcc.c-torture/compile/pr106537-2.c   -O2   (test for warnings, line 28)
> FAIL: gcc.c-torture/compile/pr106537-2.c   -O3 -g   (test for warnings, line 
> 26)
> FAIL: gcc.c-torture/compile/pr106537-2.c   -O3 -g   (test for warnings, line 
> 28)
> FAIL: gcc.c-torture/compile/pr106537-2.c   -Os   (test for warnings, line 26)
> FAIL: gcc.c-torture/compile/pr106537-2.c   -Os   (test for warnings, line 28)
> FAIL: gcc.c-torture/compile/pr106537-2.c -O2 -flto
> -fno-use-linker-plugin -flto-partition=none (test for warnings, line
> 26)
> FAIL: gcc.c-torture/compile/pr106537-2.c -O2 -flto
> -fno-use-linker-plugin -flto-partition=none (test for warnings, line
> 28)
> FAIL: gcc.c-torture/compile/pr106537-2.c -O2 -flto -fuse-linker-plugin
> -fno-fat-lto-objects (test for warnings, line 26)
> FAIL: gcc.c-torture/compile/pr106537-2.c -O2 -flto -fuse-linker-plugin
> -fno-fat-lto-objects (test for warnings, line 28)
>
> The problem is that for ==/!=, when one of the types is void*,
> build_binary_op goes to the branch attempting to warn about
> comparing void* with a function pointer, and never gets to the 
> -Wcompare-distinct-pointer-types warning.

Oof I wonder what happened with my regtesting.

I just pushed the patch below as obvious, which adjusts the tests to
conform to GCC's behavior of not emitting that pedwarn for
equality/inequality of void pointers with non-function pointers.

Sorry about this.  And thanks for reporting.

>From 721f7e2c4e5eed645593258624dd91e6c39f3bd2 Mon Sep 17 00:00:00 2001
From: "Jose E. Marchesi" 
Date: Thu, 24 Aug 2023 17:10:52 +0200
Subject: [PATCH] Fix tests for PR 106537.

This patch fixes the tests for PR 106537 (support for
-W[no]-compare-distinct-pointer-types) which were expecting the
warning when checking for equality/inequality of void pointers with
non-function pointers.

gcc/testsuite/ChangeLog:

PR c/106537
* gcc.c-torture/compile/pr106537-1.c: Comparing void pointers to
non-function pointers is legit.
* gcc.c-torture/compile/pr106537-2.c: Likewise.
---
 gcc/testsuite/gcc.c-torture/compile/pr106537-1.c | 6 --
 gcc/testsuite/gcc.c-torture/compile/pr106537-2.c | 6 --
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr106537-1.c 
b/gcc/testsuite/gcc.c-torture/compile/pr106537-1.c
index 3f3b06577d5..b67b6090dc3 100644
--- a/gcc/testsuite/gcc.c-torture/compile/pr106537-1.c
+++ b/gcc/testsuite/gcc.c-torture/compile/pr106537-1.c
@@ -25,9 +25,11 @@ int xdp_context (struct xdp_md *xdp)
 return 3;
   if (metadata + 1 <= data) /* { dg-warning "comparison of distinct pointer 
types" } */
 return 4;
-  if (metadata + 1 == data) /* { dg-warning "comparison of distinct pointer 
types" } */
+  /* Note that it is ok to check for equality or inequality betewen void
+ pointers and any other non-function pointers.  */
+  if ((int*) (metadata + 1) == (long*) data) /* { dg-warning "comparison of 
distinct pointer types" } */
 return 5;
-  if (metadata + 1 != data) /* { dg-warning "comparison of distinct pointer 
types" } */
+  if ((int*) metadata + 1 != (long*) data) /* { dg-warning "comparison of 
distinct pointer types" } */
 return 5;
 
   return 1;
diff --git a/gcc/testsuite/gcc.c-torture/compile/pr106537-2.c 
b/gcc/testsuite/gcc.c-torture/compile/pr106537-2.c
index 6876adf3aab..d4223c25c94 100644
--- a/gcc/testsuite/gcc.c-torture/com

[RFC] gimple ssa: SCCP - A new PHI optimization pass

2023-08-24 Thread Filip Kastl

Hi,

As a part of my bachelor thesis under the supervision of Honza (Jan Hubicka), I
implemented a new PHI elimination algorithm into GCC. The algorithm is
described in the following article:

Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A.
(2013). Simple and Efficient Construction of Static Single Assignment
Form. In: Jhala, R., De Bosschere, K. (eds) Compiler Construction. CC
2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin,
Heidelberg. https://doi.org/10.1007/978-3-642-37051-9_6

In the article the PHI elimination algorithm is only considered a part of
another algorithm. However, with Honza we tried running the algorithm as a
standalone pass and found that there is a surprisingly big number of PHI
functions it is able to remove -- sometimes even ~13% of PHI functions or more.
This email contains a patch with the pass and with the placement in passes.def
we used to measure this.

Do you think that the pass is worthy of inclusion into upstream GCC? What are
some things that I should change? Should I try to put the pass in different
places in passes.def?

Things I already know I'd like to change:
- Split the patch into two (one for sccp, one for the utility functions)
- Change the name SCCP to something else since there already is a pass with
  that name (any suggestions?)
- Add a comment into sccp.cc explaining the algorithm

I successfully bootstrapped and tested GCC with the patch applied (with the
commit 3b691e0190c6e7291f8a52e1e14d8293a28ff4ce checked out). 

Here are my measurements. I measured the number of PHIs before the PHI
elimination algorithm was run and after it was run. I measured on the standard
2017 benchmarks with -O3. Since the pass is present in passes.def twice,
results of the first run are marked (1) and results of the second are marked
(2). Honza also did measurements with profile feedback and got even bigger
percentages.

500.perlbench_r
Started with (1) 30287
Ended with (1) 26188
Removed PHI % (1) 13.53385941162875161000
Started with (2) 38005
Ended with (2) 37897
Removed PHI % (2) .28417313511380081600

502.gcc_r
Started with (1) 148187
Ended with (1) 140292
Removed PHI % (1) 5.32772780338356266100
Started with (2) 211479
Ended with (2) 210635
Removed PHI % (2) .39909399987705635100

505.mcf_r
Started with (1) 341
Ended with (1) 303
Removed PHI % (1) 11.14369501466275659900
Started with (2) 430
Ended with (2) 426
Removed PHI % (2) .93023255813953488400

523.xalancbmk_r
Started with (1) 62514
Ended with (1) 57785
Removed PHI % (1) 7.5647055059346800
Started with (2) 132561
Ended with (2) 131726
Removed PHI % (2) .62989868815111533600

531.deepsjeng_r
Started with (1) 1388
Ended with (1) 1250
Removed PHI % (1) 9.94236311239193083600
Started with (2) 1887
Ended with (2) 1879
Removed PHI % (2) .42395336512983571900

541.leela_r
Started with (1) 3332
Ended with (1) 2994
Removed PHI % (1) 10.14405762304921968800
Started with (2) 4372
Ended with (2) 4352
Removed PHI % (2) .45745654162854528900

Here is the patch:

-- >8 --

This patch introduces two things:
- A new PHI elimination pass (major)
- New utility functions for passes that replace one ssa name with a
  different one (minor)

Strongly-connected copy propagation (SCCP) pass

The PHI elimination pass is a lightweight optimization pass based on
strongly-connected components. Some set of PHIs may be redundant because
the PHIs only refer to each other or to a single value from outside the
set. This pass finds and eliminates these sets. As a bonus the pass also
does some basic copy propagation because it considers a copy statement
to be a PHI with a single argument.

SCCP uses an algorithm from this article:
Braun, M., Buchwald, S., Hack, S., Leißa, R., Mallon, C., Zwinkau, A.
(2013). Simple and Efficient Construction of Static Single Assignment
Form. In: Jhala, R., De Bosschere, K. (eds) Compiler Construction. CC
2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin,
Heidelberg. https://doi.org/10.1007/978-3-642-37051-9_6

cleanup_after_replace and cleanup_after_all_replaces_done

Whenever you replace all uses of an ssa name by a different ssa name,
some GCC internal structures have to be updated. To streamline this
process, the patch adds the cleanup_after_replace function that should
be called after an ssa name is replaced by a different one and the
cleanup_after_all_replaces_done that should be called before a pass that
replaced one or more ssa names exits. The SCCP pass uses these
functions.

Signed-off-by: Filip Kastl 

gcc/ChangeLog:

* Makefile.in: Added sccp pass.
* passes.def: Added sccp pass to early and late optimizations.
* tree-pass.h (make_pass_sccp): Added sccp pass.
* tree-ssa-propagate.cc (cleanup_after_replace): New function.
(cleanup_after_all_replaces_done): New function.
* tree-ssa-propagate.h (cleanup_after_replace): New function.
(cleanup_after_all_replaces_done): New function.
* sccp.cc: N

[PATCH 8/9] analyzer: handle strlen(BITS_WITHIN) [PR105899]

2023-08-24 Thread David Malcolm via Gcc-patches

gcc/analyzer/ChangeLog:
PR analyzer/105899
* region-model.cc (fragment::has_null_terminator): Handle
SK_BITS_WITHIN.
---
 gcc/analyzer/region-model.cc | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 6574ec140074..025b555d7b97 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3357,10 +3357,29 @@ struct fragment
}
break;
 
+  case SK_BITS_WITHIN:
+   {
+ const bits_within_svalue *bits_within_sval
+   = (const bits_within_svalue *)m_sval;
+ byte_range bytes (0, 0);
+ if (bits_within_sval->get_bits ().as_byte_range (&bytes))
+   {
+ const svalue *inner_sval = bits_within_sval->get_inner_svalue ();
+ fragment f (byte_range
+ (start_read_offset - bytes.get_start_bit_offset (),
+  std::max (bytes.m_size_in_bytes,
+ available_bytes)),
+ inner_sval);
+ return f.has_null_terminator (start_read_offset, out_bytes_read);
+   }
+   }
+   break;
+
   default:
// TODO: it may be possible to handle other cases here.
-   return tristate::TS_UNKNOWN;
+   break;
   }
+return tristate::TS_UNKNOWN;
   }
 
   static tristate
-- 
2.26.3

[PATCH 7/9] analyzer: handle INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL) [PR105899]

2023-08-24 Thread David Malcolm via Gcc-patches

gcc/analyzer/ChangeLog:
PR analyzer/105899
* region-model-manager.cc
(region_model_manager::get_or_create_initial_value): Simplify
INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL) to
CONSTANT_SVAL(STRING[N]).
---
 gcc/analyzer/region-model-manager.cc | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/gcc/analyzer/region-model-manager.cc 
b/gcc/analyzer/region-model-manager.cc
index 65b719056c84..22246876f8f9 100644
--- a/gcc/analyzer/region-model-manager.cc
+++ b/gcc/analyzer/region-model-manager.cc
@@ -310,6 +310,25 @@ region_model_manager::get_or_create_initial_value (const 
region *reg,
 get_or_create_initial_value (original_reg));
 }
 
+  /* Simplify:
+   INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL)
+ to:
+   CONSTANT_SVAL(STRING[N]).  */
+  if (const element_region *element_reg = reg->dyn_cast_element_region ())
+if (tree cst_idx = element_reg->get_index ()->maybe_get_constant ())
+  if (const string_region *string_reg
+ = element_reg->get_parent_region ()->dyn_cast_string_region ())
+   if (tree_fits_shwi_p (cst_idx))
+ {
+   HOST_WIDE_INT idx = tree_to_shwi (cst_idx);
+   tree string_cst = string_reg->get_string_cst ();
+   if (idx >= 0 && idx <= TREE_STRING_LENGTH (string_cst))
+ {
+   int ch = TREE_STRING_POINTER (string_cst)[idx];
+   return get_or_create_int_cst (reg->get_type (), ch);
+ }
+ }
+
   /* INIT_VAL (*UNKNOWN_PTR) -> UNKNOWN_VAL.  */
   if (reg->symbolic_for_unknown_ptr_p ())
 return get_or_create_unknown_svalue (reg->get_type ());
-- 
2.26.3

[pushed 0/9] analyzer: strlen, strcpy, and strcat [PR105899]

2023-08-24 Thread David Malcolm via Gcc-patches

This patch kit makes improvements to the analyzer's new strlen
implementation, and wires it up to strcpy and strcat.

For example, given:

  #include 

  void test (void)
  {
char buf[10];
strcpy (buf, "hello world!");
  }

we now emit:

demo.c: In function ‘test’:
demo.c:6:3: warning: stack-based buffer overflow [CWE-121] 
[-Wanalyzer-out-of-bounds]
6 |   strcpy (buf, "hello world!");
  |   ^~~~
  ‘test’: events 1-2
|
|5 |   char buf[10];
|  |^~~
|  ||
|  |(1) capacity: 10 bytes
|6 |   strcpy (buf, "hello world!");
|  |   
|  |   |
|  |   (2) out-of-bounds write from byte 10 till byte 12 but ‘buf’ ends 
at byte 10
|
demo.c:6:3: note: write of 3 bytes to beyond the end of ‘buf’
6 |   strcpy (buf, "hello world!");
  |   ^~~~
demo.c:6:3: note: valid subscripts for ‘buf’ are ‘[0]’ to ‘[9]’

  ┌─┬─┬┬┬┬┬┬┬┬┐┌─┬─┬─┐
  │ [0] │ [1] │[2] │[3] │[4] │[5] │[6] │[7] │[8] │[9] ││[10] │[11] │[12] │
  ├─┼─┼┼┼┼┼┼┼┼┤├─┼─┼─┤
  │ ‘h’ │ ‘e’ │‘l’ │‘l’ │‘o’ │‘ ’ │‘w’ │‘o’ │‘r’ │‘l’ ││ ‘d’ │ ‘!’ │ NUL │
  ├─┴─┴┴┴┴┴┴┴┴┴┴─┴─┴─┤
  │  string literal (type: ‘char[13]’)   │
  └──┘
 │ │││││││││  │ │ │
 │ │││││││││  │ │ │
 v vvvvvvvvv  v v v
  ┌─┬┬┐┌─┐
  │ [0] │  ...   │[9] ││ │
  ├─┴┴┤│after valid range│
  │ ‘buf’ (type: ‘char[10]’)  ││ │
  └───┘└─┘
  ├─┬─┤├┬┤
│   │
  ╭─┴╮  ╭───┴──╮
  │capacity: 10 bytes│  │⚠️  overflow of 3 bytes│
  ╰──╯  ╰──╯

in addition to the pre-existing:

demo.c:6:3: warning: ‘__builtin_memcpy’ writing 13 bytes into a region of size 
10 overflows the destination [-Wstringop-overflow=]
demo.c:5:8: note: destination object ‘buf’ of size 10
5 |   char buf[10];
  |^~~

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-3461-g9aaec66917c96a through to
r14-3469-gbbdc0e0d0042ae.

David Malcolm (9):
  analyzer: add logging to impl_path_context
  analyzer: handle symbolic bindings in scan_for_null_terminator
[PR105899]
  analyzer: reimplement kf_strcpy [PR105899]
  analyzer: eliminate region_model::get_string_size [PR105899]
  analyzer: reimplement kf_memcpy_memmove
  analyzer: handle strlen(INIT_VAL(STRING_REG)) [PR105899]
  analyzer: handle INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL)
[PR105899]
  analyzer: handle strlen(BITS_WITHIN) [PR105899]
  analyzer: implement kf_strcat [PR105899]

 gcc/analyzer/call-details.cc  |  12 +-
 gcc/analyzer/call-details.h   |   5 +-
 gcc/analyzer/engine.cc|  13 +-
 gcc/analyzer/kf.cc| 116 +---
 gcc/analyzer/region-model-manager.cc  |  19 ++
 gcc/analyzer/region-model.cc  | 261 +-
 gcc/analyzer/region-model.h   |  22 +-
 gcc/doc/invoke.texi   |   1 +
 .../analyzer/out-of-bounds-diagram-16.c   |  31 +++
 gcc/testsuite/gcc.dg/analyzer/sprintf-1.c |  11 +
 gcc/testsuite/gcc.dg/analyzer/strcat-1.c  | 136 +
 gcc/testsuite/gcc.dg/analyzer/strcpy-1.c  |  22 ++
 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c  |   8 +
 gcc/testsuite/gcc.dg/analyzer/strcpy-4.c  |  51 
 14 files changed, 601 insertions(+), 107 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-16.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcat-1.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcpy-4.c

-- 
2.26.3

[PATCH 6/9] analyzer: handle strlen(INIT_VAL(STRING_REG)) [PR105899]

2023-08-24 Thread David Malcolm via Gcc-patches

gcc/analyzer/ChangeLog:
PR analyzer/105899
* region-model.cc (fragment::has_null_terminator): Move STRING_CST
handling to fragment::string_cst_has_null_terminator; also use it to
handle INIT_VAL(STRING_REG).
(fragment::string_cst_has_null_terminator): New, from above.

gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/strcpy-3.c (test_2): New.
---
 gcc/analyzer/region-model.cc | 68 
 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c |  7 +++
 2 files changed, 54 insertions(+), 21 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 00c306ab7dae..6574ec140074 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3310,27 +3310,10 @@ struct fragment
  switch (TREE_CODE (cst))
{
case STRING_CST:
- {
-   /* Look for the first 0 byte within STRING_CST
-  from START_READ_OFFSET onwards.  */
-   const HOST_WIDE_INT num_bytes_to_search
- = std::min ((TREE_STRING_LENGTH (cst)
- - rel_start_read_offset_hwi),
-available_bytes_hwi);
-   const char *start = (TREE_STRING_POINTER (cst)
-+ rel_start_read_offset_hwi);
-   if (num_bytes_to_search >= 0)
- if (const void *p = memchr (start, 0,
- num_bytes_to_search))
-   {
- *out_bytes_read = (const char *)p - start + 1;
- return tristate (true);
-   }
-
-   *out_bytes_read = available_bytes;
-   return tristate (false);
- }
- break;
+ return string_cst_has_null_terminator (cst,
+rel_start_read_offset_hwi,
+available_bytes_hwi,
+out_bytes_read);
case INTEGER_CST:
  if (rel_start_read_offset_hwi == 0
  && integer_onep (TYPE_SIZE_UNIT (TREE_TYPE (cst
@@ -3357,12 +3340,55 @@ struct fragment
}
}
break;
+
+  case SK_INITIAL:
+   {
+ const initial_svalue *initial_sval = (const initial_svalue *)m_sval;
+ const region *reg = initial_sval->get_region ();
+ if (const string_region *string_reg = reg->dyn_cast_string_region ())
+   {
+ tree string_cst = string_reg->get_string_cst ();
+ return string_cst_has_null_terminator (string_cst,
+rel_start_read_offset_hwi,
+available_bytes_hwi,
+out_bytes_read);
+   }
+ return tristate::TS_UNKNOWN;
+   }
+   break;
+
   default:
// TODO: it may be possible to handle other cases here.
return tristate::TS_UNKNOWN;
   }
   }
 
+  static tristate
+  string_cst_has_null_terminator (tree string_cst,
+ HOST_WIDE_INT rel_start_read_offset_hwi,
+ HOST_WIDE_INT available_bytes_hwi,
+ byte_offset_t *out_bytes_read)
+  {
+/* Look for the first 0 byte within STRING_CST
+   from START_READ_OFFSET onwards.  */
+const HOST_WIDE_INT num_bytes_to_search
+  = std::min ((TREE_STRING_LENGTH (string_cst)
+ - rel_start_read_offset_hwi),
+available_bytes_hwi);
+const char *start = (TREE_STRING_POINTER (string_cst)
++ rel_start_read_offset_hwi);
+if (num_bytes_to_search >= 0)
+  if (const void *p = memchr (start, 0,
+ num_bytes_to_search))
+   {
+ *out_bytes_read = (const char *)p - start + 1;
+ return tristate (true);
+   }
+
+*out_bytes_read = available_bytes_hwi;
+return tristate (false);
+  }
+
   byte_range m_byte_range;
   const svalue *m_sval;
 };
diff --git a/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c 
b/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
index abb49bc39f27..a7b324fc445e 100644
--- a/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
+++ b/gcc/testsuite/gcc.dg/analyzer/strcpy-3.c
@@ -22,3 +22,10 @@ void test_1 (void)
   __analyzer_eval (result[5] == 0); /* { dg-warning "TRUE" } */
   __analyzer_eval (strlen (result) == 5); /* { dg-warning "TRUE" } */
 }
+
+void test_2 (void)
+{
+  char buf[16];
+  __builtin_strcpy (buf, "abc");
+  __analyzer_eval (strlen (buf) == 3); /* { dg-warning "TRUE" } */
+}
-- 
2.26.3

[PATCH 2/9] analyzer: handle symbolic bindings in scan_for_null_terminator [PR105899]

2023-08-24 Thread David Malcolm via Gcc-patches

gcc/analyzer/ChangeLog:
PR analyzer/105899
* region-model.cc (iterable_cluster::iterable_cluster): Add
symbolic binding keys to m_symbolic_bindings.
(iterable_cluster::has_symbolic_bindings_p): New.
(iterable_cluster::m_symbolic_bindings): New field.
(region_model::scan_for_null_terminator): Treat clusters with
symbolic bindings as having unknown strlen.

gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/sprintf-1.c: Include "analyzer-decls.h".
(test_strlen_1): New.
---
 gcc/analyzer/region-model.cc  | 15 +++
 gcc/testsuite/gcc.dg/analyzer/sprintf-1.c | 11 +++
 2 files changed, 26 insertions(+)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 99817aee3a93..7a2f81f36e0f 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3420,6 +3420,8 @@ public:
if (concrete_key->get_byte_range (&fragment_bytes))
  m_fragments.safe_push (fragment (fragment_bytes, sval));
  }
+   else
+ m_symbolic_bindings.safe_push (key);
   }
 m_fragments.qsort (fragment::cmp_ptrs);
   }
@@ -3440,8 +3442,14 @@ public:
 return false;
   }
 
+  bool has_symbolic_bindings_p () const
+  {
+return !m_symbolic_bindings.is_empty ();
+  }
+
 private:
   auto_vec m_fragments;
+  auto_vec m_symbolic_bindings;
 };
 
 /* Simulate reading the bytes at BYTES from BASE_REG.
@@ -3610,6 +3618,13 @@ region_model::scan_for_null_terminator (const region 
*reg,
   /* No binding for this base_region, or no binding at src_byte_offset
  (or a symbolic binding).  */
 
+  if (c.has_symbolic_bindings_p ())
+{
+  if (out_sval)
+   *out_sval = m_mgr->get_or_create_unknown_svalue (NULL_TREE);
+  return m_mgr->get_or_create_unknown_svalue (size_type_node);
+}
+
   /* TODO: the various special-cases seen in
  region_model::get_store_value.  */
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/sprintf-1.c 
b/gcc/testsuite/gcc.dg/analyzer/sprintf-1.c
index f8dc806d6192..e7c2b3089c5b 100644
--- a/gcc/testsuite/gcc.dg/analyzer/sprintf-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/sprintf-1.c
@@ -1,6 +1,8 @@
 /* See e.g. https://en.cppreference.com/w/c/io/fprintf
and https://www.man7.org/linux/man-pages/man3/sprintf.3.html */
 
+#include "analyzer-decls.h"
+
 extern int
 sprintf(char* dst, const char* fmt, ...)
   __attribute__((__nothrow__));
@@ -64,3 +66,12 @@ test_fmt_not_terminated (char *dst)
   return sprintf (dst, fmt); /* { dg-warning "stack-based buffer over-read" } 
*/
   /* { dg-message "while looking for null terminator for argument 2 
\\('&fmt'\\) of 'sprintf'..." "event" { target *-*-* } .-1 } */
 }
+
+void
+test_strlen_1 (void)
+{
+  char buf[10];
+  sprintf (buf, "msg: %s\n", "abc");
+  __analyzer_eval (__builtin_strlen (buf) == 8); /* { dg-warning "UNKNOWN" } */
+  // TODO: ideally would be TRUE  
+}
-- 
2.26.3

[PATCH 9/9] analyzer: implement kf_strcat [PR105899]

2023-08-24 Thread David Malcolm via Gcc-patches

gcc/analyzer/ChangeLog:
PR analyzer/105899
* call-details.cc
(call_details::check_for_null_terminated_string_arg): Split into
overloads, one taking just an arg_idx, the other a new
"include_terminator" param.
* call-details.h: Likewise.
* kf.cc (class kf_strcat): New.
(kf_strcpy::impl_call_pre): Update for change to
check_for_null_terminated_string_arg.
(register_known_functions): Register kf_strcat.
* region-model.cc
(region_model::check_for_null_terminated_string_arg): Split into
overloads, one taking just an arg_idx, the other a new
"include_terminator" param.  When returning an svalue, handle
"include_terminator" being false by subtracting one.
* region-model.h
(region_model::check_for_null_terminated_string_arg): Split into
overloads, one taking just an arg_idx, the other a new
"include_terminator" param.

gcc/ChangeLog:
PR analyzer/105899
* doc/invoke.texi (Static Analyzer Options): Add "strcat" to the
list of functions known to the analyzer.

gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/strcat-1.c: New test.
---
 gcc/analyzer/call-details.cc |  12 +-
 gcc/analyzer/call-details.h  |   5 +-
 gcc/analyzer/kf.cc   |  72 ++--
 gcc/analyzer/region-model.cc |  63 +--
 gcc/analyzer/region-model.h  |   6 +-
 gcc/doc/invoke.texi  |   1 +
 gcc/testsuite/gcc.dg/analyzer/strcat-1.c | 136 +++
 7 files changed, 275 insertions(+), 20 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcat-1.c

diff --git a/gcc/analyzer/call-details.cc b/gcc/analyzer/call-details.cc
index 8f5b28ce6c26..ce1f859c9996 100644
--- a/gcc/analyzer/call-details.cc
+++ b/gcc/analyzer/call-details.cc
@@ -386,13 +386,23 @@ call_details::lookup_function_attribute (const char 
*attr_name) const
   return lookup_attribute (attr_name, TYPE_ATTRIBUTES (allocfntype));
 }
 
+void
+call_details::check_for_null_terminated_string_arg (unsigned arg_idx) const
+{
+  check_for_null_terminated_string_arg (arg_idx, false, nullptr);
+}
+
 const svalue *
 call_details::
 check_for_null_terminated_string_arg (unsigned arg_idx,
+ bool include_terminator,
  const svalue **out_sval) const
 {
   region_model *model = get_model ();
-  return model->check_for_null_terminated_string_arg (*this, arg_idx, 
out_sval);
+  return model->check_for_null_terminated_string_arg (*this,
+ arg_idx,
+ include_terminator,
+ out_sval);
 }
 
 } // namespace ana
diff --git a/gcc/analyzer/call-details.h b/gcc/analyzer/call-details.h
index 58b5ccd2acde..ae528e4ab116 100644
--- a/gcc/analyzer/call-details.h
+++ b/gcc/analyzer/call-details.h
@@ -72,9 +72,12 @@ public:
 
   tree lookup_function_attribute (const char *attr_name) const;
 
+  void
+  check_for_null_terminated_string_arg (unsigned arg_idx) const;
   const svalue *
   check_for_null_terminated_string_arg (unsigned arg_idx,
-   const svalue **out_sval = nullptr) 
const;
+   bool include_terminator,
+   const svalue **out_sval) const;
 
 private:
   const gcall *m_call;
diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
index 3eddbe200387..36d9d10bb013 100644
--- a/gcc/analyzer/kf.cc
+++ b/gcc/analyzer/kf.cc
@@ -1106,6 +1106,61 @@ public:
   /* Currently a no-op.  */
 };
 
+/* Handler for "strcat" and "__builtin_strcat_chk".  */
+
+class kf_strcat : public known_function
+{
+public:
+  kf_strcat (unsigned int num_args) : m_num_args (num_args) {}
+  bool matches_call_types_p (const call_details &cd) const final override
+  {
+return (cd.num_args () == m_num_args
+   && cd.arg_is_pointer_p (0)
+   && cd.arg_is_pointer_p (1));
+  }
+
+  void impl_call_pre (const call_details &cd) const final override
+  {
+region_model *model = cd.get_model ();
+region_model_manager *mgr = cd.get_manager ();
+
+const svalue *dest_sval = cd.get_arg_svalue (0);
+const region *dest_reg = model->deref_rvalue (dest_sval, cd.get_arg_tree 
(0),
+ cd.get_ctxt ());
+
+const svalue *dst_strlen_sval
+  = cd.check_for_null_terminated_string_arg (0, false, nullptr);
+if (!dst_strlen_sval)
+  {
+   if (cd.get_ctxt ())
+ cd.get_ctxt ()->terminate_path ();
+   return;
+  }
+
+const svalue *bytes_to_copy;
+const svalue *num_src_bytes_read_sval
+  = cd.check_for_null_terminated_string_arg (1, true, &bytes_to_copy);
+if (!num_src_bytes_read_sval)
+  {
+

[PATCH 3/9] analyzer: reimplement kf_strcpy [PR105899]

2023-08-24 Thread David Malcolm via Gcc-patches

This patch reimplements the analyzer's implementation of strcpy using
the region_model::scan_for_null_terminator infrastructure, so that e.g.
it can complain about out-of-bounds reads/writes, unterminated strings,
etc.

gcc/analyzer/ChangeLog:
PR analyzer/105899
* kf.cc (kf_strcpy::impl_call_pre): Reimplement using
check_for_null_terminated_string_arg.
* region-model.cc (region_model::get_store_bytes): Shortcut
reading all of a string_region.
(region_model::scan_for_null_terminator): Use get_store_value for
the bytes rather than "unknown" when returning an unknown length.
(region_model::write_bytes): New.
* region-model.h (region_model::write_bytes): New decl.

gcc/testsuite/ChangeLog:
PR analyzer/105899
* gcc.dg/analyzer/out-of-bounds-diagram-16.c: New test.
* gcc.dg/analyzer/strcpy-1.c: Add test coverage.
* gcc.dg/analyzer/strcpy-3.c: Likewise.
* gcc.dg/analyzer/strcpy-4.c: New test.
---
 gcc/analyzer/kf.cc| 32 +---
 gcc/analyzer/region-model.cc  | 32 ++--
 gcc/analyzer/region-model.h   |  4 ++
 .../analyzer/out-of-bounds-diagram-16.c   | 31 +++
 gcc/testsuite/gcc.dg/analyzer/strcpy-1.c  | 22 
 gcc/testsuite/gcc.dg/analyzer/strcpy-3.c  |  1 +
 gcc/testsuite/gcc.dg/analyzer/strcpy-4.c  | 51 +++
 7 files changed, 150 insertions(+), 23 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-diagram-16.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/strcpy-4.c

diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
index 59f46bab581c..6b33cd159dac 100644
--- a/gcc/analyzer/kf.cc
+++ b/gcc/analyzer/kf.cc
@@ -1135,29 +1135,25 @@ void
 kf_strcpy::impl_call_pre (const call_details &cd) const
 {
   region_model *model = cd.get_model ();
-  region_model_manager *mgr = cd.get_manager ();
+  region_model_context *ctxt = cd.get_ctxt ();
 
   const svalue *dest_sval = cd.get_arg_svalue (0);
   const region *dest_reg = model->deref_rvalue (dest_sval, cd.get_arg_tree (0),
-cd.get_ctxt ());
-  const svalue *src_sval = cd.get_arg_svalue (1);
-  const region *src_reg = model->deref_rvalue (src_sval, cd.get_arg_tree (1),
-   cd.get_ctxt ());
-  const svalue *src_contents_sval = model->get_store_value (src_reg,
-   cd.get_ctxt ());
-  cd.check_for_null_terminated_string_arg (1);
-
+   ctxt);
+  /* strcpy returns the initial param.  */
   cd.maybe_set_lhs (dest_sval);
 
-  /* Try to get the string size if SRC_REG is a string_region.  */
-  const svalue *copied_bytes_sval = model->get_string_size (src_reg);
-  /* Otherwise, check if the contents of SRC_REG is a string.  */
-  if (copied_bytes_sval->get_kind () == SK_UNKNOWN)
-copied_bytes_sval = model->get_string_size (src_contents_sval);
-
-  const region *sized_dest_reg
-= mgr->get_sized_region (dest_reg, NULL_TREE, copied_bytes_sval);
-  model->set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
+  const svalue *bytes_to_copy;
+  if (const svalue *num_bytes_read_sval
+   = cd.check_for_null_terminated_string_arg (1, &bytes_to_copy))
+{
+  model->write_bytes (dest_reg, num_bytes_read_sval, bytes_to_copy, ctxt);
+}
+  else
+{
+  if (cd.get_ctxt ())
+   cd.get_ctxt ()->terminate_path ();
+}
 }
 
 /* Handler for "strdup" and "__builtin_strdup".  */
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 7a2f81f36e0f..cc8d895d9665 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3460,6 +3460,13 @@ region_model::get_store_bytes (const region *base_reg,
   const byte_range &bytes,
   region_model_context *ctxt) const
 {
+  /* Shortcut reading all of a string_region.  */
+  if (bytes.get_start_byte_offset () == 0)
+if (const string_region *string_reg = base_reg->dyn_cast_string_region ())
+  if (bytes.m_size_in_bytes
+ == TREE_STRING_LENGTH (string_reg->get_string_cst ()))
+   return m_mgr->get_or_create_initial_value (base_reg);
+
   const svalue *index_sval
 = m_mgr->get_or_create_int_cst (size_type_node,
bytes.get_start_byte_offset ());
@@ -3533,14 +3540,14 @@ region_model::scan_for_null_terminator (const region 
*reg,
   if (offset.symbolic_p ())
 {
   if (out_sval)
-   *out_sval = m_mgr->get_or_create_unknown_svalue (NULL_TREE);
+   *out_sval = get_store_value (reg, nullptr);
   return m_mgr->get_or_create_unknown_svalue (size_type_node);
 }
   byte_offset_t src_byte_offset;
   if (!offset.get_concrete_byte_offset (&src_byte_offset))
 {
   if (out_sval)
-   *out_sval = m_mgr->get_or_create_unk

[PATCH 5/9] analyzer: reimplement kf_memcpy_memmove

2023-08-24 Thread David Malcolm via Gcc-patches

gcc/analyzer/ChangeLog:
* kf.cc (kf_memcpy_memmove::impl_call_pre): Reimplement using
region_model::copy_bytes.
* region-model.cc (region_model::read_bytes): New.
(region_model::copy_bytes): New.
* region-model.h (region_model::read_bytes): New decl.
(region_model::copy_bytes): New decl.
---
 gcc/analyzer/kf.cc   | 14 --
 gcc/analyzer/region-model.cc | 35 +++
 gcc/analyzer/region-model.h  |  9 +
 3 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/gcc/analyzer/kf.cc b/gcc/analyzer/kf.cc
index 6b33cd159dac..3eddbe200387 100644
--- a/gcc/analyzer/kf.cc
+++ b/gcc/analyzer/kf.cc
@@ -541,7 +541,6 @@ kf_memcpy_memmove::impl_call_pre (const call_details &cd) 
const
   const svalue *num_bytes_sval = cd.get_arg_svalue (2);
 
   region_model *model = cd.get_model ();
-  region_model_manager *mgr = cd.get_manager ();
 
   const region *dest_reg
 = model->deref_rvalue (dest_ptr_sval, cd.get_arg_tree (0), cd.get_ctxt ());
@@ -550,15 +549,10 @@ kf_memcpy_memmove::impl_call_pre (const call_details &cd) 
const
 
   cd.maybe_set_lhs (dest_ptr_sval);
 
-  const region *sized_src_reg
-= mgr->get_sized_region (src_reg, NULL_TREE, num_bytes_sval);
-  const region *sized_dest_reg
-= mgr->get_sized_region (dest_reg, NULL_TREE, num_bytes_sval);
-  const svalue *src_contents_sval
-= model->get_store_value (sized_src_reg, cd.get_ctxt ());
-  model->check_for_poison (src_contents_sval, cd.get_arg_tree (1),
-  sized_src_reg, cd.get_ctxt ());
-  model->set_value (sized_dest_reg, src_contents_sval, cd.get_ctxt ());
+  model->copy_bytes (dest_reg,
+src_reg, cd.get_arg_tree (1),
+num_bytes_sval,
+cd.get_ctxt ());
 }
 
 /* Handler for "memset" and "__builtin_memset".  */
diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 1fe66f4719fa..00c306ab7dae 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -3794,6 +3794,41 @@ region_model::write_bytes (const region *dest_reg,
   set_value (sized_dest_reg, sval, ctxt);
 }
 
+/* Read NUM_BYTES_SVAL from SRC_REG.
+   Use CTXT to report any warnings associated with the copy
+   (e.g. out-of-bounds reads, copying of uninitialized values, etc).  */
+
+const svalue *
+region_model::read_bytes (const region *src_reg,
+ tree src_ptr_expr,
+ const svalue *num_bytes_sval,
+ region_model_context *ctxt) const
+{
+  const region *sized_src_reg
+= m_mgr->get_sized_region (src_reg, NULL_TREE, num_bytes_sval);
+  const svalue *src_contents_sval = get_store_value (sized_src_reg, ctxt);
+  check_for_poison (src_contents_sval, src_ptr_expr,
+   sized_src_reg, ctxt);
+  return src_contents_sval;
+}
+
+/* Copy NUM_BYTES_SVAL bytes from SRC_REG to DEST_REG.
+   Use CTXT to report any warnings associated with the copy
+   (e.g. out-of-bounds reads/writes, copying of uninitialized values,
+   etc).  */
+
+void
+region_model::copy_bytes (const region *dest_reg,
+ const region *src_reg,
+ tree src_ptr_expr,
+ const svalue *num_bytes_sval,
+ region_model_context *ctxt)
+{
+  const svalue *data_sval
+= read_bytes (src_reg, src_ptr_expr, num_bytes_sval, ctxt);
+  write_bytes (dest_reg, num_bytes_sval, data_sval, ctxt);
+}
+
 /* Mark REG as having unknown content.  */
 
 void
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 41df1885ad5b..b1c705e22c28 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -371,6 +371,15 @@ class region_model
const svalue *num_bytes_sval,
const svalue *sval,
region_model_context *ctxt);
+  const svalue *read_bytes (const region *src_reg,
+   tree src_ptr_expr,
+   const svalue *num_bytes_sval,
+   region_model_context *ctxt) const;
+  void copy_bytes (const region *dest_reg,
+  const region *src_reg,
+  tree src_ptr_expr,
+  const svalue *num_bytes_sval,
+  region_model_context *ctxt);
   void mark_region_as_unknown (const region *reg, uncertainty_t *uncertainty);
 
   tristate eval_condition (const svalue *lhs,
-- 
2.26.3

[PATCH 4/9] analyzer: eliminate region_model::get_string_size [PR105899]

2023-08-24 Thread David Malcolm via Gcc-patches

gcc/analyzer/ChangeLog:
PR analyzer/105899
* region-model.cc (region_model::get_string_size): Delete both.
* region-model.h (region_model::get_string_size): Delete both
decls.
---
 gcc/analyzer/region-model.cc | 29 -
 gcc/analyzer/region-model.h  |  3 ---
 2 files changed, 32 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index cc8d895d9665..1fe66f4719fa 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -2794,35 +2794,6 @@ region_model::get_capacity (const region *reg) const
   return m_mgr->get_or_create_unknown_svalue (sizetype);
 }
 
-/* Return the string size, including the 0-terminator, if SVAL is a
-   constant_svalue holding a string.  Otherwise, return an unknown_svalue.  */
-
-const svalue *
-region_model::get_string_size (const svalue *sval) const
-{
-  tree cst = sval->maybe_get_constant ();
-  if (!cst || TREE_CODE (cst) != STRING_CST)
-return m_mgr->get_or_create_unknown_svalue (size_type_node);
-
-  tree out = build_int_cst (size_type_node, TREE_STRING_LENGTH (cst));
-  return m_mgr->get_or_create_constant_svalue (out);
-}
-
-/* Return the string size, including the 0-terminator, if REG is a
-   string_region.  Otherwise, return an unknown_svalue.  */
-
-const svalue *
-region_model::get_string_size (const region *reg) const
-{
-  const string_region *str_reg = dyn_cast  (reg);
-  if (!str_reg)
-return m_mgr->get_or_create_unknown_svalue (size_type_node);
-
-  tree cst = str_reg->get_string_cst ();
-  tree out = build_int_cst (size_type_node, TREE_STRING_LENGTH (cst));
-  return m_mgr->get_or_create_constant_svalue (out);
-}
-
 /* If CTXT is non-NULL, use it to warn about any problems accessing REG,
using DIR to determine if this access is a read or write.
Return TRUE if an OOB access was detected.
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 9c6e60bbe824..41df1885ad5b 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -469,9 +469,6 @@ class region_model
 
   const svalue *get_capacity (const region *reg) const;
 
-  const svalue *get_string_size (const svalue *sval) const;
-  const svalue *get_string_size (const region *reg) const;
-
   bool replay_call_summary (call_summary_replay &r,
const region_model &summary);
 
-- 
2.26.3

[PATCH 1/9] analyzer: add logging to impl_path_context

2023-08-24 Thread David Malcolm via Gcc-patches

gcc/analyzer/ChangeLog:
* engine.cc (impl_path_context::impl_path_context): Add logger
param.
(impl_path_context::bifurcate): Add log message.
(impl_path_context::terminate_path): Likewise.
(impl_path_context::m_logger): New field.
(exploded_graph::process_node): Pass logger to path_ctxt ctor.
---
 gcc/analyzer/engine.cc | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/gcc/analyzer/engine.cc b/gcc/analyzer/engine.cc
index 3700154eec2c..a1908cdb364e 100644
--- a/gcc/analyzer/engine.cc
+++ b/gcc/analyzer/engine.cc
@@ -3848,8 +3848,10 @@ exploded_graph::maybe_create_dynamic_call (const gcall 
*call,
 class impl_path_context : public path_context
 {
 public:
-  impl_path_context (const program_state *cur_state)
+  impl_path_context (const program_state *cur_state,
+logger *logger)
   : m_cur_state (cur_state),
+m_logger (logger),
 m_terminate_path (false)
   {
   }
@@ -3868,6 +3870,9 @@ public:
   void
   bifurcate (std::unique_ptr info) final override
   {
+if (m_logger)
+  m_logger->log ("bifurcating path");
+
 if (m_state_at_bifurcation)
   /* Verify that the state at bifurcation is consistent when we
 split into multiple out-edges.  */
@@ -3884,6 +3889,8 @@ public:
 
   void terminate_path () final override
   {
+if (m_logger)
+  m_logger->log ("terminating path");
 m_terminate_path = true;
   }
 
@@ -3900,6 +3907,8 @@ public:
 private:
   const program_state *m_cur_state;
 
+  logger *m_logger;
+
   /* Lazily-created copy of the state before the split.  */
   std::unique_ptr m_state_at_bifurcation;
 
@@ -4044,7 +4053,7 @@ exploded_graph::process_node (exploded_node *node)
   exactly one stmt, the one that caused the change. */
program_state next_state (state);
 
-   impl_path_context path_ctxt (&next_state);
+   impl_path_context path_ctxt (&next_state, logger);
 
uncertainty_t uncertainty;
const supernode *snode = point.get_supernode ();
-- 
2.26.3

[PATCH] c++: Implement C++26 P2741R3 - user-generated static_assert messages [PR110348]

2023-08-24 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch on top of PR110349 patch (weak dependency,
only for -Wc++26-extensions, I could split that part into an independent
patch) and PR110342 patch (again weak dependency, this time mainly
because it touches the same code in cp_parser_static_assert and
nearby spot in udlit-error1.C testcase) implements the user generated
static_assert messages next to string literals.

As I wrote already in the PR, in addition to looking through the paper
I looked at the clang++ testcase for this feature implemented there from
paper's author and on godbolt played with various parts of the testcase
coverage below, and there are 4 differences between what the patch
implements and what clang++ implements.

The first is that clang++ diagnoses if M.size () or M.data () methods
are present, but aren't constexpr; while the paper introduction talks about
that, the standard wording changes don't seem to require that, all they say
is that those methods need to exist (assuming accessible and the like)
and be implicitly convertible to std::size_t or const char *, but rest is
only if the static assertion fails.  If there is intent to change that
wording, the question is how far to go, e.g. while M.size () could be
constexpr, they could e.g. return some class object which wouldn't have
constexpr conversion operator to size_t/const char * and tons of other
reasons why the constant evaluation could fail.  Without actually evaluating
it I don't see how we could guarantee anything for non-failed static_assert.

The second and most important is that clang++ has a couple of tests (and the
testcase below as well) where M.data () is not a core constant expression
but M.data ()[0] ... M.data ()[M.size () - 1] is integer constant
expression.  From my reading of http://eel.is/c++draft/dcl.pre#11.2.2
that means those should be rejected (examples of these are e.g.
static_assert (false, T{});
in the testcase, where T{}.data () returns pointer returned from new
expression, but T{}'s destructor then deletes it, making it point to
no longer live object.  Or
static_assert (false, a);
where a.data () returns &a.a but because a is constexpr automatic variable,
that isn't valid core constant expression, while a.data ()[0] is.
There are a couple of others.  Now, it seems allowing that is quite useful
in real-world, but the question is with what standard changes to achieve
that.  One possibility would be s/a core constant/an/; from implementation
POV that would mean that if M.size () is 0, then M.data () doesn't have
to be constexpr at all.  Otherwise, implementation could try to evaluate
silently M.data () as constant expression, if it would be one, it could
just use c_getstr in the GCC case as the patch does + optionally the 2
M.data ()[0] and M.data ()[M.size () - 1] tests to verify boundary cases
more carefully.  And if it wouldn't be one, it would need to evaluate
M.data ()[i] for i in [0, M.size () - 1] to get all the characters one by
one.  Another possibility would be to require that say ((void) (M.data ()), 0)
is a constant expression, that doesn't help much with the optimized way
to get at the message characters, but would require that data () is
constexpr even for the 0 case etc.

The third difference is that 
static_assert (false, "foo"_myd);
in the testcase is normal failed static assertion and
static_assert (true, "foo"_myd);
would be accepted, while clang++ rejects it.  IMHO
"foo"_myd doesn't match the syntactic requirements of unevaluated-string
as mentioned in http://eel.is/c++draft/dcl.pre#10 , and because
a constexpr udlit operator can return something which is valid, it shouldn't
be rejected just in case.

Last is clang++ ICEs on non-static data members size/data.

The patch implements what I see in the paper, because it is unclear what
further changes will be voted in (and the changes can be done at that
point).
The patch uses tf_none in 6 spots so that just the static_assert specific
errors are emitted and not others, but it would certainly be possible to
use complain instead of tf_none there, get more errors in some cases, but
perhaps help users figure out what exactly is wrong in detail.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-08-24  Jakub Jelinek  

PR c++/110348
gcc/c-family/
* c-cppbuiltin.cc (c_cpp_builtins): For C++26 predefine
__cpp_static_assert to 202306L rather than 201411L.
gcc/cp/
* parser.cc: Implement C++26 P2741R3 - user-generated static_assert
messages.
(cp_parser_static_assert): Parse message argument as
conditional-expression if it is not a pure string literal or
several of them concatenated followed by closing paren.
* semantics.cc (finish_static_assert): Handle message which is not
STRING_CST.
* pt.cc (tsubst_expr) : Also tsubst_expr
message and make sure that if it wasn't originally STRING_CST, it
isn't after tsubst_expr either.
gcc/testsuite/
* g++.dg

Re: [PATCH v1] RISC-V: Support rounding mode for VFNMSAC/VFNMSUB autovec

2023-08-24 Thread Kito Cheng via Gcc-patches

LGTM

On Thu, Aug 24, 2023 at 5:35 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> There will be a case like below for intrinsic and autovec combination.
>
> vfadd RTZ   <- intrinisc static rounding
> vfnmsub <- autovec/autovec-opt
>
> The autovec generated vfnmsub should take DYN mode, and the
> frm must be restored before the vfnmsub insn. This patch
> would like to fix this issue by:
>
> * Add the frm operand to the autovec/autovec-opt pattern.
> * Set the frm_mode attr to DYN.
>
> Thus, the frm flow when combine autovec and intrinsic should be.
>
> +
> | frrm  a5
> | ...
> | fsrmi 4
> | vfadd   <- intrinsic static rounding.
> | ...
> | fsrm  a5
> | vfnmsub <- autovec/autovec-opt
> | ...
> +
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md: Add FRM_REGNUM to vfnmsac/vfnmsub
> * config/riscv/autovec.md: Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/float-point-frm-autovec-3.c: New test.
> ---
>  gcc/config/riscv/autovec-opt.md   | 34 ---
>  gcc/config/riscv/autovec.md   | 30 ---
>  .../rvv/base/float-point-frm-autovec-3.c  | 88 +++
>  3 files changed, 126 insertions(+), 26 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-autovec-3.c
>
> diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
> index 732a51edacd..54ca6df721c 100644
> --- a/gcc/config/riscv/autovec-opt.md
> +++ b/gcc/config/riscv/autovec-opt.md
> @@ -523,13 +523,15 @@ (define_insn_and_split "*single_widen_fma"
>  ;; vect__13.182_33 = .FNMA (vect__11.180_35, vect__8.176_40, vect__4.172_45);
>  (define_insn_and_split "*double_widen_fnma"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (neg:VWEXTF
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (neg:VWEXTF
> + (float_extend:VWEXTF
> +   (match_operand: 2 "register_operand")))
> (float_extend:VWEXTF
> - (match_operand: 2 "register_operand")))
> - (float_extend:VWEXTF
> -   (match_operand: 3 "register_operand"))
> - (match_operand:VWEXTF 1 "register_operand")))]
> + (match_operand: 3 "register_operand"))
> +   (match_operand:VWEXTF 1 "register_operand"))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -540,17 +542,20 @@ (define_insn_and_split "*double_widen_fnma"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; This helps to match ext + fnma.
>  (define_insn_and_split "*single_widen_fnma"
>[(set (match_operand:VWEXTF 0 "register_operand")
> -   (fma:VWEXTF
> - (neg:VWEXTF
> -   (float_extend:VWEXTF
> - (match_operand: 2 "register_operand")))
> - (match_operand:VWEXTF 3 "register_operand")
> - (match_operand:VWEXTF 1 "register_operand")))]
> +   (unspec:VWEXTF
> + [(fma:VWEXTF
> +   (neg:VWEXTF
> + (float_extend:VWEXTF
> +   (match_operand: 2 "register_operand")))
> +   (match_operand:VWEXTF 3 "register_operand")
> +   (match_operand:VWEXTF 1 "register_operand"))
> +  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
>"TARGET_VECTOR && can_create_pseudo_p ()"
>"#"
>"&& 1"
> @@ -567,7 +572,8 @@ (define_insn_and_split "*single_widen_fnma"
>  DONE;
>}
>[(set_attr "type" "vfwmuladd")
> -   (set_attr "mode" "")])
> +   (set_attr "mode" "")
> +   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  ;; -
>  ;;  [FP] VFWMSAC
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 0c1c546817a..28396c6175d 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -1174,24 +1174,29 @@ (define_insn_and_split "*fma"
>  (define_expand "fnma4"
>[(parallel
>  [(set (match_operand:VF 0 "register_operand")
> - (fma:VF
> -   (neg:VF
> - (match_operand:VF 1 "register_operand"))
> -   (match_operand:VF 2 "register_operand")
> -   (match_operand:VF 3 "register_operand")))
> + (unspec:VF
> +   [(fma:VF
> + (neg:VF
> +   (match_operand:VF 1 "register_operand"))
> + (match_operand:VF 2 "register_operand")
> + (match_operand:VF 3 "register_operand"))
> +(reg:SI FRM_REGNUM)] UNSPEC_VFFMA))
>   (clobber (match_dup 4))])]
>"TARGET_VECTOR"
>{
>  operands[4] = gen_reg_rtx (Pmode);
> -  })
> +  }
> +  [(set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
>
>  (define_insn_and_split "*fnma"
>[(set (match_ope

Re: [PATCH] fwprop: Allow UNARY_P and check register pressure.

2023-08-24 Thread Robin Dapp via Gcc-patches

Ping.  I refined the code and some comments a bit and added a test
case.

My question in general would still be:  Is this something we want
given that we potentially move some of combine's work a bit towards
the front of the RTL pipeline?

Regards
 Robin

Subject: [PATCH] fwprop: Allow UNARY_P and check register pressure.

This patch enables the forwarding of UNARY_P sources.  As this
involves potentially replacing a vector register with a scalar register
the ira_hoist_pressure machinery is used to calculate the change in
register pressure.  If the propagation would increase the pressure
beyond the number of hard regs, we don't perform it.

gcc/ChangeLog:

* fwprop.cc (fwprop_propagation::profitable_p): Add unary
handling.
(fwprop_propagation::update_register_pressure): New function.
(fwprop_propagation::register_pressure_high_p): New function
(reg_single_def_for_src_p): Look through unary expressions.
(try_fwprop_subst_pattern): Check register pressure.
(forward_propagate_into): Call new function.
(fwprop_init): Init register pressure.
(fwprop_done): Clean up register pressure.
(fwprop_insn): Add comment.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c: New test.
---
 gcc/fwprop.cc | 314 +-
 .../riscv/rvv/autovec/binop/vadd-vx-fwprop.c  |  64 
 2 files changed, 371 insertions(+), 7 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-vx-fwprop.c

diff --git a/gcc/fwprop.cc b/gcc/fwprop.cc
index 0707a234726..b49d4e4ced4 100644
--- a/gcc/fwprop.cc
+++ b/gcc/fwprop.cc
@@ -36,6 +36,10 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-pass.h"
 #include "rtl-iter.h"
 #include "target.h"
+#include "dominance.h"
+
+#include "ira.h"
+#include "regpressure.h"
 
 /* This pass does simple forward propagation and simplification when an
operand of an insn can only come from a single def.  This pass uses
@@ -103,6 +107,10 @@ using namespace rtl_ssa;
 
 static int num_changes;
 
+/* Keep track of which registers already increased the pressure to avoid double
+   booking.  */
+sbitmap pressure_accounted;
+
 /* Do not try to replace constant addresses or addresses of local and
argument slots.  These MEM expressions are made only once and inserted
in many instructions, as well as being used to control symbol table
@@ -181,6 +189,8 @@ namespace
 bool changed_mem_p () const { return result_flags & CHANGED_MEM; }
 bool folded_to_constants_p () const;
 bool profitable_p () const;
+bool register_pressure_high_p (rtx, rtx, rtx_insn *, rtx_insn *) const;
+bool update_register_pressure (rtx, rtx, rtx_insn *, rtx_insn *) const;
 
 bool check_mem (int, rtx) final override;
 void note_simplification (int, uint16_t, rtx, rtx) final override;
@@ -332,25 +342,247 @@ fwprop_propagation::profitable_p () const
   && (result_flags & PROFITABLE))
 return true;
 
-  if (REG_P (to))
+  /* Only continue with an unary operation if we consider register
+ pressure.  */
+  rtx what = copy_rtx (to);
+  if (UNARY_P (what) && flag_ira_hoist_pressure)
+what = XEXP (what, 0);
+
+  if (REG_P (what))
 return true;
 
-  if (GET_CODE (to) == SUBREG
-  && REG_P (SUBREG_REG (to))
-  && !paradoxical_subreg_p (to))
+  if (GET_CODE (what) == SUBREG
+  && REG_P (SUBREG_REG (what))
+  && !paradoxical_subreg_p (what))
 return true;
 
-  if (CONSTANT_P (to))
+  if (CONSTANT_P (what))
 return true;
 
   return false;
 }
 
-/* Check that X has a single def.  */
+/* Check if the register pressure in any predecessor block of USE's block
+   until DEF's block is equal or higher to the number of hardregs in NU's
+   register class.  */
+bool
+fwprop_propagation::register_pressure_high_p (rtx nu, rtx old, rtx_insn *def,
+ rtx_insn *use) const
+{
+  enum reg_class nu_class, old_class;
+  int nu_nregs, old_nregs;
+  nu_class = regpressure_get_regno_pressure_class (REGNO (nu), &nu_nregs);
+  old_class
+= regpressure_get_regno_pressure_class (REGNO (old), &old_nregs);
+
+  if (nu_class == NO_REGS && old_class == NO_REGS)
+return true;
+
+  if (nu_class == old_class)
+return false;
+
+  basic_block bbfrom = BLOCK_FOR_INSN (def);
+  basic_block bbto = BLOCK_FOR_INSN (use);
+
+  basic_block bb;
+
+  sbitmap visited = sbitmap_alloc (last_basic_block_for_fn (cfun));
+  bitmap_clear (visited);
+  auto_vec q;
+  q.safe_push (bbto);
+
+  while (!q.is_empty ())
+{
+  bb = q.pop ();
+
+  if (bitmap_bit_p (visited, bb->index))
+   continue;
+
+  /* Nothing to do if the register to be replaced is not live
+in this BB.  */
+  if (bb != bbfrom && !regpressure_is_live_in (bb, REGNO (old)))
+   continue;
+
+  /* Nothing to do if the replacement register is already live in
+this BB.  */

[PATCH] c++: Implement C++26 P2361R6 - Unevaluated strings [PR110342]

2023-08-24 Thread Jakub Jelinek via Gcc-patches

Hi!

The following patch implements C++26 unevaluated-string.
As it seems to me just extra pedanticity, it is implemented only for
-std=c++26 or -std=gnu++26 and later and only if -pedantic/-pedantic-errors.
Nothing is done for inline asm, while the spec changes those, it changes it
to a balanced token sequence with implementation defined rules on what is
and isn't allowed (so pedantically accepting asm ("" : "+m" (x));
was accepts-invalid before C++26, but we didn't diagnose anything).
For the other spots mentioned in the paper, static_assert message,
linkage specification, deprecated/nodiscard attributes it enforces the
requirements (no prefixes, udlit suffixes, no octal/hexadecimal escapes
(conditional escape sequences were rejected with pedantic already before).
For the deprecated operator "" identifier case I've kept things as is,
because everything seems to have been diagnosed already (a lot being implied
from the string having to be empty).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-08-24  Jakub Jelinek  

PR c++/110342
gcc/cp/
* parser.cc: Implement C++26 P2361R6 - Unevaluated strings.
(uneval_string_attr): New enumerator.
(cp_parser_string_literal_common): Add UNEVAL argument.  If true,
pass CPP_UNEVAL_STRING rather than CPP_STRING to
cpp_interpret_string_notranslate.
(cp_parser_string_literal, cp_parser_userdef_string_literal): Adjust
callers of cp_parser_string_literal_common.
(cp_parser_unevaluated_string_literal): New function.
(cp_parser_parenthesized_expression_list): Handle uneval_string_attr.
(cp_parser_linkage_specification): Use
cp_parser_unevaluated_string_literal for C++26.
(cp_parser_static_assert): Likewise.
(cp_parser_std_attribute): Use uneval_string_attr for standard
deprecated and nodiscard attributes.
gcc/testsuite/
* g++.dg/cpp26/unevalstr1.C: New test.
* g++.dg/cpp26/unevalstr2.C: New test.
* g++.dg/cpp0x/udlit-error1.C (lol): Expect an error for C++26
about user-defined literal in deprecated attribute.
libcpp/
* include/cpplib.h (TTYPE_TABLE): Add CPP_UNEVAL_STRING literal
entry.  Use C++11 instead of C++-0x in comments.
* charset.cc (convert_escape): Add UNEVAL argument, if true,
pedantically diagnose numeric escape sequences.
(cpp_interpret_string_1): Formatting fix.  Adjust convert_escape
caller.
(cpp_interpret_string): Formatting string.
(cpp_interpret_string_notranslate): Pass type through to
cpp_interpret_string if it is CPP_UNEVAL_STRING.

--- gcc/cp/parser.cc.jj 2023-08-23 11:22:28.006593913 +0200
+++ gcc/cp/parser.cc2023-08-23 12:21:31.384232520 +0200
@@ -2267,7 +2267,8 @@ static vec *cp_parser_paren
   (cp_parser *, int, bool, bool, bool *, location_t * = NULL,
bool = false);
 /* Values for the second parameter of cp_parser_parenthesized_expression_list. 
 */
-enum { non_attr = 0, normal_attr = 1, id_attr = 2, assume_attr = 3 };
+enum { non_attr = 0, normal_attr = 1, id_attr = 2, assume_attr = 3,
+   uneval_string_attr = 4 };
 static void cp_parser_pseudo_destructor_name
   (cp_parser *, tree, tree *, tree *);
 static cp_expr cp_parser_unary_expression
@@ -4409,7 +4410,8 @@ cp_parser_identifier (cp_parser* parser)
 return error_mark_node;
 }
 
-/* Worker for cp_parser_string_literal and cp_parser_userdef_string_literal.
+/* Worker for cp_parser_string_literal, cp_parser_userdef_string_literal
+   and cp_parser_unevaluated_string_literal.
Do not call this directly; use either of the above.
 
Parse a sequence of adjacent string constants.  Return a
@@ -4417,7 +4419,8 @@ cp_parser_identifier (cp_parser* parser)
constant.  If TRANSLATE is true, translate the string to the
execution character set.  If WIDE_OK is true, a wide string is
valid here.  If UDL_OK is true, a string literal with user-defined
-   suffix can be used in this context.
+   suffix can be used in this context.  If UNEVAL is true, diagnose
+   numeric and conditional escape sequences in it if pedantic.
 
C++98 [lex.string] says that if a narrow string literal token is
adjacent to a wide string literal token, the behavior is undefined.
@@ -4431,7 +4434,7 @@ cp_parser_identifier (cp_parser* parser)
 static cp_expr
 cp_parser_string_literal_common (cp_parser *parser, bool translate,
 bool wide_ok, bool udl_ok,
-bool lookup_udlit)
+bool lookup_udlit, bool uneval)
 {
   tree value;
   size_t count;
@@ -4584,6 +4587,8 @@ cp_parser_string_literal_common (cp_pars
   cp_parser_error (parser, "a wide string is invalid in this context");
   type = CPP_STRING;
 }
+  if (uneval)
+type = CPP_UNEVAL_STRING;
 
   if ((translate ? cpp_interpret_string : cpp_interpret_string_notranslate)
   (parse_in, strs, coun

Re: [PATCH] RISC-V: Enable Hoist to GCSE simple constants

2023-08-24 Thread Jeff Law via Gcc-patches





On 8/23/23 18:42, Vineet Gupta wrote:



Seriously, I detest it too, but the irony is I've now made my 2nd change 
in there and keep adding to ugliness :-(

Happens to all of us sometimes.





So I think your change makes sense.   But I think it can be refined to 
simplify the larger chunk of code we're looking at:



  /* If the constant is likely to be stored in a GPR, SETs of
 single-insn constants are as cheap as register sets; we
 never want to CSE them.  */
  if (cost == 1 && outer_code == SET)
    *total = 0;
  /* When we load a constant more than once, it usually is 
better

 to duplicate the last operation in the sequence than to CSE
 the constant itself.  */
  else if (outer_code == SET || GET_MODE (x) == VOIDmode)
    *total = COSTS_N_INSNS (1);


Turns into
  if (outer_code == SET || GET_MODE (x) == VOIDmode)
    *total = COSTS_N_INSNS (1);


Yep that's what I started with too but then left it, leaving it as an 
visual indication to fix things up when ultimately cost model returns 
the actual num of insns for a non trivial large const.Leaving the code 
there meant we  But I agree I'll fold it and add a TODO comment for 
improving the cost model.


For the current proposal I do want to understand/reason what is left 
there - what cases are we trying to filter out with #2467 ?


|    case CONST:
|  if ((cost = riscv_const_insns (x)) > 0) # 2465
| {
|        if (outer_code == SET || GET_MODE (x) == VOIDmode)  # 2467
|        *total = COSTS_N_INSNS 
(1);    # 2468


(1) AFAIU, VOIDmode is for const_int - and supposedly true for symbolic 
addresses etc whose actual values are not known at compile time ? Or is 
it needed just as an artifact of the weird fall through.
I'd expect it to filter away some symbolics and perhaps floating point 
constants.





(2) outer_code SET will kick in for set_src_cost( ) call from Hoist, 
which passes a const_int rtx directly.

   But in case of say expand_mult () -> set_src_cost ( ) called for say
     (mult:DI (reg:DI 134)
     (const_int [0x202020202020202]))

   In the eventual call for const_int operand, outer_code is MULT, 
so we elide #2468


   But wait we don't even hit #2467 since #2465 has a weird cap 
inside which caps 6 to 0.


|    case CONST_INT:
|  {
|          int cost = riscv_integer_cost (INTVAL (x));
|          /* Force complicated constants to memory.  */
|          return cost < 4 ? cost : 0;  #1321
|  }

This definitely needs to be tracked separately in a PR
And when we solve it, it'll likely need to be uarch driven.  I've hinted 
before that there'll be changes in this area that will allow some 
targets to handle most, if not all, of those complicated constants in a 
single cycle.






With a suitable comment about GCSE and the general desire to duplicate 
the last op rather than CSE the constant for multi instruction 
constant synthesis cases.


Hmm, do we not prefer GCSE/CSE a 3-4 insn const sequence. It seems the 
RA is capable enough of undoing that ;-)

For now I can I can keep the comment with current philosophy
I mentally set aside the multi-insn sequence concern.  Presumably the 
code was written that way for a reason and without a good one, backed by 
data, I was hesitant to push back on that choice.






If you agree, then consider the patch pre-approved with that change. 
If not, then state why and your original patch is OK as well.


I'm afraid I have more questions than either of us were hoping for :-)

But it seems we can chunk up the work for just Hoist enabling and then 
improve the const cost model with that new PR.

I'll spin up a buildroot testbed to get a wider impact of that change.
Exactly.  THere's multiple issues, but I think they can be tackled 
independently.


jeff

Re: [PATCH] c++: refine CWG 2369 satisfaction vs non-dep convs [PR99599]

2023-08-24 Thread Patrick Palka via Gcc-patches

On Wed, 23 Aug 2023, Jason Merrill wrote:

> On 8/21/23 21:51, Patrick Palka wrote:
> > Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look like
> > a reasonable approach?  I didn't observe any compile time/memory impact
> > of this change.
> > 
> > -- >8 --
> > 
> > As described in detail in the PR, CWG 2369 has the surprising
> > consequence of introducing constraint recursion in seemingly valid and
> > innocent code.
> > 
> > This patch attempts to fix this surpising behavior for the majority
> > of problematic use cases.  Rather than checking satisfaction before
> > _all_ non-dependent conversions, as specified by the CWG issue,
> > this patch makes us first check "safe" non-dependent conversions,
> > then satisfaction, then followed by "unsafe" non-dependent conversions.
> > In this case, a conversion is "safe" if computing it is guaranteed
> > to not induce template instantiation.  This patch heuristically
> > determines "safety" by checking for a constructor template or conversion
> > function template in the (class) parm or arg types respectively.
> > If neither type has such a member, then computing the conversion
> > should not induce instantiation (modulo satisfaction checking of
> > non-template constructor and conversion functions I suppose).
> > 
> > + /* We're checking only non-instantiating conversions.
> > +A conversion may instantiate only if it's to/from a
> > +class type that has a constructor template/conversion
> > +function template.  */
> > + tree parm_nonref = non_reference (parm);
> > + tree type_nonref = non_reference (type);
> > +
> > + if (CLASS_TYPE_P (parm_nonref))
> > +   {
> > + if (!COMPLETE_TYPE_P (parm_nonref)
> > + && CLASSTYPE_TEMPLATE_INSTANTIATION (parm_nonref))
> > +   return unify_success (explain_p);
> > +
> > + tree ctors = get_class_binding (parm_nonref,
> > + complete_ctor_identifier);
> > + for (tree ctor : lkp_range (ctors))
> > +   if (TREE_CODE (ctor) == TEMPLATE_DECL)
> > + return unify_success (explain_p);
> 
> Today we discussed maybe checking CLASSTYPE_NON_AGGREGATE?

Done; all dups of this PR seem to use tag types that are aggregates, so this
seems like a good simplification.  I also made us punt if the arg type has a
constrained non-template conversion function.

> 
> Also, instantiation can also happen when checking for conversion to a pointer
> or reference to base class.

Oops, I suppose we just need to strip pointer types upfront as well.  The
!COMPLETE_TYPE_P && CLASSTYPE_TEMPLATE_INSTANTIATION tests will then make
sure we deem a potential derived-to-base conversion unsafe if appropriate
IIUC.

How does the following look?

-- >8 --

Subject: [PATCH] c++: refine CWG 2369 satisfaction vs non-dep convs [PR99599]

PR c++/99599

gcc/cp/ChangeLog:

* config-lang.in (gtfiles): Add search.cc.
* pt.cc (check_non_deducible_conversions): Add bool parameter
passed down to check_non_deducible_conversion.
(fn_type_unification): Call check_non_deducible_conversions
an extra time before satisfaction with noninst_only_p=true.
(check_non_deducible_conversion): Add bool parameter controlling
whether to compute only conversions that are guaranteed to
not induce template instantiation.
* search.cc (conversions_cache): Define.
(lookup_conversions): Use it to cache the lookup.  Improve cache
rate by considering TYPE_MAIN_VARIANT of the type.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-nondep4.C: New test.
---
 gcc/cp/config-lang.in |  1 +
 gcc/cp/pt.cc  | 81 +--
 gcc/cp/search.cc  | 14 +++-
 gcc/testsuite/g++.dg/cpp2a/concepts-nondep4.C | 21 +
 4 files changed, 110 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-nondep4.C

diff --git a/gcc/cp/config-lang.in b/gcc/cp/config-lang.in
index a6c7883cc24..e34c392d208 100644
--- a/gcc/cp/config-lang.in
+++ b/gcc/cp/config-lang.in
@@ -52,6 +52,7 @@ gtfiles="\
 \$(srcdir)/cp/name-lookup.cc \
 \$(srcdir)/cp/parser.cc \$(srcdir)/cp/pt.cc \
 \$(srcdir)/cp/rtti.cc \
+\$(srcdir)/cp/search.cc \
 \$(srcdir)/cp/semantics.cc \
 \$(srcdir)/cp/tree.cc \$(srcdir)/cp/typeck2.cc \
 \$(srcdir)/cp/vtable-class-hierarchy.cc \
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index a4809f034dc..3c77d2466eb 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -151,7 +151,7 @@ static tree get_partial_spec_bindings (tree, tree, tree);
 static void tsubst_enum(tree, tree, tree);
 static bool check_instantiated_args (tree, tree, tsubst_flags_t);
 static int check_non_deducible_conversion (tree, tree, unification_kind_t, int,
-  struct conversion **, bool);
+  struct con

Re: Check that passes do not forget to define profile

2023-08-24 Thread Richard Biener via Gcc-patches

On Thu, Aug 24, 2023 at 3:15 PM Jan Hubicka via Gcc-patches
 wrote:
>
> Hi,
> this patch extends verifier to check that all probabilities and counts are
> initialized if profile is supposed to be present.  This is a bit complicated
> by the posibility that we inline !flag_guess_branch_probability function
> into function with profile defined and in this case we need to stop
> verification.  For this reason I added flag to cfg structure tracking this.
>
> Bootstrapped/regtested x86_64-linux, comitted.

Couldn't we have massaged profile_status to avoid extra full_profile?
Aka add PROFILE_{READ,GUESSED}_PARTIAL?

> gcc/ChangeLog:
>
> * cfg.h (struct control_flow_graph): New field full_profile.
> * auto-profile.cc (afdo_annotate_cfg): Set full_profile to true.
> * cfg.cc (init_flow): Set full_profile to false.
> * graphite.cc (graphite_transform_loops): Set full_profile to false.
> * lto-streamer-in.cc (input_cfg): Initialize full_profile flag.
> * predict.cc (pass_profile::execute): Set full_profile to true.
> * symtab-thunks.cc (expand_thunk): Set full_profile to true.
> * tree-cfg.cc (gimple_verify_flow_info): Verify that profile is full
> if full_profile is set.
> * tree-inline.cc (initialize_cfun): Initialize full_profile.
> (expand_call_inline): Combine full_profile.
>
>
> diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
> index e3af3555e75..ff3b763945c 100644
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -1578,6 +1578,7 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts)
>  }
>update_max_bb_count ();
>profile_status_for_fn (cfun) = PROFILE_READ;
> +  cfun->cfg->full_profile = true;
>if (flag_value_profile_transformations)
>  {
>gimple_value_profile_transformations ();
> diff --git a/gcc/cfg.cc b/gcc/cfg.cc
> index 9eb9916f61a..b7865f14e7f 100644
> --- a/gcc/cfg.cc
> +++ b/gcc/cfg.cc
> @@ -81,6 +81,7 @@ init_flow (struct function *the_fun)
>  = ENTRY_BLOCK_PTR_FOR_FN (the_fun);
>the_fun->cfg->edge_flags_allocated = EDGE_ALL_FLAGS;
>the_fun->cfg->bb_flags_allocated = BB_ALL_FLAGS;
> +  the_fun->cfg->full_profile = false;
>  }
>
>  /* Helper function for remove_edge and free_cffg.  Frees edge structure
> diff --git a/gcc/cfg.h b/gcc/cfg.h
> index a0e944979c8..53e2553012c 100644
> --- a/gcc/cfg.h
> +++ b/gcc/cfg.h
> @@ -78,6 +78,9 @@ struct GTY(()) control_flow_graph {
>/* Dynamically allocated edge/bb flags.  */
>int edge_flags_allocated;
>int bb_flags_allocated;
> +
> +  /* Set if the profile is computed on every edge and basic block.  */
> +  bool full_profile;
>  };
>
>
> diff --git a/gcc/graphite.cc b/gcc/graphite.cc
> index 19f8975ffa2..2b387d5b016 100644
> --- a/gcc/graphite.cc
> +++ b/gcc/graphite.cc
> @@ -512,6 +512,8 @@ graphite_transform_loops (void)
>
>if (changed)
>  {
> +  /* FIXME: Graphite does not update profile meaningfully currently.  */
> +  cfun->cfg->full_profile = false;
>cleanup_tree_cfg ();
>profile_status_for_fn (cfun) = PROFILE_ABSENT;
>release_recorded_exits (cfun);
> diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
> index 0cce14414ca..d3128fcebe4 100644
> --- a/gcc/lto-streamer-in.cc
> +++ b/gcc/lto-streamer-in.cc
> @@ -1030,6 +1030,7 @@ input_cfg (class lto_input_block *ib, class data_in 
> *data_in,
>basic_block p_bb;
>unsigned int i;
>int index;
> +  bool full_profile = false;
>
>init_empty_tree_cfg_for_function (fn);
>
> @@ -1071,6 +1072,8 @@ input_cfg (class lto_input_block *ib, class data_in 
> *data_in,
>   data_in->location_cache.input_location_and_block (&e->goto_locus,
> &bp, ib, data_in);
>   e->probability = profile_probability::stream_in (ib);
> + if (!e->probability.initialized_p ())
> +   full_profile = false;
>
> }
>
> @@ -1145,6 +1148,7 @@ input_cfg (class lto_input_block *ib, class data_in 
> *data_in,
>
>/* Rebuild the loop tree.  */
>flow_loops_find (loops);
> +  cfun->cfg->full_profile = full_profile;
>  }
>
>
> diff --git a/gcc/predict.cc b/gcc/predict.cc
> index 5a1a561cc24..396746cbfd1 100644
> --- a/gcc/predict.cc
> +++ b/gcc/predict.cc
> @@ -4131,6 +4131,7 @@ pass_profile::execute (function *fun)
>  scev_initialize ();
>
>tree_estimate_probability (false);
> +  cfun->cfg->full_profile = true;
>
>if (nb_loops > 1)
>  scev_finalize ();
> diff --git a/gcc/symtab-thunks.cc b/gcc/symtab-thunks.cc
> index 4c04235c41b..23ead0d2138 100644
> --- a/gcc/symtab-thunks.cc
> +++ b/gcc/symtab-thunks.cc
> @@ -648,6 +648,7 @@ expand_thunk (cgraph_node *node, bool output_asm_thunks,
>   ? PROFILE_READ : PROFILE_GUESSED;
>/* FIXME: C++ FE should stop setting TREE_ASM_WRITTEN on thunks.  */
>TREE_ASM_WRITTEN (thunk_fndecl) = false;
> +  cfun->cfg->full_profile = true;
>delete_unreachable_

Check that passes do not forget to define profile

2023-08-24 Thread Jan Hubicka via Gcc-patches

Hi,
this patch extends verifier to check that all probabilities and counts are
initialized if profile is supposed to be present.  This is a bit complicated
by the posibility that we inline !flag_guess_branch_probability function
into function with profile defined and in this case we need to stop
verification.  For this reason I added flag to cfg structure tracking this.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/ChangeLog:

* cfg.h (struct control_flow_graph): New field full_profile.
* auto-profile.cc (afdo_annotate_cfg): Set full_profile to true.
* cfg.cc (init_flow): Set full_profile to false.
* graphite.cc (graphite_transform_loops): Set full_profile to false.
* lto-streamer-in.cc (input_cfg): Initialize full_profile flag.
* predict.cc (pass_profile::execute): Set full_profile to true.
* symtab-thunks.cc (expand_thunk): Set full_profile to true.
* tree-cfg.cc (gimple_verify_flow_info): Verify that profile is full
if full_profile is set.
* tree-inline.cc (initialize_cfun): Initialize full_profile.
(expand_call_inline): Combine full_profile.


diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index e3af3555e75..ff3b763945c 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1578,6 +1578,7 @@ afdo_annotate_cfg (const stmt_set &promoted_stmts)
 }
   update_max_bb_count ();
   profile_status_for_fn (cfun) = PROFILE_READ;
+  cfun->cfg->full_profile = true;
   if (flag_value_profile_transformations)
 {
   gimple_value_profile_transformations ();
diff --git a/gcc/cfg.cc b/gcc/cfg.cc
index 9eb9916f61a..b7865f14e7f 100644
--- a/gcc/cfg.cc
+++ b/gcc/cfg.cc
@@ -81,6 +81,7 @@ init_flow (struct function *the_fun)
 = ENTRY_BLOCK_PTR_FOR_FN (the_fun);
   the_fun->cfg->edge_flags_allocated = EDGE_ALL_FLAGS;
   the_fun->cfg->bb_flags_allocated = BB_ALL_FLAGS;
+  the_fun->cfg->full_profile = false;
 }
 
 /* Helper function for remove_edge and free_cffg.  Frees edge structure
diff --git a/gcc/cfg.h b/gcc/cfg.h
index a0e944979c8..53e2553012c 100644
--- a/gcc/cfg.h
+++ b/gcc/cfg.h
@@ -78,6 +78,9 @@ struct GTY(()) control_flow_graph {
   /* Dynamically allocated edge/bb flags.  */
   int edge_flags_allocated;
   int bb_flags_allocated;
+
+  /* Set if the profile is computed on every edge and basic block.  */
+  bool full_profile;
 };
 
 
diff --git a/gcc/graphite.cc b/gcc/graphite.cc
index 19f8975ffa2..2b387d5b016 100644
--- a/gcc/graphite.cc
+++ b/gcc/graphite.cc
@@ -512,6 +512,8 @@ graphite_transform_loops (void)
 
   if (changed)
 {
+  /* FIXME: Graphite does not update profile meaningfully currently.  */
+  cfun->cfg->full_profile = false;
   cleanup_tree_cfg ();
   profile_status_for_fn (cfun) = PROFILE_ABSENT;
   release_recorded_exits (cfun);
diff --git a/gcc/lto-streamer-in.cc b/gcc/lto-streamer-in.cc
index 0cce14414ca..d3128fcebe4 100644
--- a/gcc/lto-streamer-in.cc
+++ b/gcc/lto-streamer-in.cc
@@ -1030,6 +1030,7 @@ input_cfg (class lto_input_block *ib, class data_in 
*data_in,
   basic_block p_bb;
   unsigned int i;
   int index;
+  bool full_profile = false;
 
   init_empty_tree_cfg_for_function (fn);
 
@@ -1071,6 +1072,8 @@ input_cfg (class lto_input_block *ib, class data_in 
*data_in,
  data_in->location_cache.input_location_and_block (&e->goto_locus,
&bp, ib, data_in);
  e->probability = profile_probability::stream_in (ib);
+ if (!e->probability.initialized_p ())
+   full_profile = false;
 
}
 
@@ -1145,6 +1148,7 @@ input_cfg (class lto_input_block *ib, class data_in 
*data_in,
 
   /* Rebuild the loop tree.  */
   flow_loops_find (loops);
+  cfun->cfg->full_profile = full_profile;
 }
 
 
diff --git a/gcc/predict.cc b/gcc/predict.cc
index 5a1a561cc24..396746cbfd1 100644
--- a/gcc/predict.cc
+++ b/gcc/predict.cc
@@ -4131,6 +4131,7 @@ pass_profile::execute (function *fun)
 scev_initialize ();
 
   tree_estimate_probability (false);
+  cfun->cfg->full_profile = true;
 
   if (nb_loops > 1)
 scev_finalize ();
diff --git a/gcc/symtab-thunks.cc b/gcc/symtab-thunks.cc
index 4c04235c41b..23ead0d2138 100644
--- a/gcc/symtab-thunks.cc
+++ b/gcc/symtab-thunks.cc
@@ -648,6 +648,7 @@ expand_thunk (cgraph_node *node, bool output_asm_thunks,
  ? PROFILE_READ : PROFILE_GUESSED;
   /* FIXME: C++ FE should stop setting TREE_ASM_WRITTEN on thunks.  */
   TREE_ASM_WRITTEN (thunk_fndecl) = false;
+  cfun->cfg->full_profile = true;
   delete_unreachable_blocks ();
   update_ssa (TODO_update_ssa);
   checking_verify_flow_info ();
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 272d5ce321e..ffab7518b15 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -5684,6 +5684,26 @@ gimple_verify_flow_info (void)
error ("fallthru to exit from bb %d", e->src->index);
err = true;
   }
+  if (cfun->cfg->full_profile
+  && !ENTRY

[committed] libstdc++: Add test for illegal pointer arithmetic in format [PR111102]

2023-08-24 Thread Jonathan Wakely via Gcc-patches

From: Paul Dreik 

Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/02
* testsuite/std/format/string.cc: Check wide character format
strings with out-of-range widths.
---
 libstdc++-v3/testsuite/std/format/string.cc | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/libstdc++-v3/testsuite/std/format/string.cc 
b/libstdc++-v3/testsuite/std/format/string.cc
index fef55b9bcd9..a472f8d588c 100644
--- a/libstdc++-v3/testsuite/std/format/string.cc
+++ b/libstdc++-v3/testsuite/std/format/string.cc
@@ -16,6 +16,18 @@ is_format_string_for(const char* str, Args&&... args)
   }
 }
 
+template
+bool
+is_format_string_for(const wchar_t* str, Args&&... args)
+{
+  try {
+(void) std::vformat(str, std::make_wformat_args(args...));
+return true;
+  } catch (const std::format_error&) {
+return false;
+  }
+}
+
 void
 test_no_args()
 {
@@ -124,8 +136,11 @@ test_format_spec()
 
   // Maximum integer value supported for widths and precisions is USHRT_MAX.
   VERIFY( is_format_string_for("{:65535}", 1) );
+  VERIFY( is_format_string_for(L"{:65535}", 1) );
   VERIFY( ! is_format_string_for("{:65536}", 1) );
+  VERIFY( ! is_format_string_for(L"{:65536}", 1) );
   VERIFY( ! is_format_string_for("{:999}", 1) );
+  VERIFY( ! is_format_string_for(L"{:999}", 1) );
 }
 
 void
-- 
2.41.0

Re: [PATCH] Fix for bug libstdc++/111102 pointer arithmetic on nullptr

2023-08-24 Thread Jonathan Wakely via Gcc-patches

On Wed, 23 Aug 2023 at 19:48, Paul Dreik via Libstdc++
 wrote:
>
> This fixes pointer arithmetic made on a null pointer, which I found
> through fuzzing.
> Tested on debian/amd64.
>
> Thanks, Paul

Thanks. Pushed to trunk, backport to gcc-13 to follow.

I also added your testcase from the bug report to the testsuite.



>
> 
> commit 78ac41590432f4f01036797fd9d661f6ed80cf37 (HEAD -> master)
> Author: Paul Dreik 
> Date:   Tue Aug 22 19:16:57 2023 +0200
>
>  libstdc++: fix illegal pointer arithmetic in format
>
>  when parsing a format string, the width is parsed into an unsigned
> short
>  but the result is not checked in the case the format string is not a
>  char string (such as a wide string). in case the parse fails,
>  a null pointer is returned which is used for pointer arithmetic
>  which is undefined behaviour.
>
>  Signed-off-by: Paul Dreik 
>
> diff --git a/libstdc++-v3/include/std/format
> b/libstdc++-v3/include/std/format
> index f3d9ae152f..fe2caa5868 100644
> --- a/libstdc++-v3/include/std/format
> +++ b/libstdc++-v3/include/std/format
> @@ -285,7 +285,8 @@ namespace __format
>for (int __i = 0; __i < __n && (__first + __i) != __last; ++__i)
>  __buf[__i] = __first[__i];
>auto [__v, __ptr] = __format::__parse_integer(__buf, __buf +
> __n);
> - return {__v, __first + (__ptr - __buf)};
> + if (__ptr) [[likely]]
> +   return {__v, __first + (__ptr - __buf)};
>  }
> return {0, nullptr};
>   }

[committed] libstdc++: Tweak some preprocessor conditions for feature tests

2023-08-24 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

Update a preprocessor condition using __cplusplus and _GLIBCXX_HOSTED
to use the relevant feature test macro for .

Also add comments to some conditions saying which C++ standard revision
the check corresponds to.

libstdc++-v3/ChangeLog:

* include/std/atomic: Add comment to #ifdef and fix indentation.
* include/std/ostream: Check __glibcxx_syncbuf instead of
__cplusplus and _GLIBCXX_HOSTED.
* include/std/thread: Add comment to #ifdef.
---
 libstdc++-v3/include/std/atomic  | 28 ++--
 libstdc++-v3/include/std/ostream |  6 +++---
 libstdc++-v3/include/std/thread  |  2 +-
 3 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index da99169cab5..713ee2cc539 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -388,23 +388,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return compare_exchange_strong(__e, __i, __m,
__cmpexch_failure_order(__m)); }
 
-#if __cpp_lib_atomic_wait
-void
-wait(_Tp __old, memory_order __m = memory_order_seq_cst) const noexcept
-{
-  std::__atomic_wait_address_v(&_M_i, __old,
-[__m, this] { return this->load(__m); });
-}
+#if __cpp_lib_atomic_wait // C++ >= 20
+  void
+  wait(_Tp __old, memory_order __m = memory_order_seq_cst) const noexcept
+  {
+   std::__atomic_wait_address_v(&_M_i, __old,
+  [__m, this] { return this->load(__m); });
+  }
 
-// TODO add const volatile overload
+  // TODO add const volatile overload
 
-void
-notify_one() noexcept
-{ std::__atomic_notify_address(&_M_i, false); }
+  void
+  notify_one() noexcept
+  { std::__atomic_notify_address(&_M_i, false); }
 
-void
-notify_all() noexcept
-{ std::__atomic_notify_address(&_M_i, true); }
+  void
+  notify_all() noexcept
+  { std::__atomic_notify_address(&_M_i, true); }
 #endif // __cpp_lib_atomic_wait
 
 };
diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream
index 4711b8a3d96..5f973fa11ed 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -39,6 +39,7 @@
 
 #include 
 #include 
+#include  // __glibcxx_syncbuf
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
@@ -804,7 +805,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return std::move(__os);
 }
 
-#if __cplusplus > 201703L && _GLIBCXX_USE_CXX11_ABI
+#ifdef __glibcxx_syncbuf // C++ >= 20 && HOSTED && CXX11ABI
   template
 class __syncbuf_base : public basic_streambuf<_CharT, _Traits>
 {
@@ -869,8 +870,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __os.flush();
   return __os;
 }
-
-#endif // C++20
+#endif // __glibcxx_syncbuf
 
 #endif // C++11
 
diff --git a/libstdc++-v3/include/std/thread b/libstdc++-v3/include/std/thread
index 2c049edcbd6..28582c9df5c 100644
--- a/libstdc++-v3/include/std/thread
+++ b/libstdc++-v3/include/std/thread
@@ -106,7 +106,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
   /// @}
 
-#ifdef __cpp_lib_jthread
+#ifdef __cpp_lib_jthread // C++ >= 20
 
   /// @cond undocumented
 #ifndef __STRICT_ANSI__
-- 
2.41.0

[committed] libstdc++: Fix -Wunused-but-set-variable in std::format_to test

2023-08-24 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* testsuite/std/format/functions/format_to.cc: Avoid warning for
unused variables.
---
 libstdc++-v3/testsuite/std/format/functions/format_to.cc | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/testsuite/std/format/functions/format_to.cc 
b/libstdc++-v3/testsuite/std/format/functions/format_to.cc
index a35568954e2..c5c3c503625 100644
--- a/libstdc++-v3/testsuite/std/format/functions/format_to.cc
+++ b/libstdc++-v3/testsuite/std/format/functions/format_to.cc
@@ -69,14 +69,14 @@ test_move_only()
 {
   std::string str;
   move_only_iterator mo(std::back_inserter(str));
-  auto res = std::format_to(std::move(mo), "for{:.3} that{:c}",
-"matte", (int)'!');
+  [[maybe_unused]] auto res
+= std::format_to(std::move(mo), "for{:.3} that{:c}", "matte", (int)'!');
   VERIFY( str == "format that!" );
 
   std::vector vec;
   move_only_iterator wmo(std::back_inserter(vec));
-  auto wres = std::format_to(std::move(wmo), L"for{:.3} hat{:c}",
-  L"matte", (long)L'!');
+  [[maybe_unused]] auto wres
+= std::format_to(std::move(wmo), L"for{:.3} hat{:c}", L"matte", 
(long)L'!');
   VERIFY( std::wstring_view(vec.data(), vec.size()) == L"format hat!" );
 }
 
-- 
2.41.0

[committed] libstdc++: Implement new SI prefixes in for C++23 (P2734R0)

2023-08-24 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

This is a no-op for libstdc++, because our intmax_t is a 64-bit type and
so is incapable of representing the largest and smallest ratios from
C++11, let alone the new ones. I've added them to the file anyway (and
defined the feature test macro) so that if somebody ports libstdc++ to a
target with 128-bit intmax_t then they'll be present.

libstdc++-v3/ChangeLog:

* include/bits/version.def (__cpp_lib_ratio): Define.
* include/bits/version.h: Regenerate.
* include/std/ratio (quecto, ronto, yocto, zepto)
(zetta, yotta, ronna, quetta): Define.
* testsuite/20_util/ratio/operations/ops_overflow_neg.cc: Adjust
dg-error line numbers.
---
 libstdc++-v3/include/bits/version.def |  8 +++
 libstdc++-v3/include/bits/version.h   | 11 
 libstdc++-v3/include/std/ratio| 56 +--
 .../ratio/operations/ops_overflow_neg.cc  |  6 +-
 4 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/libstdc++-v3/include/bits/version.def 
b/libstdc++-v3/include/bits/version.def
index b50050440d9..80c13d4a447 100644
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -1582,6 +1582,14 @@ ftms = {
   };
 };
 
+ftms = {
+  name = ratio;
+  values = {
+v = 202306;
+cxxmin = 26;
+  };
+};
+
 ftms = {
   name = to_string;
   values = {
diff --git a/libstdc++-v3/include/bits/version.h 
b/libstdc++-v3/include/bits/version.h
index 8b8c70a6e53..5bddb4b8adc 100644
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -1940,6 +1940,17 @@
 #undef __glibcxx_want_string_resize_and_overwrite
 
 // from version.def line 1586
+#if !defined(__cpp_lib_ratio)
+# if (__cplusplus >  202302L)
+#  define __glibcxx_ratio 202306L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_ratio)
+#   define __cpp_lib_ratio 202306L
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_ratio) && defined(__glibcxx_want_ratio) */
+#undef __glibcxx_want_ratio
+
+// from version.def line 1594
 #if !defined(__cpp_lib_to_string)
 # if (__cplusplus >  202302L) && _GLIBCXX_HOSTED && (__glibcxx_to_chars)
 #  define __glibcxx_to_string 202306L
diff --git a/libstdc++-v3/include/std/ratio b/libstdc++-v3/include/std/ratio
index 1d285bf916f..c87f54fe1a2 100644
--- a/libstdc++-v3/include/std/ratio
+++ b/libstdc++-v3/include/std/ratio
@@ -39,6 +39,9 @@
 #include 
 #include  // intmax_t, uintmax_t
 
+#define __glibcxx_want_ratio
+#include 
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -602,23 +605,42 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 using ratio_subtract = typename __ratio_subtract<_R1, _R2>::type;
 
-
-  typedef ratio<1,   100> atto;
-  typedef ratio<1,  1000> femto;
-  typedef ratio<1, 1> pico;
-  typedef ratio<1,10> nano;
-  typedef ratio<1,   100> micro;
-  typedef ratio<1,  1000> milli;
-  typedef ratio<1,   100> centi;
-  typedef ratio<1,10> deci;
-  typedef ratio<   10, 1> deca;
-  typedef ratio<  100, 1> hecto;
-  typedef ratio< 1000, 1> kilo;
-  typedef ratio<  100, 1> mega;
-  typedef ratio<   10, 1> giga;
-  typedef ratio<1, 1> tera;
-  typedef ratio< 1000, 1> peta;
-  typedef ratio<  100, 1> exa;
+#if __INTMAX_WIDTH__ >= 96
+# if __cpp_lib_ratio >= 202306L
+#  if __INTMAX_WIDTH__ >= 128
+  using quecto = ratio<  1, 100>;
+#  endif
+  using ronto  = ratio< 1, 1000>;
+# endif
+  using yocto  = ratio<1, 1>;
+  using zepto  = ratio<1,10>;
+#endif
+  using atto   = ratio<1,   100>;
+  using femto  = ratio<1,  1000>;
+  using pico   = ratio<1, 1>;
+  using nano   = ratio<1,10>;
+  using micro  = ratio<1,   100>;
+  using milli  = ratio<1,  1000>;
+  using centi  = ratio<1,   100>;
+  using deci   = ratio<1,10>;
+  using deca   = ratio<   10, 1>;
+  using hecto  = ratio<  100, 1>;
+  using kilo   = ratio< 1000, 1>;
+  using mega   = ratio<  100,

Re: [PATCH] tree-optimization/111115 - SLP of masked stores

2023-08-24 Thread Richard Biener via Gcc-patches

On Thu, 24 Aug 2023, Robin Dapp wrote:

> This causes an ICE in
> gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
> (internal compiler error: in get_group_load_store_type, at 
> tree-vect-stmts.cc:2121)
> 
> #include 
> 
> #define TEST_LOOP(DATA_TYPE, INDEX_TYPE)  
>  \
>   void __attribute__ ((noinline, noclone))
>  \
>   f_##DATA_TYPE##_##INDEX_TYPE (DATA_TYPE *restrict y, DATA_TYPE *restrict x, 
>  \
>   INDEX_TYPE *restrict index,\
>   INDEX_TYPE *restrict cond) \
>   {   
>  \
> for (int i = 0; i < 100; ++i) 
>  \
>   {   
>  \
>   if (cond[i * 2])   \
> y[i * 2] = x[index[i * 2]] + 1;  \
>   if (cond[i * 2 + 1])   \
> y[i * 2 + 1] = x[index[i * 2 + 1]] + 2;  \
>   }   
>  \
>   }
> 
> TEST_LOOP (int8_t, int8_t)
> 
> Is there now a mismatch with the LEN_ IFNs somewhere?

Can you open a bugreport, produce a preprocessed testcase and state
the cc1 commandline to debug this in a cross?

Richard.

[committed] libstdc++: Add pretty printer for std::locale

2023-08-24 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk. Maybe worth backporting.

-- >8 --

Print the locale's name, except when it uses the same named C locale for
all categories except one, in which case print something like:
std::locale = "en_GB.UTF-8" with "LC_CTYPE=en_US.UTF-8"

libstdc++-v3/ChangeLog:

* python/libstdcxx/v6/printers.py (StdLocalePrinter): New
printer class.
* testsuite/libstdc++-prettyprinters/locale.cc: New test.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py  | 45 +++
 .../libstdc++-prettyprinters/locale.cc| 36 +++
 2 files changed, 81 insertions(+)
 create mode 100644 libstdc++-v3/testsuite/libstdc++-prettyprinters/locale.cc

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 1a8017adb22..37a447b514b 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -2131,6 +2131,50 @@ class StdChronoTimeZoneRulePrinter:
 return 'time_zone rule {} from {} to {} starting on {}'.format(
 self.val['name'], self.val['from'], self.val['to'], start)
 
+class StdLocalePrinter:
+"Print a std::locale"
+
+def __init__(self, typename, val):
+self.val = val
+self.typename = typename
+
+def to_string(self):
+names = self.val['_M_impl']['_M_names']
+mod = ''
+if names[0] == 0:
+name = '*'
+else:
+cats = gdb.parse_and_eval(self.typename + '::_S_categories')
+ncat = gdb.parse_and_eval(self.typename + '::_S_categories_size')
+n = names[0].string();
+cat = cats[0].string()
+name = '{}={}'.format(cat, n)
+cat_names = {cat: n}
+i = 1
+while i < ncat and names[i] != 0:
+n = names[i].string()
+cat = cats[i].string()
+name = '{};{}={}'.format(name, cat, n)
+cat_names[cat] = n
+i = i + 1
+uniq_names = set(cat_names.values())
+if len(uniq_names) == 1:
+name = n
+elif len(uniq_names) == 2:
+n1, n2 = (uniq_names)
+name_list = list(cat_names.values())
+other = None
+if name_list.count(n1) == 1:
+name = n2
+other = n1
+elif name_list.count(n2) == 1:
+name = n1
+other = n2
+if other is not None:
+cat = next(c for c,n in cat_names.items() if n == other)
+mod = ' with "{}={}"'.format(cat, other)
+return 'std::locale = "{}"{}'.format(name, mod)
+
 
 # A "regular expression" printer which conforms to the
 # "SubPrettyPrinter" protocol from gdb.printing.
@@ -2585,6 +2629,7 @@ def build_libstdcxx_dictionary ():
 libstdcxx_printer.add_version('std::', 'unique_ptr', UniquePointerPrinter)
 libstdcxx_printer.add_container('std::', 'vector', StdVectorPrinter)
 # vector
+libstdcxx_printer.add_version('std::', 'locale', StdLocalePrinter)
 
 if hasattr(gdb.Value, 'dynamic_type'):
 libstdcxx_printer.add_version('std::', 'error_code',
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/locale.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/locale.cc
new file mode 100644
index 000..66d42f99432
--- /dev/null
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/locale.cc
@@ -0,0 +1,36 @@
+// { dg-do run }
+// { dg-options "-g -O0" }
+// { dg-require-namedlocale "fr_FR.ISO8859-15" }
+// { dg-require-namedlocale "de_DE.ISO8859-15" }
+// { dg-require-namedlocale "en_US.ISO8859-1" }
+
+#include 
+#include  // for ISO_8859 macro
+
+int main()
+{
+  std::locale l1 = std::locale::classic();
+// { dg-final { note-test l1 {std::locale = "C"} } }
+
+  std::locale l2(ISO_8859(15,fr_FR));
+// { dg-final { regexp-test l2 {std::locale = "fr_FR.ISO8859-15(@euro)?"} } }
+
+  std::locale l3(l2, ISO_8859(15,de_DE), std::locale::time);
+// { dg-final { regexp-test l3 {std::locale = "fr_FR.ISO8859-15(@euro)?" with 
"LC_TIME=de_DE.ISO8859-15(@euro)?"} } }
+
+  std::locale l4(l3, ISO_8859(1,en_US), std::locale::monetary);
+// We don't know which order the categories will occur in the string,
+// so test three times, checking for the required substring each time:
+// { dg-final { regexp-test l4 {std::locale = 
"(.*;)?LC_CTYPE=fr_FR.ISO8859-15(@euro)?(;.*)?"} } }
+  std::locale l5 = l4;
+// { dg-final { regexp-test l5 {std::locale = 
"(.*;)?LC_TIME=de_DE.ISO8859-15(@euro)?(;.*)?"} } }
+  std::locale l6 = l5;
+// { dg-final { regexp-test l6 {std::locale = 
"(.*;)?LC_MONETARY=en_US.ISO8859-1(;.*)?"} } }
+
+  std::locale l7(l1, &std::use_facet >(l1));
+// { dg-final { regexp-test l7 {std::locale = "\*"} } }
+
+  return 0;// Mark SPOT
+}
+
+// { dg-final { gdb-test SPOT } }
-- 
2.41.0

[PATCH] Fix confusion about load_p in vect_build_slp_tree_1

2023-08-24 Thread Richard Biener via Gcc-patches

load_p is set and used as to whether the stmt is a memory operation,
not whether it is only a load.  The following renames it to ldst_p
to avoid this confusion.  It also replaces checking for a VUSE
with checking STMT_VINFO_DATA_REF since VUSE checking doesn't
work for pattern matched stores where no virtual operands are
present.  Where we want to distinguish between loads and stores
we then check DR_IS_READ/WRITE.

I've made a classification mistake with .MASK_STORE support and
this hits other complications when dealing with single-lane SLP.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-vect-slp.cc (vect_build_slp_tree_1): Rename
load_p to ldst_p, fix mistakes and rely on
STMT_VINFO_DATA_REF.
---
 gcc/tree-vect-slp.cc | 42 --
 1 file changed, 24 insertions(+), 18 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 0b1c2233017..0cf6e02285e 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -984,7 +984,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
   bool need_same_oprnds = false;
   tree vectype = NULL_TREE, first_op1 = NULL_TREE;
   stmt_vec_info first_load = NULL, prev_first_load = NULL;
-  bool first_stmt_load_p = false, load_p = false;
+  bool first_stmt_ldst_p = false, ldst_p = false;
   bool first_stmt_phi_p = false, phi_p = false;
   bool maybe_soft_fail = false;
   tree soft_fail_nunits_vectype = NULL_TREE;
@@ -1074,9 +1074,12 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  if (cfn == CFN_MASK_LOAD
  || cfn == CFN_GATHER_LOAD
  || cfn == CFN_MASK_GATHER_LOAD)
-   load_p = true;
+   ldst_p = true;
  else if (cfn == CFN_MASK_STORE)
-   rhs_code = CFN_MASK_STORE;
+   {
+ ldst_p = true;
+ rhs_code = CFN_MASK_STORE;
+   }
  else if ((internal_fn_p (cfn)
&& !vectorizable_internal_fn_p (as_internal_fn (cfn)))
   || gimple_call_tail_p (call_stmt)
@@ -1102,7 +1105,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   else
{
  rhs_code = gimple_assign_rhs_code (stmt);
- load_p = gimple_vuse (stmt);
+ ldst_p = STMT_VINFO_DATA_REF (stmt_info) != nullptr;
}
 
   /* Check the operation.  */
@@ -1110,7 +1113,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
{
  *node_vectype = vectype;
  first_stmt_code = rhs_code;
- first_stmt_load_p = load_p;
+ first_stmt_ldst_p = ldst_p;
  first_stmt_phi_p = phi_p;
 
  /* Shift arguments should be equal in all the packed stmts for a
@@ -1144,7 +1147,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   need_same_oprnds = true;
   first_op1 = gimple_assign_rhs2 (stmt);
 }
- else if (!load_p
+ else if (!ldst_p
   && rhs_code == BIT_FIELD_REF)
{
  tree vec = TREE_OPERAND (gimple_assign_rhs1 (stmt), 0);
@@ -1207,7 +1210,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
|| rhs_code == INDIRECT_REF
|| rhs_code == COMPONENT_REF
|| rhs_code == MEM_REF)))
- || first_stmt_load_p != load_p
+ || first_stmt_ldst_p != ldst_p
  || first_stmt_phi_p != phi_p)
{
  if (dump_enabled_p ())
@@ -1222,7 +1225,7 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
  continue;
}
 
- if (!load_p
+ if (!ldst_p
  && first_stmt_code == BIT_FIELD_REF
  && (TREE_OPERAND (gimple_assign_rhs1 (first_stmt_info->stmt), 0)
  != TREE_OPERAND (gimple_assign_rhs1 (stmt_info->stmt), 0)))
@@ -1291,12 +1294,13 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   /* Grouped store or load.  */
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
{
- if (!load_p)
+ gcc_assert (ldst_p);
+ if (DR_IS_WRITE (STMT_VINFO_DATA_REF (stmt_info)))
{
  /* Store.  */
  gcc_assert (rhs_code == CFN_MASK_STORE
- || REFERENCE_CLASS_P (lhs));
- ;
+ || REFERENCE_CLASS_P (lhs)
+ || DECL_P (lhs));
}
  else
{
@@ -1321,10 +1325,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char 
*swap,
   else
 prev_first_load = first_load;
}
-} /* Grouped access.  */
-  else
+   }
+  /* Non-grouped store or load.  */
+  else if (ldst_p)
{
- if (load_p
+ if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
  && rhs_code != CFN_GATHER_LOAD
  && rhs_code != CFN_MASK_GA

[committed] libstdc++: Declutter std::optional and std:variant pretty printers [PR110944]

2023-08-24 Thread Jonathan Wakely via Gcc-patches

Tested x86_64-linux. Pushed to trunk.

-- >8 --

As the PR says, including the template arguments in the GDB output of
these class templates can result in very long names, especially for
std::variant. You can use 'whatis' or other GDB commands to get details
of the type, we don't need to include it in the value.

We could consider including the type if it's not too long, but I think
consistency is better (and we already omit the template arguments for
std::vector and other class templates).

libstdc++-v3/ChangeLog:

PR libstdc++/110944
* python/libstdcxx/v6/printers.py (StdExpOptionalPrinter): Do
not show template arguments.
(StdVariantPrinter): Likewise.
* testsuite/libstdc++-prettyprinters/compat.cc: Adjust expected
output.
* testsuite/libstdc++-prettyprinters/cxx17.cc: Likewise.
* testsuite/libstdc++-prettyprinters/libfundts.cc: Likewise.
---
 libstdc++-v3/python/libstdcxx/v6/printers.py  |  3 +--
 .../libstdc++-prettyprinters/compat.cc|  8 +++
 .../libstdc++-prettyprinters/cxx17.cc | 22 +--
 .../libstdc++-prettyprinters/libfundts.cc | 12 +-
 4 files changed, 22 insertions(+), 23 deletions(-)

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index 0187c4b60e6..1a8017adb22 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -1343,7 +1343,7 @@ class StdExpOptionalPrinter(SingleObjContainerPrinter):
 def __init__ (self, typename, val):
 valtype = self._recognize (val.type.template_argument(0))
 typename = strip_versioned_namespace(typename)
-self.typename = 
re.sub('^std::(experimental::|)(fundamentals_v\d::|)(.*)', r'std::\1\3<%s>' % 
valtype, typename, 1)
+self.typename = 
re.sub('^std::(experimental::|)(fundamentals_v\d::|)(.*)', r'std::\1\3', 
typename, 1)
 payload = val['_M_payload']
 if self.typename.startswith('std::experimental'):
 engaged = val['_M_engaged']
@@ -1375,7 +1375,6 @@ class StdVariantPrinter(SingleObjContainerPrinter):
 def __init__(self, typename, val):
 alternatives = get_template_arg_list(val.type)
 self.typename = strip_versioned_namespace(typename)
-self.typename = "%s<%s>" % (self.typename, ', 
'.join([self._recognize(alt) for alt in alternatives]))
 self.index = val['_M_index']
 if self.index >= len(alternatives):
 self.contained_type = None
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/compat.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/compat.cc
index 34022cf1459..acc20a30d8e 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/compat.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/compat.cc
@@ -102,13 +102,13 @@ main()
   using std::optional;
 
   optional o;
-// { dg-final { note-test o {std::optional [no contained value]} } }
+// { dg-final { note-test o {std::optional [no contained value]} } }
   optional ob{false};
-// { dg-final { note-test ob {std::optional = {[contained value] = 
false}} } }
+// { dg-final { note-test ob {std::optional = {[contained value] = false}} } }
   optional oi{5};
-// { dg-final { note-test oi {std::optional = {[contained value] = 5}} } }
+// { dg-final { note-test oi {std::optional = {[contained value] = 5}} } }
   optional op{nullptr};
-// { dg-final { note-test op {std::optional = {[contained value] = 
0x0}} } }
+// { dg-final { note-test op {std::optional = {[contained value] = 0x0}} } }
 
   __builtin_puts("");
   return 0;// Mark SPOT
diff --git a/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx17.cc 
b/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx17.cc
index 3962a5e9b7e..eb8dc957a43 100644
--- a/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx17.cc
+++ b/libstdc++-v3/testsuite/libstdc++-prettyprinters/cxx17.cc
@@ -50,18 +50,18 @@ main()
 // { dg-final { note-test str "\"string\"" } }
 
   optional o;
-// { dg-final { note-test o {std::optional [no contained value]} } }
+// { dg-final { note-test o {std::optional [no contained value]} } }
   optional ob{false};
-// { dg-final { note-test ob {std::optional = {[contained value] = 
false}} } }
+// { dg-final { note-test ob {std::optional = {[contained value] = false}} } }
   optional oi{5};
-// { dg-final { note-test oi {std::optional = {[contained value] = 5}} } }
+// { dg-final { note-test oi {std::optional = {[contained value] = 5}} } }
   optional op{nullptr};
-// { dg-final { note-test op {std::optional = {[contained value] = 
0x0}} } }
+// { dg-final { note-test op {std::optional = {[contained value] = 0x0}} } }
   optional> om;
   om = std::map{ {1, 2.}, {3, 4.}, {5, 6.} };
-// { dg-final { regexp-test om {std::optional> containing std::(__debug::)?map with 3 elements = {\[1\] = 2, \[3\] = 
4, \[5\] = 6}} } }
+// { dg-final { regexp-test om {std::optional containing std:

Re: [PATCH] tree-optimization/111115 - SLP of masked stores

2023-08-24 Thread Robin Dapp via Gcc-patches

This causes an ICE in
gcc.target/riscv/rvv/autovec/gather-scatter/mask_gather_load-11.c
(internal compiler error: in get_group_load_store_type, at 
tree-vect-stmts.cc:2121)

#include 

#define TEST_LOOP(DATA_TYPE, INDEX_TYPE)   \
  void __attribute__ ((noinline, noclone)) \
  f_##DATA_TYPE##_##INDEX_TYPE (DATA_TYPE *restrict y, DATA_TYPE *restrict x,  \
INDEX_TYPE *restrict index,\
INDEX_TYPE *restrict cond) \
  {\
for (int i = 0; i < 100; ++i)  \
  {\
if (cond[i * 2])   \
  y[i * 2] = x[index[i * 2]] + 1;  \
if (cond[i * 2 + 1])   \
  y[i * 2 + 1] = x[index[i * 2 + 1]] + 2;  \
  }\
  }

TEST_LOOP (int8_t, int8_t)

Is there now a mismatch with the LEN_ IFNs somewhere?

Regards
 Robin

Re: [PATCH V2 2/5] OpenMP: C front end support for imperfectly-nested loops

2023-08-24 Thread Jakub Jelinek via Gcc-patches

On Tue, Aug 22, 2023 at 12:53:19PM -0600, Sandra Loosemore wrote:
> > All these c-c++-common testsuite changes will now FAIL after the C patch but
> > before the C++.  It is nice to have the new c-c++-common tests in a separate
> > patch, but these tweaks which can't be just avoided need the temporary
> > { target c } vs. { target c++} hacks undone later in the C++ patch.
> 
> In spite of being in the c-c++-common subdirectory, this particular testcase
> is presently run only for C:
> 
> /* { dg-skip-if "not yet" { c++ } } */

Oh, sorry, I didn't think of such an possibility, it is just weird.
Please ignore my comments on this topic then.

Jakub

RE: [PATCH] RISC-V: Add COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS testcases

2023-08-24 Thread Li, Pan2 via Gcc-patches

Committed, thanks Robin.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Robin Dapp via Gcc-patches
Sent: Thursday, August 24, 2023 7:03 PM
To: 钟居哲 ; gcc-patches 
Cc: rdapp@gmail.com; kito.cheng ; kito.cheng 
; Jeff Law 
Subject: Re: [PATCH] RISC-V: Add COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS 
testcases

OK.

Regards
 Robin

Re: [PATCH] RISC-V: Add COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS testcases

2023-08-24 Thread Robin Dapp via Gcc-patches

OK.

Regards
 Robin

[PATCH] tree-optimization/111125 - avoid BB vectorization in novector loops

2023-08-24 Thread Richard Biener via Gcc-patches

When a loop is marked with

  #pragma GCC novector

the following makes sure to also skip BB vectorization for contained
blocks.  That avoids gcc.dg/vect/bb-slp-29.c failing on aarch64
because of extra BB vectorization therein.  I'm not specifically
dealing with sub-loops of novector loops, the desired semantics
isn't documented.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

PR tree-optimization/25
* tree-vect-slp.cc (vect_slp_function): Split at novector
loop entry, do not push blocks in novector loops.
---
 gcc/tree-vect-slp.cc | 41 +
 1 file changed, 29 insertions(+), 12 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index ace0ff3ef60..0b1c2233017 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -7802,6 +7802,17 @@ vect_slp_function (function *fun)
 bbs[0]->loop_father->num, bb->index);
  split = true;
}
+  else if (!bbs.is_empty ()
+  && bb->loop_father->header == bb
+  && bb->loop_father->dont_vectorize)
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"splitting region at dont-vectorize loop %d "
+"entry at bb%d\n",
+bb->loop_father->num, bb->index);
+ split = true;
+   }
 
   if (split && !bbs.is_empty ())
{
@@ -7809,19 +7820,25 @@ vect_slp_function (function *fun)
  bbs.truncate (0);
}
 
-  /* We need to be able to insert at the head of the region which
-we cannot for region starting with a returns-twice call.  */
   if (bbs.is_empty ())
-   if (gcall *first = safe_dyn_cast  (first_stmt (bb)))
- if (gimple_call_flags (first) & ECF_RETURNS_TWICE)
-   {
- if (dump_enabled_p ())
-   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-"skipping bb%d as start of region as it "
-"starts with returns-twice call\n",
-bb->index);
- continue;
-   }
+   {
+ /* We need to be able to insert at the head of the region which
+we cannot for region starting with a returns-twice call.  */
+ if (gcall *first = safe_dyn_cast  (first_stmt (bb)))
+   if (gimple_call_flags (first) & ECF_RETURNS_TWICE)
+ {
+   if (dump_enabled_p ())
+ dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+  "skipping bb%d as start of region as it "
+  "starts with returns-twice call\n",
+  bb->index);
+   continue;
+ }
+ /* If the loop this BB belongs to is marked as not to be vectorized
+honor that also for BB vectorization.  */
+ if (bb->loop_father->dont_vectorize)
+   continue;
+   }
 
   bbs.safe_push (bb);
 
-- 
2.35.3

[PATCH] RISC-V: Add conditional autovec convert(INT<->FP) patterns

2023-08-24 Thread Lehua Ding

Hi,

This patch adds the conditional autovec convert between INT and FP
by combine convert and vcond_mask patterns.

Best,
Lehua

gcc/ChangeLog:

* config/riscv/autovec-opt.md (*cond_):
New combine pattern.
(*cond_): Ditto.
(*cond_): Ditto.
(*cond_): Ditto.
(*cond_): Ditto.
(*cond_2): Ditto.
* config/riscv/autovec.md (2): Adjust.
(2): Ditto.
(2): Ditto.
(2): Ditto.
(2): Ditto.
(2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-2.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c: 
New test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-2.c: New 
test.

---
 gcc/config/riscv/autovec-opt.md   | 108 ++
 gcc/config/riscv/autovec.md   |  42 +--
 .../autovec/cond/cond_convert_float2int-1.h   |  51 +
 .../autovec/cond/cond_convert_float2int-2.h   |  50 
 .../cond/cond_convert_float2int-rv32-1.c  |  15 +++
 .../cond/cond_convert_float2int-rv32-2.c  |  15 +++
 .../cond/cond_convert_float2int-rv64-1.c  |  15 +++
 .../cond/cond_convert_float2int-rv64-2.c  |  15 +++
 .../cond/cond_convert_float2int_run-1.c   |  32 ++
 .../cond/cond_convert_float2int_run-2.c   |  31 +
 .../autovec/cond/cond_convert_int2float-1.h   |  45 
 .../autovec/cond/cond_convert_int2float-2.h   |  44 +++
 .../cond/cond_convert_int2float-rv32-1.c  |  13 +++
 .../cond/cond_convert_int2float-rv32-2.c  |  13 +++
 .../cond/cond_convert_int2float-rv64-1.c  |  13 +++
 .../cond/cond_convert_int2float-rv64-2.c  |  13 +++
 .../cond/cond_convert_int2float_run-1.c   |  32 ++
 .../cond/cond_convert_int2float_run-2.c   |  31 +
 18 files changed, 566 insertions(+), 12 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv32-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int-rv64-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_float2int_run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-1.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-2.h
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv32-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float-rv64-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_convert_int2float_run-2.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 8f9a6317592..2797207e653 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -827,3 +827,111 @@
   riscv_vector::emit_vlmax_masked_fp_insn (icode, riscv_vector::RVV_UNOP_M, 
operands);
   DONE;
 })
+
+;; Combine convert(FP->INT) + vcond_mask
+(de

Re: [PATCH] RISC-V: Add COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS testcases

2023-08-24 Thread 钟居哲

Ping.

MIddle-end patch:
[PATCH V2] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple 
fold (gnu.org)
has been approved and supported.

This patch is pending 8 days.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-08-16 21:20
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Add COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS testcases
This patch is depending on middle-end patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627621.html
 
We already had COND_LEN_FNMA/COND_LEN_FMS/COND_FNMS patterns.
 
Remove TARGET_PREFERRED_ELSE_VALUE since it forbid the 
COND_LEN_FMS/COND_LEN_FNMS STMT fold.
 
gcc/ChangeLog:
 
* config/riscv/riscv.cc (riscv_preferred_else_value): Remove it since 
it forbid COND_LEN_FMS/COND_LEN_FNMS STMT fold.
(TARGET_PREFERRED_ELSE_VALUE): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adapt test.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-9.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-9.c: New test.
 
---
gcc/config/riscv/riscv.cc | 21 ---
.../rvv/autovec/binop/vadd-rv32gcv-nofm.c |  7 ++-
.../rvv/autovec/binop/vadd-rv64gcv-nofm.c |  7 ++-
.../riscv/rvv/autovec/cond/cond_fadd-1.c  |  3 +--
.../riscv/rvv/autovec/cond/cond_fadd-2.c  |  3 +--
.../riscv/rvv/autovec/cond/cond_fadd-3.c  |  3 +--
.../riscv/rvv/autovec/cond/cond_fadd-4.c  |  3 +--
.../riscv/rvv/autovec/ternop/ternop_nofm-1.c  |  4 +++-
.../riscv/rvv/autovec/ternop/ternop_nofm-10.c |  9 
.../riscv/rvv/autovec/ternop/ternop_nofm-11.c |  9 
.../riscv/rvv/autovec/ternop/ternop_nofm-12.c |  6 ++
.../riscv/rvv/autovec/ternop/ternop_nofm-3.c  |  5 ++---
.../riscv/rvv/autovec/ternop/ternop_nofm-4.c  |  9 
.../riscv/rvv/autovec/ternop/ternop_nofm-5.c  |  9 
.../riscv/rvv/autovec/ternop/ternop_nofm-6.c  |  6 ++
.../riscv/rvv/autovec/ternop/ternop_nofm-7.c  |  9 
.../riscv/rvv/autovec/ternop/ternop_nofm-8.c  |  9 
.../riscv/rvv/autovec/ternop/ternop_nofm-9.c  |  6 ++
.../rvv/autovec/ternop/ternop_nofm_run-10.c   |  4 
.../rvv/autovec/ternop/ternop_nofm_run-11.c   |  4 
.../rvv/autovec/ternop/ternop_nofm_run-12.c   |  4 
.../rvv/autovec/ternop/ternop_nofm_run-4.c|  4 
.../rvv/autovec/ternop/ternop_nofm_run-5.c|  4 
.../rvv/autovec/ternop/ternop_nofm_run-6.c|  4 
.../rvv/autovec/ternop/ternop_nofm_run-7.c|  4 
.../rvv/autovec/ternop/ternop_nofm_run-8.c|  4 
.../rvv/autovec/ternop/ternop_nofm_run-9.c|  4 
27 files changed, 121 insertions(+), 43 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-10.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-11.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-12.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-5.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-6.c
create mode 100644 
gcc/testsuite/gcc.target/r

RE: [PATCH V2] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

2023-08-24 Thread Li, Pan2 via Gcc-patches

Committed, thanks Richard.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Richard Sandiford via Gcc-patches
Sent: Thursday, August 24, 2023 6:34 PM
To: Juzhe-Zhong 
Cc: gcc-patches@gcc.gnu.org; rguent...@suse.de
Subject: Re: [PATCH V2] gimple_fold: Support 
COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

Juzhe-Zhong  writes:
> Hi, Richard and Richi.
>
> Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
> It's supported in tree-ssa-math-opts.cc. However, GCC failed to support 
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
>
> Consider this following case:
> #define TEST_TYPE(TYPE)   
>  \
>   __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,   
>  \
> TYPE *__restrict a,  \
> TYPE *__restrict b, int n)   \
>   {   
>  \
> for (int i = 0; i < n; i++)   
>  \
>   dst[i] -= a[i] * b[i];   \
>   }
>
> #define TEST_ALL()
>  \
>   TEST_TYPE (float)   
>  \
>
> TEST_ALL ()
>
> Gimple IR for RVV:
>
> ...
> _39 = -vect__8.14_26;
> vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, 
> vect__4.8_34, vect__4.8_34, _46, 0);
> ...
>
> This is because this following piece of codes in tree-ssa-math-opts.cc:
>
>   if (len)
>   fma_stmt
> = gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2,
>   addop, else_value, len, bias);
>   else if (cond)
>   fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1,
>  op2, addop, else_value);
>   else
>   fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
>   gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt));
>   gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun,
>  use_stmt));
>   gsi_replace (&gsi, fma_stmt, true);
>   /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
>regardless of where the negation occurs.  */
>   gimple *orig_stmt = gsi_stmt (gsi);
>   if (fold_stmt (&gsi, follow_all_ssa_edges))
>   {
> if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
>   gcc_unreachable ();
> update_stmt (gsi_stmt (gsi));
>   }
>
> 'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA > COND_LEN_FNMA.
>
> This patch support STMT fold into:
>
> vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, 
> vect__4.8_34, { 0.0, ... }, _46, 0);
>
> Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments.
>
> Extend maximum num ops:
> -  static const unsigned int MAX_NUM_OPS = 5;
> +  static const unsigned int MAX_NUM_OPS = 7;
>
> Bootstrap and Regtest on X86 passed.
> Tested on aarch64 Qemu.
>
> Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend.
>
>
> gcc/ChangeLog:
>
> * genmatch.cc (decision_tree::gen): Support 
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
> * gimple-match-exports.cc (gimple_simplify): Ditto.
> (gimple_resimplify6): New function.
> (gimple_resimplify7): New function.
> (gimple_match_op::resimplify): Support 
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
> (convert_conditional_op): Ditto.
> (build_call_internal): Ditto.
> (try_conditional_simplification): Ditto.
> (gimple_extract): Ditto.
> * gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto.
> * internal-fn.cc (CASE): Ditto.

OK, thanks.

Richard

>
> ---
>  gcc/genmatch.cc |   2 +-
>  gcc/gimple-match-exports.cc | 123 ++--
>  gcc/gimple-match.h  |  16 -
>  gcc/internal-fn.cc  |   7 +-
>  4 files changed, 138 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index f46d2e1520d..a1925a747a7 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -4052,7 +4052,7 @@ decision_tree::gen (vec  &files, bool gimple)
>  }
>fprintf (stderr, "removed %u duplicate tails\n", rcnt);
>  
> -  for (unsigned n = 1; n <= 5; ++n)
> +  for (unsigned n = 1; n <= 7; ++n)
>  {
>bool has_kids_p = false;
>  
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index 7aeb4ddb152..b36027b0bad 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -60,6 +60,12 @@ extern bool gimple_simplify (gimple_match_op *, gimple_seq 
> *, tree (*)(tree),
>

Re: [PATCH V2] gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold

2023-08-24 Thread Richard Sandiford via Gcc-patches

Juzhe-Zhong  writes:
> Hi, Richard and Richi.
>
> Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math.
> It's supported in tree-ssa-math-opts.cc. However, GCC failed to support 
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS.
>
> Consider this following case:
> #define TEST_TYPE(TYPE)   
>  \
>   __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,   
>  \
> TYPE *__restrict a,  \
> TYPE *__restrict b, int n)   \
>   {   
>  \
> for (int i = 0; i < n; i++)   
>  \
>   dst[i] -= a[i] * b[i];   \
>   }
>
> #define TEST_ALL()
>  \
>   TEST_TYPE (float)   
>  \
>
> TEST_ALL ()
>
> Gimple IR for RVV:
>
> ...
> _39 = -vect__8.14_26;
> vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, 
> vect__4.8_34, vect__4.8_34, _46, 0);
> ...
>
> This is because this following piece of codes in tree-ssa-math-opts.cc:
>
>   if (len)
>   fma_stmt
> = gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2,
>   addop, else_value, len, bias);
>   else if (cond)
>   fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1,
>  op2, addop, else_value);
>   else
>   fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
>   gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt));
>   gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun,
>  use_stmt));
>   gsi_replace (&gsi, fma_stmt, true);
>   /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
>regardless of where the negation occurs.  */
>   gimple *orig_stmt = gsi_stmt (gsi);
>   if (fold_stmt (&gsi, follow_all_ssa_edges))
>   {
> if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
>   gcc_unreachable ();
> update_stmt (gsi_stmt (gsi));
>   }
>
> 'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA > COND_LEN_FNMA.
>
> This patch support STMT fold into:
>
> vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, 
> vect__4.8_34, { 0.0, ... }, _46, 0);
>
> Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments.
>
> Extend maximum num ops:
> -  static const unsigned int MAX_NUM_OPS = 5;
> +  static const unsigned int MAX_NUM_OPS = 7;
>
> Bootstrap and Regtest on X86 passed.
> Tested on aarch64 Qemu.
>
> Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend.
>
>
> gcc/ChangeLog:
>
> * genmatch.cc (decision_tree::gen): Support 
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
> * gimple-match-exports.cc (gimple_simplify): Ditto.
> (gimple_resimplify6): New function.
> (gimple_resimplify7): New function.
> (gimple_match_op::resimplify): Support 
> COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold.
> (convert_conditional_op): Ditto.
> (build_call_internal): Ditto.
> (try_conditional_simplification): Ditto.
> (gimple_extract): Ditto.
> * gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto.
> * internal-fn.cc (CASE): Ditto.

OK, thanks.

Richard

>
> ---
>  gcc/genmatch.cc |   2 +-
>  gcc/gimple-match-exports.cc | 123 ++--
>  gcc/gimple-match.h  |  16 -
>  gcc/internal-fn.cc  |   7 +-
>  4 files changed, 138 insertions(+), 10 deletions(-)
>
> diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
> index f46d2e1520d..a1925a747a7 100644
> --- a/gcc/genmatch.cc
> +++ b/gcc/genmatch.cc
> @@ -4052,7 +4052,7 @@ decision_tree::gen (vec  &files, bool gimple)
>  }
>fprintf (stderr, "removed %u duplicate tails\n", rcnt);
>  
> -  for (unsigned n = 1; n <= 5; ++n)
> +  for (unsigned n = 1; n <= 7; ++n)
>  {
>bool has_kids_p = false;
>  
> diff --git a/gcc/gimple-match-exports.cc b/gcc/gimple-match-exports.cc
> index 7aeb4ddb152..b36027b0bad 100644
> --- a/gcc/gimple-match-exports.cc
> +++ b/gcc/gimple-match-exports.cc
> @@ -60,6 +60,12 @@ extern bool gimple_simplify (gimple_match_op *, gimple_seq 
> *, tree (*)(tree),
>code_helper, tree, tree, tree, tree, tree);
>  extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree),
>code_helper, tree, tree, tree, tree, tree, tree);
> +extern bool gimple_simplify (gimple_match_op *, gimple_seq *, tree (*)(tree),
> +  code_helper, tre

Re: [PATCH] RISC-V: Add conditional sign/zero extension and truncation autovec patterns

2023-08-24 Thread Lehua Ding


On 2023/8/24 18:20, Robin Dapp wrote:



Yes, it's better to call it one_quad.


I'd suggest to go with quarter as before or quarter_width_op
or something.


OK for quarter.




Is this necessary for recognizing a different pattern?


Are you saying that the testcases xxx-1 and xxx-2 are duplicated? If
so, I have no problem removing it and just keeping xxx-1 testcase
since it is still possible to cover my code.


I was just curious why the NEW_TYPE bi = b[i] was necessary instead
of using b[i] directly.


Oh, I think there is no difference between the two because the type of b 
is NEW_TYPE. So it's okay to remove this statement.


--
Best,
Lehua

Re: [PATCH V2] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

2023-08-24 Thread Robin Dapp via Gcc-patches

LGTM.

Regards
 Robin

Re: [PATCH] RISC-V: Add conditional sign/zero extension and truncation autovec patterns

2023-08-24 Thread Robin Dapp via Gcc-patches



> Yes, it's better to call it one_quad.

I'd suggest to go with quarter as before or quarter_width_op
or something.

>> Is this necessary for recognizing a different pattern?
> 
> Are you saying that the testcases xxx-1 and xxx-2 are duplicated? If
> so, I have no problem removing it and just keeping xxx-1 testcase
> since it is still possible to cover my code.

I was just curious why the NEW_TYPE bi = b[i] was necessary instead
of using b[i] directly.

Regards
 Robin

Re: [PATCH] RISC-V: Add conditional sign/zero extension and truncation autovec patterns

2023-08-24 Thread Lehua Ding


Hi Robin,

On 2023/8/24 17:59, Robin Dapp wrote:

Hi Lehua,

thanks, just tiny non-functional nits.


-  rtx ops[] = {operands[0], quarter};
-  icode = code_for_pred_trunc (mode);
-  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ops);
+  rtx half = gen_reg_rtx (mode);


Not really a half anymore now? :)


Yes, it's better to call it one_quad.




+#include 
+
+#define DEF_LOOP(OLD_TYPE, NEW_TYPE)   
\
+  void __attribute__ ((noipa)) 
\
+  test_##OLD_TYPE##_2_##NEW_TYPE (NEW_TYPE *__restrict r,  
\
+ OLD_TYPE *__restrict a,  \
+ NEW_TYPE *__restrict b,  \
+ OLD_TYPE *__restrict pred, int n)\
+  {
\
+for (int i = 0; i < n; ++i)
\
+  {
\
+   NEW_TYPE bi = b[i];\


Is this necessary for recognizing a different pattern?


Are you saying that the testcases xxx-1 and xxx-2 are duplicated? If so, 
I have no problem removing it and just keeping xxx-1 testcase since it 
is still possible to cover my code.





+/* wider-width Integer Type => Integer Type */


Isn't it the other way around or am I just confused?


Yes, I changed it to the following which might be better understood:
  /* INT -> wider-INT */




+/* narrower-width Integer Type => Integer Type */
+#define TEST_ALL_X2X_NARROWER(T)   
\
+  T (uint16_t, uint8_t)
\


Same here.

Regards
  Robin




--
Best,
Lehua

[PATCH V2] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

2023-08-24 Thread Juzhe-Zhong

Consider this following case:
int __attribute__ ((noinline, noclone))
condition_reduction (int *a, int min_v)
{
  int last = 66; /* High start value.  */

  for (int i = 0; i < 4; i++)
if (a[i] < min_v)
  last = i;

  return last;
}

--param=riscv-autovec-preference=fixed-vlmax --param=riscv-autovec-lmul=m8

condition_reduction:
vsetvli a4,zero,e32,m8,ta,ma
li  a5,32
vmv.v.x v8,a1
vl8re32.v   v0,0(a0)
vid.v   v16
vmslt.vvv0,v0,v8
vsetvli zero,a5,e8,m2,ta,ma
vcpop.m a5,v0
beq a5,zero,.L2
addia5,a5,-1
vsetvli a4,zero,e32,m8,ta,ma
vcompress.vmv8,v16,v0
vslidedown.vx   v8,v8,a5
vmv.x.s a0,v8
ret
.L2:
li  a0,66
ret

--param=riscv-autovec-preference=scalable

condition_reduction:
csrra6,vlenb
mv  a2,a0
li  a3,32
li  a0,66
srlia6,a6,2
vsetvli a4,zero,e32,m1,ta,ma
vmv.v.x v4,a1
vid.v   v1
.L4:
vsetvli a5,a3,e8,mf4,tu,mu
vsetvli zero,a5,e32,m1,ta,ma> redundant vsetvl
vle32.v v0,0(a2)
vsetvli a4,zero,e32,m1,ta,ma
sllia1,a5,2
vmv.v.x v2,a6
vmslt.vvv0,v0,v4
sub a3,a3,a5
vmv1r.v v3,v1
vadd.vv v1,v1,v2
vsetvli zero,a5,e8,mf4,ta,ma
vcpop.m a5,v0
beq a5,zero,.L3
addia5,a5,-1
vsetvli a4,zero,e32,m1,ta,ma
vcompress.vmv2,v3,v0
vslidedown.vx   v2,v2,a5
vmv.x.s a0,v2
.L3:
sext.w  a0,a0
add a2,a2,a1
bne a3,zero,.L4
ret

There is a redundant vsetvli instruction in VLA vectorized codes which is the 
VSETVL PASS issue.

vsetvl issue is not included in this patch but will be fixed soon.

gcc/ChangeLog:

* config/riscv/autovec.md (len_fold_extract_last_): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(expand_fold_extract_last): New function.
* config/riscv/riscv-v.cc (emit_nonvlmax_slide_insn): Ditto.
(emit_cpop_insn): Ditto.
(emit_nonvlmax_compress_insn): Ditto.
(expand_fold_extract_last): Ditto.
* config/riscv/vector.md: Fix vcpop.m ratio demand.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/reduc/extract_last-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last-9.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-13.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-14.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-3.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-6.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-7.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-8.c: New test.
* gcc.target/riscv/rvv/autovec/reduc/extract_last_run-9.c: New test.

---
 gcc/config/riscv/autovec.md   |  24 
 gcc/config/riscv/riscv-protos.h   |   2 +
 gcc/config/riscv/riscv-v.cc   | 125 +-
 gcc/config/riscv/vector.md|   2 +-
 .../riscv/rvv/autovec/reduc/extract_last-1.c  |  20 +++
 .../riscv/rvv/autovec/reduc/extract_last-10.c |   6 +
 .../riscv/rvv/autovec/reduc/extract_last-11.c |  24 
 .../riscv/rvv/autovec/reduc/extract_last-12.c |   6 +
 .../riscv/rvv/autovec/reduc/extract_last-13.c |   7 +
 .../riscv/rvv/

Re: [PATCH] RISC-V: Add conditional sign/zero extension and truncation autovec patterns

2023-08-24 Thread Robin Dapp via Gcc-patches

Hi Lehua,

thanks, just tiny non-functional nits.

> -  rtx ops[] = {operands[0], quarter};
> -  icode = code_for_pred_trunc (mode);
> -  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ops);
> +  rtx half = gen_reg_rtx (mode);

Not really a half anymore now? :)

> +#include 
> +
> +#define DEF_LOOP(OLD_TYPE, NEW_TYPE) 
>   \
> +  void __attribute__ ((noipa))   
>   \
> +  test_##OLD_TYPE##_2_##NEW_TYPE (NEW_TYPE *__restrict r,
>   \
> +   OLD_TYPE *__restrict a,  \
> +   NEW_TYPE *__restrict b,  \
> +   OLD_TYPE *__restrict pred, int n)\
> +  {  
>   \
> +for (int i = 0; i < n; ++i)  
>   \
> +  {  
>   \
> + NEW_TYPE bi = b[i];\

Is this necessary for recognizing a different pattern?

> +/* wider-width Integer Type => Integer Type */

Isn't it the other way around or am I just confused?

> +/* narrower-width Integer Type => Integer Type */
> +#define TEST_ALL_X2X_NARROWER(T) 
>   \
> +  T (uint16_t, uint8_t)  
>   \

Same here.

Regards
 Robin

Re: [PATCH] RISC-V: Add conditional unary neg/abs/not autovec patterns

2023-08-24 Thread Richard Sandiford via Gcc-patches

Jeff Law  writes:
> On 8/22/23 02:08, juzhe.zh...@rivai.ai wrote:
>> Yes, I agree long-term we want every-thing be optimized as early as 
>> possible.
>> 
>> However, IMHO, it's impossible we can support every conditional patterns 
>> in the middle-end (match.pd).
>> It's a really big number.
>> 
>> For example, for sign_extend conversion, we have vsext.vf2 (vector SI -> 
>> vector DI),... vsext.vf4 (vector HI -> vector DI), vsext.vf8 (vector QI 
>> -> vector DI)..
>> Not only the conversion, every auto-vectorization patterns can have 
>> conditional format.
>> For example, abs,..rotate, sqrt, floor, ceil,etc.
>> I bet it could be over 100+ conditional optabs/internal FNs. It's huge 
>> number.
>> I don't see necessity that we should support them in middle-end 
>> (match.pd) since we known RTL back-end combine PASS can do the good job 
>> here.
>> 
>> Besides, LLVM doesn't such many conditional pattern. LLVM just has "add" 
>> and "select" separate IR then do the combine in the back-end:
>> https://godbolt.org/z/rYcMMG1eT 
>> 
>> You can see LLVM didn't do the op + select optimization in generic IR, 
>> they do the optimization in combine PASS.
>> 
>> So I prefer this patch solution and apply such solution for the future 
>> more support : sign extend, zero extend, float extend, abs, sqrt, ceil, 
>> floor, etc.
> It's certainly got the potential to get out of hand.  And it's not just 
> the vectorizer operations.  I know of an architecture that can execute 
> most of its ALU and loads/stores conditionally (not predication, but 
> actual conditional ops) like target  = (x COND Y) ? a << b ; a)
>
> I'd tend to lean towards synthesizing these conditional ops around a 
> conditional move/select primitive in gimple through the RTL expanders. 
> That would in turn set things up so that if the target had various 
> conditional operations like conditional shift it could be trivially 
> discovered by the combiner.

FWIW, one of the original motivations behind the COND_* internal
functions was to represent the fact that the operation is suppressed
(rather than being performed and discarded) when the predicate is false.
This allows if-conversion for FP operations even in strict FP modes,
since inactive lanes are guaranteed not to generate an exception.

I think it makes sense to add COND_* functions for anything that can
reasonably be done on FP types, and that could generate an FP exception.
E.g. sqrt was one of the examples mentioned, and I think COND_SQRT is
something that we should have.

I agree it's less clear-cut for purely integer stuff, or for FP operations
like neg and abs that are pure bit manipulation.  But perhaps there's a
question of how many operations are only defined for integers, and
whether the number is high enough for them to be treated differently.

I wouldn't have expected an explosion of operations to be a significant
issue, since (a) the underlying infrastructure is pretty mechanical and
(b) any operation that a target supports is going to need an .md pattern
whatever happens.

Thanks,
Richard

Re: [PATCH 03/11] aarch64: Use br instead of ret for eh_return

2023-08-24 Thread Richard Sandiford via Gcc-patches

Richard Sandiford  writes:
> Rather than hiding this in target code, perhaps we should add a
> target-independent concept of an "eh_return taken" flag, say
> EH_RETURN_TAKEN_RTX.
>
> We could define it so that, on targets that define EH_RETURN_TAKEN_RTX,
> a register EH_RETURN_STACKADJ_RTX and a register EH_RETURN_HANDLER_RTX
> are only meaningful when the flag is true.  E.g. we could have:
>
> #ifdef EH_RETURN_HANDLER_RTX

Gah, I meant #ifdef EH_RETURN_TAKEN_RTX here

>   for (rtx tmp : { EH_RETURN_STACKADJ_RTX, EH_RETURN_HANDLER_RTX })
> if (tmp && REG_P (tmp))
>   emit_clobber (tmp);
> #endif
>
> in the "normal return" part of expand_eh_return.  (If some other target
> wants a flag with different semantics, it'd be up to them to add it.)
>
> That should avoid most of the bad code-quality effects, since the
> specialness of x4-x6 will be confined to the code immediately before
> the pre-epilogue exit edges.
>
> Thanks,
> Richard

Re: Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

2023-08-24 Thread 钟居哲

>> The use_real_merge just appeared odd to me here because there is
>> nothing to merge.  But in the end it's just to omit the vundef operand
>> so good for now.  There is an increasing number of opportunities to
>> refactor in riscv-v.cc, though ;)

I think we can change use_real_merge into use_dummy_merge?
When it's true then add undef merge :

if (!m_use_real_merge_p)
  add_vundef_operand ();

change it into:

if (m_use_dummy_merge_p)
  add_vundef_operand ();

Then we can avoid the confusion.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-24 17:13
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization
Hi Juzhe,
 
> vcpop.m a5,v0
> beq a5,zero,.L3
> addi a5,a5,-1
> vsetvli a4,zero,e32,m1,ta,ma
> vcompress.vm v2,v3,v0
> vslidedown.vx v2,v2,a5
> vmv.x.s a0,v2
> .L3:
> sext.w a0,a0
 
Mhm, where is this sext coming from?  Thought I had this covered with
the autovec-opt pattern but apparently not.  I'll take that, nothing
related to this patch.
 
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -213,7 +213,7 @@ public:
>{
>  /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
> the vsetvli to obtain the value of vlmax.  */
> - poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
> + poly_uint64 nunits = GET_MODE_NUNITS (m_mask_mode);
 
Why is that necessary?  Just for the popcount I presume?
Can't we rather have a new case for a scalar destination?  I find
the code a bit misleading now as we check m_dest_mode and then not
use it.
 
>  
> +/* Emit vcpop.m instruction.  */
> +
> +static void
> +emit_cpop_insn (unsigned icode, rtx *ops, rtx len)
> +{
> +  machine_mode dest_mode = GET_MODE (ops[0]);
> +  machine_mode mask_mode = GET_MODE (ops[1]);
> +  insn_expander e (RVV_CPOP,
> +   /* HAS_DEST_P */ true,
> +   /* FULLY_UNMASKED_P */ true,
> +   /* USE_REAL_MERGE_P */ true,
> +   /* HAS_AVL_P */ true,
> +   /* VLMAX_P */ len ? false : true,
> +   dest_mode, mask_mode);
> +
> +  e.set_vl (len);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}
 
The use_real_merge just appeared odd to me here because there is
nothing to merge.  But in the end it's just to omit the vundef operand
so good for now.  There is an increasing number of opportunities to
refactor in riscv-v.cc, though ;)
 
The rest looks good to me.  Note that my machine crashed when
compiling the extract_last-14.c because it used up all my RAM.
The vsetvl "refactor" phase 3 patch helped, though.
We'd need to have this patch depend on the other one then.
 
The rest looks good to me.  At first I was a bit wary about the
branching zero check after popcount but as we're outside of a loop
anyway, that's fine.  Might want to use a conditional select in the
future but actually not that important. 
 
Regards
Robin

Re: [PATCH] tree-optimization/111115 - SLP of masked stores

2023-08-24 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> The following adds the capability to do SLP on .MASK_STORE, I do not
> plan to add interleaving support.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

LGTM, thanks.

Richard

> Thanks,
> Richard.
>
>   PR tree-optimization/15
> gcc/
>   * tree-vectorizer.h (vect_slp_child_index_for_operand): New.
>   * tree-vect-data-refs.cc (can_group_stmts_p): Also group
>   .MASK_STORE.
>   * tree-vect-slp.cc (arg3_arg2_map): New.
>   (vect_get_operand_map): Handle IFN_MASK_STORE.
>   (vect_slp_child_index_for_operand): New function.
>   (vect_build_slp_tree_1): Handle statements with no LHS,
>   masked store ifns.
>   (vect_remove_slp_scalar_calls): Likewise.
>   * tree-vect-stmts.c (vect_check_store_rhs): Lookup the
>   SLP child corresponding to the ifn value index.
>   (vectorizable_store): Likewise for the mask index.  Support
>   masked stores.
>   (vectorizable_load): Lookup the SLP child corresponding to the
>   ifn mask index.
>
> gcc/testsuite/
>   * lib/target-supports.exp (check_effective_target_vect_masked_store):
>   Supported with check_avx_available.
>   * gcc.dg/vect/slp-mask-store-1.c: New testcase.
> ---
>  gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c | 39 +
>  gcc/testsuite/lib/target-supports.exp|  3 +-
>  gcc/tree-vect-data-refs.cc   |  3 +-
>  gcc/tree-vect-slp.cc | 46 +---
>  gcc/tree-vect-stmts.cc   | 23 +-
>  gcc/tree-vectorizer.h|  1 +
>  6 files changed, 94 insertions(+), 21 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c 
> b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
> new file mode 100644
> index 000..50b7066778e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c
> @@ -0,0 +1,39 @@
> +/* { dg-do run } */
> +/* { dg-additional-options "-mavx2" { target avx2 } } */
> +
> +#include "tree-vect.h"
> +
> +void __attribute__((noipa))
> +foo (unsigned * __restrict x, int * __restrict flag)
> +{
> +  for (int i = 0; i < 32; ++i)
> +{
> +  if (flag[2*i+0])
> +x[2*i+0] = x[2*i+0] + 3;
> +  if (flag[2*i+1])
> +x[2*i+1] = x[2*i+1] + 177;
> +}
> +}
> +
> +unsigned x[16];
> +int flag[32] = { 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0,
> + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
> +unsigned res[16] = { 3, 177, 0, 0, 0, 177, 3, 0, 3, 177, 0, 0, 0, 177, 3, 0 
> };
> +
> +int
> +main ()
> +{
> +  check_vect ();
> +
> +  foo (x, flag);
> +
> +  if (__builtin_memcmp (x, res, sizeof (x)) != 0)
> +abort ();
> +  for (int i = 0; i < 32; ++i)
> +if (flag[i] != 0 && flag[i] != 1)
> +  abort ();
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { target { 
> vect_masked_store && vect_masked_load } } } } */
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index d4623ee6b45..d353cc0aaf0 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -8400,7 +8400,8 @@ proc check_effective_target_vect_masked_load { } {
>  # Return 1 if the target supports vector masked stores.
>  
>  proc check_effective_target_vect_masked_store { } {
> -return [expr { [check_effective_target_aarch64_sve]
> +return [expr { [check_avx_available]
> +|| [check_effective_target_aarch64_sve]
>  || [istarget amdgcn*-*-*] }]
>  }
>  
> diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
> index 3e9a284666c..a2caf6cb1c7 100644
> --- a/gcc/tree-vect-data-refs.cc
> +++ b/gcc/tree-vect-data-refs.cc
> @@ -3048,8 +3048,7 @@ can_group_stmts_p (stmt_vec_info stmt1_info, 
> stmt_vec_info stmt2_info,
>like those created by build_mask_conversion.  */
>tree mask1 = gimple_call_arg (call1, 2);
>tree mask2 = gimple_call_arg (call2, 2);
> -  if (!operand_equal_p (mask1, mask2, 0)
> -  && (ifn == IFN_MASK_STORE || !allow_slp_p))
> +  if (!operand_equal_p (mask1, mask2, 0) && !allow_slp_p)
>   {
> mask1 = strip_conversion (mask1);
> if (!mask1)
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index b5f9333fc22..cc799b6ebcd 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -503,6 +503,7 @@ static const int cond_expr_maps[3][5] = {
>  static const int arg1_map[] = { 1, 1 };
>  static const int arg2_map[] = { 1, 2 };
>  static const int arg1_arg4_map[] = { 2, 1, 4 };
> +static const int arg3_arg2_map[] = { 2, 3, 2 };
>  static const int op1_op0_map[] = { 2, 1, 0 };
>  
>  /* For most SLP statements, there is a one-to-one mapping between
> @@ -543,6 +544,9 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
> swap = 0)
> case IF

[PATCH] tree-optimization/111125 - properly cost BB reduction remain stmt handling

2023-08-24 Thread Richard Biener via Gcc-patches

We assume that all root stmts which compose the total reduction chain
are vectorized but fail to account for the cost of adding back the
scalar defs we are not vectorizing.  The following rectifies this,
fixing the gcc.dg/tree-ssa/slsr-11.c FAIL on aarch64.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/25
* tree-vect-slp.cc (vectorizable_bb_reduc_epilogue): Account
for the remain_defs processing.
---
 gcc/tree-vect-slp.cc | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index cc799b6ebcd..ace0ff3ef60 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6433,6 +6433,11 @@ vectorizable_bb_reduc_epilogue (slp_instance instance,
vectype, 0, vect_body);
   record_stmt_cost (cost_vec, 1, vec_to_scalar, instance->root_stmts[0],
vectype, 0, vect_body);
+
+  /* Since we replace all stmts of a possibly longer scalar reduction
+ chain account for the extra scalar stmts for that.  */
+  record_stmt_cost (cost_vec, instance->remain_defs.length (), scalar_stmt,
+   instance->root_stmts[0], 0, vect_body);
   return true;
 }
 
-- 
2.35.3

[PATCH v1] RISC-V: Support rounding mode for VFNMSAC/VFNMSUB autovec

2023-08-24 Thread Pan Li via Gcc-patches

From: Pan Li 

There will be a case like below for intrinsic and autovec combination.

vfadd RTZ   <- intrinisc static rounding
vfnmsub <- autovec/autovec-opt

The autovec generated vfnmsub should take DYN mode, and the
frm must be restored before the vfnmsub insn. This patch
would like to fix this issue by:

* Add the frm operand to the autovec/autovec-opt pattern.
* Set the frm_mode attr to DYN.

Thus, the frm flow when combine autovec and intrinsic should be.

+
| frrm  a5
| ...
| fsrmi 4
| vfadd   <- intrinsic static rounding.
| ...
| fsrm  a5
| vfnmsub <- autovec/autovec-opt
| ...
+

Signed-off-by: Pan Li 

gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add FRM_REGNUM to vfnmsac/vfnmsub
* config/riscv/autovec.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/float-point-frm-autovec-3.c: New test.
---
 gcc/config/riscv/autovec-opt.md   | 34 ---
 gcc/config/riscv/autovec.md   | 30 ---
 .../rvv/base/float-point-frm-autovec-3.c  | 88 +++
 3 files changed, 126 insertions(+), 26 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/float-point-frm-autovec-3.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 732a51edacd..54ca6df721c 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -523,13 +523,15 @@ (define_insn_and_split "*single_widen_fma"
 ;; vect__13.182_33 = .FNMA (vect__11.180_35, vect__8.176_40, vect__4.172_45);
 (define_insn_and_split "*double_widen_fnma"
   [(set (match_operand:VWEXTF 0 "register_operand")
-   (fma:VWEXTF
- (neg:VWEXTF
+   (unspec:VWEXTF
+ [(fma:VWEXTF
+   (neg:VWEXTF
+ (float_extend:VWEXTF
+   (match_operand: 2 "register_operand")))
(float_extend:VWEXTF
- (match_operand: 2 "register_operand")))
- (float_extend:VWEXTF
-   (match_operand: 3 "register_operand"))
- (match_operand:VWEXTF 1 "register_operand")))]
+ (match_operand: 3 "register_operand"))
+   (match_operand:VWEXTF 1 "register_operand"))
+  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
   "TARGET_VECTOR && can_create_pseudo_p ()"
   "#"
   "&& 1"
@@ -540,17 +542,20 @@ (define_insn_and_split "*double_widen_fnma"
 DONE;
   }
   [(set_attr "type" "vfwmuladd")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
 
 ;; This helps to match ext + fnma.
 (define_insn_and_split "*single_widen_fnma"
   [(set (match_operand:VWEXTF 0 "register_operand")
-   (fma:VWEXTF
- (neg:VWEXTF
-   (float_extend:VWEXTF
- (match_operand: 2 "register_operand")))
- (match_operand:VWEXTF 3 "register_operand")
- (match_operand:VWEXTF 1 "register_operand")))]
+   (unspec:VWEXTF
+ [(fma:VWEXTF
+   (neg:VWEXTF
+ (float_extend:VWEXTF
+   (match_operand: 2 "register_operand")))
+   (match_operand:VWEXTF 3 "register_operand")
+   (match_operand:VWEXTF 1 "register_operand"))
+  (reg:SI FRM_REGNUM)] UNSPEC_VFFMA))]
   "TARGET_VECTOR && can_create_pseudo_p ()"
   "#"
   "&& 1"
@@ -567,7 +572,8 @@ (define_insn_and_split "*single_widen_fnma"
 DONE;
   }
   [(set_attr "type" "vfwmuladd")
-   (set_attr "mode" "")])
+   (set_attr "mode" "")
+   (set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
 
 ;; -
 ;;  [FP] VFWMSAC
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 0c1c546817a..28396c6175d 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -1174,24 +1174,29 @@ (define_insn_and_split "*fma"
 (define_expand "fnma4"
   [(parallel
 [(set (match_operand:VF 0 "register_operand")
- (fma:VF
-   (neg:VF
- (match_operand:VF 1 "register_operand"))
-   (match_operand:VF 2 "register_operand")
-   (match_operand:VF 3 "register_operand")))
+ (unspec:VF
+   [(fma:VF
+ (neg:VF
+   (match_operand:VF 1 "register_operand"))
+ (match_operand:VF 2 "register_operand")
+ (match_operand:VF 3 "register_operand"))
+(reg:SI FRM_REGNUM)] UNSPEC_VFFMA))
  (clobber (match_dup 4))])]
   "TARGET_VECTOR"
   {
 operands[4] = gen_reg_rtx (Pmode);
-  })
+  }
+  [(set (attr "frm_mode") (symbol_ref "riscv_vector::FRM_DYN"))])
 
 (define_insn_and_split "*fnma"
   [(set (match_operand:VF 0 "register_operand" "=vr, vr, ?&vr")
-   (fma:VF
- (neg:VF
-   (match_operand:VF 1 "register_operand" " %0, vr,   vr"))
- (match_operand:VF 2 "register_operand"   " vr, vr,   vr")
- (match_operand:VF 3 "register_operand"   " vr,  0,   vr")))
+   (unspec:VF
+ [(fma:VF
+

Re: Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

2023-08-24 Thread 钟居哲


>> Why is that necessary?  Just for the popcount I presume?
>> Can't we rather have a new case for a scalar destination?  I find
>> the code a bit misleading now as we check m_dest_mode and then not
>> use it.

I am gonna fix it in V2.

>> The rest looks good to me.  Note that my machine crashed when
>> compiling the extract_last-14.c because it used up all my RAM.
>> The vsetvl "refactor" phase 3 patch helped, though.
>> We'd need to have this patch depend on the other one then.

Yes. The refactor patch fixed potential bugs. I will commit that tomorrow 
when kito no more comments.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-08-24 17:13
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization
Hi Juzhe,
 
> vcpop.m a5,v0
> beq a5,zero,.L3
> addi a5,a5,-1
> vsetvli a4,zero,e32,m1,ta,ma
> vcompress.vm v2,v3,v0
> vslidedown.vx v2,v2,a5
> vmv.x.s a0,v2
> .L3:
> sext.w a0,a0
 
Mhm, where is this sext coming from?  Thought I had this covered with
the autovec-opt pattern but apparently not.  I'll take that, nothing
related to this patch.
 
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -213,7 +213,7 @@ public:
>{
>  /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
> the vsetvli to obtain the value of vlmax.  */
> - poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
> + poly_uint64 nunits = GET_MODE_NUNITS (m_mask_mode);
 
Why is that necessary?  Just for the popcount I presume?
Can't we rather have a new case for a scalar destination?  I find
the code a bit misleading now as we check m_dest_mode and then not
use it.
 
>  
> +/* Emit vcpop.m instruction.  */
> +
> +static void
> +emit_cpop_insn (unsigned icode, rtx *ops, rtx len)
> +{
> +  machine_mode dest_mode = GET_MODE (ops[0]);
> +  machine_mode mask_mode = GET_MODE (ops[1]);
> +  insn_expander e (RVV_CPOP,
> +   /* HAS_DEST_P */ true,
> +   /* FULLY_UNMASKED_P */ true,
> +   /* USE_REAL_MERGE_P */ true,
> +   /* HAS_AVL_P */ true,
> +   /* VLMAX_P */ len ? false : true,
> +   dest_mode, mask_mode);
> +
> +  e.set_vl (len);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}
 
The use_real_merge just appeared odd to me here because there is
nothing to merge.  But in the end it's just to omit the vundef operand
so good for now.  There is an increasing number of opportunities to
refactor in riscv-v.cc, though ;)
 
The rest looks good to me.  Note that my machine crashed when
compiling the extract_last-14.c because it used up all my RAM.
The vsetvl "refactor" phase 3 patch helped, though.
We'd need to have this patch depend on the other one then.
 
The rest looks good to me.  At first I was a bit wary about the
branching zero check after popcount but as we're outside of a loop
anyway, that's fine.  Might want to use a conditional select in the
future but actually not that important. 
 
Regards
Robin

[PATCH] aarch64: Account for different Advanced SIMD fusing options

2023-08-24 Thread Richard Sandiford via Gcc-patches

The scalar FNMADD/FNMSUB and SVE FNMLA/FNMLS instructions mean
that either side of a subtraction can start an accumulator chain.
However, Advanced SIMD doesn't have an equivalent instruction.
This means that, for Advanced SIMD, a subtraction can only be
fused if the second operand is a multiplication.

Also, if both sides of a subtraction are multiplications,
and if the second operand is used multiple times, such as:

 c * d - a * b
 e * f - a * b

then the first rather than second multiplication operand will tend
to be fused.  On Advanced SIMD, this leads to:

 tmp1 = a * b
 tmp2 = -tmp1
  ... = tmp2 + c * d   // FMLA
  ... = tmp2 + e * f   // FMLA

where one of the FMLAs also requires a MOV.

This patch tries to account for this in the vector cost model.
It improves roms performance by 2-3% on Neoverse V1.  It's also
needed to avoid a regression in fotonik for Neoverse N2 and
Neoverse V2 with the patch for PR110625.

Tested on aarch64-linux-gnu & pushed.

Richard


gcc/
* config/aarch64/aarch64.cc: Include ssa.h.
(aarch64_multiply_add_p): Require the second operand of an
Advanced SIMD subtraction to be a multiplication.  Assume that
such an operation won't be fused if the second operand is used
multiple times and if the first operand is also a multiplication.

gcc/testsuite/
* gcc.target/aarch64/neoverse_v1_2.c: New test.
* gcc.target/aarch64/neoverse_v1_3.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc | 24 ++-
 .../gcc.target/aarch64/neoverse_v1_2.c| 15 
 .../gcc.target/aarch64/neoverse_v1_3.c| 14 +++
 3 files changed, 47 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/neoverse_v1_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/neoverse_v1_3.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 034628148ef..37d414021ca 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -84,6 +84,7 @@
 #include "aarch64-feature-deps.h"
 #include "config/arm/aarch-common.h"
 #include "config/arm/aarch-common-protos.h"
+#include "ssa.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -16411,20 +16412,20 @@ aarch64_multiply_add_p (vec_info *vinfo, 
stmt_vec_info stmt_info,
   if (code != PLUS_EXPR && code != MINUS_EXPR)
 return false;
 
-  for (int i = 1; i < 3; ++i)
+  auto is_mul_result = [&](int i)
 {
   tree rhs = gimple_op (assign, i);
   /* ??? Should we try to check for a single use as well?  */
   if (TREE_CODE (rhs) != SSA_NAME)
-   continue;
+   return false;
 
   stmt_vec_info def_stmt_info = vinfo->lookup_def (rhs);
   if (!def_stmt_info
  || STMT_VINFO_DEF_TYPE (def_stmt_info) != vect_internal_def)
-   continue;
+   return false;
   gassign *rhs_assign = dyn_cast (def_stmt_info->stmt);
   if (!rhs_assign || gimple_assign_rhs_code (rhs_assign) != MULT_EXPR)
-   continue;
+   return false;
 
   if (vec_flags & VEC_ADVSIMD)
{
@@ -16444,8 +16445,19 @@ aarch64_multiply_add_p (vec_info *vinfo, stmt_vec_info 
stmt_info,
}
 
   return true;
-}
-  return false;
+};
+
+  if (code == MINUS_EXPR && (vec_flags & VEC_ADVSIMD))
+/* Advanced SIMD doesn't have FNMADD/FNMSUB/FNMLA/FNMLS, so the
+   multiplication must be on the second operand (to form an FMLS).
+   But if both operands are multiplications and the second operand
+   is used more than once, we'll instead negate the second operand
+   and use it as an accumulator for the first operand.  */
+return (is_mul_result (2)
+   && (has_single_use (gimple_assign_rhs2 (assign))
+   || !is_mul_result (1)));
+
+  return is_mul_result (1) || is_mul_result (2);
 }
 
 /* Return true if STMT_INFO is the second part of a two-statement boolean AND
diff --git a/gcc/testsuite/gcc.target/aarch64/neoverse_v1_2.c 
b/gcc/testsuite/gcc.target/aarch64/neoverse_v1_2.c
new file mode 100644
index 000..45d7e81c78e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/neoverse_v1_2.c
@@ -0,0 +1,15 @@
+/* { dg-options "-O2 -mcpu=neoverse-v1 --param aarch64-autovec-preference=1 
-fdump-tree-vect-details" } */
+
+void
+f (float x[restrict][100], float y[restrict][100])
+{
+  for (int i = 0; i < 100; ++i)
+{
+  x[0][i] = y[0][i] * y[1][i] - y[3][i] * y[4][i];
+  x[1][i] = y[1][i] * y[2][i] - y[3][i] * y[4][i];
+}
+}
+
+/* { dg-final { scan-tree-dump {_[0-9]+ - _[0-9]+ 1 times vector_stmt costs 2 
} "vect" } } */
+/* { dg-final { scan-tree-dump-not {vector_stmt costs 0 } "vect" } } */
+/* { dg-final { scan-tree-dump {_[0-9]+ - _[0-9]+ 1 times scalar_stmt costs 0 
} "vect" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/neoverse_v1_3.c 
b/gcc/testsuite/gcc.target/aarch64/neoverse_v1_3.c
new file mode 100644
index 000..de31fc13b28
--- /dev/null
+++ b/g

Re: [PATCH] RISC-V: Support LEN_FOLD_EXTRACT_LAST auto-vectorization

2023-08-24 Thread Robin Dapp via Gcc-patches

Hi Juzhe,

>   vcpop.m a5,v0
>   beq a5,zero,.L3
>   addia5,a5,-1
>   vsetvli a4,zero,e32,m1,ta,ma
>   vcompress.vmv2,v3,v0
>   vslidedown.vx   v2,v2,a5
>   vmv.x.s a0,v2
> .L3:
>   sext.w  a0,a0

Mhm, where is this sext coming from?  Thought I had this covered with
the autovec-opt pattern but apparently not.  I'll take that, nothing
related to this patch.

> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -213,7 +213,7 @@ public:
> {
>   /* Optimize VLS-VLMAX code gen, we can use vsetivli instead of
>  the vsetvli to obtain the value of vlmax.  */
> - poly_uint64 nunits = GET_MODE_NUNITS (m_dest_mode);
> + poly_uint64 nunits = GET_MODE_NUNITS (m_mask_mode);

Why is that necessary?  Just for the popcount I presume?
Can't we rather have a new case for a scalar destination?  I find
the code a bit misleading now as we check m_dest_mode and then not
use it.

>  
> +/* Emit vcpop.m instruction.  */
> +
> +static void
> +emit_cpop_insn (unsigned icode, rtx *ops, rtx len)
> +{
> +  machine_mode dest_mode = GET_MODE (ops[0]);
> +  machine_mode mask_mode = GET_MODE (ops[1]);
> +  insn_expander e (RVV_CPOP,
> +   /* HAS_DEST_P */ true,
> +   /* FULLY_UNMASKED_P */ true,
> +   /* USE_REAL_MERGE_P */ true,
> +   /* HAS_AVL_P */ true,
> +   /* VLMAX_P */ len ? false : true,
> +   dest_mode, mask_mode);
> +
> +  e.set_vl (len);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}

The use_real_merge just appeared odd to me here because there is
nothing to merge.  But in the end it's just to omit the vundef operand
so good for now.  There is an increasing number of opportunities to
refactor in riscv-v.cc, though ;)

The rest looks good to me.  Note that my machine crashed when
compiling the extract_last-14.c because it used up all my RAM.
The vsetvl "refactor" phase 3 patch helped, though.
We'd need to have this patch depend on the other one then.

The rest looks good to me.  At first I was a bit wary about the
branching zero check after popcount but as we're outside of a loop
anyway, that's fine.  Might want to use a conditional select in the
future but actually not that important. 

Regards
 Robin

1 2 >

1 - 100 of 115 matches

Mail list logo