Re: Re: [PATCH V3] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-16 Thread juzhe.zh...@rivai.ai
Hi, Richard.

>> slp_op and mask_vectype are only initialised when mask_index >= 0.
>>Shouldn't this code be under mask_index >= 0 too?
 
>>Also, when do we encounter mismatched mask_vectypes?  Presumably the SLP
>>node has a known vectype by this point.  I think a comment would be useful.

Address comment and I think we won't encounter mismatch mask_vectypes.

So, I changed code in V4 as follows:
+  if (mask_index >= 0 && slp_node)
+   {
+ bool match_p
+   = vect_maybe_update_slp_op_vectype (slp_op, mask_vectype);
+ gcc_assert (match_p);
+   }

https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633209.html 

Assert we always match mask_vectype.

Tested on RISC-V and bootstrap && regtest on X86 passed.

Could you confirm it ?


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-10-17 05:34
To: Juzhe-Zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V3] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]
Juzhe-Zhong  writes:
> This patch fixes this following FAILs in RISC-V regression:
>
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
>
> The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
>
> We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:
>
> 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
> condtional mask).
>
>This situation we just need to leverage the current MASK_GATHER_LOAD which 
> can achieve SLP MASK_LEN_GATHER_LOAD.
>
> 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, 
> zero, -1)
>
>Current SLP check will failed on dummy mask -1, so we relax the check in 
> tree-vect-slp.cc and allow it to be materialized.
> 
> Consider this following case:
>
> void __attribute__((noipa))
> f (int *restrict y, int *restrict x, int *restrict indices, int n)
> {
>   for (int i = 0; i < n; ++i)
> {
>   y[i * 2] = x[indices[i * 2]] + 1;
>   y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
> }
> }
>
> https://godbolt.org/z/WG3M3n7Mo
>
> GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:
>
> f:
> ble a3,zero,.L5
> .L3:
> vsetvli a5,a3,e8,mf4,ta,ma
> vsetvli zero,a5,e32,m1,ta,ma
> vlseg2e32.v v6,(a2)
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2   v2,v6
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v  v1,(a1),v2
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2   v2,v7
> vsetvli zero,zero,e32,m1,ta,ma
> vadd.vi v4,v1,1
> vsetvli zero,zero,e64,m2,ta,ma
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v  v2,(a1),v2
> vsetvli a4,zero,e32,m1,ta,ma
> sllia6,a5,3
> vadd.vi v5,v2,2
> sub a3,a3,a5
> vsetvli zero,a5,e32,m1,ta,ma
> vsseg2e32.v v4,(a0)
> add a2,a2,a6
> add a0,a0,a6
> bne a3,zero,.L3
> .L5:
> ret
>
> After this patch:
>
> f:
> ble a3,zero,.L5
> li a5,1
> csrr t1,vlenb
> slli a5,a5,33
> srli a7,t1,2
> addi a5,a5,1
> slli a3,a3,1
> neg t3,a7
> vsetvli a4,zero,e64,m1,ta,ma
> vmv.v.x v4,a5
> .L3:
> minu a5,a3,a7
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v1,0(a2)
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2 v2,v1
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v v2,(a1),v2
> vsetvli a4,zero,e32,m1,ta,ma
> mv a6,a3
> vadd.vv v2,v2,v4
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v2,0(a0)
> add a2,a2,t1
> add a0,a0,t1
> add a3,a3,t3
> bgtu a6,a7,.L3
> .L5:
> ret
>
> Note that I found we are missing conditional mask gather_load SLP test, 
> Append a test for it in this patch.
 
Yeah, we're missing a target-independent test.  I'm afraid I used
aarch64-specific tests for a lot of this stuff, since (a) I wanted
to check the quality of the asm output and (b) it's very hard to write
gcc.dg/vect tests that don't fail on some target or other.  Thanks for
picking this up.
 
>
> Tested on RISC-V and Bootstrap && Regression on X86 passed.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
> (vect_get_and_check_slp_defs): Ditto.
> (vect_build_slp_tree_1): Ditto.
> (vect_build_slp_tree_2): Ditto.
> * tree-vect-stmts.cc (vectorizable_load): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-gather-6.c: New test.
>
> ---
>  gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++
>  gcc/tree-vect-slp.cc  | 22 ++
>  gcc/tree-vect-stmts.cc| 10 +-
>  3 files changed, 42 insertion

[PATCH V4] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-16 Thread Juzhe-Zhong
This patch fixes this following FAILs in RISC-V regression:

FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect 
"Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
stmts"

The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.

We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:

1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
condtional mask).
   
   This situation we just need to leverage the current MASK_GATHER_LOAD which 
can achieve SLP MASK_LEN_GATHER_LOAD.

2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
-1)
   
   Current SLP check will failed on dummy mask -1, so we relax the check in 
tree-vect-slp.cc and allow it to be materialized.

Consider this following case:

void __attribute__((noipa))
f (int *restrict y, int *restrict x, int *restrict indices, int n)
{
  for (int i = 0; i < n; ++i)
{
  y[i * 2] = x[indices[i * 2]] + 1;
  y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
}
}

https://godbolt.org/z/WG3M3n7Mo

GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:

f:
ble a3,zero,.L5
.L3:
vsetvli a5,a3,e8,mf4,ta,ma
vsetvli zero,a5,e32,m1,ta,ma
vlseg2e32.v v6,(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v6
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v1,(a1),v2
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v7
vsetvli zero,zero,e32,m1,ta,ma
vadd.vi v4,v1,1
vsetvli zero,zero,e64,m2,ta,ma
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
sllia6,a5,3
vadd.vi v5,v2,2
sub a3,a3,a5
vsetvli zero,a5,e32,m1,ta,ma
vsseg2e32.v v4,(a0)
add a2,a2,a6
add a0,a0,a6
bne a3,zero,.L3
.L5:
ret

After this patch:

f:
ble a3,zero,.L5
li  a5,1
csrrt1,vlenb
sllia5,a5,33
srlia7,t1,2
addia5,a5,1
sllia3,a3,1
neg t3,a7
vsetvli a4,zero,e64,m1,ta,ma
vmv.v.x v4,a5
.L3:
minua5,a3,a7
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v1,0(a2)
vsetvli a4,zero,e64,m2,ta,ma
vsext.vf2   v2,v1
vsll.vi v2,v2,2
vsetvli zero,a5,e32,m1,ta,ma
vluxei64.v  v2,(a1),v2
vsetvli a4,zero,e32,m1,ta,ma
mv  a6,a3
vadd.vv v2,v2,v4
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v2,0(a0)
add a2,a2,t1
add a0,a0,t1
add a3,a3,t3
bgtua6,a7,.L3
.L5:
ret

Note that I found we are missing conditional mask gather_load SLP test, Append 
a test for it in this patch.

Tested on RISC-V and Bootstrap && Regression on X86 passed.

Ok for trunk ?

gcc/ChangeLog:

* tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
(vect_get_and_check_slp_defs): Ditto.
(vect_build_slp_tree_1): Ditto.
(vect_build_slp_tree_2): Ditto.
* tree-vect-stmts.cc (vectorizable_load): Ditto.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/vect-gather-6.c: New test.

---
 gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++
 gcc/tree-vect-slp.cc  | 22 ++
 gcc/tree-vect-stmts.cc|  9 -
 3 files changed, 41 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c

diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c 
b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
new file mode 100644
index 000..ff55f321854
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+void
+f (int *restrict y, int *restrict x, int *restrict indices, int *restrict 
cond, int n)
+{
+  for (int i = 0; i < n; ++i)
+{
+  if (cond[i * 2])
+   y[i * 2] = x[indices[i * 2]] + 1;
+  if (cond[i * 2 + 1])
+   y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
+}
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target 
vect_gather_load_ifn } } } */
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index af8f5031bd2..b379278446b 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -550,6 +550,7 @@ vect_get_operand_map (const gimple *stmt, unsigned char 
swap = 0)
return arg1_map;
 
  case IFN_MASK_GATHER_LOAD:
+ case IFN_MASK_LEN_GATHER_LOAD:
return arg1_arg4_map;
 
  case IFN_MASK_STORE:
@@ -717,8 +718,7 @@ vect_get_and_check_slp

[PATCH] middle-end/111818 - failed DECL_NOT_GIMPLE_REG_P setting of volatile

2023-10-16 Thread Richard Biener
The following addresses a missed DECL_NOT_GIMPLE_REG_P setting of
a volatile declared parameter which causes inlining to substitute
a constant parameter into a context where its address is required.

The main issue is in update_address_taken which clears
DECL_NOT_GIMPLE_REG_P from the parameter but fails to rewrite it
because is_gimple_reg returns false for volatiles.  The following
changes maybe_optimize_var to make the 1:1 correspondence between
clearing DECL_NOT_GIMPLE_REG_P of a register typed decl and
actually rewriting it to SSA.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR middle-end/111818
* tree-ssa.cc (maybe_optimize_var): When clearing
DECL_NOT_GIMPLE_REG_P always rewrite into SSA.

* gcc.dg/torture/pr111818.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr111818.c | 11 +++
 gcc/tree-ssa.cc | 17 +++--
 2 files changed, 22 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr111818.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr111818.c 
b/gcc/testsuite/gcc.dg/torture/pr111818.c
new file mode 100644
index 000..a7a9d71
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr111818.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+
+static void foo(const volatile unsigned int x, void *p)
+{
+  __builtin_memcpy(p, (void *)&x, sizeof x);
+}
+
+void bar(void *number)
+{
+  foo(0, number);
+}
diff --git a/gcc/tree-ssa.cc b/gcc/tree-ssa.cc
index ebba02b8449..2f3210fcf61 100644
--- a/gcc/tree-ssa.cc
+++ b/gcc/tree-ssa.cc
@@ -1788,15 +1788,20 @@ maybe_optimize_var (tree var, bitmap addresses_taken, 
bitmap not_reg_needs,
  maybe_reg = true;
  DECL_NOT_GIMPLE_REG_P (var) = 0;
}
-  if (maybe_reg && is_gimple_reg (var))
+  if (maybe_reg)
{
- if (dump_file)
+ if (is_gimple_reg (var))
{
- fprintf (dump_file, "Now a gimple register: ");
- print_generic_expr (dump_file, var);
- fprintf (dump_file, "\n");
+ if (dump_file)
+   {
+ fprintf (dump_file, "Now a gimple register: ");
+ print_generic_expr (dump_file, var);
+ fprintf (dump_file, "\n");
+   }
+ bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
}
- bitmap_set_bit (suitable_for_renaming, DECL_UID (var));
+ else
+   DECL_NOT_GIMPLE_REG_P (var) = 1;
}
 }
 }
-- 
2.35.3


Re: [PATCH] genemit: Split insn-emit.cc into ten files.

2023-10-16 Thread Sam James


Robin Dapp  writes:

> Hi,
>
> the attached v2 includes Tamar's suggestion of keeping the current
> stdout behavior.  When no output files are passed (via -O) the output
> is written to stdout as before.
>
> Tamar also mentioned off-list that, similar to match.pd, it might make
> sense to balance the partitions in a better way than a fixed number
> of patterns threshold.  That's a good idea but I'd rather do that
> separately as the current approach already helps considerably.
>
> Attached v2 was bootstrapped and regtested on power10, aarch64 and
> x86 are still running.
> Stefan also tested v1 on s390 where the partitioning does not help
> but also doesn't slow anything down.  insn-emit.cc isn't very large
> to begin with on s390.
>
> Regards
>  Robin
>
> From 34d05113a4e3c7e83a4731020307e26c1144af69 Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Thu, 12 Oct 2023 11:23:26 +0200
> Subject: [PATCH v2] genemit: Split insn-emit.cc into several partitions.
>
> On riscv insn-emit.cc has grown to over 1.2 mio lines of code and
> compiling it takes considerable time.
> Therefore, this patch adjust genemit to create several partitions
> (insn-emit-1.cc to insn-emit-n.cc).  In order to do so it first counts
> the number of available patterns, calculates the number of patterns per
> file and starts a new file whenever that number is reached.
>
> Similar to match.pd a configure option --with-emitinsn-partitions=num
> is introduced that makes the number of partition configurable.
>

Natively, things seem fine, but for cross, I get failures on a few
targets (hppa2.0-unknown-linux-gnu, hppa64-unknown-linux-gnu).

With ./configure --host=x86_64-pc-linux-gnu
--target=hppa2.0-unknown-linux-gnu --build=x86_64-pc-linux-gnu && make
-j$(nproc), I get a bunch of stuff like:

mv: cannot stat 'tmp-emit-9.cc': No such file or directory
echo timestamp > s-insn-emit-8
mv: cannot stat 'tmp-emit-10.cc': No such file or directory
make[2]: *** [Makefile:2598: s-insn-emit-9] Error 1
make[2]: *** Waiting for unfinished jobs
make[2]: *** [Makefile:2598: s-insn-emit-10] Error 1


[PATCH] Support 32/64-bit vectorization for _Float16 fma related operations.

2023-10-16 Thread liuhongt
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ready push to trunk.

gcc/ChangeLog:

* config/i386/mmx.md (fma4): New expander.
(fms4): Ditto.
(fnma4): Ditto.
(fnms4): Ditto.
(vec_fmaddsubv4hf4): Ditto.
(vec_fmsubaddv4hf4): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/part-vect-fmaddsubhf-1.c: New test.
* gcc.target/i386/part-vect-fmahf-1.c: New test.
---
 gcc/config/i386/mmx.md| 152 +-
 .../gcc.target/i386/part-vect-fmaddsubhf-1.c  |  22 +++
 .../gcc.target/i386/part-vect-fmahf-1.c   |  58 +++
 3 files changed, 231 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-fmaddsubhf-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/part-vect-fmahf-1.c

diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 82ca49c207b..491a0a51272 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2365,7 +2365,157 @@ (define_expand "signbit2"
 
 ;
 ;;
-;; Parallel single-precision floating point conversion operations
+;; Parallel half-precision FMA multiply/accumulate instructions.
+;;
+;
+
+(define_expand "fma4"
+  [(set (match_operand:VHF_32_64 0 "register_operand")
+   (fma:VHF_32_64
+ (match_operand:VHF_32_64 1 "nonimmediate_operand")
+ (match_operand:VHF_32_64 2 "nonimmediate_operand")
+ (match_operand:VHF_32_64 3 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
+{
+  rtx op3 = gen_reg_rtx (V8HFmode);
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_mov__to_sse (op3, operands[3]));
+  emit_insn (gen_mov__to_sse (op2, operands[2]));
+  emit_insn (gen_mov__to_sse (op1, operands[1]));
+
+  emit_insn (gen_fmav8hf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8HFmode));
+  DONE;
+})
+
+(define_expand "fms4"
+  [(set (match_operand:VHF_32_64 0 "register_operand")
+   (fma:VHF_32_64
+ (match_operand:VHF_32_64   1 "nonimmediate_operand")
+ (match_operand:VHF_32_64   2 "nonimmediate_operand")
+ (neg:VHF_32_64
+   (match_operand:VHF_32_64 3 "nonimmediate_operand"]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
+{
+  rtx op3 = gen_reg_rtx (V8HFmode);
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_mov__to_sse (op3, operands[3]));
+  emit_insn (gen_mov__to_sse (op2, operands[2]));
+  emit_insn (gen_mov__to_sse (op1, operands[1]));
+
+  emit_insn (gen_fmsv8hf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8HFmode));
+  DONE;
+})
+
+(define_expand "fnma4"
+  [(set (match_operand:VHF_32_64 0 "register_operand")
+   (fma:VHF_32_64
+ (neg:VHF_32_64
+   (match_operand:VHF_32_64 1 "nonimmediate_operand"))
+ (match_operand:VHF_32_64   2 "nonimmediate_operand")
+ (match_operand:VHF_32_64   3 "nonimmediate_operand")))]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
+{
+  rtx op3 = gen_reg_rtx (V8HFmode);
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_mov__to_sse (op3, operands[3]));
+  emit_insn (gen_mov__to_sse (op2, operands[2]));
+  emit_insn (gen_mov__to_sse (op1, operands[1]));
+
+  emit_insn (gen_fnmav8hf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8HFmode));
+  DONE;
+})
+
+(define_expand "fnms4"
+  [(set (match_operand:VHF_32_64 0 "register_operand" "=v,v,x")
+   (fma:VHF_32_64
+ (neg:VHF_32_64
+   (match_operand:VHF_32_64 1 "nonimmediate_operand"))
+ (match_operand:VHF_32_64   2 "nonimmediate_operand")
+ (neg:VHF_32_64
+   (match_operand:VHF_32_64 3 "nonimmediate_operand"]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL && ix86_partial_vec_fp_math"
+{
+  rtx op3 = gen_reg_rtx (V8HFmode);
+  rtx op2 = gen_reg_rtx (V8HFmode);
+  rtx op1 = gen_reg_rtx (V8HFmode);
+  rtx op0 = gen_reg_rtx (V8HFmode);
+
+  emit_insn (gen_mov__to_sse (op3, operands[3]));
+  emit_insn (gen_mov__to_sse (op2, operands[2]));
+  emit_insn (gen_mov__to_sse (op1, operands[1]));
+
+  emit_insn (gen_fnmsv8hf4 (op0, op1, op2, op3));
+
+  emit_move_insn (operands[0], lowpart_subreg (mode, op0, V8HFmode));
+  DONE;
+})
+
+(define_expand "vec_fmaddsubv4hf4"
+  [(match_operand:V4HF 0 "register_operand")
+   (match_operand:V4HF 1 "nonimmediate_operand")
+   (match_operand:V4HF 2 "nonimmediate_operand")
+   (match_operand:V4HF 3 "nonimmediate_operand")]
+  "TARGET_AVX512FP16 && TARGET_AVX512VL
+   && TARGET_MMX_WITH_SSE
+   && ix86_partial_vec_fp

[PING^3][PATCH v2] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-10-16 Thread Surya Kumari Jangala
Ping

On 03/10/23 3:53 pm, Surya Kumari Jangala wrote:
> Ping
> 
> On 20/09/23 7:31 am, Surya Kumari Jangala wrote:
>> Ping
>>
>> On 10/09/23 10:58 pm, Surya Kumari Jangala wrote:
>>> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
>>>
>>> In the routine rs6000_analyze_swaps(), special handling of swappable
>>> instructions is done even if the webs that contain the swappable
>>> instructions are not optimized, i.e., the webs do not contain any
>>> permuting load/store instructions along with the associated register
>>> swap instructions. Doing special handling in such webs will result in
>>> the extracted lane being adjusted unnecessarily for vec_extract.
>>>
>>> Another issue is that existing code treats non-permuting loads/stores
>>> as special swappables. Non-permuting loads/stores (that have not yet
>>> been split into a permuting load/store and a swap) are handled by
>>> converting them into a permuting load/store (which effectively removes
>>> the swap). As a result, if special swappables are handled only in webs
>>> containing permuting loads/stores, then non-optimal code is generated
>>> for non-permuting loads/stores.
>>>
>>> Hence, in this patch, all webs containing either permuting loads/
>>> stores or non-permuting loads/stores are marked as requiring special
>>> handling of swappables. Swaps associated with permuting loads/stores
>>> are marked for removal, and non-permuting loads/stores are converted to
>>> permuting loads/stores. Then the special swappables in the webs are
>>> fixed up.
>>>
>>> Another issue with always handling swappable instructions is that it is
>>> incorrect to do so in webs where loads/stores on quad word aligned
>>> addresses are changed to lvx/stvx. Similarly, in webs where
>>> swap(load(vector constant)) instructions are replaced with
>>> load(swapped vector constant), the swappable instructions should not be
>>> modified.
>>>
>>> 2023-09-10  Surya Kumari Jangala  
>>>
>>> gcc/
>>> PR rtl-optimization/PR106770
>>> * config/rs6000/rs6000-p8swap.cc (non_permuting_mem_insn): New
>>> function.
>>> (handle_non_permuting_mem_insn): New function.
>>> (rs6000_analyze_swaps): Handle swappable instructions only in
>>> certain webs.
>>> (web_requires_special_handling): New instance variable.
>>> (handle_special_swappables): Remove handling of non-permuting
>>> load/store instructions.
>>>
>>> gcc/testsuite/
>>> PR rtl-optimization/PR106770
>>> * gcc.target/powerpc/pr106770.c: New test.
>>> ---
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
>>> b/gcc/config/rs6000/rs6000-p8swap.cc
>>> index 0388b9bd736..3a695aa1318 100644
>>> --- a/gcc/config/rs6000/rs6000-p8swap.cc
>>> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
>>> @@ -179,6 +179,13 @@ class swap_web_entry : public web_entry_base
>>>unsigned int special_handling : 4;
>>>/* Set if the web represented by this entry cannot be optimized.  */
>>>unsigned int web_not_optimizable : 1;
>>> +  /* Set if the swappable insns in the web represented by this entry
>>> + have to be fixed. Swappable insns have to be fixed in :
>>> +   - webs containing permuting loads/stores and the swap insns
>>> +in such webs have been marked for removal
>>> +   - webs where non-permuting loads/stores have been converted
>>> +to permuting loads/stores  */
>>> +  unsigned int web_requires_special_handling : 1;
>>>/* Set if this insn should be deleted.  */
>>>unsigned int will_delete : 1;
>>>  };
>>> @@ -1468,14 +1475,6 @@ handle_special_swappables (swap_web_entry 
>>> *insn_entry, unsigned i)
>>>if (dump_file)
>>> fprintf (dump_file, "Adjusting subreg in insn %d\n", i);
>>>break;
>>> -case SH_NOSWAP_LD:
>>> -  /* Convert a non-permuting load to a permuting one.  */
>>> -  permute_load (insn);
>>> -  break;
>>> -case SH_NOSWAP_ST:
>>> -  /* Convert a non-permuting store to a permuting one.  */
>>> -  permute_store (insn);
>>> -  break;
>>>  case SH_EXTRACT:
>>>/* Change the lane on an extract operation.  */
>>>adjust_extract (insn);
>>> @@ -2401,6 +2400,25 @@ recombine_lvx_stvx_patterns (function *fun)
>>>free (to_delete);
>>>  }
>>>  
>>> +/* Return true if insn is a non-permuting load/store.  */
>>> +static bool
>>> +non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
>>> +{
>>> +  return (insn_entry[i].special_handling == SH_NOSWAP_LD ||
>>> + insn_entry[i].special_handling == SH_NOSWAP_ST);
>>> +}
>>> +
>>> +/* Convert a non-permuting load/store insn to a permuting one.  */
>>> +static void
>>> +handle_non_permuting_mem_insn (swap_web_entry *insn_entry, unsigned int i)
>>> +{
>>> +  rtx_insn *insn = insn_entry[i].insn;
>>> +  if (insn_entry[i].special_handling == SH_NOSWAP_LD)
>>> +permute_load (insn);
>>> +  else if (insn_entry[i].special_handling == SH_NOSWAP_ST)
>>> +permute_store (insn);
>>> +}
>>> +
>>>  /* Main entry point for this pass.  */

Re: [RFC] expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg [target/111466]

2023-10-16 Thread Jeff Law




On 9/28/23 15:43, Vineet Gupta wrote:

RISC-V suffers from extraneous sign extensions, despite/given the ABI
guarantee that 32-bit quantities are sign-extended into 64-bit registers,
meaning incoming SI function args need not be explicitly sign extended
(so do SI return values as most ALU insns implicitly sign-extend too.)

Existing REE doesn't seem to handle this well and there are various ideas
floating around to smarten REE about it.

RISC-V also seems to correctly implement middle-end hook PROMOTE_MODE
etc.

Another approach would be to prevent EXPAND from generating the
sign_extend in the first place which this patch tries to do.

The hunk being removed was introduced way back in 1994 as
5069803972 ("expand_expr, case CONVERT_EXPR .. clear the promotion flag")

This survived full testsuite run for RISC-V rv64gc with surprisingly no
fallouts: test results before/after are exactly same.

|   | # of unexpected case / # of unique unexpected 
case
|   |  gcc |  g++ | gfortran |
| rv64imafdc_zba_zbb_zbs_zicond/|  264 /87 |5 / 2 |   72 /12 |
|lp64d/medlow

Granted for something so old to have survived, there must be a valid
reason. Unfortunately the original change didn't have additional
commentary or a test case. That is not to say it can't/won't possibly
break things on other arches/ABIs, hence the RFC for someone to scream
that this is just bonkers, don't do this :-)

I've explicitly CC'ed Jakub and Roger who have last touched subreg
promoted notes in expr.cc for insight and/or screaming ;-)

Thanks to Robin for narrowing this down in an amazing debugging session
@ GNU Cauldron.

```
foo2:
sext.w  a6,a1 <-- this goes away
beq a1,zero,.L4
li  a5,0
li  a0,0
.L3:
addwa4,a2,a5
addwa5,a3,a5
addwa0,a4,a0
bltua5,a6,.L3
ret
.L4:
li  a0,0
ret
```

Signed-off-by: Vineet Gupta 
Co-developed-by: Robin Dapp 
---
  gcc/expr.cc   |  7 ---
  gcc/testsuite/gcc.target/riscv/pr111466.c | 15 +++
  2 files changed, 15 insertions(+), 7 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/pr111466.c
I created a ChangeLog and pushed this after a final bootstrap/comparison 
run on x86_64.


As I've noted before, this has been running across the various targets 
in my tester for quite a while with no issues.  Additionally Robin and 
myself have dug into various paths through expr_expr_real_2 and we're 
reasonably confident this is safe (about as much as we can be given the 
lack of information about the original patch).


My strong suspicion is that Michael M. made this code obsolete when he 
last revamped the gimple/ssa->RTL expansion path.


Thanks for your patience Vineet.  It's been a long road.


Jivan is diving into Joern's work.  It shows significant promise, though 
we are seeing some very weird behavior on perlbench.


jeff




Re: [PATCH 2/2] RISC-V: Add assert of the number of vmerge in autovec cond testcases

2023-10-16 Thread Lehua Ding

Hi Jeff,

Can you replace riscv_vector with riscv_v?  That way this will still 
work after Joern commits his change to standardize on the riscv_v target 
selector.


OK with that change, no need to wait for a review on V2, just go ahead 
and blast it in.


No problem, I'll tweak it later and submit it. Thanks.

--
Best,
Lehua (RiVAI)
lehua.d...@rivai.ai



Re: [PATCH] RISC-V: Refactor and cleanup vsetvl pass

2023-10-16 Thread Lehua Ding

OK, I'll split it as Juzhe suggested. Thanks.

On 2023/10/17 6:24, 钟居哲 wrote:

Yeah.  The refactor and rename stuff make the patch difference so messy.
It's not easy to read.

So, I suggest this patch split into these following sub-patch (each 
sub-patch not necessary compilable):


1. Refactor and clean up data structure layout
(The difference should be removing orignal 
avl_info/vl_vtype/vector_insn_info, add the new vector_insn_info).


2. Refactor compatible/fusion/available rule: The difference should be 
mostly on riscv-vector.def


3. Refactor and simplify local analysis (phase 1 and phase 2) into a 
single phase since 2 rounds local analysis (backward and forward) are 
redundant.


4. Introduce new LCM helper function (compute_reaching_defintion)

5. Split earliest fusion into 2 phases which make code easier maintain 
and read.


6. Remove all post optimizations

7. Adapt and Robostify testcases.


juzhe.zh...@rivai.ai

*From:* Kito Cheng 
*Date:* 2023-10-16 23:38
*To:* Lehua Ding 
*CC:* GCC Patches ; 钟居哲
; Robin Dapp
; Palmer Dabbelt
; Jeff Law 
*Subject:* Re: [PATCH] RISC-V: Refactor and cleanup vsetvl pass
It's impossible to review, plz split into multiple small patch if
possible...

Lehua Ding mailto:lehua.d...@rivai.ai>> 於
2023年10月16日 週一 07:54 寫道:


This patch refactors and cleanups the vsetvl pass in order to
make the code
easier to modify and understand. This patch does several things:

1. Introducing a virtual CFG for vsetvl infos and Phase 1, 2 and
3 only maintain
    and modify this virtual CFG. Phase 4 performs insertion,
modification and
    deletion of vsetvl insns based on the virtual CFG. The Basic
block in the
    virtual CFG is called vsetvl_block_info and the vsetvl
information inside
    is called vsetvl_info.
2. Combine Phase 1 and 2 into a single Phase 1 and unified the
demand system,
    this Phase only fuse local vsetvl info in forward direction.
3. Refactor Phase 3, change the logic for determining whether to
uplift vsetvl
    info to a pred basic block to a more unified method that
there is a vsetvl
    info in the vsetvl defintion reaching in compatible with it.
4. Place all modification operations to the RTL in Phase 4 and
Phase 5.
    Phase 4 is responsible for inserting, modifying and deleting
vsetvl
    instructions based on fully optimized vsetvl infos. Phase 5
removes the avl
    operand from the RVV instruction and removes the unused dest
operand
    register from the vsetvl insns.

These modifications resulted in some testcases needing to be
updated. The reasons
for updating are summarized below:

1. more optimized
  
  vlmax_back_prop-25.c/vlmax_back_prop-26.c/vlmax_conflict-3.c/vsetvl-13.c

    vsetvl-23.c/
    avl_single-21.c/avl_single-23.c/avl_single-67.c/avl_single-68.c/
    avl_single-71.c/avl_single-89.c/avl_single-93.c/avl_single-95.c/
    avl_single-96.c
2. less unnecessary fusion
    avl_single-46.c/imm_bb_prop-1.c/pr109743-2.c/pr109773-1.c
3. local fuse direction (backward -> forward)
    scalar_move-1.c/
4. add some bugfix testcases..
    pr111037-3.c/pr111037-4.c
    avl_single-89.c

         PR target/111037
         PR target/111234
         PR target/111725

gcc/ChangeLog:

         * config/riscv/riscv-vsetvl.cc
(bitmap_union_of_preds_with_entry): New helper function.
         (debug): Removed.
         (compute_reaching_defintion): Compute reaching
defintion data.
         (enum vsetvl_type): Unchange.
         (vlmax_avl_p): Unchange.
         (enum emit_type): Unchange.
         (vlmul_to_str): Unchange.
         (vlmax_avl_insn_p): Removed.
         (policy_to_str): Unchange.
         (loop_basic_block_p): Removed.
         (valid_sew_p): Removed.
         (vsetvl_insn_p): Unchange.
         (vsetvl_vtype_change_only_p): Removed.
         (after_or_same_p): Removed.
         (before_p): Removed.
         (anticipatable_occurrence_p): Removed.
         (available_occurrence_p): Removed.
         (insn_should_be_added_p): Unchange.
         (get_all_sets): Unchange.
         (get_same_bb_set): Unchange.
         (gen_vsetvl_pat): Unchange.
      

Re: [PATCH-2, rs6000] Enable vector mode for memory equality compare [PR111449]

2023-10-16 Thread Kewen.Lin
Hi,

on 2023/10/10 16:18, HAO CHEN GUI wrote:
> Hi David,
> 
>   Thanks for your review comments.
> 
> 在 2023/10/9 23:42, David Edelsohn 写道:
>>  #define MOVE_MAX (! TARGET_POWERPC64 ? 4 : 8)
>>  #define MAX_MOVE_MAX 8
>> +#define MOVE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
>> +#define COMPARE_MAX_PIECES (!TARGET_POWERPC64 ? 4 : 16)
>>
>>
>> How are the definitions of MOVE_MAX_PIECES and COMPARE_MAX_PIECES 
>> determined?  The email does not provide any explanation for the 
>> implementation.  The rest of the patch is related to vector support, but 
>> vector support is not dependent on TARGET_POWERPC64.
> 
> By default, MOVE_MAX_PIECES and COMPARE_MAX_PIECES is set the same value
> as MOVE_MAX. The move and compare instructions are required in
> compare_by_pieces, those macros are set to 16 byte when supporting
> vector mode (V16QImode). The problem is rs6000 hasn't supported TImode
> for "-m32". We discussed it in issue 1307. TImode will be used for
> move when MOVE_MAX_PIECES is set to 16. But TImode isn't supported
> with "-m32" which might cause ICE.

I think David raised a good question, it sounds to me that the current
handling simply consider that if MOVE_MAX_PIECES is set to 16, the
required operations for this optimization on TImode are always available,
but unfortunately on rs6000 the assumption doesn't hold, so could we
teach generic code instead?

BR,
Kewen


Re: [PATCH v2] rs6000: Change bitwise xor to an equality operator [PR106907]

2023-10-16 Thread Kewen.Lin
Hi,

on 2023/10/11 19:50, jeevitha wrote:
> Hi All,
> 
> The following patch has been bootstrapped and regtested on powerpc64le-linux.
> 
> PR106907 has a few warnings spotted from cppcheck. These warnings
> are related to the need of precedence clarification. Instead of using xor,
> it has been changed to equality check, which achieves the same result.
> Additionally, comment indentation has been fixed.
> 

Ok for trunk, thanks!

BR,
Kewen

> 2023-10-11  Jeevitha Palanisamy  
> 
> gcc/
>   PR target/106907
>   * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Change 
> bitwise
>   xor to an equality and fix comment indentation.
> 
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 2828f01413c..00191f8656b 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -23624,10 +23624,10 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, 
> rtx op1,
> && GET_MODE (XEXP (op0, 0)) != V8HImode)))
>   continue;
>  
> -  /* For little-endian, the two input operands must be swapped
> - (or swapped back) to ensure proper right-to-left numbering
> - from 0 to 2N-1.  */
> -   if (swapped ^ !BYTES_BIG_ENDIAN
> +   /* For little-endian, the two input operands must be swapped
> +  (or swapped back) to ensure proper right-to-left numbering
> +  from 0 to 2N-1.  */
> +   if (swapped == BYTES_BIG_ENDIAN
> && icode != CODE_FOR_vsx_xxpermdi_v16qi)
>   std::swap (op0, op1);
> if (imode != V16QImode)
> 
>


Re:[pushed] [PATCH v2] LoongArch: Delete macro definition ASM_OUTPUT_ALIGN_WITH_NOP.

2023-10-16 Thread chenglulu

Pushed to r14-4674.

在 2023/10/12 下午3:00, Lulu Cheng 写道:

There are two reasons for removing this macro definition:
1. The default in the assembler is to use the nop instruction for filling.
2. For assembly directives: .align [abs-expr[, abs-expr[, abs-expr]]]
The third expression it is the maximum number of bytes that should be
skipped by this alignment directive.
Therefore, it will affect the display of the specified alignment rules
and affect the operating efficiency.

This modification relies on binutils commit 
1fb3cdd87ec61715a5684925fb6d6a6cf53bb97c.
(Since the assembler will add nop based on the .align information when doing 
relax,
it will cause the conditional branch to go out of bounds during the assembly 
process.
This submission of binutils solves this problem.)

gcc/ChangeLog:

* config/loongarch/loongarch.h (ASM_OUTPUT_ALIGN_WITH_NOP):
Delete.

Co-authored-by: Chenghua Xu 
---
v1 -> v2:
Modify description information

---
  gcc/config/loongarch/loongarch.h | 5 -
  1 file changed, 5 deletions(-)

diff --git a/gcc/config/loongarch/loongarch.h b/gcc/config/loongarch/loongarch.h
index d357e32e414..f700b3cb939 100644
--- a/gcc/config/loongarch/loongarch.h
+++ b/gcc/config/loongarch/loongarch.h
@@ -1061,11 +1061,6 @@ typedef struct {
  
  #define ASM_OUTPUT_ALIGN(STREAM, LOG) fprintf (STREAM, "\t.align\t%d\n", (LOG))
  
-/* "nop" instruction 54525952 (andi $r0,$r0,0) is

-   used for padding.  */
-#define ASM_OUTPUT_ALIGN_WITH_NOP(STREAM, LOG) \
-  fprintf (STREAM, "\t.align\t%d,54525952,4\n", (LOG))
-
  /* This is how to output an assembler line to advance the location
 counter by SIZE bytes.  */
  


--
本邮件及其附件含有龙芯中科的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。

This email and its attachments contain confidential information from Loongson 
Technology , which is intended only for the person or entity whose address is 
listed above. Any use of the information contained herein in any way 
(including, but not limited to, total or partial disclosure, reproduction or 
dissemination) by persons other than the intended recipient(s) is prohibited. 
If you receive this email in error, please notify the sender by phone or email 
immediately and delete it.



Re: [pushed][PATCH v1] LoongArch: Fix vec_initv32qiv16qi template to avoid ICE.

2023-10-16 Thread chenglulu

Pushed to r14-4675.

在 2023/10/11 下午4:41, Chenghui Pan 写道:

Following test code triggers unrecognized insn ICE on LoongArch target
with "-O3 -mlasx":

void
foo (unsigned char *dst, unsigned char *src)
{
   for (int y = 0; y < 16; y++)
 {
   for (int x = 0; x < 16; x++)
 dst[x] = src[x] + 1;
   dst += 32;
   src += 32;
 }
}

ICE info:
./test.c: In function ‘foo’:
./test.c:8:1: error: unrecognizable insn:
 8 | }
   | ^
(insn 15 14 16 4 (set (reg:V32QI 185 [ vect__24.7 ])
 (vec_concat:V32QI (reg:V16QI 186)
 (const_vector:V16QI [
 (const_int 0 [0]) repeated x16
 ]))) "./test.c":4:19 -1
  (nil))
during RTL pass: vregs
./test.c:8:1: internal compiler error: in extract_insn, at recog.cc:2791
0x12028023b _fatal_insn(char const*, rtx_def const*, char const*, int, char 
const*)
 /home/panchenghui/upstream/gcc/gcc/rtl-error.cc:108
0x12028026f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
 /home/panchenghui/upstream/gcc/gcc/rtl-error.cc:116
0x120a03c5b extract_insn(rtx_insn*)
 /home/panchenghui/upstream/gcc/gcc/recog.cc:2791
0x12067ff73 instantiate_virtual_regs_in_insn
 /home/panchenghui/upstream/gcc/gcc/function.cc:1610
0x12067ff73 instantiate_virtual_regs
 /home/panchenghui/upstream/gcc/gcc/function.cc:1983
0x12067ff73 execute
 /home/panchenghui/upstream/gcc/gcc/function.cc:2030

This RTL is generated inside loongarch_expand_vector_group_init function 
(related
to vec_initv32qiv16qi template). Original impl doesn't ensure all vec_concat 
arguments
are register type. This patch adds force_reg() to the vec_concat argument 
generation.

gcc/ChangeLog:

* config/loongarch/loongarch.cc (loongarch_expand_vector_group_init):
  fix impl related to vec_initv32qiv16qi template to avoid ICE.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vector/lasx/lasx-vec-init-1.c: New test.
---
  gcc/config/loongarch/loongarch.cc  |  3 ++-
  .../loongarch/vector/lasx/lasx-vec-init-1.c| 14 ++
  2 files changed, 16 insertions(+), 1 deletion(-)
  create mode 100644 
gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vec-init-1.c

diff --git a/gcc/config/loongarch/loongarch.cc 
b/gcc/config/loongarch/loongarch.cc
index 3420e002efc..14dd0db1674 100644
--- a/gcc/config/loongarch/loongarch.cc
+++ b/gcc/config/loongarch/loongarch.cc
@@ -10206,7 +10206,8 @@ loongarch_gen_const_int_vector_shuffle (machine_mode 
mode, int val)
  void
  loongarch_expand_vector_group_init (rtx target, rtx vals)
  {
-  rtx ops[2] = { XVECEXP (vals, 0, 0), XVECEXP (vals, 0, 1) };
+  rtx ops[2] = { force_reg (E_V16QImode, XVECEXP (vals, 0, 0)),
+  force_reg (E_V16QImode, XVECEXP (vals, 0, 1)) };
emit_insn (gen_rtx_SET (target, gen_rtx_VEC_CONCAT (E_V32QImode, ops[0],
  ops[1])));
  }
diff --git a/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vec-init-1.c 
b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vec-init-1.c
new file mode 100644
index 000..28be329822e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/vector/lasx/lasx-vec-init-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+void
+foo (unsigned char *dst, unsigned char *src)
+{
+  for (int y = 0; y < 16; y++)
+{
+  for (int x = 0; x < 16; x++)
+dst[x] = src[x] + 1;
+  dst += 32;
+  src += 32;
+}
+}


--
本邮件及其附件含有龙芯中科的商业秘密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制或散发)本邮件及其附件中的信息。如果您错收本邮件,请您立即电话或邮件通知发件人并删除本邮件。

This email and its attachments contain confidential information from Loongson 
Technology , which is intended only for the person or entity whose address is 
listed above. Any use of the information contained herein in any way 
(including, but not limited to, total or partial disclosure, reproduction or 
dissemination) by persons other than the intended recipient(s) is prohibited. 
If you receive this email in error, please notify the sender by phone or email 
immediately and delete it.



Re: [PATCH] Add files to discourage submissions of PRs to the GitHub mirror.

2023-10-16 Thread Eric Gallager
On Mon, Oct 16, 2023 at 7:58 PM Andrew Pinski  wrote:
>
>
>
> On Mon, Oct 16, 2023, 16:39 Eric Gallager  wrote:
>>
>> Currently there is an unofficial mirror of GCC on GitHub that people
>> sometimes submit pull requests to:
>> https://github.com/gcc-mirror/gcc
>> However, this is not the proper way to contribute to GCC, so that means
>> that someone (usually Jonathan Wakely) has to go through the PRs and
>> manually tell people that they're sending their PRs to the wrong place.
>> One thing that would help mitigate this problem would be files in a
>> special .github directory that GitHub would automatically open when
>> contributors attempt to open a PR, that would then tell them the proper
>> way to contribute instead. This patch attempts to add two such files.
>> They are written in Markdown, which I'm realizing might require some
>> special handling in this repository, since the ".md" extension is also
>> used for GCC's "Machine Description" files here, but I'm not quite sure
>> how to go about handling that. Also note that I adapted these files from
>> equivalent files in the git repository for Git itself:
>> https://github.com/git/git/blob/master/.github/CONTRIBUTING.md
>> https://github.com/git/git/blob/master/.github/PULL_REQUEST_TEMPLATE.md
>> What do people think?
>
>
>
> I think this is a great idea. Is a similar one for opening issues too?
>

One for issues isn't necessary, because the GitHub mirror has never
had issues enabled in the first place, so people already can't open
issues there.

> Thanks,
> Andrew
>
>
>> ChangeLog:
>>
>> * .github/CONTRIBUTING.md: New file.
>> * .github/PULL_REQUEST_TEMPLATE.md: New file.
>> ---
>>  .github/CONTRIBUTING.md  | 18 ++
>>  .github/PULL_REQUEST_TEMPLATE.md |  5 +
>>  2 files changed, 23 insertions(+)
>>  create mode 100644 .github/CONTRIBUTING.md
>>  create mode 100644 .github/PULL_REQUEST_TEMPLATE.md
>>
>> diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
>> new file mode 100644
>> index ..4f7b3abca5f4
>> --- /dev/null
>> +++ b/.github/CONTRIBUTING.md
>> @@ -0,0 +1,18 @@
>> +## Contributing to GCC
>> +
>> +Thanks for taking the time to contribute to GCC! Please be advised that if 
>> you are
>> +viewing this on `github.com`, that the mirror there is unofficial and 
>> unmonitored.
>> +The GCC community does not use `github.com` for their contributions. 
>> Instead, we use
>> +a mailing list (`gcc-patches@gcc.gnu.org`) for code submissions, code
>> +reviews, and bug reports.
>> +
>> +Perhaps one day it will be possible to use 
>> [GitGitGadget](https://gitgitgadget.github.io/) to
>> +conveniently send Pull Requests commits to GCC's mailing list, the way that 
>> the Git project currently allows it to be used to send PRs to their mailing 
>> list, but until that day arrives, please send your patches to the mailing 
>> list manually.
>> +
>> +Please read ["Contributing to GCC"](https://gcc.gnu.org/contribute.html) on 
>> the main GCC website
>> +to learn how the GCC project is managed, and how you can work with it.
>> +In addition, we highly recommend you to read [our guidelines for read-write 
>> Git access](https://gcc.gnu.org/gitwrite.html).
>> +
>> +Or, you can follow the ["Contributing to GCC in 10 easy 
>> steps"](https://gcc.gnu.org/wiki/GettingStarted#Basics:_Contributing_to_GCC_in_10_easy_steps)
>>  section of the ["Getting Started" 
>> page](https://gcc.gnu.org/wiki/GettingStarted) on [the 
>> wiki](https://gcc.gnu.org/wiki) for another example of the contribution 
>> process.
>> +
>> +Your friendly GCC community!
>> diff --git a/.github/PULL_REQUEST_TEMPLATE.md 
>> b/.github/PULL_REQUEST_TEMPLATE.md
>> new file mode 100644
>> index ..6417392c8cf3
>> --- /dev/null
>> +++ b/.github/PULL_REQUEST_TEMPLATE.md
>> @@ -0,0 +1,5 @@
>> +Thanks for taking the time to contribute to GCC! Please be advised that if 
>> you are
>> +viewing this on `github.com`, that the mirror there is unofficial and 
>> unmonitored.
>> +The GCC community does not use `github.com` for their contributions. 
>> Instead, we use
>> +a mailing list (`gcc-patches@gcc.gnu.org`) for code submissions, code 
>> reviews, and
>> +bug reports. Please send patches there instead.


[PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-10-16 Thread juzhe.zh...@rivai.ai
Hi, Richard.

>> Does IFN_COND_LEN make conceptual sense on RVV?  If so, would defining
>> it solve some of these problems?
Yes, IFN_COND_LEN make sense to RVV. We have vmerge instruction which depending 
on VL/AVL.

I must say my internal RVV GCC has IFN_LEN_VCOND_MASK which simplify

COND_LEN_ADD (mask, a, 0, b, len, bias) into LEN_VCOND_MASK (mask, a, b, len, 
bias)

I think upstream GCC could consider this approach.

Thanks.


juzhe.zh...@rivai.ai


[PATCH v3] c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]

2023-10-16 Thread Marek Polacek
On Sat, Oct 14, 2023 at 01:13:22AM -0400, Jason Merrill wrote:
> On 10/13/23 14:53, Marek Polacek wrote:
> > On Thu, Oct 12, 2023 at 09:41:43PM -0400, Jason Merrill wrote:
> > > On 10/12/23 17:04, Marek Polacek wrote:
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > -- >8 --
> > > > My recent patch introducing cp_fold_immediate_r caused exponential
> > > > compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
> > > > case recursively walks the arms of a COND_EXPR, but after processing
> > > > both arms it doesn't end the walk; it proceeds to walk the
> > > > sub-expressions of the outermost COND_EXPR, triggering again walking
> > > > the arms of the nested COND_EXPR, and so on.  This patch brings the
> > > > compile time down to about 0m0.033s.
> > > > 
> > > > I've added some debug prints to make sure that the rest of cp_fold_r
> > > > is still performed as before.
> > > > 
> > > >   PR c++/111660
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > >   * cp-gimplify.cc (cp_fold_immediate_r) : 
> > > > Return
> > > >   integer_zero_node instead of break;.
> > > >   (cp_fold_immediate): Return true if cp_fold_immediate_r 
> > > > returned
> > > >   error_mark_node.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > >   * g++.dg/cpp0x/hog1.C: New test.
> > > > ---
> > > >gcc/cp/cp-gimplify.cc |  9 ++--
> > > >gcc/testsuite/g++.dg/cpp0x/hog1.C | 77 
> > > > +++
> > > >2 files changed, 82 insertions(+), 4 deletions(-)
> > > >create mode 100644 gcc/testsuite/g++.dg/cpp0x/hog1.C
> > > > 
> > > > diff --git a/gcc/cp/cp-gimplify.cc b/gcc/cp/cp-gimplify.cc
> > > > index bdf6e5f98ff..ca622ca169a 100644
> > > > --- a/gcc/cp/cp-gimplify.cc
> > > > +++ b/gcc/cp/cp-gimplify.cc
> > > > @@ -1063,16 +1063,16 @@ cp_fold_immediate_r (tree *stmt_p, int 
> > > > *walk_subtrees, void *data_)
> > > > break;
> > > >  if (TREE_OPERAND (stmt, 1)
> > > >   && cp_walk_tree (&TREE_OPERAND (stmt, 1), 
> > > > cp_fold_immediate_r, data,
> > > > -  nullptr))
> > > > +  nullptr) == error_mark_node)
> > > > return error_mark_node;
> > > >  if (TREE_OPERAND (stmt, 2)
> > > >   && cp_walk_tree (&TREE_OPERAND (stmt, 2), 
> > > > cp_fold_immediate_r, data,
> > > > -  nullptr))
> > > > +  nullptr) == error_mark_node)
> > > > return error_mark_node;
> > > >  /* We're done here.  Don't clear *walk_subtrees here though: 
> > > > we're called
> > > >  from cp_fold_r and we must let it recurse on the expression 
> > > > with
> > > >  cp_fold.  */
> > > > -  break;
> > > > +  return integer_zero_node;
> > > 
> > > I'm concerned this will end up missing something like
> > > 
> > > 1 ? 1 : ((1 ? 1 : 1), immediate())
> > > 
> > > as the integer_zero_node from the inner ?: will prevent walk_tree from
> > > looking any farther.
> > 
> > You are right.  The line above works as expected, but
> > 
> >1 ? 1 : ((1 ? 1 : id (42)), id (i));
> > 
> > shows the problem (when the expression isn't used as an initializer).
> > 
> > > Maybe we want to handle COND_EXPR in cp_fold_r instead of here?
> > 
> > I hope this version is better.
> > 
> > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > 
> > -- >8 --
> > My recent patch introducing cp_fold_immediate_r caused exponential
> > compile time with nested COND_EXPRs.  The problem is that the COND_EXPR
> > case recursively walks the arms of a COND_EXPR, but after processing
> > both arms it doesn't end the walk; it proceeds to walk the
> > sub-expressions of the outermost COND_EXPR, triggering again walking
> > the arms of the nested COND_EXPR, and so on.  This patch brings the
> > compile time down to about 0m0.033s.
> 
> Is this number still accurate for this version?

It is.  I ran time(1) a few more times and the results were 0m0.033s - 0m0.035s.
That said, ... 

> This change seems algorithmically better than the current code, but still
> problematic: if we have nested COND_EXPR A/B/C/D/E, it looks like we will
> end up cp_fold_immediate_r walking the arms of E five times, once for each
> COND_EXPR.

...this is accurate.  I should have addressed the redundant folding in v2
even though the compilation is pretty much immediate.
 
> What I was thinking by handling COND_EXPR in cp_fold_r was to cp_fold_r walk
> its subtrees (or cp_fold_immediate_r if it's clear from op0 that the branch
> isn't taken) so we can clear *walk_subtrees and we don't fold_imm walk a
> node more than once.

I agree I should do better here.  How's this, then?  I've added
debug_generic_expr to cp_fold_immediate_r to see if it gets the same
expr multiple times and it doesn't seem to.

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

-- >8 --
My recent patch introducing cp_

Re: [PATCH] Fortran: out of bounds access with nested implied-do IO [PR111837]

2023-10-16 Thread Jerry D

On 10/16/23 12:11 PM, Harald Anlauf wrote:

Dear All,

the attached patch fixes a dependency check in frontend optimzation
for nested implied-do IO.  The problem appeared for >= 3 loops only
as the check considered dependencies to be only of band form instead
of triangular form.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

As this fixes a regression since 8-release, I plan to backport
to all active branches.

Thanks,
Harald



OK for Mainline and backport

Thanks Harald

Jerry


[committed] d: Forbid taking the address of an intrinsic with no implementation

2023-10-16 Thread Iain Buclaw
Hi,

This code fails to link:

import core.math;
real function(real) fn = &sin;

However, when called directly, the D intrinsic `sin()' is expanded by
the front-end into the GCC built-in `__builtin_sin()'.  This has been
fixed to now also expand the function when a reference is taken.

As there are D intrinsics and GCC built-ins that don't have a fallback
implementation, raise an error if taking the address is not possible.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, and
committed to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* d-tree.h (intrinsic_code): Update define for DEF_D_INTRINSIC.
(maybe_reject_intrinsic): New prototype.
* expr.cc (ExprVisitor::visit (SymOffExp *)): Call
maybe_reject_intrinsic.
* intrinsics.cc (intrinsic_decl): Add fallback field.
(intrinsic_decls): Update define for DEF_D_INTRINSIC.
(maybe_reject_intrinsic): New function.
* intrinsics.def (DEF_D_LIB_BUILTIN): Update.
(DEF_CTFE_BUILTIN): Update.
(INTRINSIC_BSF): Declare as library builtin.
(INTRINSIC_BSR): Likewise.
(INTRINSIC_BT): Likewise.
(INTRINSIC_BSF64): Likewise.
(INTRINSIC_BSR64): Likewise.
(INTRINSIC_BT64): Likewise.
(INTRINSIC_POPCNT32): Likewise.
(INTRINSIC_POPCNT64): Likewise.
(INTRINSIC_ROL): Likewise.
(INTRINSIC_ROL_TIARG): Likewise.
(INTRINSIC_ROR): Likewise.
(INTRINSIC_ROR_TIARG): Likewise.
(INTRINSIC_ADDS): Likewise.
(INTRINSIC_ADDSL): Likewise.
(INTRINSIC_ADDU): Likewise.
(INTRINSIC_ADDUL): Likewise.
(INTRINSIC_SUBS): Likewise.
(INTRINSIC_SUBSL): Likewise.
(INTRINSIC_SUBU): Likewise.
(INTRINSIC_SUBUL): Likewise.
(INTRINSIC_MULS): Likewise.
(INTRINSIC_MULSL): Likewise.
(INTRINSIC_MULU): Likewise.
(INTRINSIC_MULUI): Likewise.
(INTRINSIC_MULUL): Likewise.
(INTRINSIC_NEGS): Likewise.
(INTRINSIC_NEGSL): Likewise.
(INTRINSIC_TOPRECF): Likewise.
(INTRINSIC_TOPREC): Likewise.
(INTRINSIC_TOPRECL): Likewise.

gcc/testsuite/ChangeLog:

* gdc.dg/builtins_reject.d: New test.
* gdc.dg/intrinsics_reject.d: New test.
---
 gcc/d/d-tree.h   |   3 +-
 gcc/d/expr.cc|   3 +
 gcc/d/intrinsics.cc  |  47 -
 gcc/d/intrinsics.def | 128 ---
 gcc/testsuite/gdc.dg/builtins_reject.d   |  17 +++
 gcc/testsuite/gdc.dg/intrinsics_reject.d |  87 +++
 6 files changed, 222 insertions(+), 63 deletions(-)
 create mode 100644 gcc/testsuite/gdc.dg/builtins_reject.d
 create mode 100644 gcc/testsuite/gdc.dg/intrinsics_reject.d

diff --git a/gcc/d/d-tree.h b/gcc/d/d-tree.h
index b64a6fb46f9..66c2f2465c8 100644
--- a/gcc/d/d-tree.h
+++ b/gcc/d/d-tree.h
@@ -94,7 +94,7 @@ enum level_kind
 
 enum intrinsic_code
 {
-#define DEF_D_INTRINSIC(CODE, B, N, M, D, C) CODE,
+#define DEF_D_INTRINSIC(CODE, B, N, M, D, C, F) CODE,
 
 #include "intrinsics.def"
 
@@ -668,6 +668,7 @@ extern tree build_import_decl (Dsymbol *);
 /* In intrinsics.cc.  */
 extern void maybe_set_intrinsic (FuncDeclaration *);
 extern tree maybe_expand_intrinsic (tree);
+extern tree maybe_reject_intrinsic (tree);
 
 /* In modules.cc.  */
 extern void build_module_tree (Module *);
diff --git a/gcc/d/expr.cc b/gcc/d/expr.cc
index cc4aa03dfb9..52243e61899 100644
--- a/gcc/d/expr.cc
+++ b/gcc/d/expr.cc
@@ -2050,6 +2050,9 @@ public:
 tree result = get_decl_tree (e->var);
 TREE_USED (result) = 1;
 
+if (e->var->isFuncDeclaration ())
+  result = maybe_reject_intrinsic (result);
+
 if (declaration_reference_p (e->var))
   gcc_assert (POINTER_TYPE_P (TREE_TYPE (result)));
 else
diff --git a/gcc/d/intrinsics.cc b/gcc/d/intrinsics.cc
index 583d5a9dea6..1b03e9edbdb 100644
--- a/gcc/d/intrinsics.cc
+++ b/gcc/d/intrinsics.cc
@@ -60,12 +60,15 @@ struct intrinsic_decl
 
   /* True if the intrinsic is only handled in CTFE.  */
   bool ctfeonly;
+
+  /* True if the intrinsic has a library implementation.  */
+  bool fallback;
 };
 
 static const intrinsic_decl intrinsic_decls[] =
 {
-#define DEF_D_INTRINSIC(CODE, BUILTIN, NAME, MODULE, DECO, CTFE) \
-{ CODE, BUILTIN, NAME, MODULE, DECO, CTFE },
+#define DEF_D_INTRINSIC(CODE, BUILTIN, NAME, MODULE, DECO, CTFE, FALLBACK) \
+{ CODE, BUILTIN, NAME, MODULE, DECO, CTFE, FALLBACK },
 
 #include "intrinsics.def"
 
@@ -1436,3 +1439,43 @@ maybe_expand_intrinsic (tree callexp)
   gcc_unreachable ();
 }
 }
+
+/* If FNDECL is an intrinsic, return the FUNCTION_DECL that has a library
+   fallback implementation of it, otherwise raise an error.  */
+
+tree
+maybe_reject_intrinsic (tree fndecl)
+{
+  gcc_assert (TREE_CODE (fndecl) == FUNCTION_DECL);
+
+  intrinsic_code intrinsic = DECL_INTRINSIC_CODE (fndecl);
+
+  if (intrinsic == INTRINSIC_NONE)
+{

Re: [PATCH] Add files to discourage submissions of PRs to the GitHub mirror.

2023-10-16 Thread Andrew Pinski
On Mon, Oct 16, 2023, 16:39 Eric Gallager  wrote:

> Currently there is an unofficial mirror of GCC on GitHub that people
> sometimes submit pull requests to:
> https://github.com/gcc-mirror/gcc
> However, this is not the proper way to contribute to GCC, so that means
> that someone (usually Jonathan Wakely) has to go through the PRs and
> manually tell people that they're sending their PRs to the wrong place.
> One thing that would help mitigate this problem would be files in a
> special .github directory that GitHub would automatically open when
> contributors attempt to open a PR, that would then tell them the proper
> way to contribute instead. This patch attempts to add two such files.
> They are written in Markdown, which I'm realizing might require some
> special handling in this repository, since the ".md" extension is also
> used for GCC's "Machine Description" files here, but I'm not quite sure
> how to go about handling that. Also note that I adapted these files from
> equivalent files in the git repository for Git itself:
> https://github.com/git/git/blob/master/.github/CONTRIBUTING.md
> https://github.com/git/git/blob/master/.github/PULL_REQUEST_TEMPLATE.md
> What do people think?
>


I think this is a great idea. Is a similar one for opening issues too?

Thanks,
Andrew


ChangeLog:
>
> * .github/CONTRIBUTING.md: New file.
> * .github/PULL_REQUEST_TEMPLATE.md: New file.
> ---
>  .github/CONTRIBUTING.md  | 18 ++
>  .github/PULL_REQUEST_TEMPLATE.md |  5 +
>  2 files changed, 23 insertions(+)
>  create mode 100644 .github/CONTRIBUTING.md
>  create mode 100644 .github/PULL_REQUEST_TEMPLATE.md
>
> diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
> new file mode 100644
> index ..4f7b3abca5f4
> --- /dev/null
> +++ b/.github/CONTRIBUTING.md
> @@ -0,0 +1,18 @@
> +## Contributing to GCC
> +
> +Thanks for taking the time to contribute to GCC! Please be advised that
> if you are
> +viewing this on `github.com`, that the mirror there is unofficial and
> unmonitored.
> +The GCC community does not use `github.com` for their contributions.
> Instead, we use
> +a mailing list (`gcc-patches@gcc.gnu.org`) for code submissions, code
> +reviews, and bug reports.
> +
> +Perhaps one day it will be possible to use [GitGitGadget](
> https://gitgitgadget.github.io/) to
> +conveniently send Pull Requests commits to GCC's mailing list, the way
> that the Git project currently allows it to be used to send PRs to their
> mailing list, but until that day arrives, please send your patches to the
> mailing list manually.
> +
> +Please read ["Contributing to GCC"](https://gcc.gnu.org/contribute.html)
> on the main GCC website
> +to learn how the GCC project is managed, and how you can work with it.
> +In addition, we highly recommend you to read [our guidelines for
> read-write Git access](https://gcc.gnu.org/gitwrite.html).
> +
> +Or, you can follow the ["Contributing to GCC in 10 easy steps"](
> https://gcc.gnu.org/wiki/GettingStarted#Basics:_Contributing_to_GCC_in_10_easy_steps)
> section of the ["Getting Started" page](
> https://gcc.gnu.org/wiki/GettingStarted) on [the wiki](
> https://gcc.gnu.org/wiki) for another example of the contribution process.
> +
> +Your friendly GCC community!
> diff --git a/.github/PULL_REQUEST_TEMPLATE.md
> b/.github/PULL_REQUEST_TEMPLATE.md
> new file mode 100644
> index ..6417392c8cf3
> --- /dev/null
> +++ b/.github/PULL_REQUEST_TEMPLATE.md
> @@ -0,0 +1,5 @@
> +Thanks for taking the time to contribute to GCC! Please be advised that
> if you are
> +viewing this on `github.com`, that the mirror there is unofficial and
> unmonitored.
> +The GCC community does not use `github.com` for their contributions.
> Instead, we use
> +a mailing list (`gcc-patches@gcc.gnu.org`) for code submissions, code
> reviews, and
> +bug reports. Please send patches there instead.
>


[PATCH] Add files to discourage submissions of PRs to the GitHub mirror.

2023-10-16 Thread Eric Gallager
Currently there is an unofficial mirror of GCC on GitHub that people
sometimes submit pull requests to:
https://github.com/gcc-mirror/gcc
However, this is not the proper way to contribute to GCC, so that means
that someone (usually Jonathan Wakely) has to go through the PRs and
manually tell people that they're sending their PRs to the wrong place.
One thing that would help mitigate this problem would be files in a
special .github directory that GitHub would automatically open when
contributors attempt to open a PR, that would then tell them the proper
way to contribute instead. This patch attempts to add two such files.
They are written in Markdown, which I'm realizing might require some
special handling in this repository, since the ".md" extension is also
used for GCC's "Machine Description" files here, but I'm not quite sure
how to go about handling that. Also note that I adapted these files from
equivalent files in the git repository for Git itself:
https://github.com/git/git/blob/master/.github/CONTRIBUTING.md
https://github.com/git/git/blob/master/.github/PULL_REQUEST_TEMPLATE.md
What do people think?

ChangeLog:

* .github/CONTRIBUTING.md: New file.
* .github/PULL_REQUEST_TEMPLATE.md: New file.
---
 .github/CONTRIBUTING.md  | 18 ++
 .github/PULL_REQUEST_TEMPLATE.md |  5 +
 2 files changed, 23 insertions(+)
 create mode 100644 .github/CONTRIBUTING.md
 create mode 100644 .github/PULL_REQUEST_TEMPLATE.md

diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
new file mode 100644
index ..4f7b3abca5f4
--- /dev/null
+++ b/.github/CONTRIBUTING.md
@@ -0,0 +1,18 @@
+## Contributing to GCC
+
+Thanks for taking the time to contribute to GCC! Please be advised that if you 
are
+viewing this on `github.com`, that the mirror there is unofficial and 
unmonitored.
+The GCC community does not use `github.com` for their contributions. Instead, 
we use
+a mailing list (`gcc-patches@gcc.gnu.org`) for code submissions, code
+reviews, and bug reports.
+
+Perhaps one day it will be possible to use 
[GitGitGadget](https://gitgitgadget.github.io/) to
+conveniently send Pull Requests commits to GCC's mailing list, the way that 
the Git project currently allows it to be used to send PRs to their mailing 
list, but until that day arrives, please send your patches to the mailing list 
manually.
+
+Please read ["Contributing to GCC"](https://gcc.gnu.org/contribute.html) on 
the main GCC website
+to learn how the GCC project is managed, and how you can work with it.
+In addition, we highly recommend you to read [our guidelines for read-write 
Git access](https://gcc.gnu.org/gitwrite.html).
+
+Or, you can follow the ["Contributing to GCC in 10 easy 
steps"](https://gcc.gnu.org/wiki/GettingStarted#Basics:_Contributing_to_GCC_in_10_easy_steps)
 section of the ["Getting Started" 
page](https://gcc.gnu.org/wiki/GettingStarted) on [the 
wiki](https://gcc.gnu.org/wiki) for another example of the contribution process.
+
+Your friendly GCC community!
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
new file mode 100644
index ..6417392c8cf3
--- /dev/null
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -0,0 +1,5 @@
+Thanks for taking the time to contribute to GCC! Please be advised that if you 
are
+viewing this on `github.com`, that the mirror there is unofficial and 
unmonitored.
+The GCC community does not use `github.com` for their contributions. Instead, 
we use
+a mailing list (`gcc-patches@gcc.gnu.org`) for code submissions, code reviews, 
and
+bug reports. Please send patches there instead.


Re: Re: [PATCH V3] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread 钟居哲
>> Doesn't this match several cases more than before i.e set the range
>> start to zero fairly often?  I mean if it works fine with me and
>> the code is easier to read.
Yes.

>> Please split off the search for the non-contiguous load/stores into
>> a separate function still.  With that change it's OK from my side.
Address comment and Committed. Thanks Robin.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-17 04:22
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V3] RISC-V: Fix unexpected big LMUL choosing in dynamic 
LMUL model for non-adjacent load/store
> +   if (live_range && flow_bb_inside_loop_p (loop, e->src))
> + {
 
Doesn't this match several cases more than before i.e set the range
start to zero fairly often?  I mean if it works fine with me and
the code is easier to read.
 
Please split off the search for the non-contiguous load/stores into
a separate function still.  With that change it's OK from my side.
 
Regards
Robin
 


[PATCH V4] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread Juzhe-Zhong
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
  int sum1 = 0;
  int sum2 = 0;
  for (int i = 0; i < n; ++i)
{
  sum1 += x[2*i] - a;
  sum1 += x[2*i+1] * b;
  sum2 += x[2*i] - b;
  sum2 += x[2*i+1] * a;
}
  return sum1 + sum2;
}

Before this patch:

bar:
ble a3,zero,.L5
csrrt0,vlenb
csrra6,vlenb
sllit1,t0,3
vsetvli a5,zero,e32,m4,ta,ma
sub sp,sp,t1
vid.v   v20
vmv.v.x v12,a1
vand.vi v4,v20,1
vmv.v.x v16,a2
vmseq.viv4,v4,1
sllit3,a6,2
vsetvli zero,a5,e32,m4,ta,ma
vmv1r.v v0,v4
viota.m v8,v4
add a7,t3,sp
vsetvli a5,zero,e32,m4,ta,mu
vand.vi v28,v20,-2
vadd.vi v4,v28,1
vs4r.v  v20,0(a7)-  spill
vrgather.vv v24,v12,v8
vrgather.vv v20,v16,v8
vrgather.vv v24,v16,v8,v0.t
vrgather.vv v20,v12,v8,v0.t
vs4r.v  v4,0(sp)  - spill
sllia3,a3,1
addit4,a6,-1
neg t1,a6
vmv4r.v v0,v20
vmv.v.i v4,0
j   .L4
.L13:
vsetvli a5,zero,e32,m4,ta,ma
.L4:
mv  a7,a3
mv  a4,a3
bleua3,a6,.L3
csrra4,vlenb
.L3:
vmv.v.x v8,t4
vl4re32.v   v12,0(sp) spill
vand.vv v20,v28,v8
vand.vv v8,v12,v8
vsetvli zero,a4,e32,m4,ta,ma
vle32.v v16,0(a0)
vsetvli a5,zero,e32,m4,ta,ma
add a3,a3,t1
vrgather.vv v12,v16,v20
add a0,a0,t3
vrgather.vv v20,v16,v8
vsub.vv v12,v12,v0
vsetvli zero,a4,e32,m4,tu,ma
vadd.vv v4,v4,v12
vmacc.vvv4,v24,v20
bgtua7,a6,.L13
csrra1,vlenb
sllia1,a1,2
add a1,a1,sp
li  a4,-1
csrrt0,vlenb
vsetvli a5,zero,e32,m4,ta,ma
vl4re32.v   v12,0(a1)    spill
vmv.v.i v8,0
vmul.vx v0,v12,a4
li  a2,0
sllit1,t0,3
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmseq.vvv0,v0,v8
vand.vi v12,v12,1
vmerge.vvm  v16,v8,v4,v0
vmseq.vvv12,v12,v8
vmv.s.x v1,a2
vmv1r.v v0,v12
vredsum.vs  v16,v16,v1
vmerge.vvm  v8,v8,v4,v0
vmv.x.s a0,v16
vredsum.vs  v8,v8,v1
vmv.x.s a5,v8
add sp,sp,t1
addwa0,a0,a5
jr  ra
.L5:
li  a0,0
ret

We can there are multiple horrible register spillings.
The root cause of this issue is for a scalar IR load:

_5 = *_4;

We didn't check whether it is a continguous load/store or gather/scatter 
load/store

Since it will be translate into:

   1. MASK_LEN_GATHER_LOAD (..., perm indice).
   2. Continguous load/store + VEC_PERM (..., perm indice)

It's obvious that no matter which situation, we will end up with consuming one 
vector register group (perm indice)
that we didn't count it before.

So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost 
model.

The key of this patch is:

  if ((type == load_vec_info_type || type == store_vec_info_type)
  && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
{
   ...
}

Add one more register consumption if it is not an adjacent load/store.

After this patch, it pick LMUL = 2 which is optimal:

bar:
ble a3,zero,.L4
csrra6,vlenb
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v6,a2
srlia2,a6,1
vmv.v.x v4,a1
vid.v   v12
sllia3,a3,1
vand.vi v0,v12,1
addit1,a2,-1
vmseq.viv0,v0,1
sllia6,a6,1
vsetvli zero,a5,e32,m2,ta,ma
neg a7,a2
viota.m v2,v0
vsetvli a5,zero,e32,m2,ta,mu
vrgather.vv v16,v4,v2
vrgather.vv v14,v6,v2
vrgather.vv v16,v6,v2,v0.t
vrgather.vv v14,v4,v2,v0.t
vand.vi v18,v12,-2
vmv.v.i v2,0
vadd.vi v20,v18,1
.L3:
minua4,a3,a2
vsetvli zero,a4,e32,m2,ta,ma
vle32.v v8,0(a0)
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v4,t1
vand.vv v10,v18,v4
vrgather.vv v6,v8,v10
vsub.vv v6,v6,v14
vsetvli zero,a4,e32,m2,tu,ma
vadd.vv v2,v2,v6
vsetvli a1,zero,e32,m2,ta,ma
vand.vv v4,v20,v4
vrgather.vv v6,v8,v4
vsetvli zero,a4,e32,m2,tu,ma
mv  a4,a3
add a0,a0,a6
add a3,a3,a7
vmacc.vvv2,v16,v6
bgtua4,a2,.L3
vsetvli a1,zero,e32,m2,ta,ma
vand.vi v0,v12,1
vmv.v.i v4,0
li  a3,

[committed] Fix minor problem in stack probing

2023-10-16 Thread Jeff Law


probe_stack_range has an assert to capture the possibility that that 
expand_binop might not construct its result in the provided target 
(TEST_ADDR in this case).


We triggered that internally a little while ago.  I'm pretty sure it was 
in the testsuite, so no new testcase.  The fix is easy, copy the result 
into the proper target when needed.


Bootstrapped and regression tested on x86.  Pushed to the trunk.


Jeffcommit b626751a4e87b090531c648631df33ac20c4fab8
Author: Jeff Law 
Date:   Mon Oct 16 17:14:38 2023 -0600

Fix minor problem in stack probing

probe_stack_range has an assert to capture the possibility that that
expand_binop might not construct its result in the provided target.

We triggered that internally a little while ago.  I'm pretty sure it was in 
the
testsuite, so no new testcase.  The fix is easy, copy the result into the
proper target when needed.

Bootstrapped and regression tested on x86.

gcc/
* explow.cc (probe_stack_range): Handle case when expand_binop
does not construct its result in the expected location.

diff --git a/gcc/explow.cc b/gcc/explow.cc
index 6424c0802f0..0c03ac350bb 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -1818,7 +1818,10 @@ probe_stack_range (HOST_WIDE_INT first, rtx size)
   gen_int_mode (PROBE_INTERVAL, Pmode), test_addr,
   1, OPTAB_WIDEN);
 
-  gcc_assert (temp == test_addr);
+  /* There is no guarantee that expand_binop constructs its result
+in TEST_ADDR.  So copy into TEST_ADDR if necessary.  */
+  if (temp != test_addr)
+   emit_move_insn (test_addr, temp);
 
   /* Probe at TEST_ADDR.  */
   emit_stack_probe (test_addr);


[pushed] diagnostics: special-case -fdiagnostics-text-art-charset=ascii for LANG=C

2023-10-16 Thread David Malcolm
In the LWN discussion of the "ASCII" art in GCC 14
  https://lwn.net/Articles/946733/#Comments
there was some concern about the use of non-ASCII characters in the
output.

Currently -fdiagnostics-text-art-charset defaults to "emoji".
To better handle older terminals by default, this patch special-cases
LANG=C to use -fdiagnostics-text-art-charset=ascii.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-4669-g04013e4464020b.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_initialize): When LANG=C, update
default for -fdiagnostics-text-art-charset from emoji to ascii.
* doc/invoke.texi (fdiagnostics-text-art-charset): Document the above.
---
 gcc/diagnostic.cc   | 13 +++--
 gcc/doc/invoke.texi |  3 ++-
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 03637459c56..6e46371b3b4 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -226,8 +226,17 @@ diagnostic_initialize (diagnostic_context *context, int 
n_opts)
   context->includes_seen = NULL;
   context->m_client_data_hooks = NULL;
   context->m_diagrams.m_theme = NULL;
-  diagnostics_text_art_charset_init (context,
-DIAGNOSTICS_TEXT_ART_CHARSET_DEFAULT);
+
+  enum diagnostic_text_art_charset text_art_charset
+= DIAGNOSTICS_TEXT_ART_CHARSET_DEFAULT;
+  if (const char *lang = getenv ("LANG"))
+{
+  /* For LANG=C, don't assume the terminal supports anything
+other than ASCII.  */
+  if (!strcmp (lang, "C"))
+   text_art_charset = DIAGNOSTICS_TEXT_ART_CHARSET_ASCII;
+}
+  diagnostics_text_art_charset_init (context, text_art_charset);
 }
 
 /* Maybe initialize the color support. We require clients to do this
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 7c5f81d9783..ef9d1fb8fe6 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -5681,7 +5681,8 @@ value further adds the possibility of emoji in the output 
(such as emitting
 U+26A0 WARNING SIGN followed by U+FE0F VARIATION SELECTOR-16 to select the
 emoji variant of the character).
 
-The default is @samp{emoji}.
+The default is @samp{emoji}, except when the environment variable @env{LANG}
+is set to @samp{C}, in which case the default is @samp{ascii}.
 
 @opindex fdiagnostics-format
 @item -fdiagnostics-format=@var{FORMAT}
-- 
2.26.3



[pushed] diagnostics: fix missing initialization of context->extra_output_kind

2023-10-16 Thread David Malcolm
Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
Pushed to trunk as r14-4668-gf8644b678285cf.

gcc/ChangeLog:
* diagnostic.cc (diagnostic_initialize): Ensure
context->extra_output_kind is initialized.
---
 gcc/diagnostic.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/diagnostic.cc b/gcc/diagnostic.cc
index 9c15f0d8c50..03637459c56 100644
--- a/gcc/diagnostic.cc
+++ b/gcc/diagnostic.cc
@@ -204,7 +204,7 @@ diagnostic_initialize (diagnostic_context *context, int 
n_opts)
   context->m_source_printing.min_margin_width = 0;
   context->m_source_printing.show_ruler_p = false;
   context->report_bug = false;
-
+  context->extra_output_kind = EXTRA_DIAGNOSTIC_OUTPUT_none;
   if (const char *var = getenv ("GCC_EXTRA_DIAGNOSTIC_OUTPUT"))
 {
   if (!strcmp (var, "fixits-v1"))
-- 
2.26.3



Re: [PATCH] gimple-match: Do not try UNCOND optimization with COND_LEN.

2023-10-16 Thread Richard Sandiford
Robin Dapp  writes:
>> Why are the contents of this if statement wrong for COND_LEN?
>> If the "else" value doesn't matter, then the masked form can use
>> the "then" value for all elements.  I would have expected the same
>> thing to be true of COND_LEN.
>
> Right, that one was overly pessimistic.  Removed.
>
>> But isn't the test whether res_op->code itself is an internal_function?
>> In other words, shouldn't it just be:
>> 
>>   if (internal_fn_p (res_op->code)
>>&& internal_fn_len_index (as_internal_fn (res_op->code)) != -1)
>>  return true;
>> 
>> maybe_resimplify_conditional_op should already have converted to an
>> internal function where possible, and if combined_fn (res_op->code)
>> does any extra conversion on the fly, that conversion won't be reflected
>> in res_op.
>
> I went through some of our test cases and believe most of the problems
> are due to situations like the following:
>
> In vect-cond-arith-2.c we have (on riscv)
>   vect_neg_xi_14.4_23 = -vect_xi_13.3_22;
>   vect_res_2.5_24 = .COND_LEN_ADD ({ -1, ... }, vect_res_1.0_17, 
> vect_neg_xi_14.4_23, vect_res_1.0_17, _29, 0);
>
> On aarch64 this is a situation that matches the VEC_COND_EXPR
> simplification that I disabled with this patch.  We valueized
> to _26 = vect_res_1.0_17 - vect_xi_13.3_22 and then create
> vect_res_2.5_24 = VEC_COND_EXPR ;
> This is later re-assembled into a COND_SUB.
>
> As we have two masks or COND_LEN we cannot use a VEC_COND_EXPR to
> achieve the same thing.  Would it be possible to create a COND_OP
> directly instead, though?  I tried the following (not very polished
> obviously):
>
> -  new_op.set_op (VEC_COND_EXPR, res_op->type,
> -res_op->cond.cond, res_op->ops[0],
> -res_op->cond.else_value);
> -  *res_op = new_op;
> -  return gimple_resimplify3 (seq, res_op, valueize);
> +  if (!res_op->cond.len)
> +   {
> + new_op.set_op (VEC_COND_EXPR, res_op->type,
> +res_op->cond.cond, res_op->ops[0],
> +res_op->cond.else_value);
> + *res_op = new_op;
> + return gimple_resimplify3 (seq, res_op, valueize);
> +   }
> +  else if (seq && *seq && is_gimple_assign (*seq))
> +   {
> + new_op.code = gimple_assign_rhs_code (*seq);
> + new_op.type = res_op->type;
> + new_op.num_ops = gimple_num_ops (*seq) - 1;
> + new_op.ops[0] = gimple_assign_rhs1 (*seq);
> + if (new_op.num_ops > 1)
> +   new_op.ops[1] = gimple_assign_rhs2 (*seq);
> + if (new_op.num_ops > 2)
> +   new_op.ops[2] = gimple_assign_rhs2 (*seq);
> +
> + new_op.cond = res_op->cond;
> +
> + gimple_match_op bla2;
> + if (convert_conditional_op (&new_op, &bla2))
> +   {
> + *res_op = bla2;
> + // SEQ should now be dead.
> + return true;
> +   }
> +   }
>
> This would make the other hunk (check whether it was a LEN
> and try to recreate it) redundant I hope.
>
> I don't know enough about valueization, whether it's always
> safe to do that and other implications.  On riscv this seems
> to work, though and the other backends never go through the LEN
> path.  If, however, this is a feasible direction it could also
> be done for the non-LEN targets?

I don't know much about valueisation either :)  But it does feel
like we're working around the lack of a LEN form of COND_EXPR.
In other words, it seems odd that we can do:

  IFN_COND_LEN_ADD (mask, a, 0, b, len, bias)

but we can't do:

  IFN_COND_LEN (mask, a, b, len, bias)

There seems to be no way of applying a length without also finding an
operation to perform.

Does IFN_COND_LEN make conceptual sense on RVV?  If so, would defining
it solve some of these problems?

I suppose in the worst case, IFN_COND_LEN is equivalent to IFN_COND_LEN_IOR
with a zero input (and extended to floats).  So if the target can do
IFN_COND_LEN_IOR, it could implement IFN_COND_LEN using the same instruction.

Thanks,
Richard



Re: PATCH-1v3, expand] Enable vector mode for compare_by_pieces [PR111449]

2023-10-16 Thread Richard Sandiford
Thanks for the update.  The comments below are mostly asking for
cosmetic changes.

HAO CHEN GUI  writes:
> Hi,
>   Vector mode instructions are efficient for compare on some targets.
> This patch enables vector mode for compare_by_pieces. Currently,
> vector mode is enabled for compare, set and clear. Helper function
> "qi_vector_p" decides if vector mode is enabled for certain by pieces
> operation. optabs_checking checks if optabs are available for the
> mode and certain by pieces operations. Both of them are called in
> fixed_size_mode finding functions. A member is added to class
> op_by_pieces_d in order to record the type of by pieces operations.
>
>   The test case is in the second patch which is rs6000 specific.
>
>   Compared to last version, the main change is to create two helper
> functions and call them in mode finding function.
>
>   Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
> regressions.
>
> Thanks
> Gui Haochen
>
> ChangeLog
> Expand: Enable vector mode for pieces compares
>
> Vector mode compare instructions are efficient for equality compare on
> rs6000. This patch refactors the codes of pieces operation to enable
> vector mode for compare.
>
> gcc/
>   PR target/111449
>   * expr.cc (qi_vector_p): New function to indicate if vector mode
>   is enabled for certain by pieces operations.
>   (optabs_checking): New function to check if optabs are available
>   for certain by pieces operations.
>   (widest_fixed_size_mode_for_size): Replace the second argument
>   with the type of by pieces operations.  Call qi_vector_p to check
>   if vector mode is enable.  Call optabs_checking to check if optabs
>   are available for the candidate vector mode.
>   (by_pieces_ninsns): Pass the type of by pieces operation to
>   widest_fixed_size_mode_for_size.
>   (class op_by_pieces_d): Add a protected member m_op to record the
>   type of by pieces operations.  Declare member function
>   fixed_size_mode widest_fixed_size_mode_for_size.
>   (op_by_pieces_d::op_by_pieces_d): Change last argument to the type
>   of by pieces operations, initialize m_op with it.  Call non-member
>   function widest_fixed_size_mode_for_size.
>   (op_by_pieces_d::get_usable_mode): Call member function
>   widest_fixed_size_mode_for_size.
>   (op_by_pieces_d::smallest_fixed_size_mode_for_size): Call
>   qi_vector_p to check if vector mode is enable.  Call
>   optabs_checking to check if optabs are available for the candidate
>   vector mode.
>   (op_by_pieces_d::run): Call member function
>   widest_fixed_size_mode_for_size.
>   (op_by_pieces_d::widest_fixed_size_mode_for_size): Implement.
>   (move_by_pieces_d::move_by_pieces_d): Set m_op to MOVE_BY_PIECES.
>   (store_by_pieces_d::store_by_pieces_d): Set m_op with the op.
>   (can_store_by_pieces): Pass the type of by pieces operations to
>   widest_fixed_size_mode_for_size.
>   (clear_by_pieces): Initialize class store_by_pieces_d with
>   CLEAR_BY_PIECES.
>   (compare_by_pieces_d::compare_by_pieces_d): Set m_op to
>   COMPARE_BY_PIECES.
>
> patch.diff
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index d87346dc07f..8ec3f5465a9 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -988,18 +988,43 @@ alignment_for_piecewise_move (unsigned int max_pieces, 
> unsigned int align)
>return align;
>  }
>
> -/* Return the widest QI vector, if QI_MODE is true, or integer mode
> -   that is narrower than SIZE bytes.  */
> +/* Return true if vector mode is enabled for the op.  */

Maybe:

/* Return true if we know how to implement OP using vectors of bytes.  */

"enabled" would normally imply target support.

> +static bool
> +qi_vector_p (by_pieces_operation op)

And maybe call the function "can_use_qi_vectors"

> +{
> +  return (op == COMPARE_BY_PIECES
> +   || op == SET_BY_PIECES
> +   || op == CLEAR_BY_PIECES);
> +}
> +
> +/* Return true if optabs are available for the mode and by pieces
> +   operations.  */

Maybe:

/* Return true if the target supports operation OP using vector mode MODE.  */

> +static bool
> +optabs_checking (fixed_size_mode mode, by_pieces_operation op)

And maybe call the function "qi_vector_mode_supported_p".

> +{
> +  if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES)
> +  && optab_handler (vec_duplicate_optab, mode) != CODE_FOR_nothing)
> +return true;
> +  else if (op == COMPARE_BY_PIECES
> +&& optab_handler (mov_optab, mode) != CODE_FOR_nothing
> +&& can_compare_p (EQ, mode, ccp_jump))
> +return true;
> +
> +  return false;
> +}
> +
> +/* Return the widest QI vector, if vector mode is enabled for the op,
> +   or integer mode that is narrower than SIZE bytes.  */

Maybe:

/* Return the widest mode that can be used to perform part of
   an operation OP on SIZE bytes.  Try to use QI vector modes
   where possible.  */
>
>  static fixed_size_mode
> -widest_fix

Re: [PATCH v5] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Uros Bizjak
On Mon, Oct 16, 2023 at 9:58 PM Fangrui Song  wrote:
>
> On Mon, Oct 16, 2023 at 12:10 PM Uros Bizjak  wrote:
> >
> > On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song  wrote:
> > >
> > > On 2023-10-16, Uros Bizjak wrote:
> > > >On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song  wrote:
> > > >>
> > > >> When using -mcmodel=medium, large data objects larger than the
> > > >> -mlarge-data-threshold threshold are placed into large data sections
> > > >> (.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 
> > > >> place
> > > >> .l* sections into separate output sections.  If small and medium code
> > > >> model object files are mixed, the .l* sections won't exert relocation
> > > >> overflow pressure on sections in object files built with 
> > > >> -mcmodel=small.
> > > >>
> > > >> However, when using -mcmodel=large, -mlarge-data-threshold doesn't
> > > >> apply.  This means that the .rodata/.data/.bss sections may exert
> > > >> relocation overflow pressure on sections in -mcmodel=small object 
> > > >> files.
> > > >>
> > > >> This patch allows -mcmodel=large to generate .l* sections and drops an
> > > >> unneeded documentation restriction that the value must be the same.
> > > >>
> > > >> Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
> > > >> ("Large data sections for the large code model")
> > > >>
> > > >> Signed-off-by: Fangrui Song 
> > > >>
> > > >> ---
> > > >> Changes from v1 
> > > >> (https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616947.html):
> > > >> * Clarify commit message. Add link to 
> > > >> https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
> > > >>
> > > >> Changes from v2
> > > >> * Drop an uneeded limitation in the documentation.
> > > >>
> > > >> Changes from v3
> > > >> * Change scan-assembler directives to use \. to match literal .
> > > >> ---
> > > >>  gcc/config/i386/i386.cc| 15 +--
> > > >>  gcc/config/i386/i386.opt   |  2 +-
> > > >>  gcc/doc/invoke.texi|  6 +++---
> > > >>  gcc/testsuite/gcc.target/i386/large-data.c | 13 +
> > > >>  4 files changed, 26 insertions(+), 10 deletions(-)
> > > >>  create mode 100644 gcc/testsuite/gcc.target/i386/large-data.c
> > > >>
> > > >> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > > >> index eabc70011ea..37e810cc741 100644
> > > >> --- a/gcc/config/i386/i386.cc
> > > >> +++ b/gcc/config/i386/i386.cc
> > > >> @@ -647,7 +647,8 @@ ix86_can_inline_p (tree caller, tree callee)
> > > >>  static bool
> > > >>  ix86_in_large_data_p (tree exp)
> > > >>  {
> > > >> -  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC)
> > > >> +  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC &&
> > > >> +  ix86_cmodel != CM_LARGE && ix86_cmodel != CM_LARGE_PIC)
> > > >
> > > >Please split multi-line expression before the operator, not after it,
> > > >as instructed in GNU Coding Standards [1] ...
> > > >
> > > >[1] https://www.gnu.org/prep/standards/html_node/Formatting.html
> > > >
> > > >>  return false;
> > > >>
> > > >>if (exp == NULL_TREE)
> > > >> @@ -858,8 +859,9 @@ x86_elf_aligned_decl_common (FILE *file, tree decl,
> > > >> const char *name, unsigned HOST_WIDE_INT size,
> > > >> unsigned align)
> > > >>  {
> > > >> -  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
> > > >> -  && size > (unsigned int)ix86_section_threshold)
> > > >> +  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
> > > >> +  ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
> > > >> + size > (unsigned int)ix86_section_threshold)
> > > >
> > > >... also here ...
> > > >
> > > >>  {
> > > >>switch_to_section (get_named_section (decl, ".lbss", 0));
> > > >>fputs (LARGECOMM_SECTION_ASM_OP, file);
> > > >> @@ -879,9 +881,10 @@ void
> > > >>  x86_output_aligned_bss (FILE *file, tree decl, const char *name,
> > > >> unsigned HOST_WIDE_INT size, unsigned align)
> > > >>  {
> > > >> -  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
> > > >> -  && size > (unsigned int)ix86_section_threshold)
> > > >> -switch_to_section (get_named_section (decl, ".lbss", 0));
> > > >> +  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
> > > >> +   ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
> > > >> +  size > (unsigned int)ix86_section_threshold)
> > > >
> > > >... and here.
> > > >
> > > >OK with these formatting changes.
> > > >
> > > >Thanks,
> > > >Uros.
> > >
> > > Thank you for the review!
> > > Posted PATCH v5 
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633153.html
> > > with the formatting.
> > >
> > > I don't have write access to the gcc repository:)
> >
> > Please provide the ChangeLog entry (see [1,2]) and I'll commit the
> > patch for you.
> >
> > [1] https://gcc.gnu.org/contribute.html
> > [2] https://gcc.gnu.org/c

Re: [PATCH 1/2] Fix ICE due to c_safe_arg_type_equiv_p not checking for error_mark node

2023-10-16 Thread Marek Polacek
On Sat, Oct 14, 2023 at 06:16:47PM -0700, Andrew Pinski wrote:
> This is a simple error recovery issue when c_safe_arg_type_equiv_p
> was added in r8-5312-gc65e18d3331aa999. The issue is that after
> an error, an argument type (of a function type) might turn
> into an error mark node and c_safe_arg_type_equiv_p was not ready
> for that. So this just adds a check for error operand for its
> arguments before getting the main variant.
> 
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Please don't include this line in the commit message.
 
>   PR c/101285
> 
> gcc/c/ChangeLog:
> 
>   * c-typeck.cc (c_safe_arg_type_equiv_p): Return true for error
>   operands early.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/pr101285-1.c: New test.
> ---
>  gcc/c/c-typeck.cc |  3 +++
>  gcc/testsuite/gcc.dg/pr101285-1.c | 10 ++
>  2 files changed, 13 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr101285-1.c
> 
> diff --git a/gcc/c/c-typeck.cc b/gcc/c/c-typeck.cc
> index e55e887da14..6e044b4afbc 100644
> --- a/gcc/c/c-typeck.cc
> +++ b/gcc/c/c-typeck.cc
> @@ -5960,6 +5960,9 @@ handle_warn_cast_qual (location_t loc, tree type, tree 
> otype)
>  static bool
>  c_safe_arg_type_equiv_p (tree t1, tree t2)
>  {
> +  if (error_operand_p (t1) || error_operand_p (t2))
> +return true;

I thought it would be more natural to return false but that would result in:
cast between incompatible function types from 'void (*)(int *)' to 'void 
(*)()'
but we don't want that so pretending the cast is safe is probably better.

>t1 = TYPE_MAIN_VARIANT (t1);
>t2 = TYPE_MAIN_VARIANT (t2);
>  
> diff --git a/gcc/testsuite/gcc.dg/pr101285-1.c 
> b/gcc/testsuite/gcc.dg/pr101285-1.c
> new file mode 100644
> index 000..831e35f7662
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr101285-1.c
> @@ -0,0 +1,10 @@

Let's put

/* PR c/101285 */

here.

> +/* { dg-do compile } */
> +/* { dg-options "-W -Wall" } */
> +const int b;
> +typedef void (*ft1)(int[b++]); /* { dg-error "read-only variable" } */
> +void bar(int * z);
> +void baz()
> +{
> +(ft1) bar; /* { dg-warning "statement with no effect" } */
> +}
> +

Extra newline.

Thanks,
Marek



Re: [PATCH 1/2] Fix ICE due to c_safe_arg_type_equiv_p not checking for error_mark node

2023-10-16 Thread Joseph Myers
On Sat, 14 Oct 2023, Andrew Pinski wrote:

> This is a simple error recovery issue when c_safe_arg_type_equiv_p
> was added in r8-5312-gc65e18d3331aa999. The issue is that after
> an error, an argument type (of a function type) might turn
> into an error mark node and c_safe_arg_type_equiv_p was not ready
> for that. So this just adds a check for error operand for its
> arguments before getting the main variant.
> 
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 2/2] [c] Fix PR 101364: ICE after error due to diagnose_arglist_conflict not checking for error

2023-10-16 Thread Joseph Myers
On Sat, 14 Oct 2023, Andrew Pinski wrote:

> When checking to see if we have a function declaration has a conflict due to
> promotations, there is no test to see if the type was an error mark and then 
> calls
> c_type_promotes_to. c_type_promotes_to is not ready for error_mark and causes 
> an
> ICE.
> 
> This adds a check for error before the call of c_type_promotes_to.
> 
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH V3] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

2023-10-16 Thread Richard Sandiford
Juzhe-Zhong  writes:
> This patch fixes this following FAILs in RISC-V regression:
>
> FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump 
> vect "Loop contains only SLP stmts"
> FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP 
> stmts"
>
> The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.
>
> We have 2 following situations of scalar recognized MASK_LEN_GATHER_LOAD:
>
> 1. conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, zero, 
> condtional mask).
>
>This situation we just need to leverage the current MASK_GATHER_LOAD which 
> can achieve SLP MASK_LEN_GATHER_LOAD.
>
> 2. un-conditional gather load: MASK_LEN_GATHER_LOAD (base, offset, scale, 
> zero, -1)
>
>Current SLP check will failed on dummy mask -1, so we relax the check in 
> tree-vect-slp.cc and allow it to be materialized.
> 
> Consider this following case:
>
> void __attribute__((noipa))
> f (int *restrict y, int *restrict x, int *restrict indices, int n)
> {
>   for (int i = 0; i < n; ++i)
> {
>   y[i * 2] = x[indices[i * 2]] + 1;
>   y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
> }
> }
>
> https://godbolt.org/z/WG3M3n7Mo
>
> GCC unable to SLP using VEC_LOAD_LANES/VEC_STORE_LANES:
>
> f:
> ble a3,zero,.L5
> .L3:
> vsetvli a5,a3,e8,mf4,ta,ma
> vsetvli zero,a5,e32,m1,ta,ma
> vlseg2e32.v v6,(a2)
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2   v2,v6
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v  v1,(a1),v2
> vsetvli a4,zero,e64,m2,ta,ma
> vsext.vf2   v2,v7
> vsetvli zero,zero,e32,m1,ta,ma
> vadd.vi v4,v1,1
> vsetvli zero,zero,e64,m2,ta,ma
> vsll.vi v2,v2,2
> vsetvli zero,a5,e32,m1,ta,ma
> vluxei64.v  v2,(a1),v2
> vsetvli a4,zero,e32,m1,ta,ma
> sllia6,a5,3
> vadd.vi v5,v2,2
> sub a3,a3,a5
> vsetvli zero,a5,e32,m1,ta,ma
> vsseg2e32.v v4,(a0)
> add a2,a2,a6
> add a0,a0,a6
> bne a3,zero,.L3
> .L5:
> ret
>
> After this patch:
>
> f:
>   ble a3,zero,.L5
>   li  a5,1
>   csrrt1,vlenb
>   sllia5,a5,33
>   srlia7,t1,2
>   addia5,a5,1
>   sllia3,a3,1
>   neg t3,a7
>   vsetvli a4,zero,e64,m1,ta,ma
>   vmv.v.x v4,a5
> .L3:
>   minua5,a3,a7
>   vsetvli zero,a5,e32,m1,ta,ma
>   vle32.v v1,0(a2)
>   vsetvli a4,zero,e64,m2,ta,ma
>   vsext.vf2   v2,v1
>   vsll.vi v2,v2,2
>   vsetvli zero,a5,e32,m1,ta,ma
>   vluxei64.v  v2,(a1),v2
>   vsetvli a4,zero,e32,m1,ta,ma
>   mv  a6,a3
>   vadd.vv v2,v2,v4
>   vsetvli zero,a5,e32,m1,ta,ma
>   vse32.v v2,0(a0)
>   add a2,a2,t1
>   add a0,a0,t1
>   add a3,a3,t3
>   bgtua6,a7,.L3
> .L5:
>   ret
>
> Note that I found we are missing conditional mask gather_load SLP test, 
> Append a test for it in this patch.

Yeah, we're missing a target-independent test.  I'm afraid I used
aarch64-specific tests for a lot of this stuff, since (a) I wanted
to check the quality of the asm output and (b) it's very hard to write
gcc.dg/vect tests that don't fail on some target or other.  Thanks for
picking this up.

>
> Tested on RISC-V and Bootstrap && Regression on X86 passed.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
>   * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
>   (vect_get_and_check_slp_defs): Ditto.
>   (vect_build_slp_tree_1): Ditto.
>   (vect_build_slp_tree_2): Ditto.
>   * tree-vect-stmts.cc (vectorizable_load): Ditto.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/vect/vect-gather-6.c: New test.
>
> ---
>  gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++
>  gcc/tree-vect-slp.cc  | 22 ++
>  gcc/tree-vect-stmts.cc| 10 +-
>  3 files changed, 42 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c
>
> diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c 
> b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
> new file mode 100644
> index 000..ff55f321854
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +
> +void
> +f (int *restrict y, int *restrict x, int *restrict indices, int *restrict 
> cond, int n)
> +{
> +  for (int i = 0; i < n; ++i)
> +{
> +  if (cond[i * 2])
> + y[i * 2] = x[indices[i * 2]] + 1;
> +  if (cond[i * 2 + 1])
> + y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
> +}
> +}
> +
> +/* { dg-final { scan-tr

Re: [C PATCH] error for function with external and internal linkage [PR111708]

2023-10-16 Thread Joseph Myers
On Sat, 14 Oct 2023, Martin Uecker wrote:

> + if (!initialized
> + && storage_class != csc_static
> + && storage_class != csc_auto
> + && current_scope != file_scope)

I think it would be better to use TREE_PUBLIC (decl) in place of 
storage_class != csc_static && storage_class != csc_auto.  OK with that 
change.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH v20 01/40] c++: Sort built-in traits alphabetically

2023-10-16 Thread Ken Matsui
On Mon, Oct 16, 2023 at 2:12 PM Patrick Palka  wrote:
>
> On Mon, 16 Oct 2023, Ken Matsui wrote:
>
> > On Mon, Oct 16, 2023 at 8:17 AM Patrick Palka  wrote:
> > >
> > > On Sun, 15 Oct 2023, Ken Matsui wrote:
> > >
> > > > This patch sorts built-in traits alphabetically for better code
> > > > readability.
> > >
> > > Hmm, I'm not sure if we still want/need this change with this current
> > > approach.  IIUC gperf would sort the trait names when generating the
> > > hash table code, and so we wanted a more consistent mapping from the
> > > cp-trait.def file to the generated code.  But with this current
> > > non-gperf approach I'm inclined to leave the existing ordering alone
> > > for sake of simplicity, and I kind of like that in cp-trait.def we
> > > currently group all expression-yielding traits together and all
> > > type-yielding traits together; that seems like a more natural layout
> > > than plain alphabetical sorting.
> > >
> >
> > I see. But this patch is crucial for me to keep all my existing
> > patches almost conflict-free against rebase, including drop, add, and
> > edit like you suggested to split integral-related patches. Without
> > this patch and alphabetical order, I will need to put a new trait in a
> > random place not close to surrounding commits, as Git relates close
> > lines when it finds conflicts. When I merged all my patches into one
> > patch series, I needed to fix conflicts for all my patches almost
> > every time I rebased. Both thinking of the random place and fixing the
> > conflicts of all patches would definitely not be desirable. Would you
> > think we should drop this patch?
>
> Fair enough, I'm all for keeping this patch and alphabetizing then :)
>

Thank you!

> >
> > > >
> > > > gcc/cp/ChangeLog:
> > > >
> > > >   * constraint.cc (diagnose_trait_expr): Sort built-in traits
> > > >   alphabetically.
> > > >   * cp-trait.def: Likewise.
> > > >   * semantics.cc (trait_expr_value): Likewise.
> > > >   (finish_trait_expr): Likewise.
> > > >   (finish_trait_type): Likewise.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * g++.dg/ext/has-builtin-1.C: Sort built-in traits alphabetically.
> > > >
> > > > Signed-off-by: Ken Matsui 
> > > > ---
> > > >  gcc/cp/constraint.cc | 68 -
> > > >  gcc/cp/cp-trait.def  | 10 +--
> > > >  gcc/cp/semantics.cc  | 94 
> > > >  gcc/testsuite/g++.dg/ext/has-builtin-1.C | 70 +-
> > > >  4 files changed, 121 insertions(+), 121 deletions(-)
> > > >
> > > > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > > > index c9e4e7043cd..722fc334e6f 100644
> > > > --- a/gcc/cp/constraint.cc
> > > > +++ b/gcc/cp/constraint.cc
> > > > @@ -3702,18 +3702,36 @@ diagnose_trait_expr (tree expr, tree args)
> > > >  case CPTK_HAS_TRIVIAL_DESTRUCTOR:
> > > >inform (loc, "  %qT is not trivially destructible", t1);
> > > >break;
> > > > +case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
> > > > +  inform (loc, "  %qT does not have unique object 
> > > > representations", t1);
> > > > +  break;
> > > >  case CPTK_HAS_VIRTUAL_DESTRUCTOR:
> > > >inform (loc, "  %qT does not have a virtual destructor", t1);
> > > >break;
> > > >  case CPTK_IS_ABSTRACT:
> > > >inform (loc, "  %qT is not an abstract class", t1);
> > > >break;
> > > > +case CPTK_IS_AGGREGATE:
> > > > +  inform (loc, "  %qT is not an aggregate", t1);
> > > > +  break;
> > > > +case CPTK_IS_ASSIGNABLE:
> > > > +  inform (loc, "  %qT is not assignable from %qT", t1, t2);
> > > > +  break;
> > > >  case CPTK_IS_BASE_OF:
> > > >inform (loc, "  %qT is not a base of %qT", t1, t2);
> > > >break;
> > > >  case CPTK_IS_CLASS:
> > > >inform (loc, "  %qT is not a class", t1);
> > > >break;
> > > > +case CPTK_IS_CONSTRUCTIBLE:
> > > > +  if (!t2)
> > > > +inform (loc, "  %qT is not default constructible", t1);
> > > > +  else
> > > > +inform (loc, "  %qT is not constructible from %qE", t1, t2);
> > > > +  break;
> > > > +case CPTK_IS_CONVERTIBLE:
> > > > +  inform (loc, "  %qT is not convertible from %qE", t2, t1);
> > > > +  break;
> > > >  case CPTK_IS_EMPTY:
> > > >inform (loc, "  %qT is not an empty class", t1);
> > > >break;
> > > > @@ -3729,6 +3747,18 @@ diagnose_trait_expr (tree expr, tree args)
> > > >  case CPTK_IS_LITERAL_TYPE:
> > > >inform (loc, "  %qT is not a literal type", t1);
> > > >break;
> > > > +case CPTK_IS_NOTHROW_ASSIGNABLE:
> > > > +  inform (loc, "  %qT is not nothrow assignable from %qT", t1, t2);
> > > > +  break;
> > > > +case CPTK_IS_NOTHROW_CONSTRUCTIBLE:
> > > > +  if (!t2)
> > > > + inform (loc, "  %qT is not nothrow default constructible", t1);
> > > > +  else
> > > > + inform (loc, "  %qT is

Re: [PATCH v20 02/40] c-family, c++: Look up built-in traits via identifier node

2023-10-16 Thread Ken Matsui
On Mon, Oct 16, 2023 at 2:06 PM Patrick Palka  wrote:
>
> On Mon, 16 Oct 2023, Ken Matsui wrote:
>
> > On Mon, Oct 16, 2023 at 7:55 AM Patrick Palka  wrote:
> > >
> > > On Sun, 15 Oct 2023, Ken Matsui wrote:
> > >
> > > > Since RID_MAX soon reaches 255 and all built-in traits are used 
> > > > approximately
> > > > once in a C++ translation unit, this patch removes all RID values for 
> > > > built-in
> > > > traits and uses the identifier node to look up the specific trait.  
> > > > Rather
> > > > than holding traits as keywords, we set all trait identifiers as 
> > > > cik_trait,
> > > > which is a new cp_identifier_kind.  As cik_reserved_for_udlit was 
> > > > unused and
> > > > cp_identifier_kind is 3 bits, we replaced the unused field with the new
> > > > cik_trait.  Also, the later patch handles a subsequent token to the 
> > > > built-in
> > > > identifier so that we accept the use of non-function-like built-in trait
> > > > identifiers.
> > >
> > > Thanks, this looks great!  Some review comments below.
> > >
> >
> > Thank you so much for your review :)
> >
> > > >
> > > > gcc/c-family/ChangeLog:
> > > >
> > > >   * c-common.cc (c_common_reswords): Remove all mappings of
> > > >   built-in traits.
> > > >   * c-common.h (enum rid): Remove all RID values for built-in 
> > > > traits.
> > > >
> > > > gcc/cp/ChangeLog:
> > > >
> > > >   * cp-objcp-common.cc (names_builtin_p): Remove all RID value
> > > >   cases for built-in traits.  Check for built-in traits via
> > > >   the new cik_trait kind.
> > > >   * cp-tree.h (enum cp_trait_kind): Set its underlying type to
> > > >   addr_space_t.
> > > >   (struct cp_trait): New struct to hold trait information.
> > > >   (cp_traits): New array to hold a mapping to all traits.
> > > >   (cik_reserved_for_udlit): Rename to ...
> > > >   (cik_trait): ... this.
> > > >   (IDENTIFIER_ANY_OP_P): Exclude cik_trait.
> > > >   (IDENTIFIER_TRAIT_P): New macro to detect cik_trait.
> > > >   * lex.cc (init_cp_traits): New function to set cik_trait for all
> > > >   built-in trait identifiers.
> > >
> > > We should mention setting IDENTIFIER_CP_INDEX as well.
> > >
> >
> > Thank you!
> >
> > > >   (cxx_init): Call init_cp_traits function.
> > > >   * parser.cc (cp_traits): Define its values, declared in cp-tree.h.
> > > >   (cp_lexer_lookup_trait): New function to look up a
> > > >   built-in trait by IDENTIFIER_CP_INDEX.
> > > >   (cp_lexer_lookup_trait_expr): Likewise, look up an
> > > >   expression-yielding built-in trait.
> > > >   (cp_lexer_lookup_trait_type): Likewise, look up a type-yielding
> > > >   built-in trait.
> > > >   (cp_keyword_starts_decl_specifier_p): Remove all RID value cases
> > > >   for built-in traits.
> > > >   (cp_lexer_next_token_is_decl_specifier_keyword): Handle
> > > >   type-yielding built-in traits.
> > > >   (cp_parser_primary_expression): Remove all RID value cases for
> > > >   built-in traits.  Handle expression-yielding built-in traits.
> > > >   (cp_parser_trait): Handle cp_trait instead of enum rid.
> > > >   (cp_parser_simple_type_specifier): Remove all RID value cases
> > > >   for built-in traits.  Handle type-yielding built-in traits.
> > > >
> > > > Co-authored-by: Patrick Palka 
> > > > Signed-off-by: Ken Matsui 
> > > > ---
> > > >  gcc/c-family/c-common.cc  |   7 --
> > > >  gcc/c-family/c-common.h   |   5 --
> > > >  gcc/cp/cp-objcp-common.cc |   8 +--
> > > >  gcc/cp/cp-tree.h  |  31 ++---
> > > >  gcc/cp/lex.cc |  21 ++
> > > >  gcc/cp/parser.cc  | 141 --
> > > >  6 files changed, 139 insertions(+), 74 deletions(-)
> > > >
> > > > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> > > > index f044db5b797..21fd333ef57 100644
> > > > --- a/gcc/c-family/c-common.cc
> > > > +++ b/gcc/c-family/c-common.cc
> > > > @@ -508,13 +508,6 @@ const struct c_common_resword c_common_reswords[] =
> > > >{ "wchar_t",   RID_WCHAR,  D_CXXONLY },
> > > >{ "while", RID_WHILE,  0 },
> > > >
> > > > -#define DEFTRAIT(TCC, CODE, NAME, ARITY) \
> > > > -  { NAME,RID_##CODE, D_CXXONLY },
> > > > -#include "cp/cp-trait.def"
> > > > -#undef DEFTRAIT
> > > > -  /* An alias for __is_same.  */
> > > > -  { "__is_same_as",  RID_IS_SAME,D_CXXONLY },
> > > > -
> > > >/* C++ transactional memory.  */
> > > >{ "synchronized",  RID_SYNCHRONIZED, D_CXX_OBJC | D_TRANSMEM },
> > > >{ "atomic_noexcept",   RID_ATOMIC_NOEXCEPT, D_CXXONLY | 
> > > > D_TRANSMEM },
> > > > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> > > > index 1fdba7ef3ea..051a442e0f4 100644
> > > > --- a/gcc/c-family/c-common.h
> > > > +++ b/gcc/c-family/c-common.h
> > > > @@ -168,11 +168,6 @@ enum rid
> > > >RID_BUILTIN_LAUNDER,
> > > >RID_BUILTIN_BIT_CAST,
> > > >
> > > > -#defi

Re: [PATCH v20 01/40] c++: Sort built-in traits alphabetically

2023-10-16 Thread Patrick Palka
On Mon, 16 Oct 2023, Ken Matsui wrote:

> On Mon, Oct 16, 2023 at 8:17 AM Patrick Palka  wrote:
> >
> > On Sun, 15 Oct 2023, Ken Matsui wrote:
> >
> > > This patch sorts built-in traits alphabetically for better code
> > > readability.
> >
> > Hmm, I'm not sure if we still want/need this change with this current
> > approach.  IIUC gperf would sort the trait names when generating the
> > hash table code, and so we wanted a more consistent mapping from the
> > cp-trait.def file to the generated code.  But with this current
> > non-gperf approach I'm inclined to leave the existing ordering alone
> > for sake of simplicity, and I kind of like that in cp-trait.def we
> > currently group all expression-yielding traits together and all
> > type-yielding traits together; that seems like a more natural layout
> > than plain alphabetical sorting.
> >
> 
> I see. But this patch is crucial for me to keep all my existing
> patches almost conflict-free against rebase, including drop, add, and
> edit like you suggested to split integral-related patches. Without
> this patch and alphabetical order, I will need to put a new trait in a
> random place not close to surrounding commits, as Git relates close
> lines when it finds conflicts. When I merged all my patches into one
> patch series, I needed to fix conflicts for all my patches almost
> every time I rebased. Both thinking of the random place and fixing the
> conflicts of all patches would definitely not be desirable. Would you
> think we should drop this patch?

Fair enough, I'm all for keeping this patch and alphabetizing then :)

> 
> > >
> > > gcc/cp/ChangeLog:
> > >
> > >   * constraint.cc (diagnose_trait_expr): Sort built-in traits
> > >   alphabetically.
> > >   * cp-trait.def: Likewise.
> > >   * semantics.cc (trait_expr_value): Likewise.
> > >   (finish_trait_expr): Likewise.
> > >   (finish_trait_type): Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * g++.dg/ext/has-builtin-1.C: Sort built-in traits alphabetically.
> > >
> > > Signed-off-by: Ken Matsui 
> > > ---
> > >  gcc/cp/constraint.cc | 68 -
> > >  gcc/cp/cp-trait.def  | 10 +--
> > >  gcc/cp/semantics.cc  | 94 
> > >  gcc/testsuite/g++.dg/ext/has-builtin-1.C | 70 +-
> > >  4 files changed, 121 insertions(+), 121 deletions(-)
> > >
> > > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > > index c9e4e7043cd..722fc334e6f 100644
> > > --- a/gcc/cp/constraint.cc
> > > +++ b/gcc/cp/constraint.cc
> > > @@ -3702,18 +3702,36 @@ diagnose_trait_expr (tree expr, tree args)
> > >  case CPTK_HAS_TRIVIAL_DESTRUCTOR:
> > >inform (loc, "  %qT is not trivially destructible", t1);
> > >break;
> > > +case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
> > > +  inform (loc, "  %qT does not have unique object representations", 
> > > t1);
> > > +  break;
> > >  case CPTK_HAS_VIRTUAL_DESTRUCTOR:
> > >inform (loc, "  %qT does not have a virtual destructor", t1);
> > >break;
> > >  case CPTK_IS_ABSTRACT:
> > >inform (loc, "  %qT is not an abstract class", t1);
> > >break;
> > > +case CPTK_IS_AGGREGATE:
> > > +  inform (loc, "  %qT is not an aggregate", t1);
> > > +  break;
> > > +case CPTK_IS_ASSIGNABLE:
> > > +  inform (loc, "  %qT is not assignable from %qT", t1, t2);
> > > +  break;
> > >  case CPTK_IS_BASE_OF:
> > >inform (loc, "  %qT is not a base of %qT", t1, t2);
> > >break;
> > >  case CPTK_IS_CLASS:
> > >inform (loc, "  %qT is not a class", t1);
> > >break;
> > > +case CPTK_IS_CONSTRUCTIBLE:
> > > +  if (!t2)
> > > +inform (loc, "  %qT is not default constructible", t1);
> > > +  else
> > > +inform (loc, "  %qT is not constructible from %qE", t1, t2);
> > > +  break;
> > > +case CPTK_IS_CONVERTIBLE:
> > > +  inform (loc, "  %qT is not convertible from %qE", t2, t1);
> > > +  break;
> > >  case CPTK_IS_EMPTY:
> > >inform (loc, "  %qT is not an empty class", t1);
> > >break;
> > > @@ -3729,6 +3747,18 @@ diagnose_trait_expr (tree expr, tree args)
> > >  case CPTK_IS_LITERAL_TYPE:
> > >inform (loc, "  %qT is not a literal type", t1);
> > >break;
> > > +case CPTK_IS_NOTHROW_ASSIGNABLE:
> > > +  inform (loc, "  %qT is not nothrow assignable from %qT", t1, t2);
> > > +  break;
> > > +case CPTK_IS_NOTHROW_CONSTRUCTIBLE:
> > > +  if (!t2)
> > > + inform (loc, "  %qT is not nothrow default constructible", t1);
> > > +  else
> > > + inform (loc, "  %qT is not nothrow constructible from %qE", t1, t2);
> > > +  break;
> > > +case CPTK_IS_NOTHROW_CONVERTIBLE:
> > > +   inform (loc, "  %qT is not nothrow convertible from %qE", t2, t1);
> > > +  break;
> > >  case CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF:
> > >

Re: PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding

2023-10-16 Thread Richard Sandiford
Prathamesh Kulkarni  writes:
> On Wed, 11 Oct 2023 at 16:57, Prathamesh Kulkarni
>  wrote:
>>
>> On Wed, 11 Oct 2023 at 16:42, Prathamesh Kulkarni
>>  wrote:
>> >
>> > On Mon, 9 Oct 2023 at 17:05, Richard Sandiford
>> >  wrote:
>> > >
>> > > Prathamesh Kulkarni  writes:
>> > > > Hi,
>> > > > The attached patch attempts to fix PR111648.
>> > > > As mentioned in PR, the issue is when a1 is a multiple of vector
>> > > > length, we end up creating following encoding in result: { base_elem,
>> > > > arg[0], arg[1], ... } (assuming S = 1),
>> > > > where arg is chosen input vector, which is incorrect, since the
>> > > > encoding originally in arg would be: { arg[0], arg[1], arg[2], ... }
>> > > >
>> > > > For the test-case mentioned in PR, vectorizer pass creates
>> > > > VEC_PERM_EXPR where:
>> > > > arg0: { -16, -9, -10, -11 }
>> > > > arg1: { -12, -5, -6, -7 }
>> > > > sel = { 3, 4, 5, 6 }
>> > > >
>> > > > arg0, arg1 and sel are encoded with npatterns = 1 and 
>> > > > nelts_per_pattern = 3.
>> > > > Since a1 = 4 and arg_len = 4, it ended up creating the result with
>> > > > following encoding:
>> > > > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, 
>> > > > nelts_per_pattern = 3
>> > > >   = { -11, -12, -5 }
>> > > >
>> > > > So for res[3], it used S = (-5) - (-12) = 7
>> > > > And hence computed it as -5 + 7 = 2.
>> > > > instead of selecting arg1[2], ie, -6.
>> > > >
>> > > > The patch tweaks valid_mask_for_fold_vec_perm_cst_p to punt if a1 is a 
>> > > > multiple
>> > > > of vector length, so a1 ... ae select elements only from stepped part
>> > > > of the pattern
>> > > > from input vector and return false for this case.
>> > > >
>> > > > Since the vectors are VLS, fold_vec_perm_cst then sets:
>> > > > res_npatterns = res_nelts
>> > > > res_nelts_per_pattern  = 1
>> > > > which seems to fix the issue by encoding all the elements.
>> > > >
>> > > > The patch resulted in Case 4 and Case 5 failing from test_nunits_min_2 
>> > > > because
>> > > > they used sel = { 0, 0, 1, ... } and {len, 0, 1, ... } respectively,
>> > > > which used a1 = 0, and thus selected arg1[0].
>> > > >
>> > > > I removed Case 4 because it was already covered in test_nunits_min_4,
>> > > > and moved Case 5 to test_nunits_min_4, with sel = { len, 1, 2, ... }
>> > > > and added a new Case 9 to test for this issue.
>> > > >
>> > > > Passes bootstrap+test on aarch64-linux-gnu with and without SVE,
>> > > > and on x86_64-linux-gnu.
>> > > > Does the patch look OK ?
>> > > >
>> > > > Thanks,
>> > > > Prathamesh
>> > > >
>> > > > [PR111648] Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.
>> > > >
>> > > > gcc/ChangeLog:
>> > > >   PR tree-optimization/111648
>> > > >   * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): Punt if a1
>> > > >   is a multiple of vector length.
>> > > >   (test_nunits_min_2): Remove Case 4 and move Case 5 to ...
>> > > >   (test_nunits_min_4): ... here and rename case numbers. Also add
>> > > >   Case 9.
>> > > >
>> > > > gcc/testsuite/ChangeLog:
>> > > >   PR tree-optimization/111648
>> > > >   * gcc.dg/vect/pr111648.c: New test.
>> > > >
>> > > >
>> > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
>> > > > index 4f8561509ff..c5f421d6b76 100644
>> > > > --- a/gcc/fold-const.cc
>> > > > +++ b/gcc/fold-const.cc
>> > > > @@ -10682,8 +10682,8 @@ valid_mask_for_fold_vec_perm_cst_p (tree arg0, 
>> > > > tree arg1,
>> > > > return false;
>> > > >   }
>> > > >
>> > > > -  /* Ensure that the stepped sequence always selects from the same
>> > > > -  input pattern.  */
>> > > > +  /* Ensure that the stepped sequence always selects from the 
>> > > > stepped
>> > > > +  part of same input pattern.  */
>> > > >unsigned arg_npatterns
>> > > >   = ((q1 & 1) == 0) ? VECTOR_CST_NPATTERNS (arg0)
>> > > > : VECTOR_CST_NPATTERNS (arg1);
>> > > > @@ -10694,6 +10694,20 @@ valid_mask_for_fold_vec_perm_cst_p (tree 
>> > > > arg0, tree arg1,
>> > > >   *reason = "step is not multiple of npatterns";
>> > > > return false;
>> > > >   }
>> > > > +
>> > > > +  /* If a1 is a multiple of len, it will select base element of 
>> > > > input
>> > > > +  vector resulting in following encoding:
>> > > > +  { base_elem, arg[0], arg[1], ... } where arg is the chosen input
>> > > > +  vector. This encoding is not originally present in arg, since 
>> > > > it's
>> > > > +  defined as:
>> > > > +  { arg[0], arg[1], arg[2], ... }.  */
>> > > > +
>> > > > +  if (multiple_p (a1, arg_len))
>> > > > + {
>> > > > +   if (reason)
>> > > > + *reason = "selecting base element of input vector";
>> > > > +   return false;
>> > > > + }
>> > >
>> > > That wouldn't catch (for example) cases where a1 == arg_len + 1 and the
>> > > second argument has 2 stepped patterns.
>> > Ah right, thanks for pointing out. In the attached patch I extended the 
>> > check
>> > so tha

Re: [PATCH v20 02/40] c-family, c++: Look up built-in traits via identifier node

2023-10-16 Thread Patrick Palka
On Mon, 16 Oct 2023, Ken Matsui wrote:

> On Mon, Oct 16, 2023 at 7:55 AM Patrick Palka  wrote:
> >
> > On Sun, 15 Oct 2023, Ken Matsui wrote:
> >
> > > Since RID_MAX soon reaches 255 and all built-in traits are used 
> > > approximately
> > > once in a C++ translation unit, this patch removes all RID values for 
> > > built-in
> > > traits and uses the identifier node to look up the specific trait.  Rather
> > > than holding traits as keywords, we set all trait identifiers as 
> > > cik_trait,
> > > which is a new cp_identifier_kind.  As cik_reserved_for_udlit was unused 
> > > and
> > > cp_identifier_kind is 3 bits, we replaced the unused field with the new
> > > cik_trait.  Also, the later patch handles a subsequent token to the 
> > > built-in
> > > identifier so that we accept the use of non-function-like built-in trait
> > > identifiers.
> >
> > Thanks, this looks great!  Some review comments below.
> >
> 
> Thank you so much for your review :)
> 
> > >
> > > gcc/c-family/ChangeLog:
> > >
> > >   * c-common.cc (c_common_reswords): Remove all mappings of
> > >   built-in traits.
> > >   * c-common.h (enum rid): Remove all RID values for built-in traits.
> > >
> > > gcc/cp/ChangeLog:
> > >
> > >   * cp-objcp-common.cc (names_builtin_p): Remove all RID value
> > >   cases for built-in traits.  Check for built-in traits via
> > >   the new cik_trait kind.
> > >   * cp-tree.h (enum cp_trait_kind): Set its underlying type to
> > >   addr_space_t.
> > >   (struct cp_trait): New struct to hold trait information.
> > >   (cp_traits): New array to hold a mapping to all traits.
> > >   (cik_reserved_for_udlit): Rename to ...
> > >   (cik_trait): ... this.
> > >   (IDENTIFIER_ANY_OP_P): Exclude cik_trait.
> > >   (IDENTIFIER_TRAIT_P): New macro to detect cik_trait.
> > >   * lex.cc (init_cp_traits): New function to set cik_trait for all
> > >   built-in trait identifiers.
> >
> > We should mention setting IDENTIFIER_CP_INDEX as well.
> >
> 
> Thank you!
> 
> > >   (cxx_init): Call init_cp_traits function.
> > >   * parser.cc (cp_traits): Define its values, declared in cp-tree.h.
> > >   (cp_lexer_lookup_trait): New function to look up a
> > >   built-in trait by IDENTIFIER_CP_INDEX.
> > >   (cp_lexer_lookup_trait_expr): Likewise, look up an
> > >   expression-yielding built-in trait.
> > >   (cp_lexer_lookup_trait_type): Likewise, look up a type-yielding
> > >   built-in trait.
> > >   (cp_keyword_starts_decl_specifier_p): Remove all RID value cases
> > >   for built-in traits.
> > >   (cp_lexer_next_token_is_decl_specifier_keyword): Handle
> > >   type-yielding built-in traits.
> > >   (cp_parser_primary_expression): Remove all RID value cases for
> > >   built-in traits.  Handle expression-yielding built-in traits.
> > >   (cp_parser_trait): Handle cp_trait instead of enum rid.
> > >   (cp_parser_simple_type_specifier): Remove all RID value cases
> > >   for built-in traits.  Handle type-yielding built-in traits.
> > >
> > > Co-authored-by: Patrick Palka 
> > > Signed-off-by: Ken Matsui 
> > > ---
> > >  gcc/c-family/c-common.cc  |   7 --
> > >  gcc/c-family/c-common.h   |   5 --
> > >  gcc/cp/cp-objcp-common.cc |   8 +--
> > >  gcc/cp/cp-tree.h  |  31 ++---
> > >  gcc/cp/lex.cc |  21 ++
> > >  gcc/cp/parser.cc  | 141 --
> > >  6 files changed, 139 insertions(+), 74 deletions(-)
> > >
> > > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> > > index f044db5b797..21fd333ef57 100644
> > > --- a/gcc/c-family/c-common.cc
> > > +++ b/gcc/c-family/c-common.cc
> > > @@ -508,13 +508,6 @@ const struct c_common_resword c_common_reswords[] =
> > >{ "wchar_t",   RID_WCHAR,  D_CXXONLY },
> > >{ "while", RID_WHILE,  0 },
> > >
> > > -#define DEFTRAIT(TCC, CODE, NAME, ARITY) \
> > > -  { NAME,RID_##CODE, D_CXXONLY },
> > > -#include "cp/cp-trait.def"
> > > -#undef DEFTRAIT
> > > -  /* An alias for __is_same.  */
> > > -  { "__is_same_as",  RID_IS_SAME,D_CXXONLY },
> > > -
> > >/* C++ transactional memory.  */
> > >{ "synchronized",  RID_SYNCHRONIZED, D_CXX_OBJC | D_TRANSMEM },
> > >{ "atomic_noexcept",   RID_ATOMIC_NOEXCEPT, D_CXXONLY | D_TRANSMEM 
> > > },
> > > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> > > index 1fdba7ef3ea..051a442e0f4 100644
> > > --- a/gcc/c-family/c-common.h
> > > +++ b/gcc/c-family/c-common.h
> > > @@ -168,11 +168,6 @@ enum rid
> > >RID_BUILTIN_LAUNDER,
> > >RID_BUILTIN_BIT_CAST,
> > >
> > > -#define DEFTRAIT(TCC, CODE, NAME, ARITY) \
> > > -  RID_##CODE,
> > > -#include "cp/cp-trait.def"
> > > -#undef DEFTRAIT
> > > -
> > >/* C++11 */
> > >RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, 
> > > RID_STATIC_ASSERT,
> > >
> > > diff --git a/gcc/cp/cp-objcp-common.cc b/gc

Re: [PATCH v20 26/40] libstdc++: Optimize is_object trait performance

2023-10-16 Thread Ken Matsui
On Mon, Oct 16, 2023 at 11:04 AM Patrick Palka  wrote:
>
> On Sun, 15 Oct 2023, Ken Matsui wrote:
>
> > This patch optimizes the performance of the is_object trait by dispatching 
> > to
> > the new __is_function and __is_reference built-in traits.
> >
> > libstdc++-v3/ChangeLog:
> >   * include/std/type_traits (is_object): Use __is_function and
> >   __is_reference built-in traits.
> >   (is_object_v): Likewise.
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >  libstdc++-v3/include/std/type_traits | 18 ++
> >  1 file changed, 18 insertions(+)
> >
> > diff --git a/libstdc++-v3/include/std/type_traits 
> > b/libstdc++-v3/include/std/type_traits
> > index bd57488824b..674d398c075 100644
> > --- a/libstdc++-v3/include/std/type_traits
> > +++ b/libstdc++-v3/include/std/type_traits
> > @@ -725,11 +725,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  { };
> >
> >/// is_object
> > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_function) \
> > + && _GLIBCXX_USE_BUILTIN_TRAIT(__is_reference)
> > +  template
> > +struct is_object
> > +: public __bool_constant > + || is_void<_Tp>::value)>
> > +{ };
>
> Since is_object is one of the more commonly used traits, we should
> probably just define a built-in for it.  (Either way we'd have to
> repeat the logic twice, either once in the frontend and once in
> the library, or twice in the library (is_object and is_object_v),
> so might as well do the more efficient approach).
>

Sure, I'll implement it :) Thank you for your review!

> > +#else
> >template
> >  struct is_object
> >  : public __not_<__or_, is_reference<_Tp>,
> >is_void<_Tp>>>::type
> >  { };
> > +#endif
> >
> >template
> >  struct is_member_pointer;
> > @@ -3305,8 +3314,17 @@ template 
> >inline constexpr bool is_arithmetic_v = is_arithmetic<_Tp>::value;
> >  template 
> >inline constexpr bool is_fundamental_v = is_fundamental<_Tp>::value;
> > +
> > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_function) \
> > + && _GLIBCXX_USE_BUILTIN_TRAIT(__is_reference)
> > +template 
> > +  inline constexpr bool is_object_v
> > += !(__is_function(_Tp) || __is_reference(_Tp) || is_void<_Tp>::value);
> > +#else
> >  template 
> >inline constexpr bool is_object_v = is_object<_Tp>::value;
> > +#endif
> > +
> >  template 
> >inline constexpr bool is_scalar_v = is_scalar<_Tp>::value;
> >  template 
> > --
> > 2.42.0
> >
> >
>


Re: [PATCH v20 31/40] c++: Implement __is_arithmetic built-in trait

2023-10-16 Thread Ken Matsui
On Mon, Oct 16, 2023 at 10:16 AM Patrick Palka  wrote:
>
> On Sun, 15 Oct 2023, Ken Matsui wrote:
>
> > This patch implements built-in trait for std::is_arithmetic.
> >
> > gcc/cp/ChangeLog:
> >
> >   * cp-trait.def: Define __is_arithmetic.
> >   * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_ARITHMETIC.
> >   * semantics.cc (trait_expr_value): Likewise.
> >   (finish_trait_expr): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/ext/has-builtin-1.C: Test existence of __is_arithmetic.
> >   * g++.dg/ext/is_arithmetic.C: New test.
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >  gcc/cp/constraint.cc |  3 +++
> >  gcc/cp/cp-trait.def  |  1 +
> >  gcc/cp/semantics.cc  |  4 +++
> >  gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
> >  gcc/testsuite/g++.dg/ext/is_arithmetic.C | 33 
> >  5 files changed, 44 insertions(+)
> >  create mode 100644 gcc/testsuite/g++.dg/ext/is_arithmetic.C
> >
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index c9d627fa782..3a7f968eae8 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -3714,6 +3714,9 @@ diagnose_trait_expr (tree expr, tree args)
> >  case CPTK_IS_AGGREGATE:
> >inform (loc, "  %qT is not an aggregate", t1);
> >break;
> > +case CPTK_IS_ARITHMETIC:
> > +  inform (loc, "  %qT is not an arithmetic type", t1);
> > +  break;
> >  case CPTK_IS_ARRAY:
> >inform (loc, "  %qT is not an array", t1);
> >break;
> > diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
> > index c60724e869e..b2be7b7bbd7 100644
> > --- a/gcc/cp/cp-trait.def
> > +++ b/gcc/cp/cp-trait.def
> > @@ -59,6 +59,7 @@ DEFTRAIT_EXPR (HAS_UNIQUE_OBJ_REPRESENTATIONS, 
> > "__has_unique_object_representati
> >  DEFTRAIT_EXPR (HAS_VIRTUAL_DESTRUCTOR, "__has_virtual_destructor", 1)
> >  DEFTRAIT_EXPR (IS_ABSTRACT, "__is_abstract", 1)
> >  DEFTRAIT_EXPR (IS_AGGREGATE, "__is_aggregate", 1)
> > +DEFTRAIT_EXPR (IS_ARITHMETIC, "__is_arithmetic", 1)
> >  DEFTRAIT_EXPR (IS_ARRAY, "__is_array", 1)
> >  DEFTRAIT_EXPR (IS_ASSIGNABLE, "__is_assignable", 2)
> >  DEFTRAIT_EXPR (IS_BASE_OF, "__is_base_of", 2)
> > diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> > index 83ed674b9d4..deab0134509 100644
> > --- a/gcc/cp/semantics.cc
> > +++ b/gcc/cp/semantics.cc
> > @@ -12143,6 +12143,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, 
> > tree type2)
> >  case CPTK_IS_AGGREGATE:
> >return CP_AGGREGATE_TYPE_P (type1);
> >
> > +case CPTK_IS_ARITHMETIC:
> > +  return ARITHMETIC_TYPE_P (type1);
> > +
>
> For built-ins corresponding to is_arithmetic and other standard traits
> defined in terms of it (e.g.  is_scalar, is_unsigned, is_signed,
> is_fundamental) we need to make sure we preserve their behavior for
> __int128, which IIUC is currently recognized as an integral type
> (according to std::is_integral) only in GNU mode.
>
> This'll probably be subtle to get right, so if you don't mind let's
> split out the work for those built-in traits into a separate patch
> series in order to ease review of the main patch series.
>

I agree. I am postponing work on integral-related traits, so isolating
non-ready-for-review patches makes a lot of sense.

> >  case CPTK_IS_ARRAY:
> >return type_code1 == ARRAY_TYPE;
> >
> > @@ -12406,6 +12409,7 @@ finish_trait_expr (location_t loc, cp_trait_kind 
> > kind, tree type1, tree type2)
> >   return error_mark_node;
> >break;
> >
> > +case CPTK_IS_ARITHMETIC:
> >  case CPTK_IS_ARRAY:
> >  case CPTK_IS_BOUNDED_ARRAY:
> >  case CPTK_IS_CLASS:
> > diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
> > b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> > index efce04fd09d..4bc85f4babb 100644
> > --- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> > +++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> > @@ -56,6 +56,9 @@
> >  #if !__has_builtin (__is_aggregate)
> >  # error "__has_builtin (__is_aggregate) failed"
> >  #endif
> > +#if !__has_builtin (__is_arithmetic)
> > +# error "__has_builtin (__is_arithmetic) failed"
> > +#endif
> >  #if !__has_builtin (__is_array)
> >  # error "__has_builtin (__is_array) failed"
> >  #endif
> > diff --git a/gcc/testsuite/g++.dg/ext/is_arithmetic.C 
> > b/gcc/testsuite/g++.dg/ext/is_arithmetic.C
> > new file mode 100644
> > index 000..fd35831f646
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/ext/is_arithmetic.C
> > @@ -0,0 +1,33 @@
> > +// { dg-do compile { target c++11 } }
> > +
> > +#include 
> > +
> > +using namespace __gnu_test;
> > +
> > +#define SA(X) static_assert((X),#X)
> > +#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)\
> > +  SA(TRAIT(TYPE) == EXPECT); \
> > +  SA(TRAIT(const TYPE) == EXPECT);   \
> > +  SA(TRAIT(volatile TYPE) == EXPECT);\
> > +  SA(TRAIT(const volatile TYPE) == E

Re: [PATCH V3] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread Robin Dapp
> +   if (live_range && flow_bb_inside_loop_p (loop, e->src))
> + {

Doesn't this match several cases more than before i.e set the range
start to zero fairly often?  I mean if it works fine with me and
the code is easier to read.

Please split off the search for the non-contiguous load/stores into
a separate function still.  With that change it's OK from my side.

Regards
 Robin


Re: [PATCH v20 30/40] libstdc++: Optimize is_pointer trait performance

2023-10-16 Thread Ken Matsui
On Mon, Oct 16, 2023 at 9:36 AM Patrick Palka  wrote:
>
> On Sun, 15 Oct 2023, Ken Matsui wrote:
>
> > This patch optimizes the performance of the is_pointer trait by dispatching 
> > to
> > the new __is_pointer built-in trait.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/bits/cpp_type_traits.h (__is_pointer): Use __is_pointer
> >   built-in trait.
> >   * include/std/type_traits (is_pointer): Likewise. Optimize its
> >   implementation.
> >   (is_pointer_v): Likewise.
> >
> > Co-authored-by: Jonathan Wakely 
> > Signed-off-by: Ken Matsui 
> > ---
> >  libstdc++-v3/include/bits/cpp_type_traits.h |  8 
> >  libstdc++-v3/include/std/type_traits| 44 +
> >  2 files changed, 44 insertions(+), 8 deletions(-)
> >
> > diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
> > b/libstdc++-v3/include/bits/cpp_type_traits.h
> > index 4312f32a4e0..cd5ce45951f 100644
> > --- a/libstdc++-v3/include/bits/cpp_type_traits.h
> > +++ b/libstdc++-v3/include/bits/cpp_type_traits.h
> > @@ -363,6 +363,13 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
> >//
> >// Pointer types
> >//
> > +#if __has_builtin(__is_pointer)
>
> Why not _GLIBCXX_USE_BUILTIN_TRAIT?  LGTM besides this.
>

Oops, thank you for pointing this out :)

> > +  template
> > +struct __is_pointer : __truth_type<__is_pointer(_Tp)>
> > +{
> > +  enum { __value = __is_pointer(_Tp) };
> > +};
>
> Nice :D
>

Yeees! But I thought this might be very confusing for someone who does
not know the new built-in behavior.

> > +#else
> >template
> >  struct __is_pointer
> >  {
> > @@ -376,6 +383,7 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
> >enum { __value = 1 };
> >typedef __true_type __type;
> >  };
> > +#endif
> >
> >//
> >// An arithmetic type is an integer type or a floating point type
> > diff --git a/libstdc++-v3/include/std/type_traits 
> > b/libstdc++-v3/include/std/type_traits
> > index 9c56d15c0b7..3acd843f2f2 100644
> > --- a/libstdc++-v3/include/std/type_traits
> > +++ b/libstdc++-v3/include/std/type_traits
> > @@ -542,19 +542,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
> >  : public true_type { };
> >  #endif
> >
> > -  template
> > -struct __is_pointer_helper
> > +  /// is_pointer
> > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
> > +  template
> > +struct is_pointer
> > +: public __bool_constant<__is_pointer(_Tp)>
> > +{ };
> > +#else
> > +  template
> > +struct is_pointer
> >  : public false_type { };
> >
> >template
> > -struct __is_pointer_helper<_Tp*>
> > +struct is_pointer<_Tp*>
> >  : public true_type { };
> >
> > -  /// is_pointer
> >template
> > -struct is_pointer
> > -: public __is_pointer_helper<__remove_cv_t<_Tp>>::type
> > -{ };
> > +struct is_pointer<_Tp* const>
> > +: public true_type { };
> > +
> > +  template
> > +struct is_pointer<_Tp* volatile>
> > +: public true_type { };
> > +
> > +  template
> > +struct is_pointer<_Tp* const volatile>
> > +: public true_type { };
> > +#endif
> >
> >/// is_lvalue_reference
> >template
> > @@ -3254,8 +3268,22 @@ template 
> >inline constexpr bool is_array_v<_Tp[_Num]> = true;
> >  #endif
> >
> > +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
> > +template 
> > +  inline constexpr bool is_pointer_v = __is_pointer(_Tp);
> > +#else
> >  template 
> > -  inline constexpr bool is_pointer_v = is_pointer<_Tp>::value;
> > +  inline constexpr bool is_pointer_v = false;
> > +template 
> > +  inline constexpr bool is_pointer_v<_Tp*> = true;
> > +template 
> > +  inline constexpr bool is_pointer_v<_Tp* const> = true;
> > +template 
> > +  inline constexpr bool is_pointer_v<_Tp* volatile> = true;
> > +template 
> > +  inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
> > +#endif
> > +
> >  template 
> >inline constexpr bool is_lvalue_reference_v = false;
> >  template 
> > --
> > 2.42.0
> >
> >
>


Re: [PATCH v20 01/40] c++: Sort built-in traits alphabetically

2023-10-16 Thread Ken Matsui
On Mon, Oct 16, 2023 at 8:17 AM Patrick Palka  wrote:
>
> On Sun, 15 Oct 2023, Ken Matsui wrote:
>
> > This patch sorts built-in traits alphabetically for better code
> > readability.
>
> Hmm, I'm not sure if we still want/need this change with this current
> approach.  IIUC gperf would sort the trait names when generating the
> hash table code, and so we wanted a more consistent mapping from the
> cp-trait.def file to the generated code.  But with this current
> non-gperf approach I'm inclined to leave the existing ordering alone
> for sake of simplicity, and I kind of like that in cp-trait.def we
> currently group all expression-yielding traits together and all
> type-yielding traits together; that seems like a more natural layout
> than plain alphabetical sorting.
>

I see. But this patch is crucial for me to keep all my existing
patches almost conflict-free against rebase, including drop, add, and
edit like you suggested to split integral-related patches. Without
this patch and alphabetical order, I will need to put a new trait in a
random place not close to surrounding commits, as Git relates close
lines when it finds conflicts. When I merged all my patches into one
patch series, I needed to fix conflicts for all my patches almost
every time I rebased. Both thinking of the random place and fixing the
conflicts of all patches would definitely not be desirable. Would you
think we should drop this patch?

> >
> > gcc/cp/ChangeLog:
> >
> >   * constraint.cc (diagnose_trait_expr): Sort built-in traits
> >   alphabetically.
> >   * cp-trait.def: Likewise.
> >   * semantics.cc (trait_expr_value): Likewise.
> >   (finish_trait_expr): Likewise.
> >   (finish_trait_type): Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * g++.dg/ext/has-builtin-1.C: Sort built-in traits alphabetically.
> >
> > Signed-off-by: Ken Matsui 
> > ---
> >  gcc/cp/constraint.cc | 68 -
> >  gcc/cp/cp-trait.def  | 10 +--
> >  gcc/cp/semantics.cc  | 94 
> >  gcc/testsuite/g++.dg/ext/has-builtin-1.C | 70 +-
> >  4 files changed, 121 insertions(+), 121 deletions(-)
> >
> > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> > index c9e4e7043cd..722fc334e6f 100644
> > --- a/gcc/cp/constraint.cc
> > +++ b/gcc/cp/constraint.cc
> > @@ -3702,18 +3702,36 @@ diagnose_trait_expr (tree expr, tree args)
> >  case CPTK_HAS_TRIVIAL_DESTRUCTOR:
> >inform (loc, "  %qT is not trivially destructible", t1);
> >break;
> > +case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
> > +  inform (loc, "  %qT does not have unique object representations", 
> > t1);
> > +  break;
> >  case CPTK_HAS_VIRTUAL_DESTRUCTOR:
> >inform (loc, "  %qT does not have a virtual destructor", t1);
> >break;
> >  case CPTK_IS_ABSTRACT:
> >inform (loc, "  %qT is not an abstract class", t1);
> >break;
> > +case CPTK_IS_AGGREGATE:
> > +  inform (loc, "  %qT is not an aggregate", t1);
> > +  break;
> > +case CPTK_IS_ASSIGNABLE:
> > +  inform (loc, "  %qT is not assignable from %qT", t1, t2);
> > +  break;
> >  case CPTK_IS_BASE_OF:
> >inform (loc, "  %qT is not a base of %qT", t1, t2);
> >break;
> >  case CPTK_IS_CLASS:
> >inform (loc, "  %qT is not a class", t1);
> >break;
> > +case CPTK_IS_CONSTRUCTIBLE:
> > +  if (!t2)
> > +inform (loc, "  %qT is not default constructible", t1);
> > +  else
> > +inform (loc, "  %qT is not constructible from %qE", t1, t2);
> > +  break;
> > +case CPTK_IS_CONVERTIBLE:
> > +  inform (loc, "  %qT is not convertible from %qE", t2, t1);
> > +  break;
> >  case CPTK_IS_EMPTY:
> >inform (loc, "  %qT is not an empty class", t1);
> >break;
> > @@ -3729,6 +3747,18 @@ diagnose_trait_expr (tree expr, tree args)
> >  case CPTK_IS_LITERAL_TYPE:
> >inform (loc, "  %qT is not a literal type", t1);
> >break;
> > +case CPTK_IS_NOTHROW_ASSIGNABLE:
> > +  inform (loc, "  %qT is not nothrow assignable from %qT", t1, t2);
> > +  break;
> > +case CPTK_IS_NOTHROW_CONSTRUCTIBLE:
> > +  if (!t2)
> > + inform (loc, "  %qT is not nothrow default constructible", t1);
> > +  else
> > + inform (loc, "  %qT is not nothrow constructible from %qE", t1, t2);
> > +  break;
> > +case CPTK_IS_NOTHROW_CONVERTIBLE:
> > +   inform (loc, "  %qT is not nothrow convertible from %qE", t2, t1);
> > +  break;
> >  case CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF:
> >inform (loc, "  %qT is not pointer-interconvertible base of %qT",
> > t1, t2);
> > @@ -3748,50 +3778,20 @@ diagnose_trait_expr (tree expr, tree args)
> >  case CPTK_IS_TRIVIAL:
> >inform (loc, "  %qT is not a trivial type", t1);
> >break;
> > -case CPTK_IS_UNION:
> > -  inform (loc, "  %qT is 

[committed] RISC-V: NFC: Move scalar block move expansion code into riscv-string.cc

2023-10-16 Thread Jeff Law


This just moves a few functions out of riscv.cc into riscv-string.cc in 
an attempt to keep riscv.cc manageable.  This was originally Christoph's 
code and I'm just pushing it on his behalf.


Full disclosure: I built rv64gc after changing to verify everything 
still builds.  Given it was just lifting code from one place to another, 
I didn't run the testsuite.


Jeff
commit 328745607c5d403a1c7b6bc2ecaa1574ee42122f
Author: Christoph Müllner 
Date:   Mon Oct 16 13:57:43 2023 -0600

RISC-V: NFC: Move scalar block move expansion code into riscv-string.cc

This just moves a few functions out of riscv.cc into riscv-string.cc in an
attempt to keep riscv.cc manageable.  This was originally Christoph's code 
and
I'm just pushing it on his behalf.

Full disclosure: I built rv64gc after changing to verify everything still
builds.  Given it was just lifting code from one place to another, I didn't 
run
the testsuite.

gcc/
* config/riscv/riscv-protos.h (emit_block_move): Remove redundant
prototype.  Improve comment.
* config/riscv/riscv.cc (riscv_block_move_straight): Move from 
riscv.cc
into riscv-string.cc.
(riscv_adjust_block_mem, riscv_block_move_loop): Likewise.
(riscv_expand_block_move): Likewise.
* config/riscv/riscv-string.cc (riscv_block_move_straight): Add 
moved
function.
(riscv_adjust_block_mem, riscv_block_move_loop): Likewise.
(riscv_expand_block_move): Likewise.

diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 49bdcdf2f93..6190faab501 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -117,7 +117,6 @@ extern rtx riscv_emit_binary (enum rtx_code code, rtx dest, 
rtx x, rtx y);
 extern bool riscv_expand_conditional_move (rtx, rtx, rtx, rtx);
 extern rtx riscv_legitimize_call_address (rtx);
 extern void riscv_set_return_address (rtx, rtx);
-extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern rtx riscv_return_addr (int, rtx);
 extern poly_int64 riscv_initial_elimination_offset (int, int);
 extern void riscv_expand_prologue (void);
@@ -125,7 +124,6 @@ extern void riscv_expand_epilogue (int);
 extern bool riscv_epilogue_uses (unsigned int);
 extern bool riscv_can_use_return_insn (void);
 extern rtx riscv_function_value (const_tree, const_tree, enum machine_mode);
-extern bool riscv_expand_block_move (rtx, rtx, rtx);
 extern bool riscv_store_data_bypass_p (rtx_insn *, rtx_insn *);
 extern rtx riscv_gen_gpr_save_insn (struct riscv_frame_info *);
 extern bool riscv_gpr_save_operation_p (rtx);
@@ -160,6 +158,9 @@ extern bool riscv_hard_regno_rename_ok (unsigned, unsigned);
 rtl_opt_pass * make_pass_shorten_memrefs (gcc::context *ctxt);
 rtl_opt_pass * make_pass_vsetvl (gcc::context *ctxt);
 
+/* Routines implemented in riscv-string.c.  */
+extern bool riscv_expand_block_move (rtx, rtx, rtx);
+
 /* Information about one CPU we know about.  */
 struct riscv_cpu_info {
   /* This CPU's canonical name.  */
diff --git a/gcc/config/riscv/riscv-string.cc b/gcc/config/riscv/riscv-string.cc
index 2bdff0374e8..0b4606aa7b2 100644
--- a/gcc/config/riscv/riscv-string.cc
+++ b/gcc/config/riscv/riscv-string.cc
@@ -592,3 +592,158 @@ riscv_expand_strlen (rtx result, rtx src, rtx 
search_char, rtx align)
 
   return false;
 }
+
+/* Emit straight-line code to move LENGTH bytes from SRC to DEST.
+   Assume that the areas do not overlap.  */
+
+static void
+riscv_block_move_straight (rtx dest, rtx src, unsigned HOST_WIDE_INT length)
+{
+  unsigned HOST_WIDE_INT offset, delta;
+  unsigned HOST_WIDE_INT bits;
+  int i;
+  enum machine_mode mode;
+  rtx *regs;
+
+  bits = MAX (BITS_PER_UNIT,
+ MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest;
+
+  mode = mode_for_size (bits, MODE_INT, 0).require ();
+  delta = bits / BITS_PER_UNIT;
+
+  /* Allocate a buffer for the temporary registers.  */
+  regs = XALLOCAVEC (rtx, length / delta);
+
+  /* Load as many BITS-sized chunks as possible.  Use a normal load if
+ the source has enough alignment, otherwise use left/right pairs.  */
+  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+{
+  regs[i] = gen_reg_rtx (mode);
+  riscv_emit_move (regs[i], adjust_address (src, mode, offset));
+}
+
+  /* Copy the chunks to the destination.  */
+  for (offset = 0, i = 0; offset + delta <= length; offset += delta, i++)
+riscv_emit_move (adjust_address (dest, mode, offset), regs[i]);
+
+  /* Mop up any left-over bytes.  */
+  if (offset < length)
+{
+  src = adjust_address (src, BLKmode, offset);
+  dest = adjust_address (dest, BLKmode, offset);
+  move_by_pieces (dest, src, length - offset,
+ MIN (MEM_ALIGN (src), MEM_ALIGN (dest)), RETURN_BEGIN);
+}
+}
+
+/* Helper function for doing a loop-based block operation on memory
+   reference MEM.  Each itera

Re: [PATCH v5] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Fangrui Song
On Mon, Oct 16, 2023 at 12:10 PM Uros Bizjak  wrote:
>
> On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song  wrote:
> >
> > On 2023-10-16, Uros Bizjak wrote:
> > >On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song  wrote:
> > >>
> > >> When using -mcmodel=medium, large data objects larger than the
> > >> -mlarge-data-threshold threshold are placed into large data sections
> > >> (.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
> > >> .l* sections into separate output sections.  If small and medium code
> > >> model object files are mixed, the .l* sections won't exert relocation
> > >> overflow pressure on sections in object files built with -mcmodel=small.
> > >>
> > >> However, when using -mcmodel=large, -mlarge-data-threshold doesn't
> > >> apply.  This means that the .rodata/.data/.bss sections may exert
> > >> relocation overflow pressure on sections in -mcmodel=small object files.
> > >>
> > >> This patch allows -mcmodel=large to generate .l* sections and drops an
> > >> unneeded documentation restriction that the value must be the same.
> > >>
> > >> Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
> > >> ("Large data sections for the large code model")
> > >>
> > >> Signed-off-by: Fangrui Song 
> > >>
> > >> ---
> > >> Changes from v1 
> > >> (https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616947.html):
> > >> * Clarify commit message. Add link to 
> > >> https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
> > >>
> > >> Changes from v2
> > >> * Drop an uneeded limitation in the documentation.
> > >>
> > >> Changes from v3
> > >> * Change scan-assembler directives to use \. to match literal .
> > >> ---
> > >>  gcc/config/i386/i386.cc| 15 +--
> > >>  gcc/config/i386/i386.opt   |  2 +-
> > >>  gcc/doc/invoke.texi|  6 +++---
> > >>  gcc/testsuite/gcc.target/i386/large-data.c | 13 +
> > >>  4 files changed, 26 insertions(+), 10 deletions(-)
> > >>  create mode 100644 gcc/testsuite/gcc.target/i386/large-data.c
> > >>
> > >> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> > >> index eabc70011ea..37e810cc741 100644
> > >> --- a/gcc/config/i386/i386.cc
> > >> +++ b/gcc/config/i386/i386.cc
> > >> @@ -647,7 +647,8 @@ ix86_can_inline_p (tree caller, tree callee)
> > >>  static bool
> > >>  ix86_in_large_data_p (tree exp)
> > >>  {
> > >> -  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC)
> > >> +  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC &&
> > >> +  ix86_cmodel != CM_LARGE && ix86_cmodel != CM_LARGE_PIC)
> > >
> > >Please split multi-line expression before the operator, not after it,
> > >as instructed in GNU Coding Standards [1] ...
> > >
> > >[1] https://www.gnu.org/prep/standards/html_node/Formatting.html
> > >
> > >>  return false;
> > >>
> > >>if (exp == NULL_TREE)
> > >> @@ -858,8 +859,9 @@ x86_elf_aligned_decl_common (FILE *file, tree decl,
> > >> const char *name, unsigned HOST_WIDE_INT size,
> > >> unsigned align)
> > >>  {
> > >> -  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
> > >> -  && size > (unsigned int)ix86_section_threshold)
> > >> +  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
> > >> +  ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
> > >> + size > (unsigned int)ix86_section_threshold)
> > >
> > >... also here ...
> > >
> > >>  {
> > >>switch_to_section (get_named_section (decl, ".lbss", 0));
> > >>fputs (LARGECOMM_SECTION_ASM_OP, file);
> > >> @@ -879,9 +881,10 @@ void
> > >>  x86_output_aligned_bss (FILE *file, tree decl, const char *name,
> > >> unsigned HOST_WIDE_INT size, unsigned align)
> > >>  {
> > >> -  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
> > >> -  && size > (unsigned int)ix86_section_threshold)
> > >> -switch_to_section (get_named_section (decl, ".lbss", 0));
> > >> +  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
> > >> +   ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
> > >> +  size > (unsigned int)ix86_section_threshold)
> > >
> > >... and here.
> > >
> > >OK with these formatting changes.
> > >
> > >Thanks,
> > >Uros.
> >
> > Thank you for the review!
> > Posted PATCH v5 
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633153.html
> > with the formatting.
> >
> > I don't have write access to the gcc repository:)
>
> Please provide the ChangeLog entry (see [1,2]) and I'll commit the
> patch for you.
>
> [1] https://gcc.gnu.org/contribute.html
> [2] https://gcc.gnu.org/codingconventions.html#ChangeLogs
>
> Thanks,
> Uros.

It's so kind of you! Attached the ChangeLog:

gcc/
* config/i386/i386.cc (ix86_can_inline_p):
Handle CM_LARGE and CM_LARGE_PIC.
(x86_elf_aligned_decl_common): Ditto.
(x86_output_aligned_bss): Ditto.
* config/i386/i386.opt: Update d

Re: [PATCH v20 02/40] c-family, c++: Look up built-in traits via identifier node

2023-10-16 Thread Ken Matsui
On Mon, Oct 16, 2023 at 7:55 AM Patrick Palka  wrote:
>
> On Sun, 15 Oct 2023, Ken Matsui wrote:
>
> > Since RID_MAX soon reaches 255 and all built-in traits are used 
> > approximately
> > once in a C++ translation unit, this patch removes all RID values for 
> > built-in
> > traits and uses the identifier node to look up the specific trait.  Rather
> > than holding traits as keywords, we set all trait identifiers as cik_trait,
> > which is a new cp_identifier_kind.  As cik_reserved_for_udlit was unused and
> > cp_identifier_kind is 3 bits, we replaced the unused field with the new
> > cik_trait.  Also, the later patch handles a subsequent token to the built-in
> > identifier so that we accept the use of non-function-like built-in trait
> > identifiers.
>
> Thanks, this looks great!  Some review comments below.
>

Thank you so much for your review :)

> >
> > gcc/c-family/ChangeLog:
> >
> >   * c-common.cc (c_common_reswords): Remove all mappings of
> >   built-in traits.
> >   * c-common.h (enum rid): Remove all RID values for built-in traits.
> >
> > gcc/cp/ChangeLog:
> >
> >   * cp-objcp-common.cc (names_builtin_p): Remove all RID value
> >   cases for built-in traits.  Check for built-in traits via
> >   the new cik_trait kind.
> >   * cp-tree.h (enum cp_trait_kind): Set its underlying type to
> >   addr_space_t.
> >   (struct cp_trait): New struct to hold trait information.
> >   (cp_traits): New array to hold a mapping to all traits.
> >   (cik_reserved_for_udlit): Rename to ...
> >   (cik_trait): ... this.
> >   (IDENTIFIER_ANY_OP_P): Exclude cik_trait.
> >   (IDENTIFIER_TRAIT_P): New macro to detect cik_trait.
> >   * lex.cc (init_cp_traits): New function to set cik_trait for all
> >   built-in trait identifiers.
>
> We should mention setting IDENTIFIER_CP_INDEX as well.
>

Thank you!

> >   (cxx_init): Call init_cp_traits function.
> >   * parser.cc (cp_traits): Define its values, declared in cp-tree.h.
> >   (cp_lexer_lookup_trait): New function to look up a
> >   built-in trait by IDENTIFIER_CP_INDEX.
> >   (cp_lexer_lookup_trait_expr): Likewise, look up an
> >   expression-yielding built-in trait.
> >   (cp_lexer_lookup_trait_type): Likewise, look up a type-yielding
> >   built-in trait.
> >   (cp_keyword_starts_decl_specifier_p): Remove all RID value cases
> >   for built-in traits.
> >   (cp_lexer_next_token_is_decl_specifier_keyword): Handle
> >   type-yielding built-in traits.
> >   (cp_parser_primary_expression): Remove all RID value cases for
> >   built-in traits.  Handle expression-yielding built-in traits.
> >   (cp_parser_trait): Handle cp_trait instead of enum rid.
> >   (cp_parser_simple_type_specifier): Remove all RID value cases
> >   for built-in traits.  Handle type-yielding built-in traits.
> >
> > Co-authored-by: Patrick Palka 
> > Signed-off-by: Ken Matsui 
> > ---
> >  gcc/c-family/c-common.cc  |   7 --
> >  gcc/c-family/c-common.h   |   5 --
> >  gcc/cp/cp-objcp-common.cc |   8 +--
> >  gcc/cp/cp-tree.h  |  31 ++---
> >  gcc/cp/lex.cc |  21 ++
> >  gcc/cp/parser.cc  | 141 --
> >  6 files changed, 139 insertions(+), 74 deletions(-)
> >
> > diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> > index f044db5b797..21fd333ef57 100644
> > --- a/gcc/c-family/c-common.cc
> > +++ b/gcc/c-family/c-common.cc
> > @@ -508,13 +508,6 @@ const struct c_common_resword c_common_reswords[] =
> >{ "wchar_t",   RID_WCHAR,  D_CXXONLY },
> >{ "while", RID_WHILE,  0 },
> >
> > -#define DEFTRAIT(TCC, CODE, NAME, ARITY) \
> > -  { NAME,RID_##CODE, D_CXXONLY },
> > -#include "cp/cp-trait.def"
> > -#undef DEFTRAIT
> > -  /* An alias for __is_same.  */
> > -  { "__is_same_as",  RID_IS_SAME,D_CXXONLY },
> > -
> >/* C++ transactional memory.  */
> >{ "synchronized",  RID_SYNCHRONIZED, D_CXX_OBJC | D_TRANSMEM },
> >{ "atomic_noexcept",   RID_ATOMIC_NOEXCEPT, D_CXXONLY | D_TRANSMEM },
> > diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> > index 1fdba7ef3ea..051a442e0f4 100644
> > --- a/gcc/c-family/c-common.h
> > +++ b/gcc/c-family/c-common.h
> > @@ -168,11 +168,6 @@ enum rid
> >RID_BUILTIN_LAUNDER,
> >RID_BUILTIN_BIT_CAST,
> >
> > -#define DEFTRAIT(TCC, CODE, NAME, ARITY) \
> > -  RID_##CODE,
> > -#include "cp/cp-trait.def"
> > -#undef DEFTRAIT
> > -
> >/* C++11 */
> >RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, 
> > RID_STATIC_ASSERT,
> >
> > diff --git a/gcc/cp/cp-objcp-common.cc b/gcc/cp/cp-objcp-common.cc
> > index 93b027b80ce..b1adacfec07 100644
> > --- a/gcc/cp/cp-objcp-common.cc
> > +++ b/gcc/cp/cp-objcp-common.cc
> > @@ -421,6 +421,10 @@ names_builtin_p (const char *name)
> >   }
> >  }
> >
> > +  /* Check for built-in traits.  */
> > +  if (IDENTIFIER_TRAIT_P 

[COMMITTED] RISC-V/testsuite: add a default march (lacking zfa) to some fp tests

2023-10-16 Thread Vineet Gupta
A bunch of FP tests expecting specific FP asm output fail when built
with zfa because different insns are generated. And this happens
because those tests don't have an explicit -march and the default
used to configure gcc could end up with zfa causing the false fails.

Fix that by adding the -march explicitly which doesn't have zfa.

BTW it seems we have some duplication in tests for zfa and non-zfa and
it would have been better if they were consolidated, but oh well.

gcc/testsuite:
* gcc.target/riscv/fle-ieee.c: Updates dg-options with
explicit -march=rv64gc and -march=rv32gc.
* gcc.target/riscv/fle-snan.c: Ditto.
* gcc.target/riscv/fle.c: Ditto.
* gcc.target/riscv/flef-ieee.c: Ditto.
* gcc.target/riscv/flef.c: Ditto.
* gcc.target/riscv/flef-snan.c: Ditto.
* gcc.target/riscv/flt-ieee.c: Ditto.
* gcc.target/riscv/flt-snan.c: Ditto.
* gcc.target/riscv/fltf-ieee.c: Ditto.
* gcc.target/riscv/fltf-snan.c: Ditto.

Signed-off-by: Vineet Gupta 
---
 gcc/testsuite/gcc.target/riscv/fle-ieee.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/fle-snan.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/fle.c   | 3 ++-
 gcc/testsuite/gcc.target/riscv/flef-ieee.c | 3 ++-
 gcc/testsuite/gcc.target/riscv/flef-snan.c | 3 ++-
 gcc/testsuite/gcc.target/riscv/flef.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/flt-ieee.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/flt-snan.c  | 3 ++-
 gcc/testsuite/gcc.target/riscv/fltf-ieee.c | 3 ++-
 gcc/testsuite/gcc.target/riscv/fltf-snan.c | 3 ++-
 10 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/fle-ieee.c 
b/gcc/testsuite/gcc.target/riscv/fle-ieee.c
index e55331f925d6..12d04514ca29 100644
--- a/gcc/testsuite/gcc.target/riscv/fle-ieee.c
+++ b/gcc/testsuite/gcc.target/riscv/fle-ieee.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } 
*/
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-ftrapping-math -fno-signaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-finite-math-only 
-ftrapping-math -fno-signaling-nans" { target { rv32 } } } */
 
 long
 fle (double x, double y)
diff --git a/gcc/testsuite/gcc.target/riscv/fle-snan.c 
b/gcc/testsuite/gcc.target/riscv/fle-snan.c
index f40bb2cbf662..146b7866e888 100644
--- a/gcc/testsuite/gcc.target/riscv/fle-snan.c
+++ b/gcc/testsuite/gcc.target/riscv/fle-snan.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-ftrapping-math -fsignaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-finite-math-only 
-ftrapping-math -fsignaling-nans" { target { rv32 } } } */
 
 long
 fle (double x, double y)
diff --git a/gcc/testsuite/gcc.target/riscv/fle.c 
b/gcc/testsuite/gcc.target/riscv/fle.c
index 97c8ab9ad864..2379e22d5062 100644
--- a/gcc/testsuite/gcc.target/riscv/fle.c
+++ b/gcc/testsuite/gcc.target/riscv/fle.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -fno-trapping-math -fno-signaling-nans" 
} */
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-fno-trapping-math -fno-signaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32d -fno-finite-math-only 
-fno-trapping-math -fno-signaling-nans" { target { rv32 } } } */
 
 long
 fle (double x, double y)
diff --git a/gcc/testsuite/gcc.target/riscv/flef-ieee.c 
b/gcc/testsuite/gcc.target/riscv/flef-ieee.c
index f3e7e7d75d6c..b6ee6ed08a4d 100644
--- a/gcc/testsuite/gcc.target/riscv/flef-ieee.c
+++ b/gcc/testsuite/gcc.target/riscv/flef-ieee.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -ftrapping-math -fno-signaling-nans" } 
*/
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-ftrapping-math -fno-signaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32f -fno-finite-math-only 
-ftrapping-math -fno-signaling-nans" { target { rv32 } } } */
 
 long
 flef (float x, float y)
diff --git a/gcc/testsuite/gcc.target/riscv/flef-snan.c 
b/gcc/testsuite/gcc.target/riscv/flef-snan.c
index ef75b3523057..e8611e8c0215 100644
--- a/gcc/testsuite/gcc.target/riscv/flef-snan.c
+++ b/gcc/testsuite/gcc.target/riscv/flef-snan.c
@@ -1,6 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target hard_float } */
-/* { dg-options "-fno-finite-math-only -ftrapping-math -fsignaling-nans" } */
+/* { dg-options "-march=rv64gc -mabi=lp64d  -fno-finite-math-only 
-ftrapping-math -fsignaling-nans" { target { rv64 } } } */
+/* { dg-options "-march=rv32gc -mabi=ilp32f 

Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in make_compound_operation.

2023-10-16 Thread Jeff Law




On 10/15/23 03:49, Roger Sayle wrote:


Hi Jeff,
Thanks for the speedy review(s).


From: Jeff Law 
Sent: 15 October 2023 00:03
To: Roger Sayle ; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] PR 91865: Avoid ZERO_EXTEND of ZERO_EXTEND in
make_compound_operation.

On 10/14/23 16:14, Roger Sayle wrote:


This patch is my proposed solution to PR rtl-optimization/91865.
Normally RTX simplification canonicalizes a ZERO_EXTEND of a
ZERO_EXTEND to a single ZERO_EXTEND, but as shown in this PR it is
possible for combine's make_compound_operation to unintentionally
generate a non-canonical ZERO_EXTEND of a ZERO_EXTEND, which is
unlikely to be matched by the backend.

For the new test case:

const int table[2] = {1, 2};
int foo (char i) { return table[i]; }

compiling with -O2 -mlarge on msp430 we currently see:

Trying 2 -> 7:
  2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
  7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Failed to match this instruction:
(set (reg:PSI 28 [ iD.1772 ])
  (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]

which results in the following code:

foo:AND #0xff, R12
  RLAM.A #4, R12 { RRAM.A #4, R12
  RLAM.A  #1, R12
  MOVX.W  table(R12), R12
  RETA

With this patch, we now see:

Trying 2 -> 7:
  2: r25:HI=zero_extend(R12:QI)
REG_DEAD R12:QI
  7: r28:PSI=sign_extend(r25:HI)#0
REG_DEAD r25:HI
Successfully matched this instruction:
(set (reg:PSI 28 [ iD.1772 ])
  (zero_extend:PSI (reg:QI 12 R12 [ iD.1772 ]))) allowing
combination of insns 2 and 7 original costs 4 + 8 = 12 replacement
cost 8

foo:MOV.B   R12, R12
  RLAM.A  #1, R12
  MOVX.W  table(R12), R12
  RETA


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

2023-10-14  Roger Sayle  

gcc/ChangeLog
  PR rtl-optimization/91865
  * combine.cc (make_compound_operation): Avoid creating a
  ZERO_EXTEND of a ZERO_EXTEND.

gcc/testsuite/ChangeLog
  PR rtl-optimization/91865
  * gcc.target/msp430/pr91865.c: New test case.

Neither an ACK or NAK at this point.

The bug report includes a patch from Segher which purports to fix this in 
simplify-
rtx.  Any thoughts on Segher's approach and whether or not it should be
considered?

The BZ also indicates that removal of 2 patterns from msp430.md would solve this
too (though it may cause regressions elsewhere?).  Any thoughts on that approach
as well?



Great questions.  I believe Segher's proposed patch (in comment #4) was an
msp430-specific proof-of-concept workaround rather than intended to be fix.
Eliminating a ZERO_EXTEND simply by changing the mode of a hard register
is not a solution that'll work on many platforms (and therefore not really 
suitable
for target-independent middle-end code in the RTL optimizers).
Thanks.  I didn't really look at Segher's patch, so thanks for digging 
into it.  Certainly just flipping the mode of the hard register isn't 
correct.





The underlying issue, which is applicable to all targets, is that combine.cc's
make_compound_operation is expected to reverse the local transformations
made by expand_compound_operation.  Hence, if an RTL expression is
canonical going into expand_compound_operation, it is expected (hoped)
to be canonical (and equivalent) coming out of make_compound_operation.

In theory, correct.




Hence, rather than be a MSP430 specific issue, no target should expect (or
be expected to see) a ZERO_EXTEND of a ZERO_EXTEND, or a SIGN_EXTEND
of a ZERO_EXTEND in the RTL stream.  Much like a binary operator with two
CONST_INT operands, or a shift by zero, it's something the middle-end might
reasonably be expected to clean-up. [Yeah, I know... 😊]

Agreed.






(set (reg:PSI 28 [ iD.1772 ])
  (zero_extend:PSI (zero_extend:HI (reg:QI 12 R12 [ iD.1772 ]


As a rule of thumb, if the missed optimization bug report includes combine's
diagnostic "Failed to match this instruction:", things can be improved by adding
a pattern (often a define_insn_and_split) that matches the shown RTL.
Yes, but we also need to ponder if that's the right way to fix any given 
problem.  Sometimes we're going to be better off simplifying elsewhere 
in the pipeline.  I think we can agree this is one of the cases where 
matching the RTL in the backend is not the desired approach.


Occasionally things like those two patterns show up for various reasons. 
 Hopefully they can be removed :-)  Though the first looks awful close 
to something I did for the mn102 (not to be confused with the mn103) 
eons ago.  Partial modes aren't exactly handled well.




In this case the perhaps reasonable assumption is that the above would/should
(normally) match the backend's existing (zero_extend:PSI (reg:QI ...)) insn 
pattern.
Or that's my understanding of why this PR is classified as

Re: [PATCH] RISC-V/testsuite: add a default march (lacking zfa) to some fp tests

2023-10-16 Thread Jeff Law




On 10/15/23 12:16, Vineet Gupta wrote:

A bunch of FP tests expecting specific FP asm output fail when built
with zfa because different insns are generated. And this happens
because those tests don't have an explicit -march and the default
used to configure gcc could end up with zfa causing the false fails.

Fix that by adding the -march explicitly which doesn't have zfa.

BTW it seems we have some duplication in tests for zfa and non-zfa and
it would have been better if they were consolidated, but oh well.

gcc/testsuite:
* gcc.target/riscv/fle-ieee.c: Updates dg-options with
explicit -march=rv64gc and -march=rv43gc.
* gcc.target/riscv/fle-snan.c: Ditto.
* gcc.target/riscv/fle.c: Ditto.
* gcc.target/riscv/flef-ieee.c: Ditto.
* gcc.target/riscv/flef.c: Ditto.
* gcc.target/riscv/flef-snan.c: Ditto.
* gcc.target/riscv/flt-ieee.c: Ditto.
* gcc.target/riscv/flt-snan.c: Ditto.
* gcc.target/riscv/fltf-ieee.c: Ditto.
* gcc.target/riscv/fltf-snan.c: Ditto.

Signed-off-by: Vineet Gupta 

OK with the "rv43" -> "rv32" typo fixed in the ChangeLog.

Jeff


[PATCH] Fortran: out of bounds access with nested implied-do IO [PR111837]

2023-10-16 Thread Harald Anlauf
Dear All,

the attached patch fixes a dependency check in frontend optimzation
for nested implied-do IO.  The problem appeared for >= 3 loops only
as the check considered dependencies to be only of band form instead
of triangular form.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

As this fixes a regression since 8-release, I plan to backport
to all active branches.

Thanks,
Harald

From 43ec8b856a67a1b70744e5c0d50ea7fa2dd9a8ee Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 16 Oct 2023 21:02:20 +0200
Subject: [PATCH] Fortran: out of bounds access with nested implied-do IO
 [PR111837]

gcc/fortran/ChangeLog:

	PR fortran/111837
	* frontend-passes.cc (traverse_io_block): Dependency check of loop
	nest shall be triangular, not banded.

gcc/testsuite/ChangeLog:

	PR fortran/111837
	* gfortran.dg/implied_do_io_8.f90: New test.
---
 gcc/fortran/frontend-passes.cc|  2 +-
 gcc/testsuite/gfortran.dg/implied_do_io_8.f90 | 18 ++
 2 files changed, 19 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/implied_do_io_8.f90

diff --git a/gcc/fortran/frontend-passes.cc b/gcc/fortran/frontend-passes.cc
index 136a292807d..536884b13f0 100644
--- a/gcc/fortran/frontend-passes.cc
+++ b/gcc/fortran/frontend-passes.cc
@@ -1326,7 +1326,7 @@ traverse_io_block (gfc_code *code, bool *has_reached, gfc_code *prev)
   if (iters[i])
 	{
 	  gfc_expr *var = iters[i]->var;
-	  for (int j = i - 1; j < i; j++)
+	  for (int j = 0; j < i; j++)
 	{
 	  if (iters[j]
 		  && (var_in_expr (var, iters[j]->start)
diff --git a/gcc/testsuite/gfortran.dg/implied_do_io_8.f90 b/gcc/testsuite/gfortran.dg/implied_do_io_8.f90
new file mode 100644
index 000..c66a0f6fde6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/implied_do_io_8.f90
@@ -0,0 +1,18 @@
+! { dg-do run }
+! { dg-additional-options "-fcheck=bounds" }
+! PR fortran/111837 - out of bounds access with front-end optimization
+
+program implied_do_bug
+  implicit none
+  integer :: i,j,k
+  real:: arr(1,1,1)
+  integer :: ni(1)
+  ni(1) = 1
+  arr = 1
+  write(*,*) (((arr(i,j,k), i=1,ni(k)), k=1,1), j=1,1)
+  write(*,*) (((arr(i,j,k), i=1,ni(k)), j=1,1), k=1,1)
+  write(*,*) (((arr(k,i,j), i=1,ni(k)), k=1,1), j=1,1)
+  write(*,*) (((arr(k,i,j), i=1,ni(k)), j=1,1), k=1,1)
+  write(*,*) (((arr(j,k,i), i=1,ni(k)), k=1,1), j=1,1)
+  write(*,*) (((arr(j,k,i), i=1,ni(k)), j=1,1), k=1,1)
+end
--
2.35.3



Re: [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-16 Thread Jeff Law




On 10/16/23 12:01, Manolis Tsamis wrote:

This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

   addi t4,sp,16
   add  t2,a6,t4
   shl  t3,t2,1
   ld   a2,0(t3)
   addi a2,1
   sd   a2,8(t2)

into the following (one instruction less):

   add  t2,a6,sp
   shl  t3,t2,1
   ld   a2,32(t3)
   addi a2,1
   sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

Thanks, I've pushed this to the trunk.

jeff




Re: [PATCH v5] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Uros Bizjak
On Mon, Oct 16, 2023 at 8:24 PM Fangrui Song  wrote:
>
> On 2023-10-16, Uros Bizjak wrote:
> >On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song  wrote:
> >>
> >> When using -mcmodel=medium, large data objects larger than the
> >> -mlarge-data-threshold threshold are placed into large data sections
> >> (.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
> >> .l* sections into separate output sections.  If small and medium code
> >> model object files are mixed, the .l* sections won't exert relocation
> >> overflow pressure on sections in object files built with -mcmodel=small.
> >>
> >> However, when using -mcmodel=large, -mlarge-data-threshold doesn't
> >> apply.  This means that the .rodata/.data/.bss sections may exert
> >> relocation overflow pressure on sections in -mcmodel=small object files.
> >>
> >> This patch allows -mcmodel=large to generate .l* sections and drops an
> >> unneeded documentation restriction that the value must be the same.
> >>
> >> Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
> >> ("Large data sections for the large code model")
> >>
> >> Signed-off-by: Fangrui Song 
> >>
> >> ---
> >> Changes from v1 
> >> (https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616947.html):
> >> * Clarify commit message. Add link to 
> >> https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
> >>
> >> Changes from v2
> >> * Drop an uneeded limitation in the documentation.
> >>
> >> Changes from v3
> >> * Change scan-assembler directives to use \. to match literal .
> >> ---
> >>  gcc/config/i386/i386.cc| 15 +--
> >>  gcc/config/i386/i386.opt   |  2 +-
> >>  gcc/doc/invoke.texi|  6 +++---
> >>  gcc/testsuite/gcc.target/i386/large-data.c | 13 +
> >>  4 files changed, 26 insertions(+), 10 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.target/i386/large-data.c
> >>
> >> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> >> index eabc70011ea..37e810cc741 100644
> >> --- a/gcc/config/i386/i386.cc
> >> +++ b/gcc/config/i386/i386.cc
> >> @@ -647,7 +647,8 @@ ix86_can_inline_p (tree caller, tree callee)
> >>  static bool
> >>  ix86_in_large_data_p (tree exp)
> >>  {
> >> -  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC)
> >> +  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC &&
> >> +  ix86_cmodel != CM_LARGE && ix86_cmodel != CM_LARGE_PIC)
> >
> >Please split multi-line expression before the operator, not after it,
> >as instructed in GNU Coding Standards [1] ...
> >
> >[1] https://www.gnu.org/prep/standards/html_node/Formatting.html
> >
> >>  return false;
> >>
> >>if (exp == NULL_TREE)
> >> @@ -858,8 +859,9 @@ x86_elf_aligned_decl_common (FILE *file, tree decl,
> >> const char *name, unsigned HOST_WIDE_INT size,
> >> unsigned align)
> >>  {
> >> -  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
> >> -  && size > (unsigned int)ix86_section_threshold)
> >> +  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
> >> +  ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
> >> + size > (unsigned int)ix86_section_threshold)
> >
> >... also here ...
> >
> >>  {
> >>switch_to_section (get_named_section (decl, ".lbss", 0));
> >>fputs (LARGECOMM_SECTION_ASM_OP, file);
> >> @@ -879,9 +881,10 @@ void
> >>  x86_output_aligned_bss (FILE *file, tree decl, const char *name,
> >> unsigned HOST_WIDE_INT size, unsigned align)
> >>  {
> >> -  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
> >> -  && size > (unsigned int)ix86_section_threshold)
> >> -switch_to_section (get_named_section (decl, ".lbss", 0));
> >> +  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
> >> +   ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
> >> +  size > (unsigned int)ix86_section_threshold)
> >
> >... and here.
> >
> >OK with these formatting changes.
> >
> >Thanks,
> >Uros.
>
> Thank you for the review!
> Posted PATCH v5 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633153.html
> with the formatting.
>
> I don't have write access to the gcc repository:)

Please provide the ChangeLog entry (see [1,2]) and I'll commit the
patch for you.

[1] https://gcc.gnu.org/contribute.html
[2] https://gcc.gnu.org/codingconventions.html#ChangeLogs

Thanks,
Uros.


Re: [patch] fortran/intrinsic.texi: Add 'passed by value' to signal handler

2023-10-16 Thread Steve Kargl
On Mon, Oct 16, 2023 at 08:31:20PM +0200, Harald Anlauf wrote:
> 
> Am 16.10.23 um 19:11 schrieb Tobias Burnus:
> > Yesterday, someone was confused because the signal handler did not work.
> > 
> > It turned out that the created Fortran procedure used as handler used
> > pass by reference - and 'signal' passed the it by value.
> > 
> > This patch adds the 'passed by value' to the wording:
> > 
> > "@var{HANDLER} to be executed with a single integer argument passed by
> > value"
> > 
> > OK for mainline?
> 
> I think the patch qualifies as obvious.
> 
> While at it, you might consider removing the comment a few lines below
> the place you are changing,
> 
> @c TODO: What should the interface of the handler be?  Does it take
> arguments?
> 
> and enhance the given example by e.g.:
> 
> subroutine handler_print (signal_number)
>   integer, value :: signal_number
>   print *, "In handler_print: received signal number", signal_number
> end subroutine handler_print
> 

Good suggestion, Harald.  I was composing a similar email
when I saw yours pop into by inbox.

-- 
Steve


Re: [patch] fortran/intrinsic.texi: Add 'passed by value' to signal handler

2023-10-16 Thread Harald Anlauf

Hi Tobias,

Am 16.10.23 um 19:11 schrieb Tobias Burnus:

Yesterday, someone was confused because the signal handler did not work.

It turned out that the created Fortran procedure used as handler used
pass by reference - and 'signal' passed the it by value.

This patch adds the 'passed by value' to the wording:

"@var{HANDLER} to be executed with a single integer argument passed by
value"

OK for mainline?


I think the patch qualifies as obvious.

While at it, you might consider removing the comment a few lines below
the place you are changing,

@c TODO: What should the interface of the handler be?  Does it take
arguments?

and enhance the given example by e.g.:

subroutine handler_print (signal_number)
  integer, value :: signal_number
  print *, "In handler_print: received signal number", signal_number
end subroutine handler_print

Thanks,
Harald


Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201,
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer:
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München;
Registergericht München, HRB 106955




[PATCH v5] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Fangrui Song

On 2023-10-16, Uros Bizjak wrote:

On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song  wrote:


When using -mcmodel=medium, large data objects larger than the
-mlarge-data-threshold threshold are placed into large data sections
(.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
.l* sections into separate output sections.  If small and medium code
model object files are mixed, the .l* sections won't exert relocation
overflow pressure on sections in object files built with -mcmodel=small.

However, when using -mcmodel=large, -mlarge-data-threshold doesn't
apply.  This means that the .rodata/.data/.bss sections may exert
relocation overflow pressure on sections in -mcmodel=small object files.

This patch allows -mcmodel=large to generate .l* sections and drops an
unneeded documentation restriction that the value must be the same.

Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
("Large data sections for the large code model")

Signed-off-by: Fangrui Song 

---
Changes from v1 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616947.html):
* Clarify commit message. Add link to 
https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU

Changes from v2
* Drop an uneeded limitation in the documentation.

Changes from v3
* Change scan-assembler directives to use \. to match literal .
---
 gcc/config/i386/i386.cc| 15 +--
 gcc/config/i386/i386.opt   |  2 +-
 gcc/doc/invoke.texi|  6 +++---
 gcc/testsuite/gcc.target/i386/large-data.c | 13 +
 4 files changed, 26 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/large-data.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index eabc70011ea..37e810cc741 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -647,7 +647,8 @@ ix86_can_inline_p (tree caller, tree callee)
 static bool
 ix86_in_large_data_p (tree exp)
 {
-  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC)
+  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC &&
+  ix86_cmodel != CM_LARGE && ix86_cmodel != CM_LARGE_PIC)


Please split multi-line expression before the operator, not after it,
as instructed in GNU Coding Standards [1] ...

[1] https://www.gnu.org/prep/standards/html_node/Formatting.html


 return false;

   if (exp == NULL_TREE)
@@ -858,8 +859,9 @@ x86_elf_aligned_decl_common (FILE *file, tree decl,
const char *name, unsigned HOST_WIDE_INT size,
unsigned align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-  && size > (unsigned int)ix86_section_threshold)
+  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
+  ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
+ size > (unsigned int)ix86_section_threshold)


... also here ...


 {
   switch_to_section (get_named_section (decl, ".lbss", 0));
   fputs (LARGECOMM_SECTION_ASM_OP, file);
@@ -879,9 +881,10 @@ void
 x86_output_aligned_bss (FILE *file, tree decl, const char *name,
unsigned HOST_WIDE_INT size, unsigned align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
-  && size > (unsigned int)ix86_section_threshold)
-switch_to_section (get_named_section (decl, ".lbss", 0));
+  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
+   ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
+  size > (unsigned int)ix86_section_threshold)


... and here.

OK with these formatting changes.

Thanks,
Uros.


Thank you for the review!
Posted PATCH v5 
https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633153.html
with the formatting.

I don't have write access to the gcc repository:)

(Hmmm... in emacs, C-c . gnu RET C-M-\  doesn't fix the && || formatting 
errors.)


+switch_to_section(get_named_section(decl, ".lbss", 0));
   else
 switch_to_section (bss_section);
   ASM_OUTPUT_ALIGN (file, floor_log2 (align / BITS_PER_UNIT));
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index 1cc8563477a..52fad492353 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -282,7 +282,7 @@ Branches are this expensive (arbitrary units).

 mlarge-data-threshold=
 Target RejectNegative Joined UInteger Var(ix86_section_threshold) 
Init(DEFAULT_LARGE_SECTION_THRESHOLD)
--mlarge-data-threshold=Data greater than given threshold will 
go into .ldata section in x86-64 medium model.
+-mlarge-data-threshold=Data greater than given threshold will 
go into a large data section in x86-64 medium and large code models.

 mcmodel=
 Target RejectNegative Joined Enum(cmodel) Var(ix86_cmodel) Init(CM_32)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 104766f446d..bf6fe3e1a20 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -33207,9 +33207,9 @@ the cache line size.  @samp{compat} is the default.

 @opindex mlar

[PATCH v5] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Fangrui Song
When using -mcmodel=medium, large data objects larger than the
-mlarge-data-threshold threshold are placed into large data sections
(.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
.l* sections into separate output sections.  If small and medium code
model object files are mixed, the .l* sections won't exert relocation
overflow pressure on sections in object files built with -mcmodel=small.

However, when using -mcmodel=large, -mlarge-data-threshold doesn't
apply.  This means that the .rodata/.data/.bss sections may exert
relocation overflow pressure on sections in -mcmodel=small object files.

This patch allows -mcmodel=large to generate .l* sections and drops an
unneeded documentation restriction that the value must be the same.

Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
("Large data sections for the large code model")

Signed-off-by: Fangrui Song 

---
Changes from v1 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616947.html):
* Clarify commit message. Add link to 
https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU

Changes from v2
* Drop an uneeded limitation in the documentation.

Changes from v3
* Change scan-assembler directives to use \. to match literal .

Changes from v4 
(https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633145.html)
* "When you split an expression into multiple lines, split it before an 
operator, not after one."
---
 gcc/config/i386/i386.cc|  9 ++---
 gcc/config/i386/i386.opt   |  2 +-
 gcc/doc/invoke.texi|  6 +++---
 gcc/testsuite/gcc.target/i386/large-data.c | 13 +
 4 files changed, 23 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/large-data.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 8251b67e2d6..641e7680335 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -663,7 +663,8 @@ ix86_can_inline_p (tree caller, tree callee)
 static bool
 ix86_in_large_data_p (tree exp)
 {
-  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC)
+  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC
+  && ix86_cmodel != CM_LARGE && ix86_cmodel != CM_LARGE_PIC)
 return false;
 
   if (exp == NULL_TREE)
@@ -874,7 +875,8 @@ x86_elf_aligned_decl_common (FILE *file, tree decl,
const char *name, unsigned HOST_WIDE_INT size,
unsigned align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
+  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC
+   || ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC)
   && size > (unsigned int)ix86_section_threshold)
 {
   switch_to_section (get_named_section (decl, ".lbss", 0));
@@ -895,7 +897,8 @@ void
 x86_output_aligned_bss (FILE *file, tree decl, const char *name,
unsigned HOST_WIDE_INT size, unsigned align)
 {
-  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
+  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC
+   || ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC)
   && size > (unsigned int)ix86_section_threshold)
 switch_to_section (get_named_section (decl, ".lbss", 0));
   else
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index b8382c48099..0c3b8f4b621 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -282,7 +282,7 @@ Branches are this expensive (arbitrary units).
 
 mlarge-data-threshold=
 Target RejectNegative Joined UInteger Var(ix86_section_threshold) 
Init(DEFAULT_LARGE_SECTION_THRESHOLD)
--mlarge-data-threshold=Data greater than given threshold will 
go into .ldata section in x86-64 medium model.
+-mlarge-data-threshold=Data greater than given threshold will 
go into a large data section in x86-64 medium and large code models.
 
 mcmodel=
 Target RejectNegative Joined Enum(cmodel) Var(ix86_cmodel) Init(CM_32)
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index eb714d18511..50745a3a195 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -33390,9 +33390,9 @@ the cache line size.  @samp{compat} is the default.
 
 @opindex mlarge-data-threshold
 @item -mlarge-data-threshold=@var{threshold}
-When @option{-mcmodel=medium} is specified, data objects larger than
-@var{threshold} are placed in the large data section.  This value must be the
-same across all objects linked into the binary, and defaults to 65535.
+When @option{-mcmodel=medium} or @option{-mcmodel=large} is specified, data
+objects larger than @var{threshold} are placed in large data sections.  The
+default is 65535.
 
 @opindex mrtd
 @item -mrtd
diff --git a/gcc/testsuite/gcc.target/i386/large-data.c 
b/gcc/testsuite/gcc.target/i386/large-data.c
new file mode 100644
index 000..bdd4acd30b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/large-data.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effect

Re: [PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-16 Thread Manolis Tsamis
This version has been bootstrapped and tested to cause no regressions
in both x86 and Aarch64.

Manolis

On Mon, Oct 16, 2023 at 9:01 PM Manolis Tsamis  wrote:
>
> This is a new RTL pass that tries to optimize memory offset calculations
> by moving them from add immediate instructions to the memory loads/stores.
> For example it can transform this:
>
>   addi t4,sp,16
>   add  t2,a6,t4
>   shl  t3,t2,1
>   ld   a2,0(t3)
>   addi a2,1
>   sd   a2,8(t2)
>
> into the following (one instruction less):
>
>   add  t2,a6,sp
>   shl  t3,t2,1
>   ld   a2,32(t3)
>   addi a2,1
>   sd   a2,24(t2)
>
> Although there are places where this is done already, this pass is more
> powerful and can handle the more difficult cases that are currently not
> optimized. Also, it runs late enough and can optimize away unnecessary
> stack pointer calculations.
>
> gcc/ChangeLog:
>
> * Makefile.in: Add fold-mem-offsets.o.
> * passes.def: Schedule a new pass.
> * tree-pass.h (make_pass_fold_mem_offsets): Declare.
> * common.opt: New options.
> * doc/invoke.texi: Document new option.
> * fold-mem-offsets.cc: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> * gcc.target/riscv/fold-mem-offsets-3.c: New test.
>
> Signed-off-by: Manolis Tsamis 
> ---
>
> Changes in v7:
> - Replace recognise with recognize.
> - Add INSN_CODE(insn) = -1 before calling insn_invalid_p.
> - Adjust scan regex in i386/pr52146.c.
>
> Changes in v6:
> - Fix formatting issues.
> - Compute maximum validity iterations based on
>   flag_expensive_optimizations.
> - Generate rtx plus only if offset is not zero.
>
> Changes in v5:
> - Introduce new helper function fold_offsets_1.
> - Fix bug because constants could be partially propagated
>   through instructions that weren't understood.
> - Introduce helper class fold_mem_info that stores f-m-o
>   info for an instruction.
> - Calculate fold_offsets only once with do_fold_info_calculation.
> - Fix correctness issue by introducing compute_validity_closure.
> - Propagate in more cases for PLUS/MINUS with constant.
>
> Changes in v4:
> - Add DF_EQ_NOTES flag to avoid incorrect state in notes.
> - Remove fold_mem_offsets_driver and enum fold_mem_phase.
> - Call recog when patching offsets in do_commit_offset.
> - Restore INSN_CODE after modifying insn in do_check_validity.
>
> Changes in v3:
> - Added propagation for more codes:
>   sub, neg, mul.
> - Added folding / elimination for sub and
>   const int moves.
> - For the validity check of the generated addresses
>   also test memory_address_addr_space_p.
> - Replaced GEN_INT with gen_int_mode.
> - Replaced some bitmap_head with auto_bitmap.
> - Refactor each phase into own function for readability.
> - Add dump details.
> - Replace rtx iteration with reg_mentioned_p.
> - Return early for codes that we can't propagate through.
>
> Changes in v2:
> - Made the pass target-independant instead of RISCV specific.
> - Fixed a number of bugs.
> - Add code to handle more ADD patterns as found
>   in other targets (x86, aarch64).
> - Improved naming and comments.
> - Fixed bitmap memory leak.
>
>  gcc/Makefile.in   |   1 +
>  gcc/common.opt|   4 +
>  gcc/doc/invoke.texi   |   8 +
>  gcc/fold-mem-offsets.cc   | 901 ++
>  gcc/passes.def|   1 +
>  gcc/testsuite/gcc.target/i386/pr52146.c   |   2 +-
>  .../gcc.target/riscv/fold-mem-offsets-1.c |  16 +
>  .../gcc.target/riscv/fold-mem-offsets-2.c |  24 +
>  .../gcc.target/riscv/fold-mem-offsets-3.c |  17 +
>  gcc/tree-pass.h   |   1 +
>  10 files changed, 974 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/fold-mem-offsets.cc
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 9cc16268abf..747f749538d 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1443,6 +1443,7 @@ OBJS = \
> fixed-value.o \
> fold-const.o \
> fold-const-call.o \
> +   fold-mem-offsets.o \
> function.o \
> function-abi.o \
> function-tests.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index f137a1f81ac..b103b8d28ed 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1252,6 +1252,10 @@ fcprop-registers
>  Common

Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-16 Thread Manolis Tsamis
On Tue, Oct 10, 2023 at 5:15 PM Jeff Law  wrote:
>
>
>
> On 10/10/23 05:59, Manolis Tsamis wrote:
>
> >> It's a code quality issue as long as we don't transform the code into
> >> movl $0, -18874240, at which point it would become a correctness issue.
> >>
> > Ok, thanks for pointing that out as I thought that movl $0, -18874240
> > and movl $0, -18874240(eax) with eax == 0 were the same.
> It's a quirk of x32 related to sign extension of the address.  Certainly
> not obvious!  But it's better than the PA where:
>
> (mem (plus (reg A) (reg B)))
>
> Is semantically different than
>
> (mem (plus (reg B) (reg A)))
>
>
> Due to the implicit segment register selection that happens on the high
> bits of the base register rather than the high bits of the effective
> address.
>
I see, thanks for the additional info

>
> >
> >>
> >>>
> >>> With regards to the "recognize that the base register is 0", that
> >>> would be nice but how would we recognise that? f-m-o can only
> >>> calculate the folded offset but that is not enough to prove that the
> >>> base register is zero or not.
> >> It's a chain of insns that produce an address and use it in the memory
> >> reference.  We essentially changed the first insn in the chain from movl
> >> -18874240, %eax into movl 0, %eax.  So we'd have to someone note that
> >> the base register in the memory reference has the value zero in the
> >> chain of instructions.  That may or may not be reasonable to do.
> >>
> > Ok, do you believe it would be worthwhile to add a REG_EQ note or
> > something similar? I assume some REG_EQ notes will be added from the
> > following cprop-hardreg pass too?
> I think it really depends on the degree of difficulty and whether or not
> we think there are other cases like this one lurking.
>
> Given this scenario can only happen with an absolute address, I would
> probably explore if there's a way to trivially detect this case, but I
> wouldn't spend a lot of time on it.
>
> And I wasn't necessarily thinking of a note in the RTL, just a bit of
> state within f-m-o when updating an arithmetic chain so that we could
> determine that the final instruction was a memory reference to an
> absolute address.
>
Ok, although I didn't include this in v7, I'll do some exploration for
a possible future change.
>
>
>
> >
> >> You could use the costing model to cost the entire sequence
> >> before/after.  There's an interface to walk a sequence and return a
> >> cost.  In the case of f-m-o the insns are part of the larger chain, so
> >> we'd need a different API.
> >>
> >> The other option would be to declare this is known, but not important
> >> issue.  I would think that it's going to be rare to have absolute
> >> addresses and x32 isn't really used much.  The combination of the two
> >> should be exceedingly rare.  Hence my willingness to use
> >> -fno-fold-mem-offsets in the test.
> >>
> > Yes, I now better understand why you suggested adding
> > -fno-fold-mem-offsets to the testcase. Or we could also make the test
> > not fail in this case where the memory access has the base register,
> > as there's no correctness issue.
> :)
>
>
> >
> > Then, going back to the code quality regression, wouldn't things be
> > much better if we would emit xor eax, eax instead of mov eax, 0? In
> > that case xor eax, eax should be 2 bytes instead of 5 for mov eax, 0
> > and hence the code size difference should be trivial. Then we can keep
> > f-m-o as is, including this testcase.
> The selection of xor vs mov imm is handled elsewhere.  On some uarchs
> the former is preferred, on others the latter.
>
> If you think about xor %eax,%eax as an example.  Unless you special case
> that scenario in your cpu front-end, the xor will have a data dependency
> on any prior set of %eax which inhibits ILP.
>
>
> >
> > Is there a way to emit a taret-specific optimized register zero insn?
> > If so I'll use that and adjust the testcase as needed, and I think
> > we're done with this one.
> I can live with just adjusting the testcase.  Let's go with that.
>
Done! Sent as v7 together with the recognised -> recognized change and
INSN_CODE(insn) = -1 for the bug you reported.

Thanks,
Manolis
> jeff


Re: [PATCH v20 26/40] libstdc++: Optimize is_object trait performance

2023-10-16 Thread Patrick Palka
On Sun, 15 Oct 2023, Ken Matsui wrote:

> This patch optimizes the performance of the is_object trait by dispatching to
> the new __is_function and __is_reference built-in traits.
> 
> libstdc++-v3/ChangeLog:
>   * include/std/type_traits (is_object): Use __is_function and
>   __is_reference built-in traits.
>   (is_object_v): Likewise.
> 
> Signed-off-by: Ken Matsui 
> ---
>  libstdc++-v3/include/std/type_traits | 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index bd57488824b..674d398c075 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -725,11 +725,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  { };
>  
>/// is_object
> +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_function) \
> + && _GLIBCXX_USE_BUILTIN_TRAIT(__is_reference)
> +  template
> +struct is_object
> +: public __bool_constant + || is_void<_Tp>::value)>
> +{ };

Since is_object is one of the more commonly used traits, we should
probably just define a built-in for it.  (Either way we'd have to
repeat the logic twice, either once in the frontend and once in
the library, or twice in the library (is_object and is_object_v),
so might as well do the more efficient approach).

> +#else
>template
>  struct is_object
>  : public __not_<__or_, is_reference<_Tp>,
>is_void<_Tp>>>::type
>  { };
> +#endif
>  
>template
>  struct is_member_pointer;
> @@ -3305,8 +3314,17 @@ template 
>inline constexpr bool is_arithmetic_v = is_arithmetic<_Tp>::value;
>  template 
>inline constexpr bool is_fundamental_v = is_fundamental<_Tp>::value;
> +
> +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_function) \
> + && _GLIBCXX_USE_BUILTIN_TRAIT(__is_reference)
> +template 
> +  inline constexpr bool is_object_v
> += !(__is_function(_Tp) || __is_reference(_Tp) || is_void<_Tp>::value);
> +#else
>  template 
>inline constexpr bool is_object_v = is_object<_Tp>::value;
> +#endif
> +
>  template 
>inline constexpr bool is_scalar_v = is_scalar<_Tp>::value;
>  template 
> -- 
> 2.42.0
> 
> 



[PATCH v7] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-16 Thread Manolis Tsamis
This is a new RTL pass that tries to optimize memory offset calculations
by moving them from add immediate instructions to the memory loads/stores.
For example it can transform this:

  addi t4,sp,16
  add  t2,a6,t4
  shl  t3,t2,1
  ld   a2,0(t3)
  addi a2,1
  sd   a2,8(t2)

into the following (one instruction less):

  add  t2,a6,sp
  shl  t3,t2,1
  ld   a2,32(t3)
  addi a2,1
  sd   a2,24(t2)

Although there are places where this is done already, this pass is more
powerful and can handle the more difficult cases that are currently not
optimized. Also, it runs late enough and can optimize away unnecessary
stack pointer calculations.

gcc/ChangeLog:

* Makefile.in: Add fold-mem-offsets.o.
* passes.def: Schedule a new pass.
* tree-pass.h (make_pass_fold_mem_offsets): Declare.
* common.opt: New options.
* doc/invoke.texi: Document new option.
* fold-mem-offsets.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/fold-mem-offsets-1.c: New test.
* gcc.target/riscv/fold-mem-offsets-2.c: New test.
* gcc.target/riscv/fold-mem-offsets-3.c: New test.

Signed-off-by: Manolis Tsamis 
---

Changes in v7:
- Replace recognise with recognize.
- Add INSN_CODE(insn) = -1 before calling insn_invalid_p.
- Adjust scan regex in i386/pr52146.c.

Changes in v6:
- Fix formatting issues.
- Compute maximum validity iterations based on
  flag_expensive_optimizations.
- Generate rtx plus only if offset is not zero.

Changes in v5:
- Introduce new helper function fold_offsets_1.
- Fix bug because constants could be partially propagated
  through instructions that weren't understood.
- Introduce helper class fold_mem_info that stores f-m-o
  info for an instruction.
- Calculate fold_offsets only once with do_fold_info_calculation.
- Fix correctness issue by introducing compute_validity_closure.
- Propagate in more cases for PLUS/MINUS with constant.

Changes in v4:
- Add DF_EQ_NOTES flag to avoid incorrect state in notes.
- Remove fold_mem_offsets_driver and enum fold_mem_phase.
- Call recog when patching offsets in do_commit_offset.
- Restore INSN_CODE after modifying insn in do_check_validity.

Changes in v3:
- Added propagation for more codes:
  sub, neg, mul.
- Added folding / elimination for sub and
  const int moves.
- For the validity check of the generated addresses
  also test memory_address_addr_space_p.
- Replaced GEN_INT with gen_int_mode.
- Replaced some bitmap_head with auto_bitmap.
- Refactor each phase into own function for readability.
- Add dump details.
- Replace rtx iteration with reg_mentioned_p.
- Return early for codes that we can't propagate through.

Changes in v2:
- Made the pass target-independant instead of RISCV specific.
- Fixed a number of bugs.
- Add code to handle more ADD patterns as found
  in other targets (x86, aarch64).
- Improved naming and comments.
- Fixed bitmap memory leak.

 gcc/Makefile.in   |   1 +
 gcc/common.opt|   4 +
 gcc/doc/invoke.texi   |   8 +
 gcc/fold-mem-offsets.cc   | 901 ++
 gcc/passes.def|   1 +
 gcc/testsuite/gcc.target/i386/pr52146.c   |   2 +-
 .../gcc.target/riscv/fold-mem-offsets-1.c |  16 +
 .../gcc.target/riscv/fold-mem-offsets-2.c |  24 +
 .../gcc.target/riscv/fold-mem-offsets-3.c |  17 +
 gcc/tree-pass.h   |   1 +
 10 files changed, 974 insertions(+), 1 deletion(-)
 create mode 100644 gcc/fold-mem-offsets.cc
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/fold-mem-offsets-3.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9cc16268abf..747f749538d 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1443,6 +1443,7 @@ OBJS = \
fixed-value.o \
fold-const.o \
fold-const-call.o \
+   fold-mem-offsets.o \
function.o \
function-abi.o \
function-tests.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index f137a1f81ac..b103b8d28ed 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1252,6 +1252,10 @@ fcprop-registers
 Common Var(flag_cprop_registers) Optimization
 Perform a register copy-propagation optimization pass.
 
+ffold-mem-offsets
+Target Bool Var(flag_fold_mem_offsets) Init(1)
+Fold instructions calculating memory offsets to the memory access instruction 
if possible.
+
 fcrossjumping
 Common Var(flag_crossjumping) Optimization
 Perform cross-jumping optimization.
diff --git a/gcc/doc/invoke.texi b/gcc/

[committed] d: Merge upstream dmd, druntime 4c18eed967, phobos d945686a4.

2023-10-16 Thread Iain Buclaw
Hi,

This patch merges the D front-end and run-time library with upstream dmd
4c18eed967, and standard library with phobos d945686a4.

Synchronizing with the upstream development branch as of 2023-10-16.

D front-end changes:

- Import latest fixes to mainline.

D runtime changes:

- Import latest fixes to mainline.

Phobos changes:

- Import latest fixes to mainline.

Bootstrapped and regression tested on x86_64-linux-gnu/-m32, committed
to mainline.

Regards,
Iain.

---
gcc/d/ChangeLog:

* dmd/MERGE: Merge upstream dmd 4c18eed967.
* d-diagnostic.cc (verrorReport): Update for new front-end interface.
(verrorReportSupplemental): Likewise.
* d-lang.cc (d_init_options): Likewise.
(d_handle_option): Likewise.
(d_post_options): Likewise.
(d_parse_file): Likewise.
* decl.cc (get_symbol_decl): Likewise.

libphobos/ChangeLog:

* libdruntime/MERGE: Merge upstream druntime 4c18eed967.
* src/MERGE: Merge upstream phobos d945686a4.
---
 gcc/d/d-diagnostic.cc |   4 +-
 gcc/d/d-lang.cc   |  86 +-
 gcc/d/decl.cc |   4 +-
 gcc/d/dmd/MERGE   |   2 +-
 gcc/d/dmd/access.d|   3 +-
 gcc/d/dmd/aggregate.d |  11 +-
 gcc/d/dmd/aggregate.h |   1 +
 gcc/d/dmd/arrayop.d   |  11 +-
 gcc/d/dmd/attrib.d|   7 +-
 gcc/d/dmd/blockexit.d |  19 +-
 gcc/d/dmd/canthrow.d  |  43 +-
 gcc/d/dmd/clone.d |   2 +-
 gcc/d/dmd/compiler.d  |   1 -
 gcc/d/dmd/cond.d  |   4 +
 gcc/d/dmd/constfold.d |  18 +-
 gcc/d/dmd/cparse.d|   5 +-
 gcc/d/dmd/cppmangle.d |  10 +-
 gcc/d/dmd/ctfe.h  |   1 -
 gcc/d/dmd/ctfeexpr.d  |   8 +-
 gcc/d/dmd/dcast.d |  53 +-
 gcc/d/dmd/dclass.d|  58 +-
 gcc/d/dmd/declaration.d   |  16 +-
 gcc/d/dmd/denum.d |   5 +-
 gcc/d/dmd/dimport.d   |   2 +-
 gcc/d/dmd/dinterpret.d| 296 +++
 gcc/d/dmd/dmangle.d   |  20 +-
 gcc/d/dmd/dmodule.d   |  44 +-
 gcc/d/dmd/doc.d   |   2 +-
 gcc/d/dmd/dstruct.d   |   2 +-
 gcc/d/dmd/dsymbol.d   |  87 +-
 gcc/d/dmd/dsymbol.h   |   4 -
 gcc/d/dmd/dsymbolsem.d| 306 +++
 gcc/d/dmd/dtemplate.d |  69 +-
 gcc/d/dmd/dtoh.d  |  20 +
 gcc/d/dmd/dversion.d  |  13 +-
 gcc/d/dmd/expression.d| 336 +++-
 gcc/d/dmd/expression.h|   6 +-
 gcc/d/dmd/expressionsem.d | 439 +-
 gcc/d/dmd/func.d  |  36 +-
 gcc/d/dmd/globals.d   |  57 +-
 gcc/d/dmd/globals.h   |  48 +-
 gcc/d/dmd/hdrgen.d| 760 ++
 gcc/d/dmd/iasm.d  |   1 +
 gcc/d/dmd/id.d|   2 +
 gcc/d/dmd/importc.d   |   5 +-
 gcc/d/dmd/init.d  |   8 -
 gcc/d/dmd/init.h  |   2 -
 gcc/d/dmd/initsem.d   |  31 +-
 gcc/d/dmd/json.d  |   4 +-
 gcc/d/dmd/lexer.d |  75 +-
 gcc/d/dmd/mtype.d |   6 +-
 gcc/d/dmd/mustuse.d   |   3 +-
 gcc/d/dmd/nogc.d  |   4 +-
 gcc/d/dmd/nspace.d|   3 +-
 gcc/d/dmd/ob.d|  20 +-
 gcc/d/dmd/objc.d  |  32 +-
 gcc/d/dmd/opover.d|  32 +-
 gcc/d/dmd/optimize.d  |  53 +-
 gcc/d/dmd/parse.d |  15 +-
 gcc/d/dmd/root/filename.d |   7 +-
 gcc/d/dmd/root/rootobject.d   |   6 +-
 gcc/d/dmd/semantic2.d |  34 +-
 gcc/d/dmd/semantic3.d |  48 +-
 gcc/d/dmd/sideeffect.d|   9 +-
 gcc/d/dmd/statement.d | 167 +---
 gcc/d/dmd/statement.h |   8 +-
 gcc/d/dmd/statementsem.d  | 192 -
 gcc/d/dmd/staticcond.d|   3 +-
 gcc/d/dmd/traits.d| 104 +--
 gcc/d/dmd/typesem.d   |  42 +-

Re: [ARC PATCH] Split asl dst, 1, src into bset dst, 0, src to implement 1<

2023-10-16 Thread Jeff Law




On 10/15/23 02:12, Roger Sayle wrote:

This patch adds a pre-reload splitter to arc.md, to use the bset (set

specific bit instruction) to implement 1

Re: [PATCH v20 31/40] c++: Implement __is_arithmetic built-in trait

2023-10-16 Thread Patrick Palka
On Sun, 15 Oct 2023, Ken Matsui wrote:

> This patch implements built-in trait for std::is_arithmetic.
> 
> gcc/cp/ChangeLog:
> 
>   * cp-trait.def: Define __is_arithmetic.
>   * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_ARITHMETIC.
>   * semantics.cc (trait_expr_value): Likewise.
>   (finish_trait_expr): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/ext/has-builtin-1.C: Test existence of __is_arithmetic.
>   * g++.dg/ext/is_arithmetic.C: New test.
> 
> Signed-off-by: Ken Matsui 
> ---
>  gcc/cp/constraint.cc |  3 +++
>  gcc/cp/cp-trait.def  |  1 +
>  gcc/cp/semantics.cc  |  4 +++
>  gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
>  gcc/testsuite/g++.dg/ext/is_arithmetic.C | 33 
>  5 files changed, 44 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_arithmetic.C
> 
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index c9d627fa782..3a7f968eae8 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -3714,6 +3714,9 @@ diagnose_trait_expr (tree expr, tree args)
>  case CPTK_IS_AGGREGATE:
>inform (loc, "  %qT is not an aggregate", t1);
>break;
> +case CPTK_IS_ARITHMETIC:
> +  inform (loc, "  %qT is not an arithmetic type", t1);
> +  break;
>  case CPTK_IS_ARRAY:
>inform (loc, "  %qT is not an array", t1);
>break;
> diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
> index c60724e869e..b2be7b7bbd7 100644
> --- a/gcc/cp/cp-trait.def
> +++ b/gcc/cp/cp-trait.def
> @@ -59,6 +59,7 @@ DEFTRAIT_EXPR (HAS_UNIQUE_OBJ_REPRESENTATIONS, 
> "__has_unique_object_representati
>  DEFTRAIT_EXPR (HAS_VIRTUAL_DESTRUCTOR, "__has_virtual_destructor", 1)
>  DEFTRAIT_EXPR (IS_ABSTRACT, "__is_abstract", 1)
>  DEFTRAIT_EXPR (IS_AGGREGATE, "__is_aggregate", 1)
> +DEFTRAIT_EXPR (IS_ARITHMETIC, "__is_arithmetic", 1)
>  DEFTRAIT_EXPR (IS_ARRAY, "__is_array", 1)
>  DEFTRAIT_EXPR (IS_ASSIGNABLE, "__is_assignable", 2)
>  DEFTRAIT_EXPR (IS_BASE_OF, "__is_base_of", 2)
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 83ed674b9d4..deab0134509 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -12143,6 +12143,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, 
> tree type2)
>  case CPTK_IS_AGGREGATE:
>return CP_AGGREGATE_TYPE_P (type1);
>  
> +case CPTK_IS_ARITHMETIC:
> +  return ARITHMETIC_TYPE_P (type1);
> +

For built-ins corresponding to is_arithmetic and other standard traits
defined in terms of it (e.g.  is_scalar, is_unsigned, is_signed,
is_fundamental) we need to make sure we preserve their behavior for
__int128, which IIUC is currently recognized as an integral type
(according to std::is_integral) only in GNU mode.

This'll probably be subtle to get right, so if you don't mind let's
split out the work for those built-in traits into a separate patch
series in order to ease review of the main patch series.

>  case CPTK_IS_ARRAY:
>return type_code1 == ARRAY_TYPE;
>  
> @@ -12406,6 +12409,7 @@ finish_trait_expr (location_t loc, cp_trait_kind 
> kind, tree type1, tree type2)
>   return error_mark_node;
>break;
>  
> +case CPTK_IS_ARITHMETIC:
>  case CPTK_IS_ARRAY:
>  case CPTK_IS_BOUNDED_ARRAY:
>  case CPTK_IS_CLASS:
> diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
> b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> index efce04fd09d..4bc85f4babb 100644
> --- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> +++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> @@ -56,6 +56,9 @@
>  #if !__has_builtin (__is_aggregate)
>  # error "__has_builtin (__is_aggregate) failed"
>  #endif
> +#if !__has_builtin (__is_arithmetic)
> +# error "__has_builtin (__is_arithmetic) failed"
> +#endif
>  #if !__has_builtin (__is_array)
>  # error "__has_builtin (__is_array) failed"
>  #endif
> diff --git a/gcc/testsuite/g++.dg/ext/is_arithmetic.C 
> b/gcc/testsuite/g++.dg/ext/is_arithmetic.C
> new file mode 100644
> index 000..fd35831f646
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/is_arithmetic.C
> @@ -0,0 +1,33 @@
> +// { dg-do compile { target c++11 } }
> +
> +#include 
> +
> +using namespace __gnu_test;
> +
> +#define SA(X) static_assert((X),#X)
> +#define SA_TEST_CATEGORY(TRAIT, TYPE, EXPECT)\
> +  SA(TRAIT(TYPE) == EXPECT); \
> +  SA(TRAIT(const TYPE) == EXPECT);   \
> +  SA(TRAIT(volatile TYPE) == EXPECT);\
> +  SA(TRAIT(const volatile TYPE) == EXPECT)
> +
> +SA_TEST_CATEGORY(__is_arithmetic, void, false);
> +
> +SA_TEST_CATEGORY(__is_arithmetic, char, true);
> +SA_TEST_CATEGORY(__is_arithmetic, signed char, true);
> +SA_TEST_CATEGORY(__is_arithmetic, unsigned char, true);
> +SA_TEST_CATEGORY(__is_arithmetic, wchar_t, true);
> +SA_TEST_CATEGORY(__is_arithmetic, short, true);
> +SA_TEST_CATEGORY(__is_arithmetic, unsigned short, true);
> +SA_TEST_CA

Re: [PATCH v4] i386: Allow -mlarge-data-threshold with -mcmodel=large

2023-10-16 Thread Uros Bizjak
On Tue, Aug 1, 2023 at 9:51 PM Fangrui Song  wrote:
>
> When using -mcmodel=medium, large data objects larger than the
> -mlarge-data-threshold threshold are placed into large data sections
> (.lrodata, .ldata, .lbss and some variants).  GNU ld and ld.lld 17 place
> .l* sections into separate output sections.  If small and medium code
> model object files are mixed, the .l* sections won't exert relocation
> overflow pressure on sections in object files built with -mcmodel=small.
>
> However, when using -mcmodel=large, -mlarge-data-threshold doesn't
> apply.  This means that the .rodata/.data/.bss sections may exert
> relocation overflow pressure on sections in -mcmodel=small object files.
>
> This patch allows -mcmodel=large to generate .l* sections and drops an
> unneeded documentation restriction that the value must be the same.
>
> Link: https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
> ("Large data sections for the large code model")
>
> Signed-off-by: Fangrui Song 
>
> ---
> Changes from v1 
> (https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616947.html):
> * Clarify commit message. Add link to 
> https://groups.google.com/g/x86-64-abi/c/jnQdJeabxiU
>
> Changes from v2
> * Drop an uneeded limitation in the documentation.
>
> Changes from v3
> * Change scan-assembler directives to use \. to match literal .
> ---
>  gcc/config/i386/i386.cc| 15 +--
>  gcc/config/i386/i386.opt   |  2 +-
>  gcc/doc/invoke.texi|  6 +++---
>  gcc/testsuite/gcc.target/i386/large-data.c | 13 +
>  4 files changed, 26 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/large-data.c
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index eabc70011ea..37e810cc741 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -647,7 +647,8 @@ ix86_can_inline_p (tree caller, tree callee)
>  static bool
>  ix86_in_large_data_p (tree exp)
>  {
> -  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC)
> +  if (ix86_cmodel != CM_MEDIUM && ix86_cmodel != CM_MEDIUM_PIC &&
> +  ix86_cmodel != CM_LARGE && ix86_cmodel != CM_LARGE_PIC)

Please split multi-line expression before the operator, not after it,
as instructed in GNU Coding Standards [1] ...

[1] https://www.gnu.org/prep/standards/html_node/Formatting.html

>  return false;
>
>if (exp == NULL_TREE)
> @@ -858,8 +859,9 @@ x86_elf_aligned_decl_common (FILE *file, tree decl,
> const char *name, unsigned HOST_WIDE_INT size,
> unsigned align)
>  {
> -  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
> -  && size > (unsigned int)ix86_section_threshold)
> +  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
> +  ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
> + size > (unsigned int)ix86_section_threshold)

... also here ...

>  {
>switch_to_section (get_named_section (decl, ".lbss", 0));
>fputs (LARGECOMM_SECTION_ASM_OP, file);
> @@ -879,9 +881,10 @@ void
>  x86_output_aligned_bss (FILE *file, tree decl, const char *name,
> unsigned HOST_WIDE_INT size, unsigned align)
>  {
> -  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC)
> -  && size > (unsigned int)ix86_section_threshold)
> -switch_to_section (get_named_section (decl, ".lbss", 0));
> +  if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC ||
> +   ix86_cmodel == CM_LARGE || ix86_cmodel == CM_LARGE_PIC) &&
> +  size > (unsigned int)ix86_section_threshold)

... and here.

OK with these formatting changes.

Thanks,
Uros.

> +switch_to_section(get_named_section(decl, ".lbss", 0));
>else
>  switch_to_section (bss_section);
>ASM_OUTPUT_ALIGN (file, floor_log2 (align / BITS_PER_UNIT));
> diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
> index 1cc8563477a..52fad492353 100644
> --- a/gcc/config/i386/i386.opt
> +++ b/gcc/config/i386/i386.opt
> @@ -282,7 +282,7 @@ Branches are this expensive (arbitrary units).
>
>  mlarge-data-threshold=
>  Target RejectNegative Joined UInteger Var(ix86_section_threshold) 
> Init(DEFAULT_LARGE_SECTION_THRESHOLD)
> --mlarge-data-threshold=Data greater than given threshold 
> will go into .ldata section in x86-64 medium model.
> +-mlarge-data-threshold=Data greater than given threshold 
> will go into a large data section in x86-64 medium and large code models.
>
>  mcmodel=
>  Target RejectNegative Joined Enum(cmodel) Var(ix86_cmodel) Init(CM_32)
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 104766f446d..bf6fe3e1a20 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -33207,9 +33207,9 @@ the cache line size.  @samp{compat} is the default.
>
>  @opindex mlarge-data-threshold
>  @item -mlarge-data-threshold=@var{threshold}
> -When @option{-mcmodel=medium} is specified, data ob

[patch] fortran/intrinsic.texi: Add 'passed by value' to signal handler

2023-10-16 Thread Tobias Burnus

Yesterday, someone was confused because the signal handler did not work.

It turned out that the created Fortran procedure used as handler used
pass by reference - and 'signal' passed the it by value.

This patch adds the 'passed by value' to the wording:

"@var{HANDLER} to be executed with a single integer argument passed by
value"

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
fortran/intrinsic.texi: Add 'passed by value' to signal handler

gcc/fortran/ChangeLog:

	* intrinsic.texi (signal): Mention that the argument
	passed to the signal handler procedure is passed by reference.

diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi
index 6c7ad03a02c..3620209e00a 100644
--- a/gcc/fortran/intrinsic.texi
+++ b/gcc/fortran/intrinsic.texi
@@ -13168,10 +13168,10 @@ end program test_sign
 @table @asis
 @item @emph{Description}:
 @code{SIGNAL(NUMBER, HANDLER [, STATUS])} causes external subroutine
-@var{HANDLER} to be executed with a single integer argument when signal
-@var{NUMBER} occurs.  If @var{HANDLER} is an integer, it can be used to
-turn off handling of signal @var{NUMBER} or revert to its default
-action.  See @code{signal(2)}.
+@var{HANDLER} to be executed with a single integer argument passed by
+value when signal @var{NUMBER} occurs.  If @var{HANDLER} is an integer,
+it can be used to turn off handling of signal @var{NUMBER} or revert to
+its default action.  See @code{signal(2)}.
 
 If @code{SIGNAL} is called as a subroutine and the @var{STATUS} argument
 is supplied, it is set to the value returned by @code{signal(2)}.


Re: [PATCH v20 30/40] libstdc++: Optimize is_pointer trait performance

2023-10-16 Thread Patrick Palka
On Sun, 15 Oct 2023, Ken Matsui wrote:

> This patch optimizes the performance of the is_pointer trait by dispatching to
> the new __is_pointer built-in trait.
> 
> libstdc++-v3/ChangeLog:
> 
>   * include/bits/cpp_type_traits.h (__is_pointer): Use __is_pointer
>   built-in trait.
>   * include/std/type_traits (is_pointer): Likewise. Optimize its
>   implementation.
>   (is_pointer_v): Likewise.
> 
> Co-authored-by: Jonathan Wakely 
> Signed-off-by: Ken Matsui 
> ---
>  libstdc++-v3/include/bits/cpp_type_traits.h |  8 
>  libstdc++-v3/include/std/type_traits| 44 +
>  2 files changed, 44 insertions(+), 8 deletions(-)
> 
> diff --git a/libstdc++-v3/include/bits/cpp_type_traits.h 
> b/libstdc++-v3/include/bits/cpp_type_traits.h
> index 4312f32a4e0..cd5ce45951f 100644
> --- a/libstdc++-v3/include/bits/cpp_type_traits.h
> +++ b/libstdc++-v3/include/bits/cpp_type_traits.h
> @@ -363,6 +363,13 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
>//
>// Pointer types
>//
> +#if __has_builtin(__is_pointer)

Why not _GLIBCXX_USE_BUILTIN_TRAIT?  LGTM besides this.

> +  template
> +struct __is_pointer : __truth_type<__is_pointer(_Tp)>
> +{
> +  enum { __value = __is_pointer(_Tp) };
> +};

Nice :D

> +#else
>template
>  struct __is_pointer
>  {
> @@ -376,6 +383,7 @@ __INT_N(__GLIBCXX_TYPE_INT_N_3)
>enum { __value = 1 };
>typedef __true_type __type;
>  };
> +#endif
>  
>//
>// An arithmetic type is an integer type or a floating point type
> diff --git a/libstdc++-v3/include/std/type_traits 
> b/libstdc++-v3/include/std/type_traits
> index 9c56d15c0b7..3acd843f2f2 100644
> --- a/libstdc++-v3/include/std/type_traits
> +++ b/libstdc++-v3/include/std/type_traits
> @@ -542,19 +542,33 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>  : public true_type { };
>  #endif
>  
> -  template
> -struct __is_pointer_helper
> +  /// is_pointer
> +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
> +  template
> +struct is_pointer
> +: public __bool_constant<__is_pointer(_Tp)>
> +{ };
> +#else
> +  template
> +struct is_pointer
>  : public false_type { };
>  
>template
> -struct __is_pointer_helper<_Tp*>
> +struct is_pointer<_Tp*>
>  : public true_type { };
>  
> -  /// is_pointer
>template
> -struct is_pointer
> -: public __is_pointer_helper<__remove_cv_t<_Tp>>::type
> -{ };
> +struct is_pointer<_Tp* const>
> +: public true_type { };
> +
> +  template
> +struct is_pointer<_Tp* volatile>
> +: public true_type { };
> +
> +  template
> +struct is_pointer<_Tp* const volatile>
> +: public true_type { };
> +#endif
>  
>/// is_lvalue_reference
>template
> @@ -3254,8 +3268,22 @@ template 
>inline constexpr bool is_array_v<_Tp[_Num]> = true;
>  #endif
>  
> +#if _GLIBCXX_USE_BUILTIN_TRAIT(__is_pointer)
> +template 
> +  inline constexpr bool is_pointer_v = __is_pointer(_Tp);
> +#else
>  template 
> -  inline constexpr bool is_pointer_v = is_pointer<_Tp>::value;
> +  inline constexpr bool is_pointer_v = false;
> +template 
> +  inline constexpr bool is_pointer_v<_Tp*> = true;
> +template 
> +  inline constexpr bool is_pointer_v<_Tp* const> = true;
> +template 
> +  inline constexpr bool is_pointer_v<_Tp* volatile> = true;
> +template 
> +  inline constexpr bool is_pointer_v<_Tp* const volatile> = true;
> +#endif
> +
>  template 
>inline constexpr bool is_lvalue_reference_v = false;
>  template 
> -- 
> 2.42.0
> 
> 



Re: [PATCH v20 01/40] c++: Sort built-in traits alphabetically

2023-10-16 Thread Patrick Palka
On Sun, 15 Oct 2023, Ken Matsui wrote:

> This patch sorts built-in traits alphabetically for better code
> readability.

Hmm, I'm not sure if we still want/need this change with this current
approach.  IIUC gperf would sort the trait names when generating the
hash table code, and so we wanted a more consistent mapping from the
cp-trait.def file to the generated code.  But with this current
non-gperf approach I'm inclined to leave the existing ordering alone
for sake of simplicity, and I kind of like that in cp-trait.def we
currently group all expression-yielding traits together and all
type-yielding traits together; that seems like a more natural layout
than plain alphabetical sorting.

> 
> gcc/cp/ChangeLog:
> 
>   * constraint.cc (diagnose_trait_expr): Sort built-in traits
>   alphabetically.
>   * cp-trait.def: Likewise.
>   * semantics.cc (trait_expr_value): Likewise.
>   (finish_trait_expr): Likewise.
>   (finish_trait_type): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/ext/has-builtin-1.C: Sort built-in traits alphabetically.
> 
> Signed-off-by: Ken Matsui 
> ---
>  gcc/cp/constraint.cc | 68 -
>  gcc/cp/cp-trait.def  | 10 +--
>  gcc/cp/semantics.cc  | 94 
>  gcc/testsuite/g++.dg/ext/has-builtin-1.C | 70 +-
>  4 files changed, 121 insertions(+), 121 deletions(-)
> 
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index c9e4e7043cd..722fc334e6f 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -3702,18 +3702,36 @@ diagnose_trait_expr (tree expr, tree args)
>  case CPTK_HAS_TRIVIAL_DESTRUCTOR:
>inform (loc, "  %qT is not trivially destructible", t1);
>break;
> +case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS:
> +  inform (loc, "  %qT does not have unique object representations", t1);
> +  break;
>  case CPTK_HAS_VIRTUAL_DESTRUCTOR:
>inform (loc, "  %qT does not have a virtual destructor", t1);
>break;
>  case CPTK_IS_ABSTRACT:
>inform (loc, "  %qT is not an abstract class", t1);
>break;
> +case CPTK_IS_AGGREGATE:
> +  inform (loc, "  %qT is not an aggregate", t1);
> +  break;
> +case CPTK_IS_ASSIGNABLE:
> +  inform (loc, "  %qT is not assignable from %qT", t1, t2);
> +  break;
>  case CPTK_IS_BASE_OF:
>inform (loc, "  %qT is not a base of %qT", t1, t2);
>break;
>  case CPTK_IS_CLASS:
>inform (loc, "  %qT is not a class", t1);
>break;
> +case CPTK_IS_CONSTRUCTIBLE:
> +  if (!t2)
> +inform (loc, "  %qT is not default constructible", t1);
> +  else
> +inform (loc, "  %qT is not constructible from %qE", t1, t2);
> +  break;
> +case CPTK_IS_CONVERTIBLE:
> +  inform (loc, "  %qT is not convertible from %qE", t2, t1);
> +  break;
>  case CPTK_IS_EMPTY:
>inform (loc, "  %qT is not an empty class", t1);
>break;
> @@ -3729,6 +3747,18 @@ diagnose_trait_expr (tree expr, tree args)
>  case CPTK_IS_LITERAL_TYPE:
>inform (loc, "  %qT is not a literal type", t1);
>break;
> +case CPTK_IS_NOTHROW_ASSIGNABLE:
> +  inform (loc, "  %qT is not nothrow assignable from %qT", t1, t2);
> +  break;
> +case CPTK_IS_NOTHROW_CONSTRUCTIBLE:
> +  if (!t2)
> + inform (loc, "  %qT is not nothrow default constructible", t1);
> +  else
> + inform (loc, "  %qT is not nothrow constructible from %qE", t1, t2);
> +  break;
> +case CPTK_IS_NOTHROW_CONVERTIBLE:
> +   inform (loc, "  %qT is not nothrow convertible from %qE", t2, t1);
> +  break;
>  case CPTK_IS_POINTER_INTERCONVERTIBLE_BASE_OF:
>inform (loc, "  %qT is not pointer-interconvertible base of %qT",
> t1, t2);
> @@ -3748,50 +3778,20 @@ diagnose_trait_expr (tree expr, tree args)
>  case CPTK_IS_TRIVIAL:
>inform (loc, "  %qT is not a trivial type", t1);
>break;
> -case CPTK_IS_UNION:
> -  inform (loc, "  %qT is not a union", t1);
> -  break;
> -case CPTK_IS_AGGREGATE:
> -  inform (loc, "  %qT is not an aggregate", t1);
> -  break;
> -case CPTK_IS_TRIVIALLY_COPYABLE:
> -  inform (loc, "  %qT is not trivially copyable", t1);
> -  break;
> -case CPTK_IS_ASSIGNABLE:
> -  inform (loc, "  %qT is not assignable from %qT", t1, t2);
> -  break;
>  case CPTK_IS_TRIVIALLY_ASSIGNABLE:
>inform (loc, "  %qT is not trivially assignable from %qT", t1, t2);
>break;
> -case CPTK_IS_NOTHROW_ASSIGNABLE:
> -  inform (loc, "  %qT is not nothrow assignable from %qT", t1, t2);
> -  break;
> -case CPTK_IS_CONSTRUCTIBLE:
> -  if (!t2)
> - inform (loc, "  %qT is not default constructible", t1);
> -  else
> - inform (loc, "  %qT is not constructible from %qE", t1, t2);
> -  break;
>  case CPTK_IS_TRIVIALLY_CONSTRUCTIBLE:
>if

[pushed] c++: improve fold-expr location

2023-10-16 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

I want to distinguish between constraint && and fold-expressions there of
written by the user and those implied by template parameter
type-constraints; to that end, let's improve our EXPR_LOCATION for an
explicit fold-expression.

The fold3.C change is needed because this moves the caret from the end of
the expression to the operator, which means the location of the error refers
to the macro invocation rather than the macro definition; both locations are
still printed, but which one is an error and which a note changes.

gcc/cp/ChangeLog:

* parser.cc (cp_parser_fold_expression): Track location range.
* semantics.cc (finish_unary_fold_expr)
(finish_left_unary_fold_expr, finish_right_unary_fold_expr)
(finish_binary_fold_expr): Add location parm.
* constraint.cc (finish_shorthand_constraint): Pass it.
* pt.cc (convert_generic_types_to_packs): Likewise.
* cp-tree.h: Adjust.

gcc/testsuite/ChangeLog:

* g++.dg/concepts/diagnostic3.C: Add expected column.
* g++.dg/cpp1z/fold3.C: Adjust diagnostic lines.
---
 gcc/cp/cp-tree.h|  6 +-
 gcc/cp/constraint.cc|  3 +-
 gcc/cp/parser.cc| 16 --
 gcc/cp/pt.cc|  4 +-
 gcc/cp/semantics.cc | 25 +
 gcc/testsuite/g++.dg/concepts/diagnostic3.C |  4 +-
 gcc/testsuite/g++.dg/cpp1z/fold3.C  | 62 ++---
 7 files changed, 66 insertions(+), 54 deletions(-)

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 6e34952da99..efcd2de54e5 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8209,9 +8209,9 @@ extern void maybe_warn_about_useless_cast   
(location_t, tree, tree,
 tsubst_flags_t);
 extern tree cp_perform_integral_promotions  (tree, tsubst_flags_t);
 
-extern tree finish_left_unary_fold_expr  (tree, int);
-extern tree finish_right_unary_fold_expr (tree, int);
-extern tree finish_binary_fold_expr  (tree, tree, int);
+extern tree finish_left_unary_fold_expr  (location_t, tree, int);
+extern tree finish_right_unary_fold_expr (location_t, tree, int);
+extern tree finish_binary_fold_expr  (location_t, tree, tree, int);
 extern tree treat_lvalue_as_rvalue_p(tree, bool);
 extern bool decl_in_std_namespace_p (tree);
 extern void maybe_warn_pessimizing_move (tree, tree, bool);
diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index c9e4e7043cd..64b64e17857 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1607,7 +1607,8 @@ finish_shorthand_constraint (tree decl, tree constr)
 
   /* Make the check a fold-expression if needed.  */
   if (apply_to_each_p && declared_pack_p)
-check = finish_left_unary_fold_expr (check, TRUTH_ANDIF_EXPR);
+check = finish_left_unary_fold_expr (DECL_SOURCE_LOCATION (decl),
+check, TRUTH_ANDIF_EXPR);
 
   return check;
 }
diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index f3abae716fe..59b9852895e 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -5533,6 +5533,8 @@ static cp_expr
 cp_parser_fold_expression (cp_parser *parser, tree expr1)
 {
   cp_id_kind pidk;
+  location_t loc = cp_lexer_peek_token (parser->lexer)->location;
+  const cp_token *token = nullptr;
 
   // Left fold.
   if (cp_lexer_next_token_is (parser->lexer, CPP_ELLIPSIS))
@@ -5540,6 +5542,7 @@ cp_parser_fold_expression (cp_parser *parser, tree expr1)
   if (expr1)
return error_mark_node;
   cp_lexer_consume_token (parser->lexer);
+  token = cp_lexer_peek_token (parser->lexer);
   int op = cp_parser_fold_operator (parser);
   if (op == ERROR_MARK)
 {
@@ -5551,10 +5554,11 @@ cp_parser_fold_expression (cp_parser *parser, tree 
expr1)
 false, &pidk);
   if (expr == error_mark_node)
 return error_mark_node;
-  return finish_left_unary_fold_expr (expr, op);
+  loc = make_location (token->location, loc, parser->lexer);
+  return finish_left_unary_fold_expr (loc, expr, op);
 }
 
-  const cp_token* token = cp_lexer_peek_token (parser->lexer);
+  token = cp_lexer_peek_token (parser->lexer);
   int op = cp_parser_fold_operator (parser);
   if (op == ERROR_MARK)
 {
@@ -5585,7 +5589,10 @@ cp_parser_fold_expression (cp_parser *parser, tree expr1)
 
   // Right fold.
   if (cp_lexer_next_token_is (parser->lexer, CPP_CLOSE_PAREN))
-return finish_right_unary_fold_expr (expr1, op);
+{
+  loc = make_location (token->location, loc, parser->lexer);
+  return finish_right_unary_fold_expr (loc, expr1, op);
+}
 
   if (cp_lexer_next_token_is_not (parser->lexer, token->type))
 {
@@ -5598,7 +5605,8 @@ cp_parser_fold_expression (cp_parser *parser, tree expr1)
   tree expr2 = cp_parser_cast_e

Re: [PATCH] aarch64: enable mixed-types for aarch64 simdclones

2023-10-16 Thread Andre Vieira (lists)

Hey,

Just a minor update to the patch, I had missed the libgomp testsuite, so 
had to make some adjustments there too.


gcc/ChangeLog:

* config/aarch64/aarch64.cc (lane_size): New function.
(aarch64_simd_clone_compute_vecsize_and_simdlen): Determine 
simdlen according to NDS rule
and reject combination of simdlen and types that lead to 
vectors larger than 128bits.


gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add aarch64 targets to vect_simd_clones.
* c-c++-common/gomp/declare-variant-14.c: Adapt test for aarch64.
* c-c++-common/gomp/pr60823-1.c: Likewise.
* c-c++-common/gomp/pr60823-2.c: Likewise.
* c-c++-common/gomp/pr60823-3.c: Likewise.
* g++.dg/gomp/attrs-10.C: Likewise.
* g++.dg/gomp/declare-simd-1.C: Likewise.
* g++.dg/gomp/declare-simd-3.C: Likewise.
* g++.dg/gomp/declare-simd-4.C: Likewise.
* g++.dg/gomp/declare-simd-7.C: Likewise.
* g++.dg/gomp/declare-simd-8.C: Likewise.
* g++.dg/gomp/pr88182.C: Likewise.
* gcc.dg/declare-simd.c: Likewise.
* gcc.dg/gomp/declare-simd-1.c: Likewise.
* gcc.dg/gomp/declare-simd-3.c: Likewise.
* gcc.dg/gomp/pr87887-1.c: Likewise.
* gcc.dg/gomp/pr87895-1.c: Likewise.
* gcc.dg/gomp/pr89246-1.c: Likewise.
* gcc.dg/gomp/pr99542.c: Likewise.
* gcc.dg/gomp/simd-clones-2.c: Likewise.
* gcc.dg/gcc.dg/vect/vect-simd-clone-1.c: Likewise.
* gcc.dg/gcc.dg/vect/vect-simd-clone-2.c: Likewise.
* gcc.dg/gcc.dg/vect/vect-simd-clone-4.c: Likewise.
* gcc.dg/gcc.dg/vect/vect-simd-clone-5.c: Likewise.
* gcc.dg/gcc.dg/vect/vect-simd-clone-8.c: Likewise.
* gfortran.dg/gomp/declare-simd-2.f90: Likewise.
* gfortran.dg/gomp/declare-simd-coarray-lib.f90: Likewise.
* gfortran.dg/gomp/declare-variant-14.f90: Likewise.
* gfortran.dg/gomp/pr79154-1.f90: Likewise.
* gfortran.dg/gomp/pr83977.f90: Likewise.

libgomp/testsuite/ChangeLog:

* libgomp.c/declare-variant-1.c: Adapt test for aarch64.
* libgomp.fortran/declare-simd-1.f90: Likewise.diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
9fbfc548a891f5d11940c6fd3c49a14bfbdec886..37507f091c2a6154fa944c3a9fad6a655ab5d5a1
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -27414,33 +27414,62 @@ supported_simd_type (tree t)
   return false;
 }
 
-/* Return true for types that currently are supported as SIMD return
-   or argument types.  */
+/* Determine the lane size for the clone argument/return type.  This follows
+   the LS(P) rule in the VFABIA64.  */
 
-static bool
-currently_supported_simd_type (tree t, tree b)
+static unsigned
+lane_size (cgraph_simd_clone_arg_type clone_arg_type, tree type)
 {
-  if (COMPLEX_FLOAT_TYPE_P (t))
-return false;
+  gcc_assert (clone_arg_type != SIMD_CLONE_ARG_TYPE_MASK);
 
-  if (TYPE_SIZE (t) != TYPE_SIZE (b))
-return false;
+  /* For non map-to-vector types that are pointers we use the element type it
+ points to.  */
+  if (POINTER_TYPE_P (type))
+switch (clone_arg_type)
+  {
+  default:
+   break;
+  case SIMD_CLONE_ARG_TYPE_UNIFORM:
+  case SIMD_CLONE_ARG_TYPE_LINEAR_CONSTANT_STEP:
+  case SIMD_CLONE_ARG_TYPE_LINEAR_VARIABLE_STEP:
+   type = TREE_TYPE (type);
+   break;
+  }
 
-  return supported_simd_type (t);
+  /* For types (or types pointers of non map-to-vector types point to) that are
+ integers or floating point, we use their size if they are 1, 2, 4 or 8.
+   */
+  if (INTEGRAL_TYPE_P (type)
+  || SCALAR_FLOAT_TYPE_P (type))
+  switch (TYPE_PRECISION (type) / BITS_PER_UNIT)
+   {
+   default:
+ break;
+   case 1:
+   case 2:
+   case 4:
+   case 8:
+ return TYPE_PRECISION (type);
+   }
+  /* For any other we use the size of uintptr_t.  For map-to-vector types that
+ are pointers, using the size of uintptr_t is the same as using the size of
+ their type, seeing all pointers are the same size as uintptr_t.  */
+  return POINTER_SIZE;
 }
 
+
 /* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN.  */
 
 static int
 aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
struct cgraph_simd_clone *clonei,
-   tree base_type, int num,
-   bool explicit_p)
+   tree base_type ATTRIBUTE_UNUSED,
+   int num, bool explicit_p)
 {
   tree t, ret_type;
-  unsigned int elt_bits, count;
+  unsigned int nds_elt_bits;
+  int count;
   unsigned HOST_WIDE_INT const_simdlen;
-  poly_uint64 vec_bits;
 
   if (!TARGET_SIMD)
 return 0;
@@ -27460,80 +27489,135 @@ aarch64_simd_clone_compute_vecsize_and_simdlen 
(struct cgraph_node *node,
 }
 
   ret_typ

Re: [PATCH v3] libcpp: add function to check XID properties

2023-10-16 Thread Arthur Cohen

Ping?

Best,

Arthur

On 9/8/23 16:59, Arthur Cohen wrote:

From: Raiki Tamura 

Fixed to include the enum's name which I had forgotten to commit.

Thanks



This commit adds a new function intended for checking the XID properties
of a possibly unicode character, as well as the accompanying enum
describing the possible properties.

libcpp/ChangeLog:

* charset.cc (cpp_check_xid_property): New.
* include/cpplib.h
(cpp_check_xid_property): New.
(enum cpp_xid_property): New.

Signed-off-by: Raiki Tamura 
---
  libcpp/charset.cc   | 36 
  libcpp/include/cpplib.h |  7 +++
  2 files changed, 43 insertions(+)

diff --git a/libcpp/charset.cc b/libcpp/charset.cc
index 7b625c9956a..a92ba75539e 100644
--- a/libcpp/charset.cc
+++ b/libcpp/charset.cc
@@ -1256,6 +1256,42 @@ _cpp_uname2c_uax44_lm2 (const char *name, size_t len, 
char *canon_name)
return result;
  }
  
+/* Returns flags representing the XID properties of the given codepoint.  */

+unsigned int
+cpp_check_xid_property (cppchar_t c)
+{
+  // fast path for ASCII
+  if (c < 0x80)
+  {
+if (('A' <= c && c <= 'Z') || ('a' <= c && c <= 'z'))
+  return CPP_XID_START | CPP_XID_CONTINUE;
+if (('0' <= c && c <= '9') || c == '_')
+  return CPP_XID_CONTINUE;
+  }
+
+  if (c > UCS_LIMIT)
+return 0;
+
+  int mn, mx, md;
+  mn = 0;
+  mx = ARRAY_SIZE (ucnranges) - 1;
+  while (mx != mn)
+{
+  md = (mn + mx) / 2;
+  if (c <= ucnranges[md].end)
+  mx = md;
+  else
+  mn = md + 1;
+}
+
+  unsigned short flags = ucnranges[mn].flags;
+
+  if (flags & CXX23)
+return CPP_XID_START | CPP_XID_CONTINUE;
+  if (flags & NXX23)
+return CPP_XID_CONTINUE;
+  return 0;
+}
  
  /* Returns 1 if C is valid in an identifier, 2 if C is valid except at

 the start of an identifier, and 0 if C is not valid in an
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index fcdaf082b09..583e3071e90 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -1606,4 +1606,11 @@ bool cpp_valid_utf8_p (const char *data, size_t 
num_bytes);
  bool cpp_is_combining_char (cppchar_t c);
  bool cpp_is_printable_char (cppchar_t c);
  
+enum cpp_xid_property {

+  CPP_XID_START = 1,
+  CPP_XID_CONTINUE = 2
+};
+
+unsigned int cpp_check_xid_property (cppchar_t c);
+
  #endif /* ! LIBCPP_CPPLIB_H */


Re: [PATCH v20 03/40] c++: Accept the use of built-in trait identifiers

2023-10-16 Thread Patrick Palka
On Sun, 15 Oct 2023, Ken Matsui wrote:

> This patch accepts the use of built-in trait identifiers when they are
> actually not used as traits.  Specifically, we check if the subsequent token
> is '(' for ordinary built-in traits or is '<' only for the special
> __type_pack_element built-in trait.  If those identifiers are used
> differently, the parser treats them as normal identifiers.  This allows
> us to accept code like: struct __is_pointer {};.

LGTM, thanks

> 
> gcc/cp/ChangeLog:
> 
>   * parser.cc (cp_lexer_lookup_trait): Rename to ...
>   (cp_lexer_peek_trait): ... this.  Handle a subsequent token for
>   the corresponding built-in trait.
>   (cp_lexer_lookup_trait_expr): Rename to ...
>   (cp_lexer_peek_trait_expr): ... this.
>   (cp_lexer_lookup_trait_type): Rename to ...
>   (cp_lexer_peek_trait_type): ... this.
>   (cp_lexer_next_token_is_decl_specifier_keyword): Call
>   cp_lexer_peek_trait_type.
>   (cp_parser_simple_type_specifier): Likewise.
>   (cp_parser_primary_expression): Call cp_lexer_peek_trait_expr.
> 
> Signed-off-by: Ken Matsui 
> ---
>  gcc/cp/parser.cc | 48 ++--
>  1 file changed, 30 insertions(+), 18 deletions(-)
> 
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index eba5272be03..0f2a1baee6a 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -246,12 +246,12 @@ static void cp_lexer_start_debugging
>(cp_lexer *) ATTRIBUTE_UNUSED;
>  static void cp_lexer_stop_debugging
>(cp_lexer *) ATTRIBUTE_UNUSED;
> -static const cp_trait *cp_lexer_lookup_trait
> -  (const cp_token *);
> -static const cp_trait *cp_lexer_lookup_trait_expr
> -  (const cp_token *);
> -static const cp_trait *cp_lexer_lookup_trait_type
> -  (const cp_token *);
> +static const cp_trait *cp_lexer_peek_trait
> +  (cp_lexer *lexer, const cp_token *);
> +static const cp_trait *cp_lexer_peek_trait_expr
> +  (cp_lexer *lexer, const cp_token *);
> +static const cp_trait *cp_lexer_peek_trait_type
> +  (cp_lexer *lexer, const cp_token *);
>  
>  static cp_token_cache *cp_token_cache_new
>(cp_token *, cp_token *);
> @@ -1195,19 +1195,31 @@ cp_keyword_starts_decl_specifier_p (enum rid keyword)
>  }
>  }
>  
> -/* Look ups the corresponding built-in trait if a given token is
> +/* Peeks the corresponding built-in trait if a given token is
> a built-in trait.  Otherwise, returns nullptr.  */
>  
>  static const cp_trait *
> -cp_lexer_lookup_trait (const cp_token *token)
> +cp_lexer_peek_trait (cp_lexer *lexer, const cp_token *token1)
>  {
> -  tree id = token->u.value;
> +  tree id = token1->u.value;
>  
> -  if (token->type == CPP_NAME
> +  if (token1->type == CPP_NAME
>&& TREE_CODE (id) == IDENTIFIER_NODE
>&& IDENTIFIER_TRAIT_P (id))
> -return &cp_traits[IDENTIFIER_CP_INDEX (id)];
> +{
> +  const cp_trait &trait = cp_traits[IDENTIFIER_CP_INDEX (id)];
> +  const bool is_pack_element = (trait.kind == CPTK_TYPE_PACK_ELEMENT);
>  
> +  /* Check if the subsequent token is a `<' token to
> + __type_pack_element or is a `(' token to everything else.  */
> +  const cp_token *token2 = cp_lexer_peek_nth_token (lexer, 2);
> +  if (is_pack_element && token2->type != CPP_LESS)
> + return nullptr;
> +  if (!is_pack_element && token2->type != CPP_OPEN_PAREN)
> + return nullptr;
> +
> +  return &trait;
> +}
>return nullptr;
>  }
>  
> @@ -1215,9 +1227,9 @@ cp_lexer_lookup_trait (const cp_token *token)
> built-in trait.  */
>  
>  static const cp_trait *
> -cp_lexer_lookup_trait_expr (const cp_token *token)
> +cp_lexer_peek_trait_expr (cp_lexer *lexer, const cp_token *token1)
>  {
> -  const cp_trait *trait = cp_lexer_lookup_trait (token);
> +  const cp_trait *trait = cp_lexer_peek_trait (lexer, token1);
>if (trait && !trait->type)
>  return trait;
>  
> @@ -1228,9 +1240,9 @@ cp_lexer_lookup_trait_expr (const cp_token *token)
> built-in trait.  */
>  
>  static const cp_trait *
> -cp_lexer_lookup_trait_type (const cp_token *token)
> +cp_lexer_peek_trait_type (cp_lexer *lexer, const cp_token *token1)
>  {
> -  const cp_trait *trait = cp_lexer_lookup_trait (token);
> +  const cp_trait *trait = cp_lexer_peek_trait (lexer, token1);
>if (trait && trait->type)
>  return trait;
>  
> @@ -1245,7 +1257,7 @@ cp_lexer_next_token_is_decl_specifier_keyword (cp_lexer 
> *lexer)
>cp_token *token;
>  
>token = cp_lexer_peek_token (lexer);
> -  if (cp_lexer_lookup_trait_type (token))
> +  if (cp_lexer_peek_trait_type (lexer, token))
>  return true;
>return cp_keyword_starts_decl_specifier_p (token->keyword);
>  }
> @@ -6117,7 +6129,7 @@ cp_parser_primary_expression (cp_parser *parser,
>keyword.  */
>  case CPP_NAME:
>{
> - const cp_trait* trait = cp_lexer_lookup_trait_expr (token);
> + const cp_trait* trait = cp_lexer_peek_trait_expr (parser->lexer, token);
>   if (trait)
> return c

Re: [PATCH] AArch64: Fix __sync_val_compare_and_swap [PR111404]

2023-10-16 Thread Wilco Dijkstra
Hi Ramana,

> I remember this to be the previous discussions and common understanding.
>
> https://gcc.gnu.org/legacy-ml/gcc/2016-06/msg00017.html
>
> and here
> 
> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-02/msg00168.html
>
> Can you point any discussion recently that shows this has changed and
> point me at that discussion if any anywhere ? I can't find it in my
> searches . Perhaps you've had the discussion some place to show it has
> changed.

Here are some more recent discussions about atomics, eg. this has good
arguments from developers wanting lock-free atomics:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878

We also had some discussion how we could handle the read-only corner
case by either giving a warning/error on const pointers to atomics or
ensuring _Atomic variables are writeable:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109553

My conclusion from that is that nobody cared enough to fix this for x86
in all these years, so it's not seen as an important issue.

We've had several internal design discussions to figure out how to fix the ABI
issues. The conclusion was that this is the only possible solution that makes
GCC and LLVM compatible without breaking backwards compatibility. It also
allows use of newer atomic instructions (which people want inlined).

Cheers,
Wilco

Re: [PATCH v20 02/40] c-family, c++: Look up built-in traits via identifier node

2023-10-16 Thread Patrick Palka
On Sun, 15 Oct 2023, Ken Matsui wrote:

> Since RID_MAX soon reaches 255 and all built-in traits are used approximately
> once in a C++ translation unit, this patch removes all RID values for built-in
> traits and uses the identifier node to look up the specific trait.  Rather
> than holding traits as keywords, we set all trait identifiers as cik_trait,
> which is a new cp_identifier_kind.  As cik_reserved_for_udlit was unused and
> cp_identifier_kind is 3 bits, we replaced the unused field with the new
> cik_trait.  Also, the later patch handles a subsequent token to the built-in
> identifier so that we accept the use of non-function-like built-in trait
> identifiers.

Thanks, this looks great!  Some review comments below.

> 
> gcc/c-family/ChangeLog:
> 
>   * c-common.cc (c_common_reswords): Remove all mappings of
>   built-in traits.
>   * c-common.h (enum rid): Remove all RID values for built-in traits.
> 
> gcc/cp/ChangeLog:
> 
>   * cp-objcp-common.cc (names_builtin_p): Remove all RID value
>   cases for built-in traits.  Check for built-in traits via
>   the new cik_trait kind.
>   * cp-tree.h (enum cp_trait_kind): Set its underlying type to
>   addr_space_t.
>   (struct cp_trait): New struct to hold trait information.
>   (cp_traits): New array to hold a mapping to all traits.
>   (cik_reserved_for_udlit): Rename to ...
>   (cik_trait): ... this.
>   (IDENTIFIER_ANY_OP_P): Exclude cik_trait.
>   (IDENTIFIER_TRAIT_P): New macro to detect cik_trait.
>   * lex.cc (init_cp_traits): New function to set cik_trait for all
>   built-in trait identifiers.

We should mention setting IDENTIFIER_CP_INDEX as well.

>   (cxx_init): Call init_cp_traits function.
>   * parser.cc (cp_traits): Define its values, declared in cp-tree.h.
>   (cp_lexer_lookup_trait): New function to look up a
>   built-in trait by IDENTIFIER_CP_INDEX.
>   (cp_lexer_lookup_trait_expr): Likewise, look up an
>   expression-yielding built-in trait.
>   (cp_lexer_lookup_trait_type): Likewise, look up a type-yielding
>   built-in trait.
>   (cp_keyword_starts_decl_specifier_p): Remove all RID value cases
>   for built-in traits.
>   (cp_lexer_next_token_is_decl_specifier_keyword): Handle
>   type-yielding built-in traits.
>   (cp_parser_primary_expression): Remove all RID value cases for
>   built-in traits.  Handle expression-yielding built-in traits.
>   (cp_parser_trait): Handle cp_trait instead of enum rid.
>   (cp_parser_simple_type_specifier): Remove all RID value cases
>   for built-in traits.  Handle type-yielding built-in traits.
> 
> Co-authored-by: Patrick Palka 
> Signed-off-by: Ken Matsui 
> ---
>  gcc/c-family/c-common.cc  |   7 --
>  gcc/c-family/c-common.h   |   5 --
>  gcc/cp/cp-objcp-common.cc |   8 +--
>  gcc/cp/cp-tree.h  |  31 ++---
>  gcc/cp/lex.cc |  21 ++
>  gcc/cp/parser.cc  | 141 --
>  6 files changed, 139 insertions(+), 74 deletions(-)
> 
> diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
> index f044db5b797..21fd333ef57 100644
> --- a/gcc/c-family/c-common.cc
> +++ b/gcc/c-family/c-common.cc
> @@ -508,13 +508,6 @@ const struct c_common_resword c_common_reswords[] =
>{ "wchar_t",   RID_WCHAR,  D_CXXONLY },
>{ "while", RID_WHILE,  0 },
>  
> -#define DEFTRAIT(TCC, CODE, NAME, ARITY) \
> -  { NAME,RID_##CODE, D_CXXONLY },
> -#include "cp/cp-trait.def"
> -#undef DEFTRAIT
> -  /* An alias for __is_same.  */
> -  { "__is_same_as",  RID_IS_SAME,D_CXXONLY },
> -
>/* C++ transactional memory.  */
>{ "synchronized",  RID_SYNCHRONIZED, D_CXX_OBJC | D_TRANSMEM },
>{ "atomic_noexcept",   RID_ATOMIC_NOEXCEPT, D_CXXONLY | D_TRANSMEM },
> diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
> index 1fdba7ef3ea..051a442e0f4 100644
> --- a/gcc/c-family/c-common.h
> +++ b/gcc/c-family/c-common.h
> @@ -168,11 +168,6 @@ enum rid
>RID_BUILTIN_LAUNDER,
>RID_BUILTIN_BIT_CAST,
>  
> -#define DEFTRAIT(TCC, CODE, NAME, ARITY) \
> -  RID_##CODE,
> -#include "cp/cp-trait.def"
> -#undef DEFTRAIT
> -
>/* C++11 */
>RID_CONSTEXPR, RID_DECLTYPE, RID_NOEXCEPT, RID_NULLPTR, RID_STATIC_ASSERT,
>  
> diff --git a/gcc/cp/cp-objcp-common.cc b/gcc/cp/cp-objcp-common.cc
> index 93b027b80ce..b1adacfec07 100644
> --- a/gcc/cp/cp-objcp-common.cc
> +++ b/gcc/cp/cp-objcp-common.cc
> @@ -421,6 +421,10 @@ names_builtin_p (const char *name)
>   }
>  }
>  
> +  /* Check for built-in traits.  */
> +  if (IDENTIFIER_TRAIT_P (id))
> +return true;
> +
>/* Also detect common reserved C++ words that aren't strictly built-in
>   functions.  */
>switch (C_RID_CODE (id))
> @@ -434,10 +438,6 @@ names_builtin_p (const char *name)
>  case RID_BUILTIN_ASSOC_BARRIER:
>  case RID_BUILTIN_BIT_CAST:
>  case RID_OFFSETOF:
> -#define D

Re: [PATCH v9] tree-ssa-sink: Improve code sinking pass

2023-10-16 Thread rep . dot . nop
On 12 October 2023 14:35:36 CEST, Ajit Agarwal  wrote:
>This patch improves code sinking pass to sink statements before call to reduce
>register pressure.
>Review comments are incorporated.

Typo: "block block"

And spurious whitespace changes.



Re: Re: [PATCH V2] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread 钟居哲
>> So we're inserting a dummy vect_perm element (that's live from the start?).
>> Would it make sense to instead increase the number of needed registers for
>> a load/store and handle this similarly to compute_nregs_for_mode?
>> Maybe also do it directly in compute_local_live_ranges and extend live_range
>> by an nregs?

Yeah,  we're inserting a dummy vect_perm element which has live range from 0 to 
max_point of the bb.
I have tried it in 'compute_nregs_for_mode' and 'compute_local_live_ranges' 
since we don't know the maximum point of bb yet.

Address other comments in V3:
[PATCH V3] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for 
non-adjacent load/store (gnu.org)



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-16 20:33
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Fix unexpected big LMUL choosing in dynamic 
LMUL model for non-adjacent load/store
Hi Juzhe,
 
> +/* Get STORE value.  */
> +static tree
> +get_store_value (gimple *stmt)
> +{
> +  if (is_gimple_call (stmt) && gimple_call_internal_p (stmt))
> +{
> +  if (gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
> + return gimple_call_arg (stmt, 3);
> +  else
> + gcc_unreachable ();
> +}
> +  else
> +return gimple_assign_rhs1 (stmt);
> +}
 
This was something I was about to mention in the review of v1
that I already started.  This is better now.
 
> +   basic_block bb = e->src;
 
Rename to pred or so?  And then keep the original bb.
 
>if (!live_range)
> - continue;
> + {
> +   if (single_succ_p (e->src))
> + {
> +   /*
> +  [local count: 850510900]:
> + goto ; [100.00%]
> +
> + Handle this case, we should extend live range of bb 3.
> +   */
 
/* If the predecessor is an extended basic block extend it and look for
   DEF's definition again.  */
 
> +   bb = single_succ (e->src);
> +   if (!program_points_per_bb.get (bb))
> + continue;
> +   live_ranges = live_ranges_per_bb.get (bb);
> +   max_point
> + = (*program_points_per_bb.get (bb)).length () - 1;
> +   live_range = live_ranges->get (def);
> +   if (!live_range)
> + continue;
> + }
> +   else
> + continue;
> + }
 
We're approaching a complexity where reverse postorder would have
helped ;)  Maybe split out the live range into a separate function
get_live_range_for_bb ()?
 
> +  for (si = gsi_start_bb (bbs[i]); !gsi_end_p (si); gsi_next (&si))
> + {
> +   if (!(is_gimple_assign (gsi_stmt (si))
> + || is_gimple_call (gsi_stmt (si
> + continue;
> +   stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
> +   enum stmt_vec_info_type type
> + = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
> +   if ((type == load_vec_info_type || type == store_vec_info_type)
> +   && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
> + {
> +   /* For non-adjacent load/store STMT, we will potentially
> + convert it into:
> +
> +1. MASK_LEN_GATHER_LOAD (..., perm indice).
> +2. Continguous load/store + VEC_PERM (..., perm indice)
> +
> + We will be likely using one more vector variable.  */
> +   unsigned int max_point
> + = (*program_points_per_bb.get (bbs[i])).length () - 1;
> +   auto *live_ranges = live_ranges_per_bb.get (bbs[i]);
> +   bool existed_p = false;
> +   tree var = type == load_vec_info_type
> +? gimple_get_lhs (gsi_stmt (si))
> +: get_store_value (gsi_stmt (si));
> +   tree sel_type = build_nonstandard_integer_type (
> + TYPE_PRECISION (TREE_TYPE (var)), 1);
> +   tree sel = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +  get_identifier ("vect_perm"), sel_type);
> +   pair &live_range = live_ranges->get_or_insert (sel, &existed_p);
> +   gcc_assert (!existed_p);
> +   live_range = pair (0, max_point);
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> + "Add perm indice %T, start = 0, end = %d\n",
> + sel, max_point);
> + }
> + }
>  }
>  }
 
So we're inserting a dummy vect_perm element (that's live from the start?).
Would it make sense to instead increase the number of needed registers for
a load/store and handle this similarly to compute_nregs_for_mode?
Maybe also do it directly in compute_local_live_ranges and extend live_range
by an nregs?
 
Regards
Robin
 
 


[PATCH V3] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread Juzhe-Zhong
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
  int sum1 = 0;
  int sum2 = 0;
  for (int i = 0; i < n; ++i)
{
  sum1 += x[2*i] - a;
  sum1 += x[2*i+1] * b;
  sum2 += x[2*i] - b;
  sum2 += x[2*i+1] * a;
}
  return sum1 + sum2;
}

Before this patch:

bar:
ble a3,zero,.L5
csrrt0,vlenb
csrra6,vlenb
sllit1,t0,3
vsetvli a5,zero,e32,m4,ta,ma
sub sp,sp,t1
vid.v   v20
vmv.v.x v12,a1
vand.vi v4,v20,1
vmv.v.x v16,a2
vmseq.viv4,v4,1
sllit3,a6,2
vsetvli zero,a5,e32,m4,ta,ma
vmv1r.v v0,v4
viota.m v8,v4
add a7,t3,sp
vsetvli a5,zero,e32,m4,ta,mu
vand.vi v28,v20,-2
vadd.vi v4,v28,1
vs4r.v  v20,0(a7)-  spill
vrgather.vv v24,v12,v8
vrgather.vv v20,v16,v8
vrgather.vv v24,v16,v8,v0.t
vrgather.vv v20,v12,v8,v0.t
vs4r.v  v4,0(sp)  - spill
sllia3,a3,1
addit4,a6,-1
neg t1,a6
vmv4r.v v0,v20
vmv.v.i v4,0
j   .L4
.L13:
vsetvli a5,zero,e32,m4,ta,ma
.L4:
mv  a7,a3
mv  a4,a3
bleua3,a6,.L3
csrra4,vlenb
.L3:
vmv.v.x v8,t4
vl4re32.v   v12,0(sp) spill
vand.vv v20,v28,v8
vand.vv v8,v12,v8
vsetvli zero,a4,e32,m4,ta,ma
vle32.v v16,0(a0)
vsetvli a5,zero,e32,m4,ta,ma
add a3,a3,t1
vrgather.vv v12,v16,v20
add a0,a0,t3
vrgather.vv v20,v16,v8
vsub.vv v12,v12,v0
vsetvli zero,a4,e32,m4,tu,ma
vadd.vv v4,v4,v12
vmacc.vvv4,v24,v20
bgtua7,a6,.L13
csrra1,vlenb
sllia1,a1,2
add a1,a1,sp
li  a4,-1
csrrt0,vlenb
vsetvli a5,zero,e32,m4,ta,ma
vl4re32.v   v12,0(a1)    spill
vmv.v.i v8,0
vmul.vx v0,v12,a4
li  a2,0
sllit1,t0,3
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmseq.vvv0,v0,v8
vand.vi v12,v12,1
vmerge.vvm  v16,v8,v4,v0
vmseq.vvv12,v12,v8
vmv.s.x v1,a2
vmv1r.v v0,v12
vredsum.vs  v16,v16,v1
vmerge.vvm  v8,v8,v4,v0
vmv.x.s a0,v16
vredsum.vs  v8,v8,v1
vmv.x.s a5,v8
add sp,sp,t1
addwa0,a0,a5
jr  ra
.L5:
li  a0,0
ret

We can there are multiple horrible register spillings.
The root cause of this issue is for a scalar IR load:

_5 = *_4;

We didn't check whether it is a continguous load/store or gather/scatter 
load/store

Since it will be translate into:

   1. MASK_LEN_GATHER_LOAD (..., perm indice).
   2. Continguous load/store + VEC_PERM (..., perm indice)

It's obvious that no matter which situation, we will end up with consuming one 
vector register group (perm indice)
that we didn't count it before.

So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost 
model.

The key of this patch is:

  if ((type == load_vec_info_type || type == store_vec_info_type)
  && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
{
   ...
}

Add one more register consumption if it is not an adjacent load/store.

After this patch, it pick LMUL = 2 which is optimal:

bar:
ble a3,zero,.L4
csrra6,vlenb
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v6,a2
srlia2,a6,1
vmv.v.x v4,a1
vid.v   v12
sllia3,a3,1
vand.vi v0,v12,1
addit1,a2,-1
vmseq.viv0,v0,1
sllia6,a6,1
vsetvli zero,a5,e32,m2,ta,ma
neg a7,a2
viota.m v2,v0
vsetvli a5,zero,e32,m2,ta,mu
vrgather.vv v16,v4,v2
vrgather.vv v14,v6,v2
vrgather.vv v16,v6,v2,v0.t
vrgather.vv v14,v4,v2,v0.t
vand.vi v18,v12,-2
vmv.v.i v2,0
vadd.vi v20,v18,1
.L3:
minua4,a3,a2
vsetvli zero,a4,e32,m2,ta,ma
vle32.v v8,0(a0)
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v4,t1
vand.vv v10,v18,v4
vrgather.vv v6,v8,v10
vsub.vv v6,v6,v14
vsetvli zero,a4,e32,m2,tu,ma
vadd.vv v2,v2,v6
vsetvli a1,zero,e32,m2,ta,ma
vand.vv v4,v20,v4
vrgather.vv v6,v8,v4
vsetvli zero,a4,e32,m2,tu,ma
mv  a4,a3
add a0,a0,a6
add a3,a3,a7
vmacc.vvv2,v16,v6
bgtua4,a2,.L3
vsetvli a1,zero,e32,m2,ta,ma
vand.vi v0,v12,1
vmv.v.i v4,0
li  a3,

[PATCH] Fix PR ada/111813 (Inconsistent limit in Ada.Calendar.Formatting)

2023-10-16 Thread Simon Wright
The description of the second Value function (returning Duration) (ARM 
9.6.1(87) 
doesn't place any limitation on the Elapsed_Time parameter's value, beyond 
"Constraint_Error is raised if the string is not formatted as described for 
Image, or 
the function cannot interpret the given string as a Duration value".

It would seem reasonable that Value and Image should be consistent, in that any 
string produced by Image should be accepted by Value. Since Image must produce
a two-digit representation of the Hours, there's an implication that its 
Elapsed_Time parameter should be less than 100.0 hours (the ARM merely says
that in that case the result is implementation-defined).

The current implementation of Value raises Constraint_Error if the Elapsed_Time
parameter is greater than or equal to 24 hours.

This patch removes the restriction, so that the Elapsed_Time parameter must only
be less than 100.0 hours.

gcc/ada/Changelog:

  2023-10-15 Simon Wright 

  PR ada/111813

  * gcc/ada/libgnat/a-calfor.adb (Value (2)): Allow values of parameter
  Elapsed_Time greater than or equal to 24 hours, by doing the
  hour calculations in Natural rather than Hour_Number (0 .. 23).
  Calculate the result directly rather than by using Seconds_Of
  (whose Hour parameter is of type Hour_Number).

  If an exception occurs of type Constraint_Error, re-raise it
  rather than raising a new CE.

gcc/testsuite/Changelog:

  2023-10-15 Simon Wright 

  PR ada/111813

  * gcc/testsuite/gnat.dg/calendar_format_value.adb: New test.

---
 gcc/ada/libgnat/a-calfor.adb  | 11 +---
 .../gnat.dg/calendar_format_value.adb | 26 +++
 2 files changed, 34 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gnat.dg/calendar_format_value.adb

diff --git a/gcc/ada/libgnat/a-calfor.adb b/gcc/ada/libgnat/a-calfor.adb
index 18f4e7388df..493728b490e 100644
--- a/gcc/ada/libgnat/a-calfor.adb
+++ b/gcc/ada/libgnat/a-calfor.adb
@@ -777,7 +777,7 @@ package body Ada.Calendar.Formatting is
 
function Value (Elapsed_Time : String) return Duration is
   D  : String (1 .. 11);
-  Hour   : Hour_Number;
+  Hour   : Natural;
   Minute : Minute_Number;
   Second : Second_Number;
   Sub_Second : Second_Duration := 0.0;
@@ -817,7 +817,7 @@ package body Ada.Calendar.Formatting is
 
   --  Value extraction
 
-  Hour   := Hour_Number   (Hour_Number'Value   (D (1 .. 2)));
+  Hour   := Natural   (Natural'Value   (D (1 .. 2)));
   Minute := Minute_Number (Minute_Number'Value (D (4 .. 5)));
   Second := Second_Number (Second_Number'Value (D (7 .. 8)));
 
@@ -837,9 +837,14 @@ package body Ada.Calendar.Formatting is
  raise Constraint_Error;
   end if;
 
-  return Seconds_Of (Hour, Minute, Second, Sub_Second);
+  return Duration (Hour * 3600)
++ Duration (Minute * 60)
++ Duration (Second)
++ Sub_Second;
 
exception
+  --  CE is mandated, but preserve trace if CE already.
+  when Constraint_Error => raise;
   when others => raise Constraint_Error;
end Value;
 
diff --git a/gcc/testsuite/gnat.dg/calendar_format_value.adb 
b/gcc/testsuite/gnat.dg/calendar_format_value.adb
new file mode 100644
index 000..e98e496fd3b
--- /dev/null
+++ b/gcc/testsuite/gnat.dg/calendar_format_value.adb
@@ -0,0 +1,26 @@
+-- { dg-do run }
+-- { dg-options "-O2" }
+
+with Ada.Calendar.Formatting;
+
+procedure Calendar_Format_Value is
+   Limit : constant Duration
+ := 99 * 3600.0 + 59 * 60.0 + 59.0;
+begin
+   declare
+  Image : constant String := Ada.Calendar.Formatting .Image (Limit);
+  Image_Error : exception;
+   begin
+  if Image /= "99:59:59" then
+ raise Image_Error with "image: " & Image;
+  end if;
+  declare
+ Value : constant Duration := Ada.Calendar.Formatting.Value (Image);
+ Value_Error : exception;
+  begin
+ if Value /= Limit then
+raise Value_Error with "duration: " & Value'Image;
+ end if;
+  end;
+   end;
+end Calendar_Format_Value;
-- 
2.39.3 (Apple Git-145)



Re: [ARC PATCH] Split asl dst, 1, src into bset dst, 0, src to implement 1<

2023-10-16 Thread Jeff Law




On 10/15/23 02:14, Roger Sayle wrote:

I’ve done it again. ENOPATCH.

*From:*Roger Sayle 
*Sent:* 15 October 2023 09:13
*To:* 'gcc-patches@gcc.gnu.org' 
*Cc:* 'Claudiu Zissulescu' 
*Subject:* [ARC PATCH] Split asl dst,1,src into bset dst,0,src to 
implement 1<

This patch adds a pre-reload splitter to arc.md, to use the bset (set

specific bit instruction) to implement 1

Re: [PATCH] AArch64: Fix __sync_val_compare_and_swap [PR111404]

2023-10-16 Thread Wilco Dijkstra
ping
 

__sync_val_compare_and_swap may be used on 128-bit types and either calls the
outline atomic code or uses an inline loop.  On AArch64 LDXP is only atomic if
the value is stored successfully using STXP, but the current implementations
do not perform the store if the comparison fails.  In this case the value 
returned
is not read atomically.

Passes regress/bootstrap, OK for commit?

gcc/ChangeLog/
    PR target/111404
    * config/aarch64/aarch64.cc (aarch64_split_compare_and_swap):
    For 128-bit store the loaded value and loop if needed.

libgcc/ChangeLog/
    PR target/111404
    * config/aarch64/lse.S (__aarch64_cas16_acq_rel): Execute STLXP using
    either new value or loaded value.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
5e8d0a0c91bc7719de2a8c5627b354cf905a4db0..c44c0b979d0cc3755c61dcf566cfddedccebf1ea
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -23413,11 +23413,11 @@ aarch64_split_compare_and_swap (rtx operands[])
   mem = operands[1];
   oldval = operands[2];
   newval = operands[3];
-  is_weak = (operands[4] != const0_rtx);
   model_rtx = operands[5];
   scratch = operands[7];
   mode = GET_MODE (mem);
   model = memmodel_from_int (INTVAL (model_rtx));
+  is_weak = operands[4] != const0_rtx && mode != TImode;
 
   /* When OLDVAL is zero and we want the strong version we can emit a tighter
 loop:
@@ -23478,6 +23478,33 @@ aarch64_split_compare_and_swap (rtx operands[])
   else
 aarch64_gen_compare_reg (NE, scratch, const0_rtx);
 
+  /* 128-bit LDAXP is not atomic unless STLXP succeeds.  So for a mismatch,
+ store the returned value and loop if the STLXP fails.  */
+  if (mode == TImode)
+    {
+  rtx_code_label *label3 = gen_label_rtx ();
+  emit_jump_insn (gen_rtx_SET (pc_rtx, gen_rtx_LABEL_REF (Pmode, label3)));
+  emit_barrier ();
+
+  emit_label (label2);
+  aarch64_emit_store_exclusive (mode, scratch, mem, rval, model_rtx);
+
+  if (aarch64_track_speculation)
+   {
+ /* Emit an explicit compare instruction, so that we can correctly
+    track the condition codes.  */
+ rtx cc_reg = aarch64_gen_compare_reg (NE, scratch, const0_rtx);
+ x = gen_rtx_NE (GET_MODE (cc_reg), cc_reg, const0_rtx);
+   }
+  else
+   x = gen_rtx_NE (VOIDmode, scratch, const0_rtx);
+  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
+   gen_rtx_LABEL_REF (Pmode, label1), pc_rtx);
+  aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x));
+
+  label2 = label3;
+    }
+
   emit_label (label2);
 
   /* If we used a CBNZ in the exchange loop emit an explicit compare with RVAL
diff --git a/libgcc/config/aarch64/lse.S b/libgcc/config/aarch64/lse.S
index 
dde3a28e07b13669533dfc5e8fac0a9a6ac33dbd..ba05047ff02b6fc5752235bffa924fc4a2f48c04
 100644
--- a/libgcc/config/aarch64/lse.S
+++ b/libgcc/config/aarch64/lse.S
@@ -160,6 +160,8 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
 #define tmp0    16
 #define tmp1    17
 #define tmp2    15
+#define tmp3   14
+#define tmp4   13
 
 #define BTI_C   hint    34
 
@@ -233,10 +235,11 @@ STARTFN   NAME(cas)
 0:  LDXP    x0, x1, [x4]
 cmp x0, x(tmp0)
 ccmp    x1, x(tmp1), #0, eq
-   bne 1f
-   STXP    w(tmp2), x2, x3, [x4]
-   cbnz    w(tmp2), 0b
-1: BARRIER
+   csel    x(tmp2), x2, x0, eq
+   csel    x(tmp3), x3, x1, eq
+   STXP    w(tmp4), x(tmp2), x(tmp3), [x4]
+   cbnz    w(tmp4), 0b
+   BARRIER
 ret
 
 #endif


Re: [PATCH] libatomic: Improve ifunc selection on AArch64

2023-10-16 Thread Wilco Dijkstra
 

ping


From: Wilco Dijkstra
Sent: 04 August 2023 16:05
To: GCC Patches ; Richard Sandiford 

Cc: Kyrylo Tkachov 
Subject: [PATCH] libatomic: Improve ifunc selection on AArch64 
 

Add support for ifunc selection based on CPUID register.  Neoverse N1 supports
atomic 128-bit load/store, so use the FEAT_USCAT ifunc like newer Neoverse
cores.

Passes regress, OK for commit?

libatomic/
    config/linux/aarch64/host-config.h (ifunc1): Use CPUID in ifunc
    selection.

---

diff --git a/libatomic/config/linux/aarch64/host-config.h 
b/libatomic/config/linux/aarch64/host-config.h
index 
851c78c01cd643318aaa52929ce4550266238b79..e5dc33c030a4bab927874fa6c69425db463fdc4b
 100644
--- a/libatomic/config/linux/aarch64/host-config.h
+++ b/libatomic/config/linux/aarch64/host-config.h
@@ -26,7 +26,7 @@
 
 #ifdef HWCAP_USCAT
 # if N == 16
-#  define IFUNC_COND_1 (hwcap & HWCAP_USCAT)
+#  define IFUNC_COND_1 ifunc1 (hwcap)
 # else
 #  define IFUNC_COND_1  (hwcap & HWCAP_ATOMICS)
 # endif
@@ -50,4 +50,28 @@
 #undef MAYBE_HAVE_ATOMIC_EXCHANGE_16
 #define MAYBE_HAVE_ATOMIC_EXCHANGE_16   1
 
+#ifdef HWCAP_USCAT
+
+#define MIDR_IMPLEMENTOR(midr) (((midr) >> 24) & 255)
+#define MIDR_PARTNUM(midr) (((midr) >> 4) & 0xfff)
+
+static inline bool
+ifunc1 (unsigned long hwcap)
+{
+  if (hwcap & HWCAP_USCAT)
+    return true;
+  if (!(hwcap & HWCAP_CPUID))
+    return false;
+
+  unsigned long midr;
+  asm volatile ("mrs %0, midr_el1" : "=r" (midr));
+
+  /* Neoverse N1 supports atomic 128-bit load/store.  */
+  if (MIDR_IMPLEMENTOR (midr) == 'A' && MIDR_PARTNUM(midr) == 0xd0c)
+    return true;
+
+  return false;
+}
+#endif
+
 #include_next 

Re: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 [PR110061]

2023-10-16 Thread Wilco Dijkstra
 

ping

From: Wilco Dijkstra
Sent: 02 June 2023 18:28
To: GCC Patches 
Cc: Richard Sandiford ; Kyrylo Tkachov 

Subject: [PATCH] libatomic: Enable lock-free 128-bit atomics on AArch64 
[PR110061] 
 

Enable lock-free 128-bit atomics on AArch64.  This is backwards compatible with
existing binaries, gives better performance than locking atomics and is what
most users expect.

Note 128-bit atomic loads use a load/store exclusive loop if LSE2 is not 
supported.
This results in an implicit store which is invisible to software as long as the 
given
address is writeable (which will be true when using atomics in actual code).

A simple test on an old Cortex-A72 showed 2.7x speedup of 128-bit atomics.

Passes regress, OK for commit?

libatomic/
    PR target/110061
    config/linux/aarch64/atomic_16.S: Implement lock-free ARMv8.0 atomics.
    config/linux/aarch64/host-config.h: Use atomic_16.S for baseline v8.0.
    State we have lock-free atomics.

---

diff --git a/libatomic/config/linux/aarch64/atomic_16.S 
b/libatomic/config/linux/aarch64/atomic_16.S
index 
05439ce394b9653c9bcb582761ff7aaa7c8f9643..0485c284117edf54f41959d2fab9341a9567b1cf
 100644
--- a/libatomic/config/linux/aarch64/atomic_16.S
+++ b/libatomic/config/linux/aarch64/atomic_16.S
@@ -22,6 +22,21 @@
    .  */
 
 
+/* AArch64 128-bit lock-free atomic implementation.
+
+   128-bit atomics are now lock-free for all AArch64 architecture versions.
+   This is backwards compatible with existing binaries and gives better
+   performance than locking atomics.
+
+   128-bit atomic loads use a exclusive loop if LSE2 is not supported.
+   This results in an implicit store which is invisible to software as long
+   as the given address is writeable.  Since all other atomics have explicit
+   writes, this will be true when using atomics in actual code.
+
+   The libat__16 entry points are ARMv8.0.
+   The libat__16_i1 entry points are used when LSE2 is available.  */
+
+
 .arch   armv8-a+lse
 
 #define ENTRY(name) \
@@ -37,6 +52,10 @@ name:    \
 .cfi_endproc;   \
 .size name, .-name;
 
+#define ALIAS(alias,name)  \
+   .global alias;  \
+   .set alias, name;
+
 #define res0 x0
 #define res1 x1
 #define in0  x2
@@ -70,6 +89,24 @@ name:    \
 #define SEQ_CST 5
 
 
+ENTRY (libat_load_16)
+   mov x5, x0
+   cbnz    w1, 2f
+
+   /* RELAXED.  */
+1: ldxp    res0, res1, [x5]
+   stxp    w4, res0, res1, [x5]
+   cbnz    w4, 1b
+   ret
+
+   /* ACQUIRE/CONSUME/SEQ_CST.  */
+2: ldaxp   res0, res1, [x5]
+   stxp    w4, res0, res1, [x5]
+   cbnz    w4, 2b
+   ret
+END (libat_load_16)
+
+
 ENTRY (libat_load_16_i1)
 cbnz    w1, 1f
 
@@ -93,6 +130,23 @@ ENTRY (libat_load_16_i1)
 END (libat_load_16_i1)
 
 
+ENTRY (libat_store_16)
+   cbnz    w4, 2f
+
+   /* RELAXED.  */
+1: ldxp    xzr, tmp0, [x0]
+   stxp    w4, in0, in1, [x0]
+   cbnz    w4, 1b
+   ret
+
+   /* RELEASE/SEQ_CST.  */
+2: ldxp    xzr, tmp0, [x0]
+   stlxp   w4, in0, in1, [x0]
+   cbnz    w4, 2b
+   ret
+END (libat_store_16)
+
+
 ENTRY (libat_store_16_i1)
 cbnz    w4, 1f
 
@@ -101,14 +155,14 @@ ENTRY (libat_store_16_i1)
 ret
 
 /* RELEASE/SEQ_CST.  */
-1: ldaxp   xzr, tmp0, [x0]
+1: ldxp    xzr, tmp0, [x0]
 stlxp   w4, in0, in1, [x0]
 cbnz    w4, 1b
 ret
 END (libat_store_16_i1)
 
 
-ENTRY (libat_exchange_16_i1)
+ENTRY (libat_exchange_16)
 mov x5, x0
 cbnz    w4, 2f
 
@@ -126,22 +180,55 @@ ENTRY (libat_exchange_16_i1)
 stxp    w4, in0, in1, [x5]
 cbnz    w4, 3b
 ret
-4:
-   cmp w4, RELEASE
-   b.ne    6f
 
-   /* RELEASE.  */
-5: ldxp    res0, res1, [x5]
+   /* RELEASE/ACQ_REL/SEQ_CST.  */
+4: ldaxp   res0, res1, [x5]
 stlxp   w4, in0, in1, [x5]
-   cbnz    w4, 5b
+   cbnz    w4, 4b
 ret
+END (libat_exchange_16)
 
-   /* ACQ_REL/SEQ_CST.  */
-6: ldaxp   res0, res1, [x5]
-   stlxp   w4, in0, in1, [x5]
-   cbnz    w4, 6b
+
+ENTRY (libat_compare_exchange_16)
+   ldp exp0, exp1, [x1]
+   cbz w4, 3f
+   cmp w4, RELEASE
+   b.hs    4f
+
+   /* ACQUIRE/CONSUME.  */
+1: ldaxp   tmp0, tmp1, [x0]
+   cmp tmp0, exp0
+   ccmp    tmp1, exp1, 0, eq
+   bne 2f
+   stxp    w4, in0, in1, [x0]
+   cbnz    w4, 1b
+   mov x0, 1
 ret
-END (libat_exchange_16_i1)
+
+2: stp tmp0, tmp1, [x1]
+   mov x0, 0
+   ret
+
+   /* RELAXED.  */
+3: ldxp    tmp0, tmp1, [x0]
+   cmp tmp0, exp0
+   ccmp    tmp1, exp1, 0, eq
+   bne 2b
+   stxp    w4, in0, in1, [x0]
+   cbnz    w4, 3b
+   mov x0, 1
+   ret
+
+   /* RELEASE/ACQ_REL/SEQ_CST.  */
+4: ldaxp   tmp0, tmp1,

Re: [PATCH v2] AArch64: Fix strict-align cpymem/setmem [PR103100]

2023-10-16 Thread Wilco Dijkstra
ping
 
v2: Use UINTVAL, rename max_mops_size.

The cpymemdi/setmemdi implementation doesn't fully support strict alignment.
Block the expansion if the alignment is less than 16 with STRICT_ALIGNMENT.
Clean up the condition when to use MOPS.
    
Passes regress/bootstrap, OK for commit?
    
gcc/ChangeLog/
    PR target/103100
    * config/aarch64/aarch64.md (cpymemdi): Remove pattern condition.
    (setmemdi): Likewise.
    * config/aarch64/aarch64.cc (aarch64_expand_cpymem): Support
    strict-align.  Cleanup condition for using MOPS.
    (aarch64_expand_setmem): Likewise.

---

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
dd6874d13a75f20d10a244578afc355b25c73da2..8a12894d6b80de1031d6e7d02dca680c57bce136
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -25261,27 +25261,23 @@ aarch64_expand_cpymem (rtx *operands)
   int mode_bits;
   rtx dst = operands[0];
   rtx src = operands[1];
+  unsigned align = UINTVAL (operands[3]);
   rtx base;
   machine_mode cur_mode = BLKmode;
+  bool size_p = optimize_function_for_size_p (cfun);
 
-  /* Variable-sized memcpy can go through the MOPS expansion if available.  */
-  if (!CONST_INT_P (operands[2]))
+  /* Variable-sized or strict-align copies may use the MOPS expansion.  */
+  if (!CONST_INT_P (operands[2]) || (STRICT_ALIGNMENT && align < 16))
 return aarch64_expand_cpymem_mops (operands);
 
-  unsigned HOST_WIDE_INT size = INTVAL (operands[2]);
-
-  /* Try to inline up to 256 bytes or use the MOPS threshold if available.  */
-  unsigned HOST_WIDE_INT max_copy_size
-    = TARGET_MOPS ? aarch64_mops_memcpy_size_threshold : 256;
+  unsigned HOST_WIDE_INT size = UINTVAL (operands[2]);
 
-  bool size_p = optimize_function_for_size_p (cfun);
+  /* Try to inline up to 256 bytes.  */
+  unsigned max_copy_size = 256;
+  unsigned mops_threshold = aarch64_mops_memcpy_size_threshold;
 
-  /* Large constant-sized cpymem should go through MOPS when possible.
- It should be a win even for size optimization in the general case.
- For speed optimization the choice between MOPS and the SIMD sequence
- depends on the size of the copy, rather than number of instructions,
- alignment etc.  */
-  if (size > max_copy_size)
+  /* Large copies use MOPS when available or a library call.  */
+  if (size > max_copy_size || (TARGET_MOPS && size > mops_threshold))
 return aarch64_expand_cpymem_mops (operands);
 
   int copy_bits = 256;
@@ -25445,12 +25441,13 @@ aarch64_expand_setmem (rtx *operands)
   unsigned HOST_WIDE_INT len;
   rtx dst = operands[0];
   rtx val = operands[2], src;
+  unsigned align = UINTVAL (operands[3]);
   rtx base;
   machine_mode cur_mode = BLKmode, next_mode;
 
-  /* If we don't have SIMD registers or the size is variable use the MOPS
- inlined sequence if possible.  */
-  if (!CONST_INT_P (operands[1]) || !TARGET_SIMD)
+  /* Variable-sized or strict-align memset may use the MOPS expansion.  */
+  if (!CONST_INT_P (operands[1]) || !TARGET_SIMD
+  || (STRICT_ALIGNMENT && align < 16))
 return aarch64_expand_setmem_mops (operands);
 
   bool size_p = optimize_function_for_size_p (cfun);
@@ -25458,10 +25455,13 @@ aarch64_expand_setmem (rtx *operands)
   /* Default the maximum to 256-bytes when considering only libcall vs
  SIMD broadcast sequence.  */
   unsigned max_set_size = 256;
+  unsigned mops_threshold = aarch64_mops_memset_size_threshold;
 
-  len = INTVAL (operands[1]);
-  if (len > max_set_size && !TARGET_MOPS)
-    return false;
+  len = UINTVAL (operands[1]);
+
+  /* Large memset uses MOPS when available or a library call.  */
+  if (len > max_set_size || (TARGET_MOPS && len > mops_threshold))
+    return aarch64_expand_setmem_mops (operands);
 
   int cst_val = !!(CONST_INT_P (val) && (INTVAL (val) != 0));
   /* The MOPS sequence takes:
@@ -25474,12 +25474,6 @@ aarch64_expand_setmem (rtx *operands)
  the arguments + 1 for the call.  */
   unsigned libcall_cost = 4;
 
-  /* Upper bound check.  For large constant-sized setmem use the MOPS sequence
- when available.  */
-  if (TARGET_MOPS
-  && len >= (unsigned HOST_WIDE_INT) aarch64_mops_memset_size_threshold)
-    return aarch64_expand_setmem_mops (operands);
-
   /* Attempt a sequence with a vector broadcast followed by stores.
  Count the number of operations involved to see if it's worth it
  against the alternatives.  A simple counter simd_ops on the
@@ -25521,10 +25515,8 @@ aarch64_expand_setmem (rtx *operands)
   simd_ops++;
   n -= mode_bits;
 
-  /* Do certain trailing copies as overlapping if it's going to be
-    cheaper.  i.e. less instructions to do so.  For instance doing a 15
-    byte copy it's more efficient to do two overlapping 8 byte copies than
-    8 + 4 + 2 + 1.  Only do this when -mstrict-align is not supplied.  */
+  /* Emit trailing writes using overlapping unaligned accesses
+   (when !STRICT_ALIGNMEN

Re: [PATCH V2] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread Robin Dapp
Hi Juzhe,

> +/* Get STORE value.  */
> +static tree
> +get_store_value (gimple *stmt)
> +{
> +  if (is_gimple_call (stmt) && gimple_call_internal_p (stmt))
> +{
> +  if (gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
> + return gimple_call_arg (stmt, 3);
> +  else
> + gcc_unreachable ();
> +}
> +  else
> +return gimple_assign_rhs1 (stmt);
> +}

This was something I was about to mention in the review of v1
that I already started.  This is better now.

> +   basic_block bb = e->src;

Rename to pred or so?  And then keep the original bb.

> if (!live_range)
> - continue;
> + {
> +   if (single_succ_p (e->src))
> + {
> +   /*
> +   [local count: 850510900]:
> +  goto ; [100.00%]
> +
> +  Handle this case, we should extend live range of bb 3.
> +   */

/* If the predecessor is an extended basic block extend it and look for
   DEF's definition again.  */

> +   bb = single_succ (e->src);
> +   if (!program_points_per_bb.get (bb))
> + continue;
> +   live_ranges = live_ranges_per_bb.get (bb);
> +   max_point
> + = (*program_points_per_bb.get (bb)).length () - 1;
> +   live_range = live_ranges->get (def);
> +   if (!live_range)
> + continue;
> + }
> +   else
> + continue;
> + }

We're approaching a complexity where reverse postorder would have
helped ;)  Maybe split out the live range into a separate function
get_live_range_for_bb ()?

> +  for (si = gsi_start_bb (bbs[i]); !gsi_end_p (si); gsi_next (&si))
> + {
> +   if (!(is_gimple_assign (gsi_stmt (si))
> + || is_gimple_call (gsi_stmt (si
> + continue;
> +   stmt_vec_info stmt_info = vinfo->lookup_stmt (gsi_stmt (si));
> +   enum stmt_vec_info_type type
> + = STMT_VINFO_TYPE (vect_stmt_to_vectorize (stmt_info));
> +   if ((type == load_vec_info_type || type == store_vec_info_type)
> +   && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
> + {
> +   /* For non-adjacent load/store STMT, we will potentially
> +  convert it into:
> +
> +1. MASK_LEN_GATHER_LOAD (..., perm indice).
> +2. Continguous load/store + VEC_PERM (..., perm indice)
> +
> + We will be likely using one more vector variable.  */
> +   unsigned int max_point
> + = (*program_points_per_bb.get (bbs[i])).length () - 1;
> +   auto *live_ranges = live_ranges_per_bb.get (bbs[i]);
> +   bool existed_p = false;
> +   tree var = type == load_vec_info_type
> +? gimple_get_lhs (gsi_stmt (si))
> +: get_store_value (gsi_stmt (si));
> +   tree sel_type = build_nonstandard_integer_type (
> + TYPE_PRECISION (TREE_TYPE (var)), 1);
> +   tree sel = build_decl (UNKNOWN_LOCATION, VAR_DECL,
> +  get_identifier ("vect_perm"), sel_type);
> +   pair &live_range = live_ranges->get_or_insert (sel, &existed_p);
> +   gcc_assert (!existed_p);
> +   live_range = pair (0, max_point);
> +   if (dump_enabled_p ())
> + dump_printf_loc (MSG_NOTE, vect_location,
> +  "Add perm indice %T, start = 0, end = %d\n",
> +  sel, max_point);
> + }
> + }
>  }
>  }

So we're inserting a dummy vect_perm element (that's live from the start?).
Would it make sense to instead increase the number of needed registers for
a load/store and handle this similarly to compute_nregs_for_mode?
Maybe also do it directly in compute_local_live_ranges and extend live_range
by an nregs?

Regards
 Robin



[PATCH v2] AArch64: Add inline memmove expansion

2023-10-16 Thread Wilco Dijkstra
v2: further cleanups, improved comments

Add support for inline memmove expansions.  The generated code is identical
as for memcpy, except that all loads are emitted before stores rather than
being interleaved.  The maximum size is 256 bytes which requires at most 16
registers.

Passes regress/bootstrap, OK for commit?

gcc/ChangeLog/
* config/aarch64/aarch64.opt (aarch64_mops_memmove_size_threshold):
Change default.
* config/aarch64/aarch64.md (cpymemdi): Add a parameter.
(movmemdi): Call aarch64_expand_cpymem.
* config/aarch64/aarch64.cc (aarch64_copy_one_block): Rename function,
simplify, support storing generated loads/stores. 
(aarch64_expand_cpymem): Support expansion of memmove.
* config/aarch64/aarch64-protos.h (aarch64_expand_cpymem): Add bool arg.

gcc/testsuite/ChangeLog/
* gcc.target/aarch64/memmove.c: Add new test.

---

diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 
60a55f4bc1956786ea687fc7cad7ec9e4a84e1f0..0d39622bd2826a3fde54d67b5c5da9ee9286cbbd
 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -769,7 +769,7 @@ bool aarch64_emit_approx_sqrt (rtx, rtx, bool);
 tree aarch64_vector_load_decl (tree);
 void aarch64_expand_call (rtx, rtx, rtx, bool);
 bool aarch64_expand_cpymem_mops (rtx *, bool);
-bool aarch64_expand_cpymem (rtx *);
+bool aarch64_expand_cpymem (rtx *, bool);
 bool aarch64_expand_setmem (rtx *);
 bool aarch64_float_const_zero_rtx_p (rtx);
 bool aarch64_float_const_rtx_p (rtx);
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
2fa5d09de85d385c1165e399bcc97681ef170916..e19e2d1de2e5b30eca672df05d9dcc1bc106ecc8
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -25238,52 +25238,37 @@ aarch64_progress_pointer (rtx pointer)
   return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer)));
 }
 
-/* Copy one MODE sized block from SRC to DST, then progress SRC and DST by
-   MODE bytes.  */
+/* Copy one block of size MODE from SRC to DST at offset OFFSET.  */
 
 static void
-aarch64_copy_one_block_and_progress_pointers (rtx *src, rtx *dst,
- machine_mode mode)
+aarch64_copy_one_block (rtx *load, rtx *store, rtx src, rtx dst,
+   int offset, machine_mode mode)
 {
-  /* Handle 256-bit memcpy separately.  We do this by making 2 adjacent memory
- address copies using V4SImode so that we can use Q registers.  */
-  if (known_eq (GET_MODE_BITSIZE (mode), 256))
+  /* Emit explict load/store pair instructions for 32-byte copies.  */
+  if (known_eq (GET_MODE_SIZE (mode), 32))
 {
   mode = V4SImode;
+  rtx src1 = adjust_address (src, mode, offset);
+  rtx src2 = adjust_address (src, mode, offset + 16);
+  rtx dst1 = adjust_address (dst, mode, offset);
+  rtx dst2 = adjust_address (dst, mode, offset + 16);
   rtx reg1 = gen_reg_rtx (mode);
   rtx reg2 = gen_reg_rtx (mode);
-  /* "Cast" the pointers to the correct mode.  */
-  *src = adjust_address (*src, mode, 0);
-  *dst = adjust_address (*dst, mode, 0);
-  /* Emit the memcpy.  */
-  emit_insn (aarch64_gen_load_pair (mode, reg1, *src, reg2,
-   aarch64_progress_pointer (*src)));
-  emit_insn (aarch64_gen_store_pair (mode, *dst, reg1,
-aarch64_progress_pointer (*dst), 
reg2));
-  /* Move the pointers forward.  */
-  *src = aarch64_move_pointer (*src, 32);
-  *dst = aarch64_move_pointer (*dst, 32);
+  *load = aarch64_gen_load_pair (mode, reg1, src1, reg2, src2);
+  *store = aarch64_gen_store_pair (mode, dst1, reg1, dst2, reg2);
   return;
 }
 
   rtx reg = gen_reg_rtx (mode);
-
-  /* "Cast" the pointers to the correct mode.  */
-  *src = adjust_address (*src, mode, 0);
-  *dst = adjust_address (*dst, mode, 0);
-  /* Emit the memcpy.  */
-  emit_move_insn (reg, *src);
-  emit_move_insn (*dst, reg);
-  /* Move the pointers forward.  */
-  *src = aarch64_progress_pointer (*src);
-  *dst = aarch64_progress_pointer (*dst);
+  *load = gen_move_insn (reg, adjust_address (src, mode, offset));
+  *store = gen_move_insn (adjust_address (dst, mode, offset), reg);
 }
 
 /* Expand a cpymem/movmem using the MOPS extension.  OPERANDS are taken
from the cpymem/movmem pattern.  IS_MEMMOVE is true if this is a memmove
rather than memcpy.  Return true iff we succeeded.  */
 bool
-aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove = false)
+aarch64_expand_cpymem_mops (rtx *operands, bool is_memmove)
 {
   if (!TARGET_MOPS)
 return false;
@@ -25302,51 +25287,48 @@ aarch64_expand_cpymem_mops (rtx *operands, bool 
is_memmove = false)
   return true;
 }
 
-/* Expand cpymem, as if from a __builtin_memcpy.  Return true if
-   we succeed, otherwise return false, indicating that a libcall to
-

Re: [ARC PATCH] Split asl dst, 1, src into bset dst, 0, src to implement 1<

2023-10-16 Thread Claudiu Zissulescu Ianculescu
Hi Roger,

Indeed, I was missing the patch file.

Approved.

Thank you for your contribution,
 Claudiu

On Sun, Oct 15, 2023 at 11:14 AM Roger Sayle  wrote:
>
> I’ve done it again. ENOPATCH.
>
>
>
> From: Roger Sayle 
> Sent: 15 October 2023 09:13
> To: 'gcc-patches@gcc.gnu.org' 
> Cc: 'Claudiu Zissulescu' 
> Subject: [ARC PATCH] Split asl dst,1,src into bset dst,0,src to implement 
> 1<
>
>
>
>
> This patch adds a pre-reload splitter to arc.md, to use the bset (set
>
> specific bit instruction) to implement 1<
> on ARC processors that don't have a barrel shifter.
>
>
>
> Currently,
>
>
>
> int foo(int x) {
>
>   return 1 << x;
>
> }
>
>
>
> when compiled with -O2 -mcpu=em is compiled as a loop:
>
>
>
> foo:mov_s   r2,1;3
>
> and.f lp_count,r0, 0x1f
>
> lpnz2f
>
> add r2,r2,r2
>
> nop
>
> 2:  # end single insn loop
>
> j_s.d   [blink]
>
> mov_s   r0,r2   ;4
>
>
>
> with this patch we instead generate a single instruction:
>
>
>
> foo:bsetr0,0,r0
>
> j_s [blink]
>
>
>
>
>
> Finger-crossed this passes Claudiu's nightly testing.  This patch
>
> has been minimally tested by building a cross-compiler cc1 to
>
> arc-linux hosted on x86_64-pc-linux-gnu with no additional failures
>
> seen with make -k check.  Ok for mainline?  Thanks in advance.
>
>
>
>
>
> 2023-10-15  Roger Sayle  
>
>
>
> gcc/ChangeLog
>
> * config/arc/arc.md (*ashlsi3_1): New pre-reload splitter to
>
> use bset dst,0,src to implement 1<
>
>
>
>
> Cheers,
>
> Roger
>
> --
>
>


Re: [PATCH] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread juzhe.zh...@rivai.ai
V2: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633120.html
with some bug fix.



juzhe.zh...@rivai.ai
 
From: Juzhe-Zhong
Date: 2023-10-16 11:57
To: gcc-patches
CC: kito.cheng; kito.cheng; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model 
for non-adjacent load/store
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
  int sum1 = 0;
  int sum2 = 0;
  for (int i = 0; i < n; ++i)
{
  sum1 += x[2*i] - a;
  sum1 += x[2*i+1] * b;
  sum2 += x[2*i] - b;
  sum2 += x[2*i+1] * a;
}
  return sum1 + sum2;
}
 
Before this patch:
 
bar:
ble a3,zero,.L5
csrrt0,vlenb
csrra6,vlenb
sllit1,t0,3
vsetvli a5,zero,e32,m4,ta,ma
sub sp,sp,t1
vid.v   v20
vmv.v.x v12,a1
vand.vi v4,v20,1
vmv.v.x v16,a2
vmseq.viv4,v4,1
sllit3,a6,2
vsetvli zero,a5,e32,m4,ta,ma
vmv1r.v v0,v4
viota.m v8,v4
add a7,t3,sp
vsetvli a5,zero,e32,m4,ta,mu
vand.vi v28,v20,-2
vadd.vi v4,v28,1
vs4r.v  v20,0(a7)-  spill
vrgather.vv v24,v12,v8
vrgather.vv v20,v16,v8
vrgather.vv v24,v16,v8,v0.t
vrgather.vv v20,v12,v8,v0.t
vs4r.v  v4,0(sp)  - spill
sllia3,a3,1
addit4,a6,-1
neg t1,a6
vmv4r.v v0,v20
vmv.v.i v4,0
j   .L4
.L13:
vsetvli a5,zero,e32,m4,ta,ma
.L4:
mv  a7,a3
mv  a4,a3
bleua3,a6,.L3
csrra4,vlenb
.L3:
vmv.v.x v8,t4
vl4re32.v   v12,0(sp) spill
vand.vv v20,v28,v8
vand.vv v8,v12,v8
vsetvli zero,a4,e32,m4,ta,ma
vle32.v v16,0(a0)
vsetvli a5,zero,e32,m4,ta,ma
add a3,a3,t1
vrgather.vv v12,v16,v20
add a0,a0,t3
vrgather.vv v20,v16,v8
vsub.vv v12,v12,v0
vsetvli zero,a4,e32,m4,tu,ma
vadd.vv v4,v4,v12
vmacc.vvv4,v24,v20
bgtua7,a6,.L13
csrra1,vlenb
sllia1,a1,2
add a1,a1,sp
li  a4,-1
csrrt0,vlenb
vsetvli a5,zero,e32,m4,ta,ma
vl4re32.v   v12,0(a1)    spill
vmv.v.i v8,0
vmul.vx v0,v12,a4
li  a2,0
sllit1,t0,3
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmseq.vvv0,v0,v8
vand.vi v12,v12,1
vmerge.vvm  v16,v8,v4,v0
vmseq.vvv12,v12,v8
vmv.s.x v1,a2
vmv1r.v v0,v12
vredsum.vs  v16,v16,v1
vmerge.vvm  v8,v8,v4,v0
vmv.x.s a0,v16
vredsum.vs  v8,v8,v1
vmv.x.s a5,v8
add sp,sp,t1
addwa0,a0,a5
jr  ra
.L5:
li  a0,0
ret
 
We can there are multiple horrible register spillings.
The root cause of this issue is for a scalar IR load:
 
_5 = *_4;
 
We didn't check whether it is a continguous load/store or gather/scatter 
load/store
 
Since it will be translate into:
 
   1. MASK_LEN_GATHER_LOAD (..., perm indice).
   2. Continguous load/store + VEC_PERM (..., perm indice)
 
It's obvious that no matter which situation, we will end up with consuming one 
vector register group (perm indice)
that we didn't count it before.
 
So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost 
model.
 
The key of this patch is:
 
  if ((type == load_vec_info_type || type == store_vec_info_type)
  && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
{
   ...
}
 
Add one more register consumption if it is not an adjacent load/store.
 
After this patch, it pick LMUL = 2 which is optimal:
 
bar:
ble a3,zero,.L4
csrr a6,vlenb
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v6,a2
srli a2,a6,1
vmv.v.x v4,a1
vid.v v12
slli a3,a3,1
vand.vi v0,v12,1
addi t1,a2,-1
vmseq.vi v0,v0,1
slli a6,a6,1
vsetvli zero,a5,e32,m2,ta,ma
neg a7,a2
viota.m v2,v0
vsetvli a5,zero,e32,m2,ta,mu
vrgather.vv v16,v4,v2
vrgather.vv v14,v6,v2
vrgather.vv v16,v6,v2,v0.t
vrgather.vv v14,v4,v2,v0.t
vand.vi v18,v12,-2
vmv.v.i v2,0
vadd.vi v20,v18,1
.L3:
minu a4,a3,a2
vsetvli zero,a4,e32,m2,ta,ma
vle32.v v8,0(a0)
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v4,t1
vand.vv v10,v18,v4
vrgather.vv v6,v8,v10
vsub.vv v6,v6,v14
vsetvli zero,a4,e32,m2,tu,ma
vadd.vv v2,v2,v6
vsetvli a1,zero,e32,m2,ta,ma
vand.vv v4,v20,v4
vrgather.vv v6,v8,v4
vsetvli zero,a4,e32,m2,tu,ma
mv a4,a3
add a0,a0,a6
add a3,a3,a7
vmacc.vv v2,v16,v6
bgtu a4,a2,.L3
vsetvli a1,zero,e32,m2,ta,ma
vand.vi v0,v12,1
vmv.v.i v4,0
li a3,-1
vmseq.vv v0,v0,v4
vmv.s.x v1,zero
vmerge.vvm v6,v4,v2,v0
vredsum.vs v6,v6,v1
vmul.vx v0,v12,a3
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmv.x.s a4,v6
vmseq.

[PATCH V2] RISC-V: Fix unexpected big LMUL choosing in dynamic LMUL model for non-adjacent load/store

2023-10-16 Thread Juzhe-Zhong
Consider this following case:
int
bar (int *x, int a, int b, int n)
{
  x = __builtin_assume_aligned (x, __BIGGEST_ALIGNMENT__);
  int sum1 = 0;
  int sum2 = 0;
  for (int i = 0; i < n; ++i)
{
  sum1 += x[2*i] - a;
  sum1 += x[2*i+1] * b;
  sum2 += x[2*i] - b;
  sum2 += x[2*i+1] * a;
}
  return sum1 + sum2;
}

Before this patch:

bar:
ble a3,zero,.L5
csrrt0,vlenb
csrra6,vlenb
sllit1,t0,3
vsetvli a5,zero,e32,m4,ta,ma
sub sp,sp,t1
vid.v   v20
vmv.v.x v12,a1
vand.vi v4,v20,1
vmv.v.x v16,a2
vmseq.viv4,v4,1
sllit3,a6,2
vsetvli zero,a5,e32,m4,ta,ma
vmv1r.v v0,v4
viota.m v8,v4
add a7,t3,sp
vsetvli a5,zero,e32,m4,ta,mu
vand.vi v28,v20,-2
vadd.vi v4,v28,1
vs4r.v  v20,0(a7)-  spill
vrgather.vv v24,v12,v8
vrgather.vv v20,v16,v8
vrgather.vv v24,v16,v8,v0.t
vrgather.vv v20,v12,v8,v0.t
vs4r.v  v4,0(sp)  - spill
sllia3,a3,1
addit4,a6,-1
neg t1,a6
vmv4r.v v0,v20
vmv.v.i v4,0
j   .L4
.L13:
vsetvli a5,zero,e32,m4,ta,ma
.L4:
mv  a7,a3
mv  a4,a3
bleua3,a6,.L3
csrra4,vlenb
.L3:
vmv.v.x v8,t4
vl4re32.v   v12,0(sp) spill
vand.vv v20,v28,v8
vand.vv v8,v12,v8
vsetvli zero,a4,e32,m4,ta,ma
vle32.v v16,0(a0)
vsetvli a5,zero,e32,m4,ta,ma
add a3,a3,t1
vrgather.vv v12,v16,v20
add a0,a0,t3
vrgather.vv v20,v16,v8
vsub.vv v12,v12,v0
vsetvli zero,a4,e32,m4,tu,ma
vadd.vv v4,v4,v12
vmacc.vvv4,v24,v20
bgtua7,a6,.L13
csrra1,vlenb
sllia1,a1,2
add a1,a1,sp
li  a4,-1
csrrt0,vlenb
vsetvli a5,zero,e32,m4,ta,ma
vl4re32.v   v12,0(a1)    spill
vmv.v.i v8,0
vmul.vx v0,v12,a4
li  a2,0
sllit1,t0,3
vadd.vi v0,v0,-1
vand.vi v0,v0,1
vmseq.vvv0,v0,v8
vand.vi v12,v12,1
vmerge.vvm  v16,v8,v4,v0
vmseq.vvv12,v12,v8
vmv.s.x v1,a2
vmv1r.v v0,v12
vredsum.vs  v16,v16,v1
vmerge.vvm  v8,v8,v4,v0
vmv.x.s a0,v16
vredsum.vs  v8,v8,v1
vmv.x.s a5,v8
add sp,sp,t1
addwa0,a0,a5
jr  ra
.L5:
li  a0,0
ret

We can there are multiple horrible register spillings.
The root cause of this issue is for a scalar IR load:

_5 = *_4;

We didn't check whether it is a continguous load/store or gather/scatter 
load/store

Since it will be translate into:

   1. MASK_LEN_GATHER_LOAD (..., perm indice).
   2. Continguous load/store + VEC_PERM (..., perm indice)

It's obvious that no matter which situation, we will end up with consuming one 
vector register group (perm indice)
that we didn't count it before.

So this case we pick LMUL = 4 which is incorrect choice for dynamic LMUL cost 
model.

The key of this patch is:

  if ((type == load_vec_info_type || type == store_vec_info_type)
  && !adjacent_dr_p (STMT_VINFO_DATA_REF (stmt_info)))
{
   ...
}

Add one more register consumption if it is not an adjacent load/store.

After this patch, it pick LMUL = 2 which is optimal:

bar:
ble a3,zero,.L4
csrra6,vlenb
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v6,a2
srlia2,a6,1
vmv.v.x v4,a1
vid.v   v12
sllia3,a3,1
vand.vi v0,v12,1
addit1,a2,-1
vmseq.viv0,v0,1
sllia6,a6,1
vsetvli zero,a5,e32,m2,ta,ma
neg a7,a2
viota.m v2,v0
vsetvli a5,zero,e32,m2,ta,mu
vrgather.vv v16,v4,v2
vrgather.vv v14,v6,v2
vrgather.vv v16,v6,v2,v0.t
vrgather.vv v14,v4,v2,v0.t
vand.vi v18,v12,-2
vmv.v.i v2,0
vadd.vi v20,v18,1
.L3:
minua4,a3,a2
vsetvli zero,a4,e32,m2,ta,ma
vle32.v v8,0(a0)
vsetvli a5,zero,e32,m2,ta,ma
vmv.v.x v4,t1
vand.vv v10,v18,v4
vrgather.vv v6,v8,v10
vsub.vv v6,v6,v14
vsetvli zero,a4,e32,m2,tu,ma
vadd.vv v2,v2,v6
vsetvli a1,zero,e32,m2,ta,ma
vand.vv v4,v20,v4
vrgather.vv v6,v8,v4
vsetvli zero,a4,e32,m2,tu,ma
mv  a4,a3
add a0,a0,a6
add a3,a3,a7
vmacc.vvv2,v16,v6
bgtua4,a2,.L3
vsetvli a1,zero,e32,m2,ta,ma
vand.vi v0,v12,1
vmv.v.i v4,0
li  a3,

Re: [PATCH] s390: Fix expander popcountv8hi2_vx

2023-10-16 Thread Andreas Krebbel
On 10/16/23 13:20, Stefan Schulze Frielinghaus wrote:
> The normal form of a CONST_INT which represents an integer of a mode
> with fewer bits than in HOST_WIDE_INT is sign extended.  This even holds
> for unsigned integers.
> 
> This fixes an ICE during cse1 where we bail out at rtl.h:2297 since
> INTVAL (x.first) == sext_hwi (INTVAL (x.first), precision) does not hold.
> 
> gcc/ChangeLog:
> 
>   * config/s390/vector.md (popcountv8hi2_vx): Sign extend each
>   unsigned vector element.

Ok. Thanks!

Bye,

Andreas



[PATCH] s390: Fix expander popcountv8hi2_vx

2023-10-16 Thread Stefan Schulze Frielinghaus
The normal form of a CONST_INT which represents an integer of a mode
with fewer bits than in HOST_WIDE_INT is sign extended.  This even holds
for unsigned integers.

This fixes an ICE during cse1 where we bail out at rtl.h:2297 since
INTVAL (x.first) == sext_hwi (INTVAL (x.first), precision) does not hold.

gcc/ChangeLog:

* config/s390/vector.md (popcountv8hi2_vx): Sign extend each
unsigned vector element.
---
 gcc/config/s390/vector.md | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/s390/vector.md b/gcc/config/s390/vector.md
index f0e9ed3d263..7d1eb36e844 100644
--- a/gcc/config/s390/vector.md
+++ b/gcc/config/s390/vector.md
@@ -1154,14 +1154,14 @@
(plus:V16QI (match_dup 2) (match_dup 3)))
; Generate mask for the odd numbered byte elements
(set (match_dup 3)
-   (const_vector:V16QI [(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)
-(const_int 0) (const_int 255)]))
+   (const_vector:V16QI [(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)
+(const_int 0) (const_int -1)]))
; Zero out the even indexed bytes
(set (match_operand:V8HI 0 "register_operand" "=v")
(and:V8HI (subreg:V8HI (match_dup 2) 0)
-- 
2.41.0



[PATCH] tree-optimization/111807 - ICE in verify_sra_access_forest

2023-10-16 Thread Richard Biener
The following addresses build_reconstructed_reference failing to
build references with a different offset than the models and thus
the caller conditional being off.  This manifests when attempting
to build a ref with offset 160 from the model BIT_FIELD_REF 
onto the same base l_4827 but the models offset being 288.  This
cannot work for any kind of ref I can think of, not just with
BIT_FIELD_REFs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will push
later.

Martin - do you remember which case was supposed to be allowed
with offset < model->offset?

Thanks,
Richard.

PR tree-optimization/111807
* tree-sra.cc (build_ref_for_model): Only call
build_reconstructed_reference when the offsets are the same.

* gcc.dg/torture/pr111807.c: New testcase.
---
 gcc/testsuite/gcc.dg/torture/pr111807.c | 12 
 gcc/tree-sra.cc |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr111807.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr111807.c 
b/gcc/testsuite/gcc.dg/torture/pr111807.c
new file mode 100644
index 000..09fbdcfb667
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr111807.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+
+static struct A {
+  int x : 4;
+} a;
+static int b;
+int main()
+{
+  struct A t[2];
+  t[0] = b ? t[1] : a;
+  return (b ? t[1].x : 0) && 1;
+}
diff --git a/gcc/tree-sra.cc b/gcc/tree-sra.cc
index 24d0c20da6a..f8dff8b27d7 100644
--- a/gcc/tree-sra.cc
+++ b/gcc/tree-sra.cc
@@ -1751,7 +1751,7 @@ build_ref_for_model (location_t loc, tree base, 
HOST_WIDE_INT offset,
  && !TREE_THIS_VOLATILE (base)
  && (TYPE_ADDR_SPACE (TREE_TYPE (base))
  == TYPE_ADDR_SPACE (TREE_TYPE (model->expr)))
- && offset <= model->offset
+ && offset == model->offset
  /* build_reconstructed_reference can still fail if we have already
 massaged BASE because of another type incompatibility.  */
  && (res = build_reconstructed_reference (loc, base, model)))
-- 
2.35.3


Re: [PATCH] genemit: Split insn-emit.cc into ten files.

2023-10-16 Thread Sam James


Robin Dapp  writes:

> Hi,
>
> the attached v2 includes Tamar's suggestion of keeping the current
> stdout behavior.  When no output files are passed (via -O) the output
> is written to stdout as before.
>
> Tamar also mentioned off-list that, similar to match.pd, it might make
> sense to balance the partitions in a better way than a fixed number
> of patterns threshold.  That's a good idea but I'd rather do that
> separately as the current approach already helps considerably.
>
> Attached v2 was bootstrapped and regtested on power10, aarch64 and
> x86 are still running.
> Stefan also tested v1 on s390 where the partitioning does not help
> but also doesn't slow anything down.  insn-emit.cc isn't very large
> to begin with on s390.

I tested v1 on x86/arm64, I'll do at least the same for v2. (I also
backported it to 13 for my own purposes locally and everything seemed
fine there.)

I didn't notice any change on my x86 machine but it's already
quite powerful and I didn't have a chance to try on anything weaker,
so I wasn't too surprised.

>
> Regards
>  Robin
>
> From 34d05113a4e3c7e83a4731020307e26c1144af69 Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Thu, 12 Oct 2023 11:23:26 +0200
> Subject: [PATCH v2] genemit: Split insn-emit.cc into several partitions.
>
> On riscv insn-emit.cc has grown to over 1.2 mio lines of code and
> compiling it takes considerable time.
> Therefore, this patch adjust genemit to create several partitions
> (insn-emit-1.cc to insn-emit-n.cc).  In order to do so it first counts
> the number of available patterns, calculates the number of patterns per
> file and starts a new file whenever that number is reached.
>
> Similar to match.pd a configure option --with-emitinsn-partitions=num
> is introduced that makes the number of partition configurable.
>
> gcc/ChangeLog:
>
>   PR bootstrap/84402
>   PR target/111600
>
>   * Makefile.in: Handle split insn-emit.cc.
>   * configure: Regenerate.
>   * configure.ac: Add --with-insnemit-partitions.
>   * genemit.cc (output_peephole2_scratches): Print to file instead
>   of stdout.
>   (print_code): Ditto.
>   (gen_rtx_scratch): Ditto.
>   (gen_exp): Ditto.
>   (gen_emit_seq): Ditto.
>   (emit_c_code): Ditto.
>   (gen_insn): Ditto.
>   (gen_expand): Ditto.
>   (gen_split): Ditto.
>   (output_add_clobbers): Ditto.
>   (output_added_clobbers_hard_reg_p): Ditto.
>   (print_overload_arguments): Ditto.
>   (print_overload_test): Ditto.
>   (handle_overloaded_code_for): Ditto.
>   (handle_overloaded_gen): Ditto.
>   (print_header): New function.
>   (handle_arg): New function.
>   (main): Split output into 10 files.
>   * gensupport.cc (count_patterns): New function.
>   * gensupport.h (count_patterns): Define.
>   * read-md.cc (md_reader::print_md_ptr_loc): Add file argument.
>   * read-md.h (class md_reader): Change definition.
> ---
>  gcc/Makefile.in   |  38 +++-
>  gcc/configure |  24 ++-
>  gcc/configure.ac  |  13 ++
>  gcc/genemit.cc| 536 +-
>  gcc/gensupport.cc |  36 
>  gcc/gensupport.h  |   1 +
>  gcc/read-md.cc|   4 +-
>  gcc/read-md.h |   2 +-
>  8 files changed, 399 insertions(+), 255 deletions(-)
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 9cc16268abf..ca0a616f768 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -236,6 +236,13 @@ GIMPLE_MATCH_PD_SEQ_O = $(patsubst %, gimple-match-%.o, 
> $(MATCH_SPLITS_SEQ))
>  GENERIC_MATCH_PD_SEQ_SRC = $(patsubst %, generic-match-%.cc, 
> $(MATCH_SPLITS_SEQ))
>  GENERIC_MATCH_PD_SEQ_O = $(patsubst %, generic-match-%.o, 
> $(MATCH_SPLITS_SEQ))
>  
> +# The number of splits to be made for the insn-emit files.
> +NUM_INSNEMIT_SPLITS = @DEFAULT_INSNEMIT_PARTITIONS@
> +INSNEMIT_SPLITS_SEQ = $(wordlist 1,$(NUM_INSNEMIT_SPLITS),$(one_to_))
> +INSNEMIT_SEQ_SRC = $(patsubst %, insn-emit-%.cc, $(INSNEMIT_SPLITS_SEQ))
> +INSNEMIT_SEQ_TMP = $(patsubst %, tmp-emit-%.cc, $(INSNEMIT_SPLITS_SEQ))
> +INSNEMIT_SEQ_O = $(patsubst %, insn-emit-%.o, $(INSNEMIT_SPLITS_SEQ))
> +
>  # These files are to have specific diagnostics suppressed, or are not to
>  # be subject to -Werror:
>  # flex output may yield harmless "no previous prototype" warnings
> @@ -1354,7 +1361,7 @@ OBJS = \
>   insn-attrtab.o \
>   insn-automata.o \
>   insn-dfatab.o \
> - insn-emit.o \
> + $(INSNEMIT_SEQ_O) \
>   insn-extract.o \
>   insn-latencytab.o \
>   insn-modes.o \
> @@ -1852,7 +1859,8 @@ TREECHECKING = @TREECHECKING@
>  FULL_DRIVER_NAME=$(target_noncanonical)-gcc-$(version)$(exeext)
>  
>  MOSTLYCLEANFILES = insn-flags.h insn-config.h insn-codes.h \
> - insn-output.cc insn-recog.cc insn-emit.cc insn-extract.cc insn-peep.cc \
> + insn-output.cc insn-recog.cc $(INSNEMIT_SEQ_SRC) \
> + insn-extract.cc insn-peep.cc \
>   insn-attr.h insn-attr-common.h insn-attrtab.cc insn-dfatab.cc \
>   insn-latencyt

Re: [PATCH] genemit: Split insn-emit.cc into ten files.

2023-10-16 Thread Robin Dapp
Hi,

the attached v2 includes Tamar's suggestion of keeping the current
stdout behavior.  When no output files are passed (via -O) the output
is written to stdout as before.

Tamar also mentioned off-list that, similar to match.pd, it might make
sense to balance the partitions in a better way than a fixed number
of patterns threshold.  That's a good idea but I'd rather do that
separately as the current approach already helps considerably.

Attached v2 was bootstrapped and regtested on power10, aarch64 and
x86 are still running.
Stefan also tested v1 on s390 where the partitioning does not help
but also doesn't slow anything down.  insn-emit.cc isn't very large
to begin with on s390.

Regards
 Robin

>From 34d05113a4e3c7e83a4731020307e26c1144af69 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Thu, 12 Oct 2023 11:23:26 +0200
Subject: [PATCH v2] genemit: Split insn-emit.cc into several partitions.

On riscv insn-emit.cc has grown to over 1.2 mio lines of code and
compiling it takes considerable time.
Therefore, this patch adjust genemit to create several partitions
(insn-emit-1.cc to insn-emit-n.cc).  In order to do so it first counts
the number of available patterns, calculates the number of patterns per
file and starts a new file whenever that number is reached.

Similar to match.pd a configure option --with-emitinsn-partitions=num
is introduced that makes the number of partition configurable.

gcc/ChangeLog:

PR bootstrap/84402
PR target/111600

* Makefile.in: Handle split insn-emit.cc.
* configure: Regenerate.
* configure.ac: Add --with-insnemit-partitions.
* genemit.cc (output_peephole2_scratches): Print to file instead
of stdout.
(print_code): Ditto.
(gen_rtx_scratch): Ditto.
(gen_exp): Ditto.
(gen_emit_seq): Ditto.
(emit_c_code): Ditto.
(gen_insn): Ditto.
(gen_expand): Ditto.
(gen_split): Ditto.
(output_add_clobbers): Ditto.
(output_added_clobbers_hard_reg_p): Ditto.
(print_overload_arguments): Ditto.
(print_overload_test): Ditto.
(handle_overloaded_code_for): Ditto.
(handle_overloaded_gen): Ditto.
(print_header): New function.
(handle_arg): New function.
(main): Split output into 10 files.
* gensupport.cc (count_patterns): New function.
* gensupport.h (count_patterns): Define.
* read-md.cc (md_reader::print_md_ptr_loc): Add file argument.
* read-md.h (class md_reader): Change definition.
---
 gcc/Makefile.in   |  38 +++-
 gcc/configure |  24 ++-
 gcc/configure.ac  |  13 ++
 gcc/genemit.cc| 536 +-
 gcc/gensupport.cc |  36 
 gcc/gensupport.h  |   1 +
 gcc/read-md.cc|   4 +-
 gcc/read-md.h |   2 +-
 8 files changed, 399 insertions(+), 255 deletions(-)

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 9cc16268abf..ca0a616f768 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -236,6 +236,13 @@ GIMPLE_MATCH_PD_SEQ_O = $(patsubst %, gimple-match-%.o, 
$(MATCH_SPLITS_SEQ))
 GENERIC_MATCH_PD_SEQ_SRC = $(patsubst %, generic-match-%.cc, 
$(MATCH_SPLITS_SEQ))
 GENERIC_MATCH_PD_SEQ_O = $(patsubst %, generic-match-%.o, $(MATCH_SPLITS_SEQ))
 
+# The number of splits to be made for the insn-emit files.
+NUM_INSNEMIT_SPLITS = @DEFAULT_INSNEMIT_PARTITIONS@
+INSNEMIT_SPLITS_SEQ = $(wordlist 1,$(NUM_INSNEMIT_SPLITS),$(one_to_))
+INSNEMIT_SEQ_SRC = $(patsubst %, insn-emit-%.cc, $(INSNEMIT_SPLITS_SEQ))
+INSNEMIT_SEQ_TMP = $(patsubst %, tmp-emit-%.cc, $(INSNEMIT_SPLITS_SEQ))
+INSNEMIT_SEQ_O = $(patsubst %, insn-emit-%.o, $(INSNEMIT_SPLITS_SEQ))
+
 # These files are to have specific diagnostics suppressed, or are not to
 # be subject to -Werror:
 # flex output may yield harmless "no previous prototype" warnings
@@ -1354,7 +1361,7 @@ OBJS = \
insn-attrtab.o \
insn-automata.o \
insn-dfatab.o \
-   insn-emit.o \
+   $(INSNEMIT_SEQ_O) \
insn-extract.o \
insn-latencytab.o \
insn-modes.o \
@@ -1852,7 +1859,8 @@ TREECHECKING = @TREECHECKING@
 FULL_DRIVER_NAME=$(target_noncanonical)-gcc-$(version)$(exeext)
 
 MOSTLYCLEANFILES = insn-flags.h insn-config.h insn-codes.h \
- insn-output.cc insn-recog.cc insn-emit.cc insn-extract.cc insn-peep.cc \
+ insn-output.cc insn-recog.cc $(INSNEMIT_SEQ_SRC) \
+ insn-extract.cc insn-peep.cc \
  insn-attr.h insn-attr-common.h insn-attrtab.cc insn-dfatab.cc \
  insn-latencytab.cc insn-opinit.cc insn-opinit.h insn-preds.cc 
insn-constants.h \
  tm-preds.h tm-constrs.h checksum-options $(GIMPLE_MATCH_PD_SEQ_SRC) \
@@ -2481,11 +2489,11 @@ $(common_out_object_file): $(common_out_file)
 # and compile them.
 
 .PRECIOUS: insn-config.h insn-flags.h insn-codes.h insn-constants.h \
-  insn-emit.cc insn-recog.cc insn-extract.cc insn-output.cc insn-peep.cc \
-  insn-attr.h insn-attr-common.h insn-attrtab.cc insn-dfatab.cc \
-  insn-latencytab.cc insn-preds.cc 

Re: [PATCH] [PR31531] MATCH: Improve ~a < ~b and ~a < CST, allow a nop cast inbetween ~ and a/b

2023-10-16 Thread Richard Biener
On Mon, Oct 16, 2023 at 4:34 AM Andrew Pinski  wrote:
>
> Currently we able to simplify `~a CMP ~b` to `b CMP a` but we should allow a 
> nop
> conversion in between the `~` and the `a` which can show up. A similarly 
> thing should
> be done for `~a CMP CST`.
>
> I had originally submitted the `~a CMP CST` case as
> https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585088.html;
> I noticed we should do the same thing for the `~a CMP ~b` case and combined
> it with that one here.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> PR tree-optimization/31531
>
> gcc/ChangeLog:
>
> * match.pd (~X op ~Y): Allow for an optional nop convert.
> (~X op C): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr31531-1.c: New test.
> * gcc.dg/tree-ssa/pr31531-2.c: New test.
> ---
>  gcc/match.pd  | 10 ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c | 19 +
>  gcc/testsuite/gcc.dg/tree-ssa/pr31531-2.c | 34 +++
>  3 files changed, 59 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr31531-2.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 51e5065d086..e76ec1ec034 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5944,18 +5944,20 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* Fold ~X op ~Y as Y op X.  */
>  (for cmp (simple_comparison)
>   (simplify
> -  (cmp (bit_not@2 @0) (bit_not@3 @1))
> +  (cmp (nop_convert1?@4 (bit_not@2 @0)) (nop_convert2? (bit_not@3 @1)))
>(if (single_use (@2) && single_use (@3))
> -   (cmp @1 @0
> +   (with { tree otype = TREE_TYPE (@4); }
> +(cmp (convert:otype @1) (convert:otype @0))
>
>  /* Fold ~X op C as X op' ~C, where op' is the swapped comparison.  */
>  (for cmp (simple_comparison)
>   scmp (swapped_simple_comparison)
>   (simplify
> -  (cmp (bit_not@2 @0) CONSTANT_CLASS_P@1)
> +  (cmp (nop_convert? (bit_not@2 @0)) CONSTANT_CLASS_P@1)
>(if (single_use (@2)
> && (TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST))
> -   (scmp @0 (bit_not @1)
> +   (with { tree otype = TREE_TYPE (@1); }
> +(scmp (convert:otype @0) (bit_not @1))
>
>  (for cmp (simple_comparison)
>   (simplify
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c
> new file mode 100644
> index 000..c27299151eb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* PR tree-optimization/31531 */
> +
> +int f(int a)
> +{
> +  int b = ~a;
> +  return b<0;
> +}
> +
> +
> +int f1(unsigned a)
> +{
> +  int b = ~a;
> +  return b<0;
> +}
> +/* We should convert the above two functions from b <0 to ((int)a) >= 0. */
> +/* { dg-final { scan-tree-dump-times ">= 0" 2 "optimized"} } */
> +/* { dg-final { scan-tree-dump-times "~" 0 "optimized"} } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr31531-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr31531-2.c
> new file mode 100644
> index 000..865ea292215
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr31531-2.c
> @@ -0,0 +1,34 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* PR tree-optimization/31531 */
> +
> +int f0(unsigned x, unsigned t)
> +{
> +x = ~x;
> +t = ~t;
> +int xx = x;
> +int tt = t;
> +return tt < xx;
> +}
> +
> +int f1(unsigned x, int t)
> +{
> +x = ~x;
> +t = ~t;
> +int xx = x;
> +int tt = t;
> +return tt < xx;
> +}
> +
> +int f2(int x, unsigned t)
> +{
> +x = ~x;
> +t = ~t;
> +int xx = x;
> +int tt = t;
> +return tt < xx;
> +}
> +
> +
> +/* We should be able to remove all ~ from the above functions. */
> +/* { dg-final { scan-tree-dump-times "~" 0 "optimized"} } */
> --
> 2.39.3
>


Re: [PATCH] Improve factor_out_conditional_operation for conversions and constants

2023-10-16 Thread Richard Biener
On Mon, Oct 16, 2023 at 2:02 AM Andrew Pinski  wrote:
>
> In the case of a NOP conversion (precisions of the 2 types are equal),
> factoring out the conversion can be done even if int_fits_type_p returns
> false and even when the conversion is defined by a statement inside the
> conditional. Since it is a NOP conversion there is no zero/sign extending
> happening which is why it is ok to be done here.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>
> gcc/ChangeLog:
>
> PR tree-optimization/104376
> PR tree-optimization/101541
> * tree-ssa-phiopt.cc (factor_out_conditional_operation):
> Allow nop conversions even if it is defined by a statement
> inside the conditional.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/101541
> * gcc.dg/tree-ssa/phi-opt-38.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c | 44 ++
>  gcc/tree-ssa-phiopt.cc |  8 +++-
>  2 files changed, 50 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c
> new file mode 100644
> index 000..ca04d1619e6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-38.c
> @@ -0,0 +1,44 @@
> +/* { dg-options "-O2 -fdump-tree-phiopt" } */
> +
> +unsigned f0(int A)
> +{
> +// A == 0? A : -Asame as -A
> +  if (A == 0)  return A;
> +  return -A;
> +}
> +
> +unsigned f1(int A)
> +{
> +// A != 0? A : -Asame as A
> +  if (A != 0)  return A;
> +  return -A;
> +}
> +unsigned f2(int A)
> +{
> +// A >= 0? A : -Asame as abs (A)
> +  if (A >= 0)  return A;
> +  return -A;
> +}
> +unsigned f3(int A)
> +{
> +// A > 0?  A : -Asame as abs (A)
> +  if (A > 0)  return A;
> +  return -A;
> +}
> +unsigned f4(int A)
> +{
> +// A <= 0? A : -Asame as -abs (A)
> +  if (A <= 0)  return A;
> +  return -A;
> +}
> +unsigned f5(int A)
> +{
> +// A < 0?  A : -Asame as -abs (A)
> +  if (A < 0)  return A;
> +  return -A;
> +}
> +
> +/* f4 and f5 are not allowed to be optimized in early phi-opt. */
> +/* { dg-final { scan-tree-dump-times "if" 2 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-not "if" "phiopt2" } } */
> +
> diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
> index 312a6f9082b..0ab8fad5898 100644
> --- a/gcc/tree-ssa-phiopt.cc
> +++ b/gcc/tree-ssa-phiopt.cc
> @@ -310,7 +310,9 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
> *phi,
> return NULL;
>/* If arg1 is an INTEGER_CST, fold it to new type.  */
>if (INTEGRAL_TYPE_P (TREE_TYPE (new_arg0))
> - && int_fits_type_p (arg1, TREE_TYPE (new_arg0)))
> + && (int_fits_type_p (arg1, TREE_TYPE (new_arg0))
> + || TYPE_PRECISION (TREE_TYPE (new_arg0))
> + == TYPE_PRECISION (TREE_TYPE (arg1

can you add parens for auto-indent?

> {
>   if (gimple_assign_cast_p (arg0_def_stmt))
> {
> @@ -323,7 +325,9 @@ factor_out_conditional_operation (edge e0, edge e1, gphi 
> *phi,
>  its basic block, because then it is possible this
>  could enable further optimizations (minmax replacement
>  etc.).  See PR71016.  */

Doesn't the comment also apply for equal precision?

> - if (new_arg0 != gimple_cond_lhs (cond_stmt)
> + if (TYPE_PRECISION (TREE_TYPE (new_arg0))
> +   != TYPE_PRECISION (TREE_TYPE (arg1))
> + && new_arg0 != gimple_cond_lhs (cond_stmt)
>   && new_arg0 != gimple_cond_rhs (cond_stmt)
>   && gimple_bb (arg0_def_stmt) == e0->src)
> {

When we later fold_convert () I think you want to drop TREE_OVERFLOW
which we eventually add.

Otherwise OK I think.

Richard.

> --
> 2.34.1
>


[PATCH RFA] PR target/111815: VAX: Only accept the index scaler as the RHS operand to ASHIFT

2023-10-16 Thread Maciej W. Rozycki
As from commit 9df1ba9a35b8 ("libbacktrace: support zstd decompression") 
GCC for the `vax-netbsdelf' target fails to complete building, with an 
ICE:

during RTL pass: final
.../libbacktrace/elf.c: In function 'elf_zstd_decompress':
.../libbacktrace/elf.c:5006:1: internal compiler error: in 
print_operand_address, at config/vax/vax.cc:514
 5006 | }
  | ^
0x1113df97 print_operand_address(_IO_FILE*, rtx_def*)
.../gcc/config/vax/vax.cc:514
0x10c2489b default_print_operand_address(_IO_FILE*, machine_mode, rtx_def*)
.../gcc/targhooks.cc:373
0x106ddd0b output_address(machine_mode, rtx_def*)
.../gcc/final.cc:3648
0x106ddd0b output_asm_insn(char const*, rtx_def**)
.../gcc/final.cc:3505
0x106e2143 output_asm_insn(char const*, rtx_def**)
.../gcc/final.cc:3421
0x106e2143 final_scan_insn_1
.../gcc/final.cc:2841
0x106e28e3 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
.../gcc/final.cc:2887
0x106e2bf7 final_1
.../gcc/final.cc:1979
0x106e3c67 rest_of_handle_final
.../gcc/final.cc:4240
0x106e3c67 execute
.../gcc/final.cc:4318
Please submit a full bug report, with preprocessed source (by using 
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

This is due to combine producing an invalid address RTX:

(plus:SI (ashift:SI (const_int 1 [0x1])
(reg:QI 3 %r3 [1232]))
(reg/v:SI 10 %r10 [orig:736 weight_mask ] [736]))

where the expression is ((1 << R3) + R10), which does not match a valid 
machine addressing mode.  Consequently `print_operand_address' chokes.

This can be reduced to the testcase included, where it triggers the same 
ICE in `p'.  Preincrements are required so that their results land in 
registers and consequently an indexed addressing mode is tried or 
otherwise doing operations piecemeal on stack-based function arguments 
as direct input operands turns out more profitable in terms of RTX costs 
and the ICE is avoided.

The ultimate cause has been commit c605a8bf9270 ("VAX: Accept ASHIFT in 
address expressions"), where a shift of an immediate value by a register 
has been mistakenly allowed as an index expression as if the shift 
operation was commutative such as multiplication is.  So with ASHIFT the 
scaler in an index expression has to be the right-hand operand, and the 
backend has to enforce that, whereas with MULT the scaler can be either 
operand.

Fix this by only accepting the index scaler as the RHS operand to 
ASHIFT.

gcc/
PR target/111815
* config/vax/vax.cc (index_term_p): Only accept the index scaler 
as the RHS operand to ASHIFT.

gcc/testsuite/
PR target/111815
* gcc.dg/torture/pr111815.c: New test.
---
Hi,

 The testcase is generic enough I thought it wouldn't hurt to place it in 
a generic part of the testsuite, where it has been verified to pass with 
the `powerpc64le-linux-gnu', `riscv64-linux-gnu', and `vax-netbsdelf' 
targets.  I'm fine to move it to the VAX part of the testsuite though if 
there's disagreement as to my choice.  Otherwise OK to apply for this 
part?

 NB as a serious backend issue I mean to backport this change to active 
release branches as well.

  Maciej
---
 gcc/config/vax/vax.cc   |9 ++---
 gcc/testsuite/gcc.dg/torture/pr111815.c |   26 ++
 2 files changed, 32 insertions(+), 3 deletions(-)

gcc-vax-index-ashift-noncommutative.diff
Index: gcc/gcc/config/vax/vax.cc
===
--- gcc.orig/gcc/config/vax/vax.cc
+++ gcc/gcc/config/vax/vax.cc
@@ -1831,7 +1831,9 @@ nonindexed_address_p (rtx x, bool strict
 }
 
 /* True if PROD is either a reg times size of mode MODE and MODE is less
-   than or equal 8 bytes, or just a reg if MODE is one byte.  */
+   than or equal 8 bytes, or just a reg if MODE is one byte.  For a MULT
+   RTX we accept its operands in either order, however ASHIFT is not
+   commutative, so in that case reg has to be the left operand.  */
 
 static bool
 index_term_p (rtx prod, machine_mode mode, bool strict)
@@ -1850,8 +1852,9 @@ index_term_p (rtx prod, machine_mode mod
   xfoo0 = XEXP (prod, 0);
   xfoo1 = XEXP (prod, 1);
 
-  if (CONST_INT_P (xfoo0)
-  && GET_MODE_SIZE (mode) == (log_p ? 1 << INTVAL (xfoo0) : INTVAL (xfoo0))
+  if (!log_p
+  && CONST_INT_P (xfoo0)
+  && GET_MODE_SIZE (mode) == INTVAL (xfoo0)
   && INDEX_REGISTER_P (xfoo1, strict))
 return true;
 
Index: gcc/gcc/testsuite/gcc.dg/torture/pr111815.c
===
--- /dev/null
+++ gcc/gcc/testsuite/gcc.dg/torture/pr111815.c
@@ -0,0 +1,26 @@
+/* { dg-do run } */
+
+char x[] = {
+   0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
+  16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31,
+};
+
+__attribute__ ((noinline)) char *
+p (char *a, int o, int i)
+{
+  r

Re: Re: [PATCH] RISC-V: Use VLS modes if the NITERS is known and smaller than VLS mode elements.

2023-10-16 Thread juzhe.zh...@rivai.ai
Thanks Robin.

Committed.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-10-16 17:12
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Use VLS modes if the NITERS is known and smaller 
than VLS mode elements.
Hi Juzhe,
 
this LGTM.  I was first concerned whether we would want to
stop e.g. at LMUL = 1 and only continue with a specific flag but
actually this should be done via the costs.  If an implementation
wants to penalize or incentivize some behavior it can always
adjust the costs which should be sufficient.
 
Regards
Robin
 


[Patch] nvptx: Use fatal_error when -march= is missing not an assert [PR111093]

2023-10-16 Thread Tobias Burnus

While mkoffload ensures that there is always a -march=, nvptx's
cc1 can also be run directly.

In my case, I wanted to know which target-specific #define are
available; hence, I did run:
  accel/nvptx-none/cc1 -E -dM < /dev/null
which gave an ICE. After some debugging, the reasons was
clear (missing -march=) but somehow a (fatal) error would have been
nicer than an ICE + debugging.

OK for mainline?

Tobias
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
nvptx: Use fatal_error when -march= is missing not an assert [PR111093]

gcc/ChangeLog:

	PR target/111093
	* config/nvptx/nvptx.cc (nvptx_option_override): Issue fatal error
	instead of an assert ICE when no -march= has been specified.

diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc
index edef39fb5e1..634c31673be 100644
--- a/gcc/config/nvptx/nvptx.cc
+++ b/gcc/config/nvptx/nvptx.cc
@@ -335,8 +335,9 @@ nvptx_option_override (void)
   init_machine_status = nvptx_init_machine_status;
 
   /* Via nvptx 'OPTION_DEFAULT_SPECS', '-misa' always appears on the command
- line.  */
-  gcc_checking_assert (OPTION_SET_P (ptx_isa_option));
+ line; but handle the case that the compiler is not run via the driver.  */
+  if (!OPTION_SET_P (ptx_isa_option))
+fatal_error (UNKNOWN_LOCATION, "%<-march=%> must be specified");
 
   handle_ptx_version_option ();
 


Re: [PATCH] Do not prepend target triple to -fuse-ld=lld,mold.

2023-10-16 Thread Richard Biener
On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:

> 
> 
> > On Oct 16, 2023, at 17:55, Richard Biener  wrote:
> > 
> > On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
> > 
> >> 
> >> 
> >>> On Oct 16, 2023, at 17:39, Richard Biener  wrote:
> >>> 
> >>> On Mon, 16 Oct 2023, Tatsuyuki Ishi wrote:
> >>> 
>  lld and mold are platform-agnostic and not prefixed with target triple.
>  Prepending the target triple makes it less likely to find the intended
>  linker executable.
>  
>  A potential breaking change is that we no longer try to search for
>  triple-prefixed lld/mold binaries anymore. However, since there doesn't
>  seem to be support to build LLVM or mold with triple-prefixed executable
>  names, it seems better to just not bother with that case.
>  
>   PR driver/111605
>  
>  gcc/Changelog:
>  
>   * collect2.cc (main): Do not prepend target triple to
>   -fuse-ld=lld,mold.
>  ---
>  gcc/collect2.cc | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>  
>  diff --git a/gcc/collect2.cc b/gcc/collect2.cc
>  index 63b9a0c233a..c943f9f577c 100644
>  --- a/gcc/collect2.cc
>  +++ b/gcc/collect2.cc
>  @@ -865,12 +865,15 @@ main (int argc, char **argv)
>   int i;
>  
>   for (i = 0; i < USE_LD_MAX; i++)
>  -full_ld_suffixes[i]
>  #ifdef CROSS_DIRECTORY_STRUCTURE
>  -  = concat (target_machine, "-", ld_suffixes[i], NULL);
>  -#else
>  -  = ld_suffixes[i];
>  -#endif
>  +/* lld and mold are platform-agnostic and not prefixed with target
>  +   triple.  */
>  +if (!(i == USE_LLD_LD || i == USE_MOLD_LD))
>  +  full_ld_suffixes[i] = concat (target_machine, "-", ld_suffixes[i],
>  +NULL);
>  +else
>  +#endif
>  +  full_ld_suffixes[i] = ld_suffixes[i];
>  
>   p = argv[0] + strlen (argv[0]);
>   while (p != argv[0] && !IS_DIR_SEPARATOR (p[-1]))
> >>> 
> >>> Since we later do
> >>> 
> >>> /* Search the compiler directories for `ld'.  We have protection against
> >>>recursive calls in find_a_file.  */
> >>> if (ld_file_name == 0)
> >>>   ld_file_name = find_a_file (&cpath, ld_suffixes[selected_linker], 
> >>> X_OK);
> >>> /* Search the ordinary system bin directories
> >>>for `ld' (if native linking) or `TARGET-ld' (if cross).  */
> >>> if (ld_file_name == 0)
> >>>   ld_file_name = find_a_file (&path, full_ld_suffixes[selected_linker], 
> >>> X_OK);
> >>> 
> >>> I wonder how having full_ld_suffixes[LLD|MOLD] == ld_suffixes[LLD|MOLD]
> >>> fixes anything?
> >> 
> >> Per the linked PR, the intended use case for this is when one wants to use 
> >> their system lld/mold with a separately packaged cross toolchain, without 
> >> requiring them to symlink their system lld/mold into the cross toolchain 
> >> bin directory.
> >> 
> >> (Note that the first search is against COMPILER_PATH while the latter is 
> >> against PATH).
> > 
> > Ah.  So what about instead adding here
> > 
> >   /* Search the ordinary system bin directories for mold/lld even in
> >  a cross configuration.  */
> >   if (ld_file_name == 0
> >   && selected_linker == ...)
> > ld_file_name = find_a_file (&path, ld_suffixes[selected_linker], X_OK);
> > 
> > instead?  That would keep things working in case the user has a
> > xyz-arch-mold in the system dir but uses GNU ld on the host
> > otherwise, lacking a 'mold' binary there?
> > 
> > That is, we'd only add, not change what we search for.
> 
> I considered that, but as described in commit message, it doesn?t seem anyone 
> has created stuff named xyz-arch-lld or xyz-arch-mold. Closest is Gentoo?s 
> symlink mentioned in this thread, but that?s xyz-arch-ld -> ld.lld/mold.
> As such, this feels like a quirk, not something we need to keep compatibility 
> for.

I don't have a good idea whether this is the case or not unfortunately
so if it's my call I would err on the safe side.

We seem to recognize mold and lld only since GCC 12 which both are
still maintained so I think we might want to do the change on all
those branches?

If you feel confident there's indeed no such installs then let's go
with your original patch.

Thus, OK for trunk and the affected branches after a while of no
reported issues.

Thanks,
Richard.

> The proposed change seems simple enough though, so if you consider this 
> a compatibility issue I can go for that way as well.

> Tatsuyuki.
> 
> > 
> > Thanks,
> > Richard.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] RISC-V: Use VLS modes if the NITERS is known and smaller than VLS mode elements.

2023-10-16 Thread Robin Dapp
Hi Juzhe,

this LGTM.  I was first concerned whether we would want to
stop e.g. at LMUL = 1 and only continue with a specific flag but
actually this should be done via the costs.  If an implementation
wants to penalize or incentivize some behavior it can always
adjust the costs which should be sufficient.

Regards
 Robin


Re: [PATCH] MATCH: Improve `A CMP 0 ? A : -A` set of patterns to use bitwise_equal_p.

2023-10-16 Thread Richard Biener
On Mon, Oct 16, 2023 at 12:00 AM Andrew Pinski  wrote:
>
> This improves the `A CMP 0 ? A : -A` set of match patterns to use
> bitwise_equal_p which allows an nop cast between signed and unsigned.
> This allows catching a few extra cases which were not being caught before.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK.

> gcc/ChangeLog:
>
> PR tree-optimization/101541
> * match.pd (A CMP 0 ? A : -A): Improve
> using bitwise_equal_p.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/101541
> * gcc.dg/tree-ssa/phi-opt-36.c: New test.
> * gcc.dg/tree-ssa/phi-opt-37.c: New test.
> ---
>  gcc/match.pd   | 49 -
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-36.c | 51 ++
>  gcc/testsuite/gcc.dg/tree-ssa/phi-opt-37.c | 24 ++
>  3 files changed, 104 insertions(+), 20 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-36.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/phi-opt-37.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 45624f3dcb4..142e2dfbeb1 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5668,42 +5668,51 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   /* A == 0 ? A : -Asame as -A */
>   (for cmp (eq uneq)
>(simplify
> -   (cnd (cmp @0 zerop) @0 (negate@1 @0))
> -(if (!HONOR_SIGNED_ZEROS (type))
> +   (cnd (cmp @0 zerop) @2 (negate@1 @2))
> +(if (!HONOR_SIGNED_ZEROS (type)
> +&& bitwise_equal_p (@0, @2))
>   @1))
>(simplify
> -   (cnd (cmp @0 zerop) zerop (negate@1 @0))
> -(if (!HONOR_SIGNED_ZEROS (type))
> +   (cnd (cmp @0 zerop) zerop (negate@1 @2))
> +(if (!HONOR_SIGNED_ZEROS (type)
> +&& bitwise_equal_p (@0, @2))
>   @1))
>   )
>   /* A != 0 ? A : -Asame as A */
>   (for cmp (ne ltgt)
>(simplify
> -   (cnd (cmp @0 zerop) @0 (negate @0))
> -(if (!HONOR_SIGNED_ZEROS (type))
> - @0))
> +   (cnd (cmp @0 zerop) @1 (negate @1))
> +(if (!HONOR_SIGNED_ZEROS (type)
> +&& bitwise_equal_p (@0, @1))
> + @1))
>(simplify
> -   (cnd (cmp @0 zerop) @0 integer_zerop)
> -(if (!HONOR_SIGNED_ZEROS (type))
> - @0))
> +   (cnd (cmp @0 zerop) @1 integer_zerop)
> +(if (!HONOR_SIGNED_ZEROS (type)
> +&& bitwise_equal_p (@0, @1))
> + @1))
>   )
>   /* A >=/> 0 ? A : -Asame as abs (A) */
>   (for cmp (ge gt)
>(simplify
> -   (cnd (cmp @0 zerop) @0 (negate @0))
> -(if (!HONOR_SIGNED_ZEROS (type)
> -&& !TYPE_UNSIGNED (type))
> - (abs @0
> +   (cnd (cmp @0 zerop) @1 (negate @1))
> +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> +&& !TYPE_UNSIGNED (TREE_TYPE(@0))
> +&& bitwise_equal_p (@0, @1))
> + (if (TYPE_UNSIGNED (type))
> +  (absu:type @0)
> +  (abs @0)
>   /* A <=/< 0 ? A : -Asame as -abs (A) */
>   (for cmp (le lt)
>(simplify
> -   (cnd (cmp @0 zerop) @0 (negate @0))
> -(if (!HONOR_SIGNED_ZEROS (type)
> -&& !TYPE_UNSIGNED (type))
> - (if (ANY_INTEGRAL_TYPE_P (type)
> - && !TYPE_OVERFLOW_WRAPS (type))
> +   (cnd (cmp @0 zerop) @1 (negate @1))
> +(if (!HONOR_SIGNED_ZEROS (TREE_TYPE(@0))
> +&& !TYPE_UNSIGNED (TREE_TYPE(@0))
> +&& bitwise_equal_p (@0, @1))
> + (if ((ANY_INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +  && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (@0)))
> + || TYPE_UNSIGNED (type))
>(with {
> -   tree utype = unsigned_type_for (type);
> +   tree utype = unsigned_type_for (TREE_TYPE(@0));
> }
> (convert (negate (absu:utype @0
> (negate (abs @0)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-36.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-36.c
> new file mode 100644
> index 000..4baf9f82a22
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/phi-opt-36.c
> @@ -0,0 +1,51 @@
> +/* { dg-options "-O2 -fdump-tree-phiopt" } */
> +
> +unsigned f0(int A)
> +{
> +  unsigned t = A;
> +// A == 0? A : -Asame as -A
> +  if (A == 0)  return t;
> +  return -t;
> +}
> +
> +unsigned f1(int A)
> +{
> +  unsigned t = A;
> +// A != 0? A : -Asame as A
> +  if (A != 0)  return t;
> +  return -t;
> +}
> +unsigned f2(int A)
> +{
> +  unsigned t = A;
> +// A >= 0? A : -Asame as abs (A)
> +  if (A >= 0)  return t;
> +  return -t;
> +}
> +unsigned f3(int A)
> +{
> +  unsigned t = A;
> +// A > 0?  A : -Asame as abs (A)
> +  if (A > 0)  return t;
> +  return -t;
> +}
> +unsigned f4(int A)
> +{
> +  unsigned t = A;
> +// A <= 0? A : -Asame as -abs (A)
> +  if (A <= 0)  return t;
> +  return -t;
> +}
> +unsigned f5(int A)
> +{
> +  unsigned t = A;
> +// A < 0?  A : -Asame as -abs (A)
> +  if (A < 0)  return t;
> +  return -t;
> +}
> +
> +/* f4 and f5 are not allowed to be optimized in early phi-opt. */
> +/* { dg-final { scan-tree-dump-times "if " 2 "phiopt1" } } */
> +/* { dg-final { scan-tree-dump-not "if " "phiopt2" } } */
> +
> 

  1   2   >