Re: [Patch] gcc.c-torture/compile/103818.c: enable for llp64 too

2023-02-27 Thread Jonathan Yong via Gcc-patches

On 2/28/23 02:10, Hans-Peter Nilsson wrote:

On Sun, 26 Feb 2023, Jonathan Yong via Gcc-patches wrote:


Patch OK for master branch? I did not see any obvious issues to exclude LLP64
specifically.


I see "lp64 || lp64" in that patch (which should preferably have
been sent inline, as it's harder to quote an attached patch,
QED).  Sending the wrong version?  Don't forget to test it.

brgds, H-P


Corrected, previous patch was manually applied from a corrupted patch file.
From 52b1209193260a624f90c3ca759a83b975c2e8e0 Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Sun, 26 Feb 2023 06:34:04 +
Subject: [PATCH 4/7] gcc.c-torture/compile/103818.c: enable for llp64 too

gcc/testsuite/ChangeLog:

	* gcc.c-torture/compile/103818.c: enable test for llp64.

Signed-off-by: Jonathan Yong <10wa...@gmail.com>
---
 gcc/testsuite/gcc.c-torture/compile/103818.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.c-torture/compile/103818.c b/gcc/testsuite/gcc.c-torture/compile/103818.c
index e6cbe7860cf..57f56b6c09d 100644
--- a/gcc/testsuite/gcc.c-torture/compile/103818.c
+++ b/gcc/testsuite/gcc.c-torture/compile/103818.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target lp64 } } */
+/* { dg-do compile { target { lp64 || llp64 } } } */
 struct A { int b[1]; };
 
 void
-- 
2.39.2



Ping: [PATCH v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]

2023-02-27 Thread Xionghu Luo via Gcc-patches

Hi Segher, Ping this for stage 4...


On 2023/2/10 10:59, Xionghu Luo via Gcc-patches wrote:

Resend this patch...

v4: Update per comments.
v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.

The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent.  So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.

(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
   (subreg:V4SI (reg:V16QI 139) 0)
   (subreg:V4SI (reg:V16QI 140) 0))
   [const_int 0 4 1 5]))

Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}

=>

21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}

The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.

Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux.

gcc/ChangeLog:

PR target/106069
* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
(altivec_vmrghb_direct_be): New pattern for BE.
(altivec_vmrghb_direct_le): New pattern for LE.
(altivec_vmrghh_direct): Remove.
(altivec_vmrghh_direct_be): New pattern for BE.
(altivec_vmrghh_direct_le): New pattern for LE.
(altivec_vmrghw_direct_): Remove.
(altivec_vmrghw_direct__be): New pattern for BE.
(altivec_vmrghw_direct__le): New pattern for LE.
(altivec_vmrglb_direct): Remove.
(altivec_vmrglb_direct_be): New pattern for BE.
(altivec_vmrglb_direct_le): New pattern for LE.
(altivec_vmrglh_direct): Remove.
(altivec_vmrglh_direct_be): New pattern for BE.
(altivec_vmrglh_direct_le): New pattern for LE.
(altivec_vmrglw_direct_): Remove.
(altivec_vmrglw_direct__be): New pattern for BE.
(altivec_vmrglw_direct__le): New pattern for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
Adjust.
* config/rs6000/vsx.md: Likewise.

gcc/testsuite/ChangeLog:

PR target/106069
* g++.target/powerpc/pr106069.C: New test.

Signed-off-by: Xionghu Luo 
---
  gcc/config/rs6000/altivec.md| 222 ++--
  gcc/config/rs6000/rs6000.cc |  24 +--
  gcc/config/rs6000/vsx.md|  28 +--
  gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++
  4 files changed, 307 insertions(+), 85 deletions(-)
  create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 30606b8ab21..4bfeecec224 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
 (use (match_operand:V16QI 2 "register_operand"))]
"TARGET_ALTIVEC"
  {
-  rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
-   : gen_altivec_vmrglb_direct;
-  if (!BYTES_BIG_ENDIAN)
-std::swap (operands[1], operands[2]);
-  emit_insn (fun (operands[0], operands[1], operands[2]));
+  if (BYTES_BIG_ENDIAN)
+emit_insn (
+  gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+  else
+emit_insn (
+  gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
DONE;
  })
  
-(define_insn "altivec_vmrghb_direct"

+(define_insn "altivec_vmrghb_direct_be"
[(set (match_operand:V16QI 0 "register_operand" "=v")
(vec_select:V16QI
  (vec_concat:V32QI
@@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
 (const_int 5) (const_int 21)
 (const_int 6) (const_int 22)
 (const_int 7) (const_int 23)])))]
-  "TARGET_ALTIVEC"
+  "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+  "vmrghb %0,%1,%2"
+  [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+  [(set (match_operand:V16QI 0 "register_operand" "=v")
+   (vec_select:V16QI
+ (vec_concat:V32QI
+   (match_operand:V16QI 2 "register_operand" "v")
+   (match_operand:V16QI 1 "register_operand" "v"))
+ (parallel [(const_int  8) (const_int 24)
+(const_int  9) (const_int 25)
+(const_int 10) (const_int 26)
+(const_int 11) (const_int 27)
+(const_int 12) (const_int 28)
+(const_int 13) 

Re: [PATCH] RISC-V: Fix wrong partial subreg check for bsetidisi

2023-02-27 Thread Sinan Lin via Gcc-patches
I encountered a miscompilation case with zbs, where bseti without sign
extension emitted from bsetidisi pattern leads to wrong output.

Take pr68648.c as an example, -march=rv64gc_zba_zbb_zbs -O3 did not generate
sext.w in int bar (void) and led to a wrong value in a0. It seems that the
partial subreg
check is wrongly set to the immediate operand.

int foo (void):
li  a0,123
ret
int bar (void):
addisp,sp,-16
sd  ra,8(sp)
callfoo # a0 123
li  a5,248639488# a0 123, a5 0xed1f000
addia5,a5,11# a0 123, a5 0xed1f00b
sllia5,a5,14# a0 123, a5 0x3b47c02c000
addia5,a5,-8# a0 123, a5 0x3b47c02bff8
ld  ra,8(sp)
or  a0,a0,a5# a0 0x3b47c02bffb, a5 0x3b47c02bff8
bseti   a5,zero,32  # a0 0x3b47c02bffb, a5 0x1
addia5,a5,-1# a0 0x3b47c02bffb, a5 0x0
xor a0,a0,a5# a0 0x3b483fd4004, a5 0x0
bseti   a0,a0,0 # a0 0x3b483fd4005, a5 0x0
addisp,sp,16# sext.w a0,a0 is missing
jr  ra
main:
addisp,sp,-16
sd  ra,8(sp)
callbar
li  a5,-2080555008
addia5,a5,5
bne a0,a5,.L8   # a0 0x3b483fd4005, a5 0x83fd4005
ld  ra,8(sp)
li  a0,0
addisp,sp,16
jr  ra
.L8:
callabort

Lin Sinan  于2023年2月28日周二 13:00写道:

> From: Lin Sinan 
>
> The partial subreg check should be for subreg operand(operand 1) instead of
> the immediate operand(operand 2). This change also fix pr68648.c in zbs.
>
> gcc/ChangeLog:
>
> * config/riscv/bitmanip.md: Fix wrong index in the check.
>
> ---
>  gcc/config/riscv/bitmanip.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index 14d18edbe62..58a86bd929f 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -442,7 +442,7 @@
> (ior:DI (sign_extend:DI (match_operand:SI 1 "register_operand"
> "r"))
> (match_operand 2 "single_bit_mask_operand" "i")))]
>"TARGET_ZBS && TARGET_64BIT
> -   && !partial_subreg_p (operands[2])"
> +   && !partial_subreg_p (operands[1])"
>"bseti\t%0,%1,%S2"
>[(set_attr "type" "bitmanip")])
>
> --
> 2.34.1
>
>


[PATCH] RISC-V: Fix wrong partial subreg check for bsetidisi

2023-02-27 Thread Lin Sinan via Gcc-patches
From: Lin Sinan 

The partial subreg check should be for subreg operand(operand 1) instead of
the immediate operand(operand 2). This change also fix pr68648.c in zbs.

gcc/ChangeLog:

* config/riscv/bitmanip.md: Fix wrong index in the check.

---
 gcc/config/riscv/bitmanip.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 14d18edbe62..58a86bd929f 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -442,7 +442,7 @@
(ior:DI (sign_extend:DI (match_operand:SI 1 "register_operand" "r"))
(match_operand 2 "single_bit_mask_operand" "i")))]
   "TARGET_ZBS && TARGET_64BIT
-   && !partial_subreg_p (operands[2])"
+   && !partial_subreg_p (operands[1])"
   "bseti\t%0,%1,%S2"
   [(set_attr "type" "bitmanip")])
 
-- 
2.34.1



[PATCH] RISC-V: Allow const0_rtx operand in max/min

2023-02-27 Thread Sinan via Gcc-patches
From 73e743348a49a7fffcf2e328b8179e8dbbc3b2b4 Mon Sep 17 00:00:00 2001
From: Lin Sinan 
Date: Tue, 28 Feb 2023 00:44:55 +0800
Subject: [PATCH] RISC-V: Allow const0_rtx operand in max/min
Optimize cases that use max[u]/min[u] against a zero constant.
E.g., the case int f(int x) { return x >= 0 ? x : 0; }
the current asm output in rv64gc_zba_zbb
 li rtmp,0
 max a0,a0,rtmp
could be optimized into
 max a0,a0,zero
gcc/ChangeLog:
 * config/riscv/bitmanip.md: allow 0 constant in max/min
 pattern. 
gcc/testsuite/ChangeLog:
 * gcc.target/riscv/zbb-min-max-03.c: New test.
---
 gcc/config/riscv/bitmanip.md | 4 ++--
 gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c | 10 ++
 2 files changed, 12 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c
diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 58a86bd929f..f771835369c 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -363,9 +363,9 @@
 (define_insn "3"
 [(set (match_operand:X 0 "register_operand" "=r")
 (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
- (match_operand:X 2 "register_operand" "r")))]
+ (match_operand:X 2 "reg_or_0_operand" "rJ")))]
 "TARGET_ZBB"
- "\t%0,%1,%2"
+ "\t%0,%1,%z2"
 [(set_attr "type" "bitmanip")])
 ;; Optimize the common case of a SImode min/max against a constant
diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c 
b/gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c
new file mode 100644
index 000..947300d599d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-03.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64d" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+int f(int x) {
+ return x >= 0 ? x : 0;
+}
+
+/* { dg-final { scan-assembler-times "max\t" 1 } } */
+/* { dg-final { scan-assembler-not "li\t" } } */
-- 
2.34.1


[PATCH] MIPS: Add buildtime option to set msa default

2023-02-27 Thread Junxian Zhu
From: Junxian Zhu 

Add buildtime option to decide whether will compiler build with `-mmsa` option 
default.

gcc/ChangeLog:
* config.gcc: add -with-{no-}msa build option.
* config/mips/mips.h: Likewise.
* doc/install.texi: Likewise.

Signed-off-by: Junxian Zhu 
---
 gcc/config.gcc | 19 +--
 gcc/config/mips/mips.h |  3 ++-
 gcc/doc/install.texi   |  8 
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index c070e6ecd2e..da3a6d3ba1f 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4709,7 +4709,7 @@ case "${target}" in
;;
 
mips*-*-*)
-   supported_defaults="abi arch arch_32 arch_64 float fpu nan 
fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 
madd4 compact-branches"
+   supported_defaults="abi arch arch_32 arch_64 float fpu nan 
fp_32 odd_spreg_32 tune tune_32 tune_64 divide llsc mips-plt synci lxc1-sxc1 
madd4 compact-branches msa"
 
case ${with_float} in
"" | soft | hard)
@@ -4871,6 +4871,21 @@ case "${target}" in
exit 1
;;
esac
+
+   case ${with_msa} in
+   yes)
+   with_msa=msa
+   ;;
+   no)
+   with_msa=no-msa
+   ;;
+   "")
+   ;;
+   *)
+   echo "Unknown msa type used in --with-msa" 1>&2
+   exit 1
+   ;;
+   esac
;;
 
loongarch*-*-*)
@@ -5815,7 +5830,7 @@ case ${target} in
 esac
 
 t=
-all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 tune_64 
schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt synci tls 
lxc1-sxc1 madd4 isa_spec compact-branches"
+all_defaults="abi cpu cpu_32 cpu_64 arch arch_32 arch_64 tune tune_32 tune_64 
schedule float mode fpu nan fp_32 odd_spreg_32 divide llsc mips-plt synci tls 
lxc1-sxc1 madd4 isa_spec compact-branches msa"
 for option in $all_defaults
 do
eval "val=\$with_"`echo $option | sed s/-/_/g`
diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h
index fbb4372864f..13bc193b752 100644
--- a/gcc/config/mips/mips.h
+++ b/gcc/config/mips/mips.h
@@ -916,7 +916,8 @@ struct mips_cpu_info {
   {"synci", "%{!msynci:%{!mno-synci:-m%(VALUE)}}" },   \
   {"lxc1-sxc1", "%{!mlxc1-sxc1:%{!mno-lxc1-sxc1:-m%(VALUE)}}" }, \
   {"madd4", "%{!mmadd4:%{!mno-madd4:-m%(VALUE)}}" }, \
-  {"compact-branches", "%{!mcompact-branches=*:-mcompact-branches=%(VALUE)}" } 
\
+  {"compact-branches", "%{!mcompact-branches=*:-mcompact-branches=%(VALUE)}" 
}, \
+  {"msa", "%{!mmsa:%{!mno-msa:-m%(VALUE)}}" } \
 
 /* A spec that infers the:
-mnan=2008 setting from a -mips argument,
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 8ef5c1414da..718f48fbaeb 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1653,6 +1653,14 @@ unfused is normally expected).  Disabling these 
instructions is the
 only way to ensure compatible code is generated; this will incur
 a performance penalty.
 
+@item --with-msa
+On MIPS targets, make @option{-mmsa} the default when no
+@option{-mno-msa} option is passed.
+
+@item --without-msa
+On MIPS targets, make @option{-mno-msa} the default when no
+@option{-mmsa} option is passed. This is the default.
+
 @item --with-mips-plt
 On MIPS targets, make use of copy relocations and PLTs.
 These features are extensions to the traditional
-- 
2.39.2


Re: [Patch] gcc.dg/overflow-warn-9.c: exclude from LLP64

2023-02-27 Thread Hans-Peter Nilsson


On Mon, 27 Feb 2023, Jonathan Yong via Gcc-patches wrote:

> This test is for LP64 only, exclude LLP64 too.
> Patch OK?

I may be confused, but you're not making use of the "llp64" 
effective target, there instead excluding/including lp64 / 
ilp32 in sets that not obviously mean "exclude LLP64".

To wit, how is "! ilp32" -> "lp64" and "ilp32" -> "! lp64" 
expressing "! llp64"?

brgds, H-P


[PATCHv2, rs6000] Merge two vector shift when their sources are the same

2023-02-27 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch merges two "vsldoi" insns when their sources are the
same. Particularly, it is simplified to be one move if the total
shift is multiples of 16 bytes.

  Bootstrapped and tested on powerpc64-linux BE and LE with no
regressions.

Thanks
Gui Haochen


ChangeLog
2023-02-28  Haochen Gui 

gcc/
* config/rs6000/altivec.md (*altivec_vsldoi_dup_): New
insn_and_split to merge two vsldoi when the sources are the same.

gcc/testsuite/
* gcc.target/powerpc/vsldoi_merge.c: New.



patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 84660073f32..fae8ec2b2e8 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2529,6 +2529,35 @@ (define_insn "altivec_vsldoi_"
   "vsldoi %0,%1,%2,%3"
   [(set_attr "type" "vecperm")])

+(define_insn_and_split "*altivec_vsldoi_dup_"
+  [(set (match_operand:VM 0 "register_operand" "=v")
+   (unspec:VM [(unspec:VM [(match_operand:VM 1 "register_operand" "v")
+   (match_dup 1)
+   (match_operand:QI 2 "immediate_operand" "i")]
+  UNSPEC_VSLDOI)
+   (unspec:VM [(match_dup 1)
+   (match_dup 1)
+   (match_dup 2)]
+  UNSPEC_VSLDOI)
+   (match_operand:QI 3 "immediate_operand" "i")]
+  UNSPEC_VSLDOI))]
+  "TARGET_ALTIVEC"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  unsigned int shift1 = UINTVAL (operands[2]);
+  unsigned int shift2 = UINTVAL (operands[3]);
+
+  unsigned int shift = (shift1 + shift2) % 16;
+  if (shift)
+emit_insn (gen_altivec_vsldoi_ (operands[0], operands[1],
+ operands[1], GEN_INT (shift)));
+  else
+emit_move_insn (operands[0], operands[1]);
+  DONE;
+})
+
 (define_insn "altivec_vupkhs"
   [(set (match_operand:VP 0 "register_operand" "=v")
(unspec:VP [(match_operand: 1 "register_operand" "v")]
diff --git a/gcc/testsuite/gcc.target/powerpc/vsldoi_merge.c 
b/gcc/testsuite/gcc.target/powerpc/vsldoi_merge.c
new file mode 100644
index 000..eebd7b4d382
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vsldoi_merge.c
@@ -0,0 +1,59 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-options "-O2 -mvsx -save-temps" } */
+
+#include "altivec.h"
+
+#ifdef DEBUG
+#include 
+#endif
+
+void abort (void);
+
+__attribute__ ((noipa)) vector signed int
+test1 (vector signed int a)
+{
+  a = vec_sld (a, a, 2);
+  a = vec_sld (a, a, 6);
+  return a;
+}
+
+__attribute__ ((noipa)) vector signed int
+test2 (vector signed int a)
+{
+  a = vec_sld (a, a, 14);
+  a = vec_sld (a, a, 2);
+  return a;
+}
+
+int main (void)
+{
+  vector signed int a = {1,2,3,4};
+  vector signed int result_a;
+  int i;
+
+  result_a = test1 (a);
+  vector signed int expect_a = {3,4,1,2};
+
+  for (i = 0; i< 4; i++)
+if (result_a[i] != expect_a[i])
+#ifdef DEBUG
+  printf("ERROR: test1 result[%d] = %d, not expected[%d] = %d\n",
+  i, result_a[i], i, expect_a[i]);
+#else
+  abort ();
+#endif
+
+  result_a = test2 (a);
+
+  for (i = 0; i< 4; i++)
+if (result_a[i] != a[i])
+#ifdef DEBUG
+  printf("ERROR: test2 result[%d] = %d, not expected[%d] = %d\n",
+  i, result_a[i], i, a[i]);
+#else
+  abort ();
+#endif
+}
+
+/* { dg-final { scan-assembler-times {\mvsldoi\M} 1 } } */


RE: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-02-27 Thread Li, Pan2 via Gcc-patches
Hi Richard Sandiford,

After some investigation, I am not sure if it is possible to make it general 
without any changes to exact_div. We can add one method like below to get the 
unit poly for all possible N.

template
inline POLY_CONST_RESULT (N, Ca, Ca)
normalize_to_unit (const poly_int_pod )
{
  typedef POLY_CONST_COEFF (Ca, Ca) C;

  poly_int normalized = a;

  if (normalized.is_constant())
normalized.coeffs[0] = 1;
  else
for (unsigned int i = 0; i < N; i++)
  POLY_SET_COEFF (C, normalized, i, 1);

  return normalized;
}

And then adjust the genmodes like below to consume the unit poly.

  printf ("poly_uint16 unit_poly = "
 "normalize_to_unit (mode_precision[E_%smode]);\n", m->name);
  printf ("if (known_lt (mode_precision[E_%smode], "
 "unit_poly * BITS_PER_UNIT))\n", m->name);
  printf ("  mode_size[E_%smode] = unit_poly;\n", m->name);

I am not sure if it is a good idea to introduce above normalize code into 
exact_div. Given the comment of the exact_div indicates that “/* Return A / B, 
given that A is known to be a multiple of B. */”.

Could you please help to share your opinion about this from the expert’s 
perspective ? Thank you!

Pan

From: 盼 李 
Sent: Monday, February 27, 2023 11:13 PM
To: Richard Sandiford ; incarnation.p.lee--- via 
Gcc-patches 
Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; rguent...@suse.de; Li, Pan2 

Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment

Never mind, wish you have a good holiday.

Thanks for pointing this out, the if part cannot take care of poly_int with N > 
2. As I understand, we need to make it general for all the N of poly_int.

Thus I would like to double confirm with you about how to make it general. I 
suppose there will be a new function can_div_away_from_zero_p to replace the if 
(known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the 
word exact, I suppose we should not touch here), right? Then we still need one 
poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.

Thanks again for your professional suggestion, have a nice day, !

Pan

From: Richard Sandiford 
mailto:richard.sandif...@arm.com>>
Sent: Monday, February 27, 2023 22:24
To: incarnation.p.lee--- via Gcc-patches 
mailto:gcc-patches@gcc.gnu.org>>
Cc: incarnation.p@outlook.com 
mailto:incarnation.p@outlook.com>>; 
juzhe.zh...@rivai.ai 
mailto:juzhe.zh...@rivai.ai>>; 
kito.ch...@sifive.com 
mailto:kito.ch...@sifive.com>>; 
rguent...@suse.de 
mailto:rguent...@suse.de>>; 
pan2...@intel.com 
mailto:pan2...@intel.com>>
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment

Sorry for the slow reply, been away for a couple of weeks.

"incarnation.p.lee--- via Gcc-patches" 
mailto:gcc-patches@gcc.gnu.org>> writes:
> From: Pan Li mailto:pan2...@intel.com>>
>
>Fix the bug of the rvv bool mode precision with the adjustment.
>The bits size of vbool*_t will be adjusted to
>[1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>adjusted mode precison of vbool*_t will help underlying pass to
>make the right decision for both the correctness and optimization.
>
>Given below sample code:
>void test_1(int8_t * restrict in, int8_t * restrict out)
>{
>  vbool8_t v2 = *(vbool8_t*)in;
>  vbool16_t v5 = *(vbool16_t*)in;
>  *(vbool16_t*)(out + 200) = v5;
>  *(vbool8_t*)(out + 100) = v2;
>}
>
>Before the precision adjustment:
>addia4,a1,100
>vsetvli a5,zero,e8,m1,ta,ma
>addia1,a1,200
>vlm.v   v24,0(a0)
>vsm.v   v24,0(a4)
>// Need one vsetvli and vlm.v for correctness here.
>vsm.v   v24,0(a1)
>
>After the precision adjustment:
>csrrt0,vlenb
>sllit1,t0,1
>csrra3,vlenb
>sub sp,sp,t1
>sllia4,a3,1
>add a4,a4,sp
>sub a3,a4,a3
>vsetvli a5,zero,e8,m1,ta,ma
>addia2,a1,200
>vlm.v   v24,0(a0)
>vsm.v   v24,0(a3)
>addia1,a1,100
>vsetvli a4,zero,e8,mf2,ta,ma
>csrrt0,vlenb
>vlm.v   v25,0(a3)
>vsm.v   v25,0(a2)
>sllit1,t0,1
>vsetvli a5,zero,e8,m1,ta,ma
>vsm.v   v24,0(a1)
>add sp,sp,t1
>jr  ra
>
>However, there may be some optimization opportunates after
>the mode precision adjustment. It can be token care of in
>the RISC-V backend in the underlying separted PR(s).
>
>PR 108185
>PR 108654
>
> gcc/ChangeLog:
>
>* config/riscv/riscv-modes.def (ADJUST_PRECISION):
>* config/riscv/riscv.cc (riscv_v_adjust_precision):
>* 

Re: [patch, libgfortran] Initailize some variable to get rid of nuisance warnings.

2023-02-27 Thread Jerry D via Gcc-patches

Pushed, thanks for feedback

On 2/26/23 11:54 PM, Tobias Burnus wrote:

Just side remarks, the 0 init in the patch is fine.

On 27.02.23 03:53, Jerry D via Gcc-patches wrote:


regarding PACK: since this is a bogus warning as the compiler does
not realize that dim >= 1, wouldn't a

gcc_assert (dim >= 1);


Note: gcc_assert only exists in the compiler itself; in libgfortran, we
use GFC_ASSERT or directly 'assert'.

You could also use 'if (dim < 1) __builtin_unreachable();' – or since
GCC 13:

__attribute__((assume (dim >= 1)));

Tobias

PS: In Fortran, '-fopenmp-simd' plus '!$omp assume holds(dim>=0) ...
!$omp end assume' (or !$omp ... + block/end block) can be used to denote
such assumptions. '-fopenmp-simd' enables only those bits of OpenMP that
do not require any library support (no libgomp, no pthreads), contrary
to '-fopenmp'.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 
80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: 
Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; 
Registergericht München, HRB 106955




Re: [Patch] gcc.c-torture/compile/103818.c: enable for llp64 too

2023-02-27 Thread Hans-Peter Nilsson
On Sun, 26 Feb 2023, Jonathan Yong via Gcc-patches wrote:

> Patch OK for master branch? I did not see any obvious issues to exclude LLP64
> specifically.

I see "lp64 || lp64" in that patch (which should preferably have 
been sent inline, as it's harder to quote an attached patch, 
QED).  Sending the wrong version?  Don't forget to test it.

brgds, H-P


Ping: [PATCH] testsuite: Tweak gcc.dg/attr-aligned.c for CRIS

2023-02-27 Thread Hans-Peter Nilsson via Gcc-patches
Ping...

> From: Hans-Peter Nilsson 
> Date: Thu, 16 Feb 2023 21:05:29 +0100

> Asking for the lines outside the "#if __CRIS__" part.
> Ok to commit?
> 
> -- >8 --
> tm.texi says for BIGGEST_ALIGNMENT (from which
> __BIGGEST_ALIGNMENT__ is derived): "Biggest alignment that
> any data type can require on this machine, in bits."
> 
> That is, using that value might be too strict for alignment
> of *functions* and CRIS requires at least 16-bit alignment
> for functions.  But, one purpose of the test is to test that
> alignment can be set to a large but valid value, so pick
> 512, which has some use as a historically required alignment
> for certain I/O descriptors.
> 
>   * gcc.dg/attr-aligned.c: Adjust comment for ALIGN_MAX_STATIC.
>   (ALIGN_MAX_STATIC): Set to 512 for CRIS.
> ---
>  gcc/testsuite/gcc.dg/attr-aligned.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/attr-aligned.c 
> b/gcc/testsuite/gcc.dg/attr-aligned.c
> index 887bdd0f3799..4f0c885dc812 100644
> --- a/gcc/testsuite/gcc.dg/attr-aligned.c
> +++ b/gcc/testsuite/gcc.dg/attr-aligned.c
> @@ -18,6 +18,10 @@
>  # else
>  #   define ALIGN_MAX_STATIC  ALIGN_MAX_HARD
>  # endif
> +#elif __CRIS__
> +/* __BIGGEST_ALIGNMENT__ doesn't cover functions (16 bits for CRIS). */
> +#  define ALIGN_MAX_STATIC  512
> +#  define ALIGN_TOO_BIG_OFILE   (ALIGN_MAX_HARD << 1)
>  #elif pdp11
>  #  define ALIGN_MAX_STATIC  2
>  /* Work around a pdp11 ICE (see PR target/87821).  */
> @@ -29,7 +33,9 @@
>  /* Is this processor- or operating-system specific?  */
>  #  define ALIGN_MAX_STATIC  ALIGN_MAX_HARD
>  #else
> -   /* Guaranteed to be accepted regardless of the target.  */
> +   /* Guaranteed to be accepted regardless of the target for objects.
> +  This might not be true for alignment of functions though, so
> +  may need to be set to a target-specific value above.  */
>  #  define ALIGN_MAX_STATIC  __BIGGEST_ALIGNMENT__
> /* Guaranteed to be rejected regardless of the target.  */
>  #  define ALIGN_TOO_BIG_OFILE   (ALIGN_MAX_HARD << 1)
> -- 
> 2.30.2
> 


[COMMITTED] testsuite: No xfail infoleak-vfio_iommu_type1.c bogus for default_packed

2023-02-27 Thread Hans-Peter Nilsson via Gcc-patches
Committed as obvious after sanity-checking cris-elf and
native x86_64-linux.
-- >8 --
There are no messages about padding for targets that don't
pad, i.e. default_packed.  Noticed for cris-elf, verified
for pru-elf at gcc-testresults@.

testsuite:
* gcc.dg/plugin/infoleak-vfio_iommu_type1.c: Don't xfail bogus
message for "default_packed" targets.
---
 gcc/testsuite/gcc.dg/plugin/infoleak-vfio_iommu_type1.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/plugin/infoleak-vfio_iommu_type1.c 
b/gcc/testsuite/gcc.dg/plugin/infoleak-vfio_iommu_type1.c
index 51ad5db2bab2..af320b6b17ee 100644
--- a/gcc/testsuite/gcc.dg/plugin/infoleak-vfio_iommu_type1.c
+++ b/gcc/testsuite/gcc.dg/plugin/infoleak-vfio_iommu_type1.c
@@ -37,8 +37,8 @@ int vfio_iommu_type1_get_info(unsigned long arg)
 info.cap_offset = 0;
   }
 
-  /* The padding bytes (20-23) are uninitialized, but can't be written
- back, since minsz is either 16 or 20.  */
-  return copy_to_user((void *)arg, , minsz) ? -14 : 0; /* { dg-bogus 
"exposure" "" { xfail *-*-* } } */
+  /* The padding bytes (20-23, but applicable just for targets with padding) 
are
+ uninitialized, but can't be written back, since minsz is either 16 or 20. 
 */
+  return copy_to_user((void *)arg, , minsz) ? -14 : 0; /* { dg-bogus 
"exposure" "" { xfail { ! default_packed } } } */
   // TODO: false +ve due to not handling minsz being either 16 or 20
 }
-- 
2.30.2



[COMMITTED] testsuite: Shorten multiline pattern message to the same for fail and pass

2023-02-27 Thread Hans-Peter Nilsson via Gcc-patches
As recommended by testsuite maintainer: Regression analysis
works only if the string is the same.

testsuite:
* lib/multiline.exp (handle-multiline-outputs): Shorten
message to the same for fail and pass.
---
 gcc/testsuite/lib/multiline.exp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/lib/multiline.exp b/gcc/testsuite/lib/multiline.exp
index 5eccf2bbebc1..cfd928f6e28a 100644
--- a/gcc/testsuite/lib/multiline.exp
+++ b/gcc/testsuite/lib/multiline.exp
@@ -169,9 +169,9 @@ proc handle-multiline-outputs { text } {
# Use "regsub" to attempt to prune the pattern from $text
if {[regsub -line $rexp $text "" text]} {
# The multiline pattern was pruned.
-   ${maybe_x}pass "$title was found"
+   ${maybe_x}pass "$title"
} else {
-   ${maybe_x}fail "$title not found"
+   ${maybe_x}fail "$title"
}
 
set index [expr $index + 1]
-- 
2.30.2



[COMMITTED] testsuite: Remove xfail gcc.dg/tree-ssa/pr91091-2.c RHS ! natural_alignment_32

2023-02-27 Thread Hans-Peter Nilsson via Gcc-patches
Committed as obvious.
-- >8 --
Reacting to a long-standing XPASS for CRIS.  This one is
slightly brown paper-bag level; it was never the here-removed
xfailed scan that failed and I didn't notice that XPASS when
reporting success on the commit as a whole.  It's not logical to
re-read what was just-written even with overlap issues, and I'm
sure that edit was originally a copy-pasto.  I checked
historical m68k-linux and pru-elf test-results too, to verify
that I got that part right.

PR testsuite/91419
* gcc.dg/tree-ssa/pr91091-2.c:15 Remove xfail for RHS.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr91091-2.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr91091-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr91091-2.c
index ecc50d355a7c..792541504903 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr91091-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr91091-2.c
@@ -12,4 +12,4 @@ void swap(struct s* p, struct t* q)
 
 /* The second statement is redundant.  */
 /* { dg-final { scan-tree-dump-times "x = " 1 "fre1" { xfail { ! 
natural_alignment_32 } } } } */
-/* { dg-final { scan-tree-dump-times " = \[^;\]*x;" 1 "fre1" { xfail { ! 
natural_alignment_32 } } } } */
+/* { dg-final { scan-tree-dump-times " = \[^;\]*x;" 1 "fre1" } } */
-- 
2.30.2



[COMMITTED] testsuite: Add CRIS to targets not xfailing gcc.dg/attr-alloc_size-11.c:50, 51

2023-02-27 Thread Hans-Peter Nilsson via Gcc-patches
Reacting to a long-standing XPASS for CRIS.  Maybe better do
as https://gcc.gnu.org/PR79356#c11 suggests: xfail it for
x86 only ...except I see m68k also does not xpass.

testsuite:
PR testsuite/79356
* gcc.dg/attr-alloc_size-11.c: Add CRIS to the list
of targets excluding xfail on lines 50 and 51.
---
 gcc/testsuite/gcc.dg/attr-alloc_size-11.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c 
b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
index 8332b39930c3..a2efe1289151 100644
--- a/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
+++ b/gcc/testsuite/gcc.dg/attr-alloc_size-11.c
@@ -47,8 +47,8 @@ typedef __SIZE_TYPE__size_t;
 
 /* The following tests fail because of missing range information.  The xfail
exclusions are PR79356.  */
-TEST (signed char, SCHAR_MIN + 2, ALLOC_MAX);   /* { dg-warning "argument 1 
range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info 
for signed char" { xfail { ! { aarch64*-*-* arm*-*-* avr-*-* alpha*-*-* 
ia64-*-* mips*-*-* or1k*-*-* pdp11*-*-* powerpc*-*-* sparc*-*-* s390*-*-* 
visium-*-* msp430-*-* nvptx*-*-*} } } } */
-TEST (short, SHRT_MIN + 2, ALLOC_MAX); /* { dg-warning "argument 1 range 
\\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for 
short" { xfail { ! { aarch64*-*-* arm*-*-* alpha*-*-* avr-*-* ia64-*-* 
mips*-*-* or1k*-*-* pdp11*-*-* powerpc*-*-* sparc*-*-* s390x-*-* visium-*-* 
msp430-*-* nvptx*-*-* } } } } */
+TEST (signed char, SCHAR_MIN + 2, ALLOC_MAX);   /* { dg-warning "argument 1 
range \\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info 
for signed char" { xfail { ! { aarch64*-*-* arm*-*-* avr-*-* alpha*-*-* 
cris-*-* ia64-*-* mips*-*-* or1k*-*-* pdp11*-*-* powerpc*-*-* sparc*-*-* 
s390*-*-* visium-*-* msp430-*-* nvptx*-*-*} } } } */
+TEST (short, SHRT_MIN + 2, ALLOC_MAX); /* { dg-warning "argument 1 range 
\\\[13, \[0-9\]+\\\] exceeds maximum object size 12" "missing range info for 
short" { xfail { ! { aarch64*-*-* arm*-*-* alpha*-*-* avr-*-* cris-*-* ia64-*-* 
mips*-*-* or1k*-*-* pdp11*-*-* powerpc*-*-* sparc*-*-* s390x-*-* visium-*-* 
msp430-*-* nvptx*-*-* } } } } */
 TEST (int, INT_MIN + 2, ALLOC_MAX);/* { dg-warning "argument 1 range 
\\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */
 TEST (int, -3, ALLOC_MAX); /* { dg-warning "argument 1 range 
\\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */
 TEST (int, -2, ALLOC_MAX); /* { dg-warning "argument 1 range 
\\\[13, \[0-9\]+\\\] exceeds maximum object size 12" } */
-- 
2.30.2



Re: [PATCH] c++: unevaluated array new-expr size constantness [PR108219]

2023-02-27 Thread Jason Merrill via Gcc-patches

On 2/22/23 14:45, Patrick Palka wrote:

Here we're mishandling the unevaluated array new-expressions due to a
supposed non-constant array size ever since r12-5253-g4df7f8c79835d569,
made us no longer perform constant evaluation of non-manifestly-constant
expressions within unevaluated contexts.  This shouldn't make a
difference here since the array sizes are constant literals, except
they're actually NON_LVALUE_EXPR location wrappers wrapping INTEGER_CST,
wrappers which used to get stripped as part of constant evaluation and
now no longer do.  Moreover it means build_vec_init can't constant fold
the 'maxindex' passed from build_new_1 (since it uses maybe_constant_value
with mce_unknown).


Hmm, now that you mention it I think the

  if (manifestly_const_eval != mce_unknown)

change in maybe_constant_value isn't quite right, we don't want to force 
evaluation in unevaluated mce_false context either.



This patch fixes the first issue by making maybe_constant_value and
fold_non_dependent_expr_template shortcut handling location wrappers
around constant nodes, and the second issue by using fold_build2_loc
instead of cp_build_binary_op when computing the maxindex to pass to
build_vec_init.


Maybe in unevaluated mce_unknown/false context maybe_constant_value 
should call fold?



Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk/12?

PR c++/108219
PR c++/108218

gcc/cp/ChangeLog:

* constexpr.cc (maybe_constant_value): Extend the constant node
shortcut to look through location wrappers too.
(fold_non_dependent_expr_template): Mirror the constant node
shortcut from maybe_constant_value.
* init.cc (build_new_1): Use fold_build2_loc instead
of cp_build_binary_op to build a MINUS_EXPR representing the
maximum index.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/new6.C: New test.
* g++.dg/cpp2a/concepts-new1.C: New test.
---
  gcc/cp/constexpr.cc|  8 ++--
  gcc/cp/init.cc | 18 --
  gcc/testsuite/g++.dg/cpp0x/new6.C  |  9 +
  gcc/testsuite/g++.dg/cpp2a/concepts-new1.C | 13 +
  4 files changed, 36 insertions(+), 12 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/new6.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-new1.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index aa2c14355f8..d38c4c80415 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -8538,9 +8538,9 @@ maybe_constant_value (tree t, tree decl /* = NULL_TREE */,
t = mark_non_constant (t);
return t;
  }
-  else if (CONSTANT_CLASS_P (t))
+  else if (CONSTANT_CLASS_OR_WRAPPER_P (t))
  /* No caching or evaluation needed.  */
-return t;
+return tree_strip_any_location_wrapper (t);
  
if (manifestly_const_eval != mce_unknown)

  return cxx_eval_outermost_constant_expr (t, true, true,
@@ -8631,6 +8631,10 @@ fold_non_dependent_expr_template (tree t, tsubst_flags_t 
complain,
  return t;
}
  
+  if (CONSTANT_CLASS_OR_WRAPPER_P (t))

+   /* No evaluation needed.  */
+   return tree_strip_any_location_wrapper (t);
+
if (cp_unevaluated_operand && !manifestly_const_eval)
return t;
  
diff --git a/gcc/cp/init.cc b/gcc/cp/init.cc

index 705a5b3bdb6..574d2e2586c 100644
--- a/gcc/cp/init.cc
+++ b/gcc/cp/init.cc
@@ -3653,16 +3653,14 @@ build_new_1 (vec **placement, tree type, 
tree nelts,
  error ("parenthesized initializer in array new");
  return error_mark_node;
  }
- init_expr
-   = build_vec_init (data_addr,
- cp_build_binary_op (input_location,
- MINUS_EXPR, outer_nelts,
- integer_one_node,
- complain),
- vecinit,
- explicit_value_init_p,
- /*from_array=*/0,
-  complain);
+ tree maxindex = fold_build2_loc (input_location, MINUS_EXPR,
+  TREE_TYPE (outer_nelts),
+  outer_nelts,
+  build_one_cst (TREE_TYPE
+ (outer_nelts)));
+ init_expr = build_vec_init (data_addr, maxindex, vecinit,
+ explicit_value_init_p, /*from_array=*/0,
+ complain);
}
else
{
diff --git a/gcc/testsuite/g++.dg/cpp0x/new6.C 
b/gcc/testsuite/g++.dg/cpp0x/new6.C
new file mode 100644
index 000..17a669b42d0
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/new6.C
@@ -0,0 +1,9 @@
+// PR c++/108218
+// { dg-do compile { target c++11 } }
+

Re: [PATCH] c++: Add target hook for emit_support_tinfos [PR108883]

2023-02-27 Thread Jakub Jelinek via Gcc-patches
On Mon, Feb 27, 2023 at 06:26:04PM -0500, Jason Merrill wrote:
> > The following patch instead adds a target hook which allows the backend
> > to temporarily tweak registered types such that emit_support_tinfos
> > emits whatever is needed.
> 
> Why handle these types differently from the DFP handling at the end of
> emit_support_tinfos?

One thing is that the fallback_* nodes look like a waste to me,
the tinfo decls are mangled right away, and the fallback_* nodes need to be
walked by GC, so I think we could get away even for the decimal tinfos
to just do:
  dfloat32_type_node = make_node (REAL_TYPE);
  emit_support_tinfo_1 (dfloat32_type_node);
  dfloat32_type_node = NULL_TREE;
etc. and drop the fallback stuff.
If we wanted to do fallback_* even for the _Float*/decltype(0.0bf16)
nodes, which are at least sometimes mangled in target hooks it would
make stuff harder because fallback_* is C++ FE private.

And then there is a question whether we want to emit rtti for
_Float{16,32,64,128}, _Float{32,64,128}x and decltype(0.0bf16) regardless
of whether the target supports them at all or not.
Emitting them always would have an advantage, if say bfloat16_t support
isn't added for aarch64 for GCC 13 (it is still pending review), we wouldn't
need to deal with symbol versioning for it in GCC 14 or later.
On the other side, on some arches some types are very unlikely to be
supported.  And e.g. _Float128x isn't supported on any arch right now.

Though, if we can get rid of the fallback_* stuff and we wanted to emit
all _Float{16,32,64,128}, _Float{32,64,128}x and decltype(0.0bf16) tinfos
on all arches (or say for now all but _Float128x), we could do it simply
by splitting the fundamentals array in emit_support_tinfos into
one without fallback and one with fallback, put say
_type_node, _type_node, _type_node,
_type_node, _type_node, _type_node,
_type_node, _type_node, _type_node,
_type_node, _type_node, 0
into the latter and simply handle the NULL case with a temporary fallback,
like:
  tree fallback = NULL_TREE;
  for (ix = 0; fundamentals_with_fallback[ix]; ix++)
if (*fundamentals_with_fallback[ix])
  emit_support_tinfo_1 (*fundamentals_with_fallback[ix]);
else
  {
if (fallback == NULL_TREE)
  fallback = make_node (REAL_TYPE);
*fundamentals_with_fallback[ix] = fallback;
emit_support_tinfo_1 (fallback);
*fundamentals_with_fallback[ix] = NULL_TREE;
  }

Jakub



Re: [PATCH] c++: variable template and targ deduction [PR108550]

2023-02-27 Thread Marek Polacek via Gcc-patches
On Mon, Feb 27, 2023 at 06:21:13PM -0500, Jason Merrill wrote:
> On 2/23/23 10:54, Marek Polacek wrote:
> > On Thu, Feb 23, 2023 at 10:17:22AM -0500, Patrick Palka wrote:
> > > On Wed, 22 Feb 2023, Marek Polacek wrote:
> > > 
> > > > In this test, we get a bogus error because we failed to deduce the auto 
> > > > in
> > > > constexpr auto is_pointer_v = is_pointer::value;
> > > > to bool.  Then ensure_literal_type_for_constexpr_object thinks the 
> > > > object
> > > > isn't literal and an error is reported.
> > > > 
> > > > This is another case of the interaction between tf_partial and 'auto',
> > > > where the auto was not reduced so the deduction failed.  In more detail:
> > > > we have
> > > > 
> > > >Wrap1()
> > > > 
> > > > in the code and we need to perform OR -> fn_type_unification.  The targ
> > > > list is incomplete, so we do
> > > >tsubst_flags_t ecomplain = complain | tf_partial | 
> > > > tf_fndecl_type;
> > > >fntype = tsubst (TREE_TYPE (fn), explicit_targs, ecomplain, 
> > > > NULL_TREE);
> > > > where TREE_TYPE (fn) is struct integral_constant  (void).  Then
> > > > we substitute the return type, which results in tsubsting 
> > > > is_pointer_v.
> > > > is_pointer_v is a variable template with a placeholder type:
> > > > 
> > > >template 
> > > >constexpr auto is_pointer_v = is_pointer::value;
> > > > 
> > > > so we find ourselves in lookup_and_finish_template_variable.  
> > > > tf_partial is
> > > > still set, so finish_template_variable -> instantiate_template -> tsubst
> > > > won't reduce the level of auto.  But then we do mark_used which 
> > > > eventually
> > > > calls do_auto_deduction which clears tf_partial, because we want to 
> > > > replace
> > > > the auto now.  But we hadn't reduced auto's level so this fails.  And
> > > > since we're not in an immediate context, we emit a hard error.
> > > > 
> > > > I suppose that when we reach lookup_and_finish_template_variable it's
> > > > probably time to clear tf_partial.  (I added an assert and our testsuite
> > > > doesn't have a test whereby we get to 
> > > > lookup_and_finish_template_variable
> > > > while tf_partial is still active.)
> > > > 
> > > > Does this make *any* sense to you?
> > > > 
> > > > Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?
> > > > 
> > > > PR c++/108550
> > > > 
> > > > gcc/cp/ChangeLog:
> > > > 
> > > > * pt.cc (lookup_and_finish_template_variable): Clear tf_partial.
> > > > 
> > > > gcc/testsuite/ChangeLog:
> > > > 
> > > > * g++.dg/cpp1y/var-templ70.C: New test.
> > > > * g++.dg/cpp1y/var-templ71.C: New test.
> > > > ---
> > > >   gcc/cp/pt.cc |  6 ++
> > > >   gcc/testsuite/g++.dg/cpp1y/var-templ70.C | 25 +++
> > > >   gcc/testsuite/g++.dg/cpp1y/var-templ71.C | 26 
> > > >   3 files changed, 57 insertions(+)
> > > >   create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ70.C
> > > >   create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ71.C
> > > > 
> > > > diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> > > > index 1a071e95004..f636bac5413 100644
> > > > --- a/gcc/cp/pt.cc
> > > > +++ b/gcc/cp/pt.cc
> > > > @@ -10355,6 +10355,12 @@ lookup_and_finish_template_variable (tree 
> > > > templ, tree targs,
> > > > if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (templ)) == 1
> > > > && !any_dependent_template_arguments_p (targs))
> > > >   {
> > > > +  /* We may be called while doing a partial substitution, but the
> > > > +type of the variable template may be auto, in which case we
> > > > +will call do_auto_deduction in mark_used (which clears 
> > > > tf_partial)
> > > > +and the auto must be properly reduced at that time for the
> > > > +deduction to work.  */
> > > > +  complain &= ~tf_partial;
> > > 
> > > LGTM.  I can't think of a reason to keep tf_partial set at this point
> > > since we know there's only a single level of template arguments left to
> > > substitute and the args we have are non-dependent, so the substitution
> > > is in no way partial.
> > 
> > Right, once we get to finish_template_variable I suppose we're really
> > committed to the var tmpl.  So the fix should be safe, albeit it feels
> > a bit ad hoc.  FWIW, I'd tested adding alias templates to the mix and
> > it still worked.
> > > Though I wonder if ideally we should clear tf_partial more generally
> > > from instantiate_template?  Modulo diagnostics, the result of
> > > instantiate_template shouldn't depend on complain, in particular because
> > > we cache the result in the specializations table is which keyed off of
> > > only the given tmpl and args.  Not sure if that'd be a suitable change
> > > for stage4/backports though...
> > 
> > This makes sense; I guess anytime you have the full set of targs and are
> > going to instantiate, tf_partial should be cleared otherwise it can
> > wreak havoc.  I'd be nervous about 

Ping: [PATCH, V3] PR 107299, GCC does not build on PowerPC when long double is IEEE 128-bit

2023-02-27 Thread Michael Meissner via Gcc-patches
This is the most important patch to look at:

| Date: Wed, 14 Dec 2022 15:29:02 -0500
| From: Michael Meissner 
| Subject: [PATCH, V3] PR 107299, GCC does not build on PowerPC when long 
double is IEEE 128-bit
| Message-ID: 

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com


Re: [PATCH] c++: Add target hook for emit_support_tinfos [PR108883]

2023-02-27 Thread Jason Merrill via Gcc-patches

On 2/23/23 05:23, Jakub Jelinek wrote:

Hi!

_Float16 and decltype(0.0bf16) types are on x86 supported only with
-msse2.  On x86_64 that is the default, but on ia32 it is not.
We should still emit fundamental type tinfo for those types in
libsupc++.a/libstdc++.*, regardless of whether libsupc++/libstdc++
is compiled with -msse2 or not, as user programs can be compiled
with different ISA flags from libsupc++/libstdc++ and if they
are compiled with -msse2 and use std::float16_t or std::bfloat16_t
and need RTTI for it, it should work out of the box.  Furthermore,
libstdc++ ABI on ia32 shouldn't depend on whether the library
is compiled with -mno-sse or -msse2.

Unfortunately, just hacking up libsupc++ Makefile/configure so that
a single source is compiled with -msse2 isn't appropriate, because
that TU emits also code and the code should be able to run on CPUs
which libstdc++ supports.  We could add [[gnu::attribute ("no-sse2")]]
there perhaps conditionally, but it all gets quite ugly.

The following patch instead adds a target hook which allows the backend
to temporarily tweak registered types such that emit_support_tinfos
emits whatever is needed.


Why handle these types differently from the DFP handling at the end of 
emit_support_tinfos?



Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2023-02-23  Jakub Jelinek  

PR target/108883
* hooks.h (hook_void_bool): Declare.
* hooks.cc (hook_void_bool): New function.
* target.def (emit_support_tinfos): New target hook.
* doc/tm.texi.in (emit_support_tinfos): Document it.
* doc/tm.texi: Regenerated.
* config/i386/i386.cc (ix86_emit_support_tinfos): New function.
(TARGET_EMIT_SUPPORT_TINFOS): Redefine.

* rtti.cc (emit_support_tinfos): Call targetm.emit_support_tinfos
before and after emit_support_tinfo_1 calls.

--- gcc/hooks.h.jj  2023-01-02 09:32:29.418183667 +0100
+++ gcc/hooks.h 2023-02-22 12:34:32.144973558 +0100
@@ -77,6 +77,7 @@ extern bool hook_bool_wint_wint_uint_boo
unsigned int, bool);
  
  extern void hook_void_void (void);

+extern void hook_void_bool (bool);
  extern void hook_void_constcharptr (const char *);
  extern void hook_void_rtx_insn_int (rtx_insn *, int);
  extern void hook_void_FILEptr_constcharptr (FILE *, const char *);
--- gcc/hooks.cc.jj 2023-01-02 09:32:49.675890970 +0100
+++ gcc/hooks.cc2023-02-22 12:36:46.241035355 +0100
@@ -271,6 +271,11 @@ hook_hwi_void_0 (void)
  }
  
  void

+hook_void_bool (bool)
+{
+}
+
+void
  hook_void_tree (tree)
  {
  }
--- gcc/target.def.jj   2023-01-04 10:45:50.117881714 +0100
+++ gcc/target.def  2023-02-22 12:33:39.715731356 +0100
@@ -2606,6 +2606,20 @@ types.",
   const char *, (const_tree type),
   hook_constcharptr_const_tree_null)
  
+/* Temporarily add conditional target specific types for the purpose of

+   emitting C++ fundamental type tinfos.  */
+DEFHOOK
+(emit_support_tinfos,
+ "If your target defines any fundamental types which depend on ISA flags,\n\
+they might need C++ tinfo symbols in libsupc++/libstdc++ regardless of\n\
+ISA flags the library is compiled with.\n\
+The hook is called with @var{add} @code{true} at the start of C++ FE\n\
+@code{emit_support_tinfos} and with @var{add} @code{false} at the end of it\n\
+and can temporarily create fundamental types that are not supposed to be\n\
+otherwise created due to the ISA options.",
+ void, (bool add),
+ hook_void_bool)
+
  /* Make any adjustments to libfunc names needed for this target.  */
  DEFHOOK
  (init_libfuncs,
--- gcc/doc/tm.texi.in.jj   2023-01-16 11:52:16.124733460 +0100
+++ gcc/doc/tm.texi.in  2023-02-22 12:46:37.951482849 +0100
@@ -1286,6 +1286,8 @@ pattern needs to support both a 32- and
  
  @hook TARGET_MANGLE_TYPE
  
+@hook TARGET_EMIT_SUPPORT_TINFOS

+
  @node Type Layout
  @section Layout of Source Language Data Types
  
--- gcc/doc/tm.texi.jj	2023-01-16 11:52:16.123733475 +0100

+++ gcc/doc/tm.texi 2023-02-22 12:46:41.741428081 +0100
@@ -1525,6 +1525,16 @@ appropriate for a target that does not d
  types.
  @end deftypefn
  
+@deftypefn {Target Hook} void TARGET_EMIT_SUPPORT_TINFOS (bool @var{add})

+If your target defines any fundamental types which depend on ISA flags,
+they might need C++ tinfo symbols in libsupc++/libstdc++ regardless of
+ISA flags the library is compiled with.
+The hook is called with @var{add} @code{true} at the start of C++ FE
+@code{emit_support_tinfos} and with @var{add} @code{false} at the end of it
+and can temporarily create fundamental types that are not supposed to be
+otherwise created due to the ISA options.
+@end deftypefn
+
  @node Type Layout
  @section Layout of Source Language Data Types
  
--- gcc/config/i386/i386.cc.jj	2023-02-16 10:13:23.701210667 +0100

+++ gcc/config/i386/i386.cc 2023-02-22 12:59:55.110960505 +0100
@@ -22775,6 +22775,30 @@ ix86_mangle_type (const_tree type)
  }
  }
  
+/* 

Re: [PATCH] c++: Fix up -fcontracts option description [PR108890]

2023-02-27 Thread Jason Merrill via Gcc-patches

On 2/23/23 05:26, Jakub Jelinek wrote:

Hi!

This translation PR mentioned the description is a little bit weird.

Ok for trunk?


OK.


2023-02-23  Jakub Jelinek  

PR translation/108890
* c.opt (fcontracts): Fix description.

--- gcc/c-family/c.opt.jj   2023-02-01 10:19:42.637146215 +0100
+++ gcc/c-family/c.opt  2023-02-23 10:25:06.084085718 +0100
@@ -1689,7 +1689,7 @@ C++ ObjC++ Joined RejectNegative Host_Wi
  
  fcontracts

  C++ ObjC++ Var(flag_contracts) Init(0)
-Enable certain features present drafts of C++ Contracts.
+Enable certain features present in drafts of C++ Contracts.
  
  Enum

  Name(on_off) Type(int) UnknownError(argument %qs must be either % or 
%)

Jakub





Re: [PATCH] c++: variable template and targ deduction [PR108550]

2023-02-27 Thread Jason Merrill via Gcc-patches

On 2/23/23 10:54, Marek Polacek wrote:

On Thu, Feb 23, 2023 at 10:17:22AM -0500, Patrick Palka wrote:

On Wed, 22 Feb 2023, Marek Polacek wrote:


In this test, we get a bogus error because we failed to deduce the auto in
constexpr auto is_pointer_v = is_pointer::value;
to bool.  Then ensure_literal_type_for_constexpr_object thinks the object
isn't literal and an error is reported.

This is another case of the interaction between tf_partial and 'auto',
where the auto was not reduced so the deduction failed.  In more detail:
we have

   Wrap1()

in the code and we need to perform OR -> fn_type_unification.  The targ
list is incomplete, so we do
   tsubst_flags_t ecomplain = complain | tf_partial | tf_fndecl_type;
   fntype = tsubst (TREE_TYPE (fn), explicit_targs, ecomplain, NULL_TREE);
where TREE_TYPE (fn) is struct integral_constant  (void).  Then
we substitute the return type, which results in tsubsting is_pointer_v.
is_pointer_v is a variable template with a placeholder type:

   template 
   constexpr auto is_pointer_v = is_pointer::value;

so we find ourselves in lookup_and_finish_template_variable.  tf_partial is
still set, so finish_template_variable -> instantiate_template -> tsubst
won't reduce the level of auto.  But then we do mark_used which eventually
calls do_auto_deduction which clears tf_partial, because we want to replace
the auto now.  But we hadn't reduced auto's level so this fails.  And
since we're not in an immediate context, we emit a hard error.

I suppose that when we reach lookup_and_finish_template_variable it's
probably time to clear tf_partial.  (I added an assert and our testsuite
doesn't have a test whereby we get to lookup_and_finish_template_variable
while tf_partial is still active.)

Does this make *any* sense to you?

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk?

PR c++/108550

gcc/cp/ChangeLog:

* pt.cc (lookup_and_finish_template_variable): Clear tf_partial.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ70.C: New test.
* g++.dg/cpp1y/var-templ71.C: New test.
---
  gcc/cp/pt.cc |  6 ++
  gcc/testsuite/g++.dg/cpp1y/var-templ70.C | 25 +++
  gcc/testsuite/g++.dg/cpp1y/var-templ71.C | 26 
  3 files changed, 57 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ70.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ71.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 1a071e95004..f636bac5413 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10355,6 +10355,12 @@ lookup_and_finish_template_variable (tree templ, tree 
targs,
if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (templ)) == 1
&& !any_dependent_template_arguments_p (targs))
  {
+  /* We may be called while doing a partial substitution, but the
+type of the variable template may be auto, in which case we
+will call do_auto_deduction in mark_used (which clears tf_partial)
+and the auto must be properly reduced at that time for the
+deduction to work.  */
+  complain &= ~tf_partial;


LGTM.  I can't think of a reason to keep tf_partial set at this point
since we know there's only a single level of template arguments left to
substitute and the args we have are non-dependent, so the substitution
is in no way partial.


Right, once we get to finish_template_variable I suppose we're really
committed to the var tmpl.  So the fix should be safe, albeit it feels
a bit ad hoc.  FWIW, I'd tested adding alias templates to the mix and
it still worked.
  

Though I wonder if ideally we should clear tf_partial more generally
from instantiate_template?  Modulo diagnostics, the result of
instantiate_template shouldn't depend on complain, in particular because
we cache the result in the specializations table is which keyed off of
only the given tmpl and args.  Not sure if that'd be a suitable change
for stage4/backports though...


This makes sense; I guess anytime you have the full set of targs and are
going to instantiate, tf_partial should be cleared otherwise it can
wreak havoc.  I'd be nervous about applying such a patch now but maybe
we could experiment with that in GCC 14.


This sounds right to me; it would never make sense to have tf_partial 
set in instantiate_template, so we should be able to clear it there.  We 
might even do


 complain = complain & tf_warning_or_error

since none of the other flags are relevant at that level either.

But this patch is OK for GCC 13.


Thanks for taking a look!
  

var = finish_template_variable (var, complain);
mark_used (var);
  }
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ70.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ70.C
new file mode 100644
index 000..1d35c38c7cc
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ70.C
@@ -0,0 +1,25 @@
+// PR c++/108550
+// { dg-do compile { target c++14 } }
+
+template
+struct is_pointer
+{
+  static 

Re: [PATCH] c++: non-dependent variable template-id [PR108848]

2023-02-27 Thread Jason Merrill via Gcc-patches

On 2/23/23 16:52, Patrick Palka wrote:

Here we're incorrectly treating the non-dependent variable template-id
tag as dependent ever since r226642 gave variable TEMPLATE_ID_EXPR
an empty type which causes the call to finish_template_variable from
finish_id_expression_1 to be dead code at template parse time.  Thus
we're led into treating tag.var as a dependent name at parse
time.

This patch fixes this by making finish_id_expression_1 instantiate a
variable template-id as long as the template and args are non-dependent
according to the dependence test in lookup_and_finish_template_variable
and not according to type_dependent_expression_p.  This patch also moves
said dependence test into finish_template_variable.

bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/108848

gcc/cp/ChangeLog:

* pt.cc (finish_template_variable): Move dependence check
to here from ...
(lookup_and_finish_template_variable): ... here.
* semantics.cc (finish_id_expression_1): Call
finish_template_variable sooner and regardless of
type_dependent_expression_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/noexcept1.C: Don't expect bogus "different
exception specifier" error.  Expect a separate "not usable
in a constant expression" error.
* g++.dg/cpp1y/var-templ70.C: New test.
* g++.dg/cpp1y/var-templ71.C: New test.
---
  gcc/cp/pt.cc | 18 ++
  gcc/cp/semantics.cc  | 14 +-
  gcc/testsuite/g++.dg/cpp1y/noexcept1.C   |  4 ++--
  gcc/testsuite/g++.dg/cpp1y/var-templ70.C | 20 
  gcc/testsuite/g++.dg/cpp1y/var-templ71.C | 10 ++
  5 files changed, 47 insertions(+), 19 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ70.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ71.C

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index b1ac7d4beb4..11c0baa0119 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -10317,7 +10317,8 @@ lookup_template_variable (tree templ, tree arglist)
return build2 (TEMPLATE_ID_EXPR, NULL_TREE, templ, arglist);
  }
  
-/* Instantiate a variable declaration from a TEMPLATE_ID_EXPR for use. */

+/* Instantiate a variable declaration from a TEMPLATE_ID_EXPR if it's
+   not dependent.  */
  
  tree

  finish_template_variable (tree var, tsubst_flags_t complain)
@@ -10325,6 +10326,12 @@ finish_template_variable (tree var, tsubst_flags_t 
complain)
tree templ = TREE_OPERAND (var, 0);
tree arglist = TREE_OPERAND (var, 1);
  
+  /* If the template or arguments are dependent, then we

+ can't instantiate yet.  */
+  if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (templ)) != 1
+  || any_dependent_template_arguments_p (arglist))
+return var;
+
tree parms = DECL_TEMPLATE_PARMS (templ);
arglist = coerce_template_parms (parms, arglist, templ, complain);
if (arglist == error_mark_node)
@@ -10352,13 +10359,8 @@ lookup_and_finish_template_variable (tree templ, tree 
targs,
 tsubst_flags_t complain)
  {
tree var = lookup_template_variable (templ, targs);
-  if (TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (templ)) == 1
-  && !any_dependent_template_arguments_p (targs))
-{
-  var = finish_template_variable (var, complain);
-  mark_used (var);
-}
-
+  var = finish_template_variable (var, complain);
+  mark_used (var);
return convert_from_reference (var);
  }
  
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc

index c2df0b69b30..a18364dda4f 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -4208,6 +4208,11 @@ finish_id_expression_1 (tree id_expression,
  }
else
  {


We could use a comment here about why this needs to come before checking 
type_dep*.  OK with that change.



+  if (TREE_CODE (decl) == TEMPLATE_ID_EXPR
+ && variable_template_p (TREE_OPERAND (decl, 0))
+ && !concept_check_p (decl))
+   decl = finish_template_variable (decl);
+
bool dependent_p = type_dependent_expression_p (decl);
  
/* If the declaration was explicitly qualified indicate

@@ -4275,15 +4280,6 @@ finish_id_expression_1 (tree id_expression,
/* Replace an evaluated use of the thread_local variable with
   a call to its wrapper.  */
decl = wrap;
-  else if (TREE_CODE (decl) == TEMPLATE_ID_EXPR
-  && !dependent_p
-  && variable_template_p (TREE_OPERAND (decl, 0))
-  && !concept_check_p (decl))
-   {
- decl = finish_template_variable (decl);
- mark_used (decl);
- decl = convert_from_reference (decl);
-   }
else if (concept_check_p (decl))
{
  /* Nothing more to do. All of the analysis for concept checks
diff --git a/gcc/testsuite/g++.dg/cpp1y/noexcept1.C 
b/gcc/testsuite/g++.dg/cpp1y/noexcept1.C
index 86e46c96148..caa4a056a2e 100644
--- 

Re: [PATCH] c++: ICE with constexpr variable template [PR107938]

2023-02-27 Thread Jason Merrill via Gcc-patches

On 2/23/23 18:51, Marek Polacek wrote:

Since r11-557, cp_finish_decl can call check_initializer even in
a template for a constexpr initializer.  That ultimately leads to
convert_for_assignment and check_address_or_pointer_of_packed_member,
where we crash, because it doesn't expect that the CALL_EXPR is
a function object.  Q has a constexpr operator(), but since we're
in a template, q(0) is a CALL_EXPR whose CALL_EXPR_FN is just
a VAR_DECL; it hasn't been converted to Q::operator(, 0) yet.
I propose to robustify check_address_or_pointer_of_packed_member.

var-templ74.C has an XFAIL, subject to 107939.

I noticed that our -Waddress-of-packed-member tests weren't testing
member functions, added thus.  (I was tempted to check
FUNCTION_POINTER_TYPE_P but that doesn't include METHOD_TYPE.)

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/12?


OK.


PR c++/107938

gcc/c-family/ChangeLog:

* c-warn.cc (check_address_or_pointer_of_packed_member): Check
POINTER_TYPE_P.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ73.C: New test.
* g++.dg/cpp1y/var-templ74.C: New test.
* g++.dg/warn/Waddress-of-packed-member3.C: New test.
---
  gcc/c-family/c-warn.cc|  4 
  gcc/testsuite/g++.dg/cpp1y/var-templ73.C  | 12 ++
  gcc/testsuite/g++.dg/cpp1y/var-templ74.C  | 19 +++
  .../g++.dg/warn/Waddress-of-packed-member3.C  | 23 +++
  4 files changed, 58 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ73.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/var-templ74.C
  create mode 100644 gcc/testsuite/g++.dg/warn/Waddress-of-packed-member3.C

diff --git a/gcc/c-family/c-warn.cc b/gcc/c-family/c-warn.cc
index a6fb95b1e80..29ae1ea1dc8 100644
--- a/gcc/c-family/c-warn.cc
+++ b/gcc/c-family/c-warn.cc
@@ -3000,6 +3000,10 @@ check_address_or_pointer_of_packed_member (tree type, 
tree rhs)
  if (rhs == NULL_TREE)
return NULL_TREE;
  rhs = TREE_TYPE (rhs);/* Pointer type.  */
+ /* We could be called while processing a template and RHS could be
+a functor.  In that case it's a class, not a pointer.  */
+ if (!POINTER_TYPE_P (rhs))
+   return NULL_TREE;
  rhs = TREE_TYPE (rhs);/* Function type.  */
  rhstype = TREE_TYPE (rhs);
  if (!rhstype || !POINTER_TYPE_P (rhstype))
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ73.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ73.C
new file mode 100644
index 000..b76babcfa81
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ73.C
@@ -0,0 +1,12 @@
+// PR c++/107938
+// { dg-do compile { target c++14 } }
+
+struct Q {
+  int n;
+  constexpr const Q* operator()(int) const { return this; }
+};
+
+constexpr Q q{};
+
+template
+constexpr const Q* p = q(0);
diff --git a/gcc/testsuite/g++.dg/cpp1y/var-templ74.C 
b/gcc/testsuite/g++.dg/cpp1y/var-templ74.C
new file mode 100644
index 000..4e2e800a6eb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/var-templ74.C
@@ -0,0 +1,19 @@
+// PR c++/107938
+// { dg-do compile { target c++14 } }
+
+struct Q {
+  int n;
+  constexpr const Q* operator()(int) const { return this; }
+};
+
+extern const Q q;
+
+template
+constexpr const Q* p = q(0); // { dg-bogus "not usable" "PR107939" { xfail 
*-*-* } }
+
+void
+g ()
+{
+  constexpr const Q* p2 = q(0);
+  constexpr auto x = p<0>;
+}
diff --git a/gcc/testsuite/g++.dg/warn/Waddress-of-packed-member3.C 
b/gcc/testsuite/g++.dg/warn/Waddress-of-packed-member3.C
new file mode 100644
index 000..aeffb969c01
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Waddress-of-packed-member3.C
@@ -0,0 +1,23 @@
+// { dg-do compile { target { ! default_packed } } }
+// Test that -Waddress-of-packed-member works with member functions.
+
+struct S {
+  char c;
+} __attribute__((packed));
+
+struct X {
+  S* memfn ();
+  static S* smemfn ();
+} x;
+
+S *foo ();
+
+S**
+f ()
+{
+  S **s;
+  s = reinterpret_cast(foo ()); // { dg-warning "converting a packed" }
+  s = reinterpret_cast(x.memfn ()); // { dg-warning "converting a packed" 
}
+  s = reinterpret_cast(X::smemfn ()); // { dg-warning "converting a 
packed" }
+  return s;
+}

base-commit: f33d7a88d069d169bbe76da8e5c52de17f68ca05




Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy

2023-02-27 Thread Segher Boessenkool
On Mon, Feb 27, 2023 at 04:03:56PM -0600, Pat Haugen wrote:
> On 2/27/23 2:53 PM, Segher Boessenkool wrote:
> >"Slightly".  It takes 12 cycles for the two in parallel (64-bit, p9),
> >but 17 cycles for the "cheaper" sequence (divd+mulld+subf, 12+5+2).  It
> >is all worse if the units are busy of course, or if there are other
> >problems.
> >
> >>but if you throw in another
> >>independent div or mod in the insn stream then doing the peephole should
> >>be a clear win since that 3rd insn can execute in parallel with the
> >>initial divide as opposed to waiting for the one of the first div/mod to
> >>clear the exclusive stage of the pipe.
> >
> >That is the SMT4 case, the one we do not optimise for.  SMT2 and ST can
> >do four in parallel.  This means you can start a div or mod every 2nd
> >cycle on average, so it is very unlikely you will ever be limited by
> >this on real code.
> 
> Power9/Power10 only have 2 fixed-point divide units, and are able to 
> issue 2 divides every 9/11 cycles (they aren't fully pipelined), with 
> latencies of 12-24/12-25. Not saying that changes the "best case" 
> scenario, just pointing out a lot of variables in play.

The p9 UM says in no uncertain terms there are four integer dividers
(four fixed-point execution pipelines, all four capable of divides).
Is that wrong then?

Let's do actual tests on actual hardware :-)


Segher


Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803]

2023-02-27 Thread Segher Boessenkool
Hi!

On Mon, Feb 27, 2023 at 08:11:09PM +0100, Jakub Jelinek wrote:
> (insn 52 48 53 2 (set (reg:CC 66 cc)
> (compare:CC (reg:SI 130)
> (const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi}
>  (expr_list:REG_DEAD (reg:SI 130)
> (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffc0])
> (const_int 0 [0]))
> (nil
> (insn 53 52 57 2 (set (reg:DI 152 [ _6+8 ])
> (if_then_else:DI (ge (reg:CC 66 cc)
> (const_int 0 [0]))
> (reg:DI 132)
> (const_int 0 [0]))) "pr108803.c":12:25 490 {*cmovdi_insn}
>  (expr_list:REG_DEAD (reg:DI 132)
> (nil)))
> (insn 57 53 59 2 (set (reg:DI 151 [ _6 ])
> (if_then_else:DI (ge (reg:CC 66 cc)
> (const_int 0 [0]))
> (const_int 0 [0])
> (reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn}
>  (expr_list:REG_DEAD (reg:CC 66 cc)
> (nil)))
> ...
> (insn 71 68 72 2 (set (reg:CC 66 cc)
> (compare:CC (reg:SI 137)
> (const_int 0 [0]))) "pr108803.c":12:42 437 {cmpsi}
>  (expr_list:REG_DEAD (reg:SI 137)
> (expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffc0])
> (const_int 0 [0]))
> (nil
> (insn 72 71 76 2 (set (reg:DI 153 [ _8 ])
> (if_then_else:DI (ge (reg:CC 66 cc)
> (const_int 0 [0]))
> (reg:DI 139)
> (reg:DI 153 [ _8 ]))) "pr108803.c":12:42 490 {*cmovdi_insn}
>  (expr_list:REG_DEAD (reg:DI 139)
> (nil)))
> (insn 76 72 77 2 (set (reg:DI 154 [ _8+8 ])
> (if_then_else:DI (ge (reg:CC 66 cc)
> (const_int 0 [0]))
> (reg:DI 138)
> (reg:DI 127))) "pr108803.c":12:42 490 {*cmovdi_insn}
>  (expr_list:REG_DEAD (reg:DI 138)
> (expr_list:REG_DEAD (reg:DI 127)
> (expr_list:REG_DEAD (reg:CC 66 cc)
> (nil)
> (insn 77 76 78 2 (set (reg:DI 159 [ b ])
> (ior:DI (reg:DI 151 [ _6 ])
> (reg:DI 126))) "pr108803.c":12:12 537 {iordi3}
>  (expr_list:REG_DEAD (reg:DI 126)
> (expr_list:REG_DEAD (reg:DI 151 [ _6 ])
> (nil
> (insn 78 77 80 2 (set (reg:DI 160 [ b+8 ])
> (reg:DI 152 [ _6+8 ])) "pr108803.c":12:12 65 {*movdi_aarch64}
>  (expr_list:REG_DEAD (reg:DI 152 [ _6+8 ])
> (nil)))

Both CC's are used twice, in if_then_else all times, a situation that
does not happen frequently at all, and that combine is apparently not
prepared for at all.  It is the same (hard!) register in all cases as
well.

> but as you can see, because cc reg has been REG_DEAD before on insn 57
> rather than on insn 53, nothing really removed REG_DEAD note from there
> and just adds it on insn 78 (note, besides this REG_DEAD issue the
> IL is otherwise still sane, the previous cc setter 71 and its previous
> uses 72 and 76 in between the move have been optimized away already in
> an earlier successful combination).
> And things go wild with the next successful combination:

Yup.


Segher


Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803]

2023-02-27 Thread Segher Boessenkool
On Mon, Feb 27, 2023 at 09:54:06PM +0100, Jakub Jelinek wrote:
> Even if the target-independent code doesn't know what the target dependent
> code will do, I don't see how it could emit it safely.

I always understood RTL to not have anything like C "undefined
behavior", but be closer in general (exceptions are integer division by
zero etc.) to C's "unspecified value".  This is quite close to what on
Power we call "boundedly undefined":
  The results of executing a given instruction are said to be boundedly
  undefined if they could have been achieved by executing an arbitrary
  finite sequence of instructions (none of which yields boundedly
  undefined results) in the state the processor was in before
  executing the given instruction. Boundedly undefined results may
  include the presentation of inconsistent state to the system error
  handler as described in Section 1.8.1 of Book II. Boundedly undefined
  results for a given instruction may vary between implementations, and
  between different executions on the same implementation.
So no trapping, everything writes all bits in all registers always, etc.

C undefined behaviour makes the *whole program* undefined, always: it
can have effects reaching unlimitedly far into the future, but also
unboundedly far into the past (well, not further back then program
start, there is that :-) ).  RTL unspecified stuff is way more limited
than that.  It also has a very different goal: C UB is to allow
compilers to optimise more programs and to optimise them better, while
for RTL we simply do not specify the operation result in some cases.
This does not mean such cases cannot happen!

And of course there are TARGET_SHIFT_COUNT_TRUNCATED and
TARGET_SHIFT_TRUNCATION_MASK, which allow to make more inputs result in
known (to the compiler) outputs.


Segher


RE: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-27 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, February 27, 2023 9:33 PM
> To: Tamar Christina via Gcc-patches 
> Cc: Tamar Christina ; nd ;
> rguent...@suse.de; j...@ventanamicro.com
> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
> by using new optabs [PR108583]
> 
> Tamar Christina via Gcc-patches  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Monday, February 27, 2023 12:12 PM
> >> To: Tamar Christina 
> >> Cc: Tamar Christina via Gcc-patches ; nd
> >> ; rguent...@suse.de; j...@ventanamicro.com
> >> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of
> >> div-bitmask by using new optabs [PR108583]
> >>
> >> Tamar Christina  writes:
> >> > Hi,
> >> >
> >> >> > I avoided open coding it with add and shift because it creates a
> >> >> > 4 instructions (and shifts which are typically slow) dependency
> >> >> > chain instead of a load and multiply.  This change, unless the
> >> >> > target is known to optimize it further is unlikely to be
> >> >> > beneficial.  And by the time we get to costing the only
> >> >> > alternative is to undo the existing pattern and
> >> >> so you lose the general shift optimization.
> >> >> >
> >> >> > So it seemed unwise to open code as shifts, given the codegen
> >> >> > out of the vectorizer would be degenerate for most targets or
> >> >> > one needs the more complicated route of costing during pattern
> matching already.
> >> >>
> >> >> Hmm, OK.  That seems like a cost-model thing though, rather than
> >> >> something that should be exposed through optabs.  And I imagine
> >> >> the open-coded version would still be better than nothing on
> >> >> targets without
> >> highpart multiply.
> >> >>
> >> >> So how about replacing the hook with one that simply asks whether
> >> >> division through highpart multiplication is preferred over the
> >> >> add/shift
> >> sequence?
> >> >> (Unfortunately it's not going to be possible to work that out from
> >> >> existing
> >> >> information.)
> >> >
> >> > So this doesn't work for SVE.  For SVE the multiplication widening
> >> > pass introduces FMAs at gimple level.  So in the cases where the
> >> > operation is fed from a widening multiplication we end up generating
> FMA.
> >> If that was it I could have matched FMA.
> >> >
> >> > But it also pushes the multiplication in the second operand because
> >> > it no longer has a mul to share the results with.
> >> >
> >> > In any case, the gimple code is transformed into
> >> >
> >> > vect__3.8_122 = .MASK_LOAD (_29, 8B, loop_mask_121);
> >> > vect_patt_57.9_123 = (vector([8,8]) unsigned short) vect__3.8_122;
> >> > vect_patt_64.11_127 = .FMA (vect_patt_57.9_123, vect_cst__124, {
> >> > 257, ... });
> >> > vect_patt_65.12_128 = vect_patt_64.11_127 >> 8;
> >> > vect_patt_66.13_129 = .FMA (vect_patt_57.9_123, vect_cst__124,
> >> > vect_patt_65.12_128);
> >> > vect_patt_62.14_130 = vect_patt_66.13_129 >> 8;
> >> > vect_patt_68.15_131 = (vector([8,8]) unsigned charD.21)
> >> > vect_patt_62.14_130;
> >> >
> >> > This transformation is much worse than the original code, it
> >> > extended the dependency chain with another expensive instruction. I
> >> > can try to correct this in RTL by matching FMA and shift and
> >> > splitting into MUL +
> >> ADDHNB and hope CSE takes care of the extra mul.
> >> >
> >> > But this seems like a hack, and it's basically undoing the earlier
> >> > transformation.  It seems to me that the open coding is a bad idea.
> >>
> >> Could you post the patch that gives this result?  I'll have a poke around.
> >
> > Sure, I'll post the new series, it needs all of them.
> 
> Thanks.  Which testcase did you use to get the above?
> 

#include 

#define N 16
#define TYPE uint8_t

void fun3(TYPE* restrict pixel, TYPE level, int n)
{
  for (int i = 0; i < (n & -16); i+=1)
pixel[i] = (pixel[i] * level) / 0xff;
}

> But since SVE does have highpart multiply, and since the assumption for SVE is
> that MULH+shift is better than ADD*3+shift*2, shouldn't SVE just be one of
> the targets for which the hook that "asks whether division through highpart
> multiplication is preferred over the add/shift sequence" returns true?
> 

Yes (it's also two adds not 3), but it's not correct for SVE2, which has 
addhnb, in which case 2x addhnb is
much faster than MULH+shift.  And the problem is that widening_mul will not
allow add+shift to reach the backend because the ADD+shift were open coded.

They are now subjected to further optimization.

To summarize:

Other targets: false
SVE: false
SVE2: true
NEON: true

SVE2 borked because MUL+ADD+SHIFT -> FMA+SHIFT.

If you're saying you don't want the optimization for SVE2, then sure, happy to 
turn it off.

But  UMULH+LSR == 6 cycles on Neoverse-N2 and throughput of 1.
2x ADDHNB = 4 cycles and throughput of 2.

Tamar.

> 
> Richard


Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy

2023-02-27 Thread Pat Haugen via Gcc-patches

On 2/27/23 2:53 PM, Segher Boessenkool wrote:

Hi!

On Mon, Feb 27, 2023 at 02:12:23PM -0600, Pat Haugen wrote:

On 2/27/23 11:08 AM, Segher Boessenkool wrote:

On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote:

The define_insns for the modulo operation currently force the target
register
to a distinct reg in preparation for a possible future peephole combining
div/mod. But this can lead to cases of a needless copy being inserted.
Fixed
with the following patch.


Have you verified those peepholes still match?


Yes, I verified the peepholes still match and transform the sequence.


Please add the testcases for that then?  Or do we have tests for it
already :-)


I don't see one, but can add one.


Do those peepholes actually improve performance?  On new CPUs?  The code
here says
;; On machines with modulo support, do a combined div/mod the old fashioned
;; method, since the multiply/subtract is faster than doing the mod
instruction
;; after a divide.
but that really should not be true: we can do the div and mod in
parallel (except in SMT4 perhaps, which we never schedule for anyway),
so that should always be strictly faster.


Since the modulo insns were introduced in Power9, we're just talking
Power9/Power10. On paper, I would agree that separate div/mod could be
slightly faster to get the mod result,


"Slightly".  It takes 12 cycles for the two in parallel (64-bit, p9),
but 17 cycles for the "cheaper" sequence (divd+mulld+subf, 12+5+2).  It
is all worse if the units are busy of course, or if there are other
problems.


but if you throw in another
independent div or mod in the insn stream then doing the peephole should
be a clear win since that 3rd insn can execute in parallel with the
initial divide as opposed to waiting for the one of the first div/mod to
clear the exclusive stage of the pipe.


That is the SMT4 case, the one we do not optimise for.  SMT2 and ST can
do four in parallel.  This means you can start a div or mod every 2nd
cycle on average, so it is very unlikely you will ever be limited by
this on real code.


Power9/Power10 only have 2 fixed-point divide units, and are able to 
issue 2 divides every 9/11 cycles (they aren't fully pipelined), with 
latencies of 12-24/12-25. Not saying that changes the "best case" 
scenario, just pointing out a lot of variables in play.


-Pat




Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-27 Thread Richard Sandiford via Gcc-patches
Tamar Christina via Gcc-patches  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Monday, February 27, 2023 12:12 PM
>> To: Tamar Christina 
>> Cc: Tamar Christina via Gcc-patches ; nd
>> ; rguent...@suse.de; j...@ventanamicro.com
>> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
>> by using new optabs [PR108583]
>> 
>> Tamar Christina  writes:
>> > Hi,
>> >
>> >> > I avoided open coding it with add and shift because it creates a 4
>> >> > instructions (and shifts which are typically slow) dependency chain
>> >> > instead of a load and multiply.  This change, unless the target is
>> >> > known to optimize it further is unlikely to be beneficial.  And by
>> >> > the time we get to costing the only alternative is to undo the
>> >> > existing pattern and
>> >> so you lose the general shift optimization.
>> >> >
>> >> > So it seemed unwise to open code as shifts, given the codegen out
>> >> > of the vectorizer would be degenerate for most targets or one needs
>> >> > the more complicated route of costing during pattern matching already.
>> >>
>> >> Hmm, OK.  That seems like a cost-model thing though, rather than
>> >> something that should be exposed through optabs.  And I imagine the
>> >> open-coded version would still be better than nothing on targets without
>> highpart multiply.
>> >>
>> >> So how about replacing the hook with one that simply asks whether
>> >> division through highpart multiplication is preferred over the add/shift
>> sequence?
>> >> (Unfortunately it's not going to be possible to work that out from
>> >> existing
>> >> information.)
>> >
>> > So this doesn't work for SVE.  For SVE the multiplication widening
>> > pass introduces FMAs at gimple level.  So in the cases where the
>> > operation is fed from a widening multiplication we end up generating FMA.
>> If that was it I could have matched FMA.
>> >
>> > But it also pushes the multiplication in the second operand because it
>> > no longer has a mul to share the results with.
>> >
>> > In any case, the gimple code is transformed into
>> >
>> > vect__3.8_122 = .MASK_LOAD (_29, 8B, loop_mask_121);
>> > vect_patt_57.9_123 = (vector([8,8]) unsigned short) vect__3.8_122;
>> > vect_patt_64.11_127 = .FMA (vect_patt_57.9_123, vect_cst__124, { 257,
>> > ... });
>> > vect_patt_65.12_128 = vect_patt_64.11_127 >> 8;
>> > vect_patt_66.13_129 = .FMA (vect_patt_57.9_123, vect_cst__124,
>> > vect_patt_65.12_128);
>> > vect_patt_62.14_130 = vect_patt_66.13_129 >> 8;
>> > vect_patt_68.15_131 = (vector([8,8]) unsigned charD.21)
>> > vect_patt_62.14_130;
>> >
>> > This transformation is much worse than the original code, it extended
>> > the dependency chain with another expensive instruction. I can try to
>> > correct this in RTL by matching FMA and shift and splitting into MUL +
>> ADDHNB and hope CSE takes care of the extra mul.
>> >
>> > But this seems like a hack, and it's basically undoing the earlier
>> > transformation.  It seems to me that the open coding is a bad idea.
>> 
>> Could you post the patch that gives this result?  I'll have a poke around.
>
> Sure, I'll post the new series, it needs all of them.

Thanks.  Which testcase did you use to get the above?

But since SVE does have highpart multiply, and since the assumption for
SVE is that MULH+shift is better than ADD*3+shift*2, shouldn't SVE just
be one of the targets for which the hook that "asks whether division
through highpart multiplication is preferred over the add/shift
sequence" returns true?

For extra conservativeness, we could make the hook default to true
and explicitly return false for Advanced SIMD and for SVE2.

Richard


Re: [PATCH] Fortran: fix corner case of IBITS intrinsic [PR108937]

2023-02-27 Thread Steve Kargl via Gcc-patches
On Mon, Feb 27, 2023 at 09:54:38PM +0100, Harald Anlauf via Fortran wrote:
> 
> as found by the reporter, the result of the intrinsic IBITS
> differed from other compilers (e.g. Intel, NAG) for the corner
> case that the LEN argument was equal to BIT_SIZE(I), which is
> explicitly allowed by the standard.
> 
> We actually had an inconsistency for this case between
> code generated by the frontend and compile-time simplified
> expressions.
> 
> The reporter noticed that this is related to a restriction in
> gcc that requires that shift widths shall be smaller than the
> bit sizes, and we already special case this for ISHFT.
> It makes sense to use the same special casing for IBITS.
> 
> Attached patch fixes this and regtests on x86_64-pc-linux-gnu.
> 
> OK for mainline?

Yes.  Good catch on comparison with simplification,
which I failed to consider last night.

> This issue has been there for ages.  Shall this be backported
> or left in release branches as is?

As always, backporting is up to you and your bandwidth.
Bring the the run-time result and simplification into
agreement suggests that a back port is a good thing.

-- 
Steve


Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803]

2023-02-27 Thread Jakub Jelinek via Gcc-patches
On Mon, Feb 27, 2023 at 09:01:15PM +, Richard Sandiford via Gcc-patches 
wrote:
> Jakub Jelinek  writes:
> > On Mon, Feb 27, 2023 at 08:43:27PM +, Richard Sandiford wrote:
> >> My argument was that !SHIFT_COUNT_TRUNCATED and
> >> C?Z_DEFINED_VALUE_AT_ZERO==0 mean that the behaviour is undefined
> >> only in the sense that target-independent code doesn't know what
> >> the behaviour is.  !SHIFT_COUNT_TRUNCATED doesn't mean that
> >> target-independent code can assume that out-of-range shift values
> >> invoke program UB (and therefore target-independent code can optimise
> >> shifts on the principle that all shifts are in-range).  Similarly
> >> CTZ_DEFINED_VALUE_AT_ZERO==0 doesn't mean the corresponding thing for CTZ.
> >
> > Even if the target-independent code doesn't know what the target dependent
> > code will do, I don't see how it could emit it safely.
> > In that case, I could understand some particular backend emitting such code
> > because it knows what to assume, and then generic code like simplify-rtx
> > simply trying to punt and not mess up with things which it doesn't know how
> > they'll behave.
> > But in this case it isn't aarch64 backend that emits this, but it is
> > the generic expansion code.  It even guarantees one of the two shifts
> > will be out of bounds...  And target didn't tell it that out of bound
> > shifts say won't trap or do something similar.
> 
> But we haven't had to handle trapping shifts because no supported target
> behaves like that.  And yeah, even if you disagree with that being a
> reasonable assumption...
> 
> > Sure, there could be a target hook or macro that would tell the expander
> > that the backend out of bound shift will just produce unspecified result
> > but not other effects.
> 
> ...adding an explicit control, or putting the expansion in aarch64 code,
> would allow us to generate the same RTL as we do now, and so hit the
> same bug.

Ok, but then we should document it (that at RTL an out of bounds shift
can only result in unspecified result value but not have other
side-effects).

Which would turn my patch from a correctness thing into an optimization
territory (especially when PR108840 will be fixed), because for
architectures which have these shifts with masking patterns it emits fewer
instructions.  Though, because for targets which don't do that it emits
more code, we'd need some target hook that could tell the expander
that is the case.  And of course make it a GCC14 material rather than GCC13.

Jakub



[PATCH] i386: Do not constrain fmod and remainder patterns with flag_finite_math_only [PR108922]

2023-02-27 Thread Uros Bizjak via Gcc-patches
According to Intel ISA manual, fprem and fprem1 return NaN when invalid
arithmetic exception is generated. This is documented in Table 8-10 of the
ISA manual and makes these two instructions fully IEEE compatible.

The reverted patch was based on the data from table 3-30 and 3-31 of the
Intel ISA manual, where results in case of st(0) being infinity or
st(1) being 0 are not specified.

2023-02-27  Uroš Bizjak  

gcc/ChangeLog:

PR target/108922
Revert:
* config/i386/i386.md (fmodxf3): Enable for flag_finite_math_only only.
(fmod3): Ditto.
(fpremxf4_i387): Ditto.
(reminderxf3): Ditto.
(reminder3): Ditto.
(fprem1xf4_i387): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 8ebb12be2c9..ed689b044c3 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -19527,8 +19527,7 @@
(set (reg:CCFP FPSR_REG)
(unspec:CCFP [(match_dup 2) (match_dup 3)]
 UNSPEC_C2_FLAG))]
-  "TARGET_USE_FANCY_MATH_387
-   && flag_finite_math_only"
+  "TARGET_USE_FANCY_MATH_387"
   "fprem"
   [(set_attr "type" "fpspc")
(set_attr "znver1_decode" "vector")
@@ -19538,8 +19537,7 @@
   [(use (match_operand:XF 0 "register_operand"))
(use (match_operand:XF 1 "general_operand"))
(use (match_operand:XF 2 "general_operand"))]
-  "TARGET_USE_FANCY_MATH_387
-   && flag_finite_math_only"
+  "TARGET_USE_FANCY_MATH_387"
 {
   rtx_code_label *label = gen_label_rtx ();
 
@@ -19562,8 +19560,7 @@
   [(use (match_operand:MODEF 0 "register_operand"))
(use (match_operand:MODEF 1 "general_operand"))
(use (match_operand:MODEF 2 "general_operand"))]
-  "TARGET_USE_FANCY_MATH_387
-   && flag_finite_math_only"
+  "TARGET_USE_FANCY_MATH_387"
 {
   rtx (*gen_truncxf) (rtx, rtx);
 
@@ -19602,8 +19599,7 @@
(set (reg:CCFP FPSR_REG)
(unspec:CCFP [(match_dup 2) (match_dup 3)]
 UNSPEC_C2_FLAG))]
-  "TARGET_USE_FANCY_MATH_387
-   && flag_finite_math_only"
+  "TARGET_USE_FANCY_MATH_387"
   "fprem1"
   [(set_attr "type" "fpspc")
(set_attr "znver1_decode" "vector")
@@ -19613,8 +19609,7 @@
   [(use (match_operand:XF 0 "register_operand"))
(use (match_operand:XF 1 "general_operand"))
(use (match_operand:XF 2 "general_operand"))]
-  "TARGET_USE_FANCY_MATH_387
-   && flag_finite_math_only"
+  "TARGET_USE_FANCY_MATH_387"
 {
   rtx_code_label *label = gen_label_rtx ();
 
@@ -19637,8 +19632,7 @@
   [(use (match_operand:MODEF 0 "register_operand"))
(use (match_operand:MODEF 1 "general_operand"))
(use (match_operand:MODEF 2 "general_operand"))]
-  "TARGET_USE_FANCY_MATH_387
-   && flag_finite_math_only"
+  "TARGET_USE_FANCY_MATH_387"
 {
   rtx (*gen_truncxf) (rtx, rtx);
 


Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803]

2023-02-27 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Mon, Feb 27, 2023 at 08:43:27PM +, Richard Sandiford wrote:
>> My argument was that !SHIFT_COUNT_TRUNCATED and
>> C?Z_DEFINED_VALUE_AT_ZERO==0 mean that the behaviour is undefined
>> only in the sense that target-independent code doesn't know what
>> the behaviour is.  !SHIFT_COUNT_TRUNCATED doesn't mean that
>> target-independent code can assume that out-of-range shift values
>> invoke program UB (and therefore target-independent code can optimise
>> shifts on the principle that all shifts are in-range).  Similarly
>> CTZ_DEFINED_VALUE_AT_ZERO==0 doesn't mean the corresponding thing for CTZ.
>
> Even if the target-independent code doesn't know what the target dependent
> code will do, I don't see how it could emit it safely.
> In that case, I could understand some particular backend emitting such code
> because it knows what to assume, and then generic code like simplify-rtx
> simply trying to punt and not mess up with things which it doesn't know how
> they'll behave.
> But in this case it isn't aarch64 backend that emits this, but it is
> the generic expansion code.  It even guarantees one of the two shifts
> will be out of bounds...  And target didn't tell it that out of bound
> shifts say won't trap or do something similar.

But we haven't had to handle trapping shifts because no supported target
behaves like that.  And yeah, even if you disagree with that being a
reasonable assumption...

> Sure, there could be a target hook or macro that would tell the expander
> that the backend out of bound shift will just produce unspecified result
> but not other effects.

...adding an explicit control, or putting the expansion in aarch64 code,
would allow us to generate the same RTL as we do now, and so hit the
same bug.

Richard


Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy

2023-02-27 Thread Segher Boessenkool
Hi!

On Mon, Feb 27, 2023 at 02:12:23PM -0600, Pat Haugen wrote:
> On 2/27/23 11:08 AM, Segher Boessenkool wrote:
> >On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote:
> >>The define_insns for the modulo operation currently force the target
> >>register
> >>to a distinct reg in preparation for a possible future peephole combining
> >>div/mod. But this can lead to cases of a needless copy being inserted. 
> >>Fixed
> >>with the following patch.
> >
> >Have you verified those peepholes still match?
> 
> Yes, I verified the peepholes still match and transform the sequence.

Please add the testcases for that then?  Or do we have tests for it
already :-)

> >Do those peepholes actually improve performance?  On new CPUs?  The code
> >here says
> >;; On machines with modulo support, do a combined div/mod the old fashioned
> >;; method, since the multiply/subtract is faster than doing the mod 
> >instruction
> >;; after a divide.
> >but that really should not be true: we can do the div and mod in
> >parallel (except in SMT4 perhaps, which we never schedule for anyway),
> >so that should always be strictly faster.
> >
> Since the modulo insns were introduced in Power9, we're just talking 
> Power9/Power10. On paper, I would agree that separate div/mod could be 
> slightly faster to get the mod result,

"Slightly".  It takes 12 cycles for the two in parallel (64-bit, p9),
but 17 cycles for the "cheaper" sequence (divd+mulld+subf, 12+5+2).  It
is all worse if the units are busy of course, or if there are other
problems.

> but if you throw in another 
> independent div or mod in the insn stream then doing the peephole should 
> be a clear win since that 3rd insn can execute in parallel with the 
> initial divide as opposed to waiting for the one of the first div/mod to 
> clear the exclusive stage of the pipe.

That is the SMT4 case, the one we do not optimise for.  SMT2 and ST can
do four in parallel.  This means you can start a div or mod every 2nd
cycle on average, so it is very unlikely you will ever be limited by
this on real code.


Segher


[PATCH] Fortran: fix corner case of IBITS intrinsic [PR108937]

2023-02-27 Thread Harald Anlauf via Gcc-patches
Dear all,

as found by the reporter, the result of the intrinsic IBITS
differed from other compilers (e.g. Intel, NAG) for the corner
case that the LEN argument was equal to BIT_SIZE(I), which is
explicitly allowed by the standard.

We actually had an inconsistency for this case between
code generated by the frontend and compile-time simplified
expressions.

The reporter noticed that this is related to a restriction in
gcc that requires that shift widths shall be smaller than the
bit sizes, and we already special case this for ISHFT.
It makes sense to use the same special casing for IBITS.

Attached patch fixes this and regtests on x86_64-pc-linux-gnu.

OK for mainline?

This issue has been there for ages.  Shall this be backported
or left in release branches as is?

Thanks,
Harald

From 6844c5ecb271e091a8c913903a79eac932cf5f76 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Mon, 27 Feb 2023 21:37:11 +0100
Subject: [PATCH] Fortran: fix corner case of IBITS intrinsic [PR108937]

gcc/fortran/ChangeLog:

	PR fortran/108937
	* trans-intrinsic.cc (gfc_conv_intrinsic_ibits): Handle corner case
	LEN argument of IBITS equal to BITSIZE(I).

gcc/testsuite/ChangeLog:

	PR fortran/108937
	* gfortran.dg/ibits_2.f90: New test.
---
 gcc/fortran/trans-intrinsic.cc| 10 +
 gcc/testsuite/gfortran.dg/ibits_2.f90 | 32 +++
 2 files changed, 42 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/ibits_2.f90

diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 21eeb12ca89..3cce9c0166e 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -6638,6 +6638,7 @@ gfc_conv_intrinsic_ibits (gfc_se * se, gfc_expr * expr)
   tree type;
   tree tmp;
   tree mask;
+  tree num_bits, cond;

   gfc_conv_intrinsic_function_args (se, expr, args, 3);
   type = TREE_TYPE (args[0]);
@@ -6678,8 +6679,17 @@ gfc_conv_intrinsic_ibits (gfc_se * se, gfc_expr * expr)
 			   "in intrinsic IBITS", tmp1, tmp2, nbits);
 }

+  /* The Fortran standard allows (shift width) LEN <= BIT_SIZE(I), whereas
+ gcc requires a shift width < BIT_SIZE(I), so we have to catch this
+ special case.  See also gfc_conv_intrinsic_ishft ().  */
+  num_bits = build_int_cst (TREE_TYPE (args[2]), TYPE_PRECISION (type));
+
   mask = build_int_cst (type, -1);
   mask = fold_build2_loc (input_location, LSHIFT_EXPR, type, mask, args[2]);
+  cond = fold_build2_loc (input_location, GE_EXPR, logical_type_node, args[2],
+			  num_bits);
+  mask = fold_build3_loc (input_location, COND_EXPR, type, cond,
+			  build_int_cst (type, 0), mask);
   mask = fold_build1_loc (input_location, BIT_NOT_EXPR, type, mask);

   tmp = fold_build2_loc (input_location, RSHIFT_EXPR, type, args[0], args[1]);
diff --git a/gcc/testsuite/gfortran.dg/ibits_2.f90 b/gcc/testsuite/gfortran.dg/ibits_2.f90
new file mode 100644
index 000..2af5542d764
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/ibits_2.f90
@@ -0,0 +1,32 @@
+! { dg-do run }
+! { dg-additional-options "-fcheck=bits" }
+! PR fortran/108937 - Intrinsic IBITS(I,POS,LEN) fails when LEN equals
+! to BIT_SIZE(I)
+! Contributed by saitofuy...@jamstec.go.jp
+
+program test_bits
+  implicit none
+  integer, parameter :: KT = kind (1)
+  integer, parameter :: lbits = bit_size (0_KT)
+  integer(kind=KT) :: x, y0, y1
+  integer(kind=KT) :: p, l
+
+  x = -1
+  p = 0
+  do l = 0, lbits
+ y0 = ibits  (x, p, l)
+ y1 = ibits_1(x, p, l)
+ if (y0 /= y1) then
+print *, l, y0, y1
+stop 1+l
+ end if
+  end do
+contains
+  elemental integer(kind=KT) function ibits_1(I, POS, LEN) result(n)
+!! IBITS(I, POS, LEN) = (I >> POS) & ~((~0) << LEN)
+implicit none
+integer(kind=KT),intent(in) :: I
+integer, intent(in) :: POS, LEN
+n = IAND (ISHFT(I, - POS), NOT(ISHFT(-1_KT, LEN)))
+  end function ibits_1
+end program test_bits
--
2.35.3



Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803]

2023-02-27 Thread Jakub Jelinek via Gcc-patches
On Mon, Feb 27, 2023 at 08:43:27PM +, Richard Sandiford wrote:
> My argument was that !SHIFT_COUNT_TRUNCATED and
> C?Z_DEFINED_VALUE_AT_ZERO==0 mean that the behaviour is undefined
> only in the sense that target-independent code doesn't know what
> the behaviour is.  !SHIFT_COUNT_TRUNCATED doesn't mean that
> target-independent code can assume that out-of-range shift values
> invoke program UB (and therefore target-independent code can optimise
> shifts on the principle that all shifts are in-range).  Similarly
> CTZ_DEFINED_VALUE_AT_ZERO==0 doesn't mean the corresponding thing for CTZ.

Even if the target-independent code doesn't know what the target dependent
code will do, I don't see how it could emit it safely.
In that case, I could understand some particular backend emitting such code
because it knows what to assume, and then generic code like simplify-rtx
simply trying to punt and not mess up with things which it doesn't know how
they'll behave.
But in this case it isn't aarch64 backend that emits this, but it is
the generic expansion code.  It even guarantees one of the two shifts
will be out of bounds...  And target didn't tell it that out of bound
shifts say won't trap or do something similar.
Sure, there could be a target hook or macro that would tell the expander
that the backend out of bound shift will just produce unspecified result
but not other effects.

Jakub



Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803]

2023-02-27 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Mon, Feb 27, 2023 at 07:51:21PM +, Richard Sandiford wrote:
>> I think RTL and gimple are different in that respect.
>> SHIFT_COUNT_TRUNCATED's effect on shifts is IMO a bit like
>> CTZ_DEFINED_VALUE_AT_ZERO's effect on CTZ: it enumerates common
>> target-specific behaviour, but doesn't turn invalid/should-not-be-evaluated
>> values into valid values.  Not defining SHIFT_COUNT_TRUNCATED is like
>> defining CTZ_DEFINED_VALUE_AT_ZERO to 0.
>> 
>> The docs say:
>> 
>>   Note that regardless of this macro the ``definedness'' of @code{clz}
>>   and @code{ctz} at zero do @emph{not} extend to the builtin functions
>>   visible to the user.  Thus one may be free to adjust the value at will
>>   to match the target expansion of these operations without fear of
>>   breaking the API@.
>> 
>> So for CTZ this really is an RTL thing, which can leak into gimple
>> through ifns.  I'd argue that the same is true for SHIFT_COUNT_TRUNCATED
>> and conditional shifts like COND_SHL: normal gimple shifts aren't guaranteed
>> to honour SHIFT_COUNT_TRUNCATED, but COND_SHL should be.
>
> I understand that if SHIFT_COUNT_TRUNCATED 1 is defined, then formerly
> out of bounds shift is well defined on RTL. after all, for
> SHIFT_COUNT_TRUNCATED the generic code removes shift count masking as
> redundant, so code without UB in the source could otherwise appear to have
> UB on RTL.
> The question is what happens with SHIFT_COUNT_TRUNCATED 0 or
> C?Z_DEFINED_VALUE_AT_ZERO 0, if encountering the RTL with invalid operand(s)
> is undefined behavior, or simply undefined value but no other side-effects.
> There are many RTL expressions which invoke on invalid values really
> undefined behavior, it can crash the program etc.  The question is if
> out of bounds shifts are like that too or not.  Ditto for CLZ/CTZ.

My argument was that !SHIFT_COUNT_TRUNCATED and
C?Z_DEFINED_VALUE_AT_ZERO==0 mean that the behaviour is undefined
only in the sense that target-independent code doesn't know what
the behaviour is.  !SHIFT_COUNT_TRUNCATED doesn't mean that
target-independent code can assume that out-of-range shift values
invoke program UB (and therefore target-independent code can optimise
shifts on the principle that all shifts are in-range).  Similarly
CTZ_DEFINED_VALUE_AT_ZERO==0 doesn't mean the corresponding thing for CTZ.

If !SHIFT_COUNT_TRUNCATED meant that all out-of-range shifts are UB then:

wide_int wop1 = pop1;
if (SHIFT_COUNT_TRUNCATED)
  wop1 = wi::umod_trunc (wop1, GET_MODE_PRECISION (int_mode));
else if (wi::geu_p (wop1, GET_MODE_PRECISION (int_mode)))
  return NULL_RTX;

in simplify_const_binary_operation wouldn't be necessary.  We could
just fold constant shifts in the SHIFT_COUNT_TRUNCATED way for all
values, like wide_int_binop folds all nonnegative shifts on trees.

As I say, arm_emit_coreregs_64bit_shift relies on being able to create
RTL shifts whose counts might be out-of-range (to a certain degree),
because the arm port knows how arm shifts behave.  Treating the
out-of-range shifts as UB would break the DI shift expansions.

Thanks,
Richard


Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy

2023-02-27 Thread Pat Haugen via Gcc-patches

On 2/27/23 11:08 AM, Segher Boessenkool wrote:

Hi!

On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote:

The define_insns for the modulo operation currently force the target
register
to a distinct reg in preparation for a possible future peephole combining
div/mod. But this can lead to cases of a needless copy being inserted. Fixed
with the following patch.


Have you verified those peepholes still match?


Yes, I verified the peepholes still match and transform the sequence.



Do those peepholes actually improve performance?  On new CPUs?  The code
here says
;; On machines with modulo support, do a combined div/mod the old fashioned
;; method, since the multiply/subtract is faster than doing the mod instruction
;; after a divide.
but that really should not be true: we can do the div and mod in
parallel (except in SMT4 perhaps, which we never schedule for anyway),
so that should always be strictly faster.

Since the modulo insns were introduced in Power9, we're just talking 
Power9/Power10. On paper, I would agree that separate div/mod could be 
slightly faster to get the mod result, but if you throw in another 
independent div or mod in the insn stream then doing the peephole should 
be a clear win since that 3rd insn can execute in parallel with the 
initial divide as opposed to waiting for the one of the first div/mod to 
clear the exclusive stage of the pipe.



--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { powerpc*-*-*  } } } */


All files in gcc.target/powerpc/ test for this already.  Just leave off
the target clause here?


+/* { dg-require-effective-target powerpc_p9modulo_ok } */


Leave out this line, because ...


+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */


... the -mcpu= forces it to true always.


Will update.

-Pat




+/* Verify r3 is used as source and target, no copy inserted. */



+/* { dg-final { scan-assembler-not {\mmr\M} } } */


That is probably good enough, yeah, since the test results in only a
handful of insns.


Segher




Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803]

2023-02-27 Thread Jakub Jelinek via Gcc-patches
On Mon, Feb 27, 2023 at 07:51:21PM +, Richard Sandiford wrote:
> I think RTL and gimple are different in that respect.
> SHIFT_COUNT_TRUNCATED's effect on shifts is IMO a bit like
> CTZ_DEFINED_VALUE_AT_ZERO's effect on CTZ: it enumerates common
> target-specific behaviour, but doesn't turn invalid/should-not-be-evaluated
> values into valid values.  Not defining SHIFT_COUNT_TRUNCATED is like
> defining CTZ_DEFINED_VALUE_AT_ZERO to 0.
> 
> The docs say:
> 
>   Note that regardless of this macro the ``definedness'' of @code{clz}
>   and @code{ctz} at zero do @emph{not} extend to the builtin functions
>   visible to the user.  Thus one may be free to adjust the value at will
>   to match the target expansion of these operations without fear of
>   breaking the API@.
> 
> So for CTZ this really is an RTL thing, which can leak into gimple
> through ifns.  I'd argue that the same is true for SHIFT_COUNT_TRUNCATED
> and conditional shifts like COND_SHL: normal gimple shifts aren't guaranteed
> to honour SHIFT_COUNT_TRUNCATED, but COND_SHL should be.

I understand that if SHIFT_COUNT_TRUNCATED 1 is defined, then formerly
out of bounds shift is well defined on RTL. after all, for
SHIFT_COUNT_TRUNCATED the generic code removes shift count masking as
redundant, so code without UB in the source could otherwise appear to have
UB on RTL.
The question is what happens with SHIFT_COUNT_TRUNCATED 0 or
C?Z_DEFINED_VALUE_AT_ZERO 0, if encountering the RTL with invalid operand(s)
is undefined behavior, or simply undefined value but no other side-effects.
There are many RTL expressions which invoke on invalid values really
undefined behavior, it can crash the program etc.  The question is if
out of bounds shifts are like that too or not.  Ditto for CLZ/CTZ.

Jakub



Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803]

2023-02-27 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> On Mon, Feb 27, 2023 at 03:34:11PM +, Richard Sandiford wrote:
>> > The following testcase is miscompiled on aarch64.  The problem is that
>> > aarch64 with TARGET_SIMD is !SHIFT_COUNT_TRUNCATED target with
>> > targetm.shift_truncation_mask (DImode) == 0 which has HAVE_conditional_move
>> > true.  If a doubleword shift (in this case TImode) doesn't have its own
>> > expander (as is the case of e.g. x86) and is handled in generic code,
>> > there are 3 possible expansions.  One is when the shift count is constant,
>> > the code computes in that case superword_op1 as op1 - BITS_PER_WORD,
>> > and chooses at compile time which of expand_superword_shift or
>> > expand_subword_shift to call, which ensures that whatever is used
>> > actually has its shift count (superword_op1 or op1) in [0, BITS_PER_WORD - 
>> > 1]
>> > range.  If !HAVE_conditional_move or that expansion fails, the function
>> > handles non-constant op1 similarly (there are some special cases for
>> > shift_mask >= BITS_PER_WORD - 1 but let's talk here just about
>> > shift_mask < BITS_PER_WORD - 1), except that the selection is done at
>> > runtime, with branches around the stuff.  While superword_op1 can be
>> > [-BITS_PER_WORD, BITS_PER_WORD - 1] and op1 [0, 2 * BITS_PER_WORD - 1],
>> > the runtime selection ensures that the instructions executed at runtime
>> > have their corresponding shift count again in the right range of
>> > [0, BITS_PER_WORD - 1].
>> > The problematic is the HAVE_conditional_move case, which emits both
>> > sequences into the actually executed code, so necessarily one of them
>> > has an out of range shift count and then using 2 conditional moves
>> > picks up a result.
>> > Now, in the testcase because -Og doesn't perform VRP/EVRP the shift
>> > count isn't optimized to constant during GIMPLE passes, but is determined
>> > to be constant during/after expansion into RTL.  The shift count is
>> > actually const0_rtx later, so superword_op1 is (const_int -64) and we end
>> > up with miscompilation during combine because of that.
>> 
>> I haven't worked through the testcase yet, but how does that happen?
>> Having shifts with out-of-range shift counts shouldn't be a problem
>> in itself, provided that the conditional move selects the one with
>> the in-range shift.
>
> It depends on if we (hw or the compiler) treats out-of-range shifts in RTL
> as undefined behavior or just some kind of constrained unspecified behavior
> (destination register can be anything, but no traps/exceptions/program
> termination etc.).  Of course SHIFT_COUNT_TRUNCATED makes it well defined,
> but that is not the case here.
> I've always thoughts we really treat it as undefined behavior, you shouldn't
> reach this spot at runtime; we certainly treat it that way in GIMPLE.
> If that is the case, then invoking UB and then having a conditional move
> not to select its result is not well defined.
> And the patch as posted fixes that problem while not really resulting in
> worse code e.g. on aarch64.

I think RTL and gimple are different in that respect.
SHIFT_COUNT_TRUNCATED's effect on shifts is IMO a bit like
CTZ_DEFINED_VALUE_AT_ZERO's effect on CTZ: it enumerates common
target-specific behaviour, but doesn't turn invalid/should-not-be-evaluated
values into valid values.  Not defining SHIFT_COUNT_TRUNCATED is like
defining CTZ_DEFINED_VALUE_AT_ZERO to 0.

The docs say:

  Note that regardless of this macro the ``definedness'' of @code{clz}
  and @code{ctz} at zero do @emph{not} extend to the builtin functions
  visible to the user.  Thus one may be free to adjust the value at will
  to match the target expansion of these operations without fear of
  breaking the API@.

So for CTZ this really is an RTL thing, which can leak into gimple
through ifns.  I'd argue that the same is true for SHIFT_COUNT_TRUNCATED
and conditional shifts like COND_SHL: normal gimple shifts aren't guaranteed
to honour SHIFT_COUNT_TRUNCATED, but COND_SHL should be.

The arm part relies on being able to generate shifts that take
advantage of port-specific knowledge, see arm_emit_coreregs_64bit_shift.

Haven't had time to work through the combine thing yet, but yeah,
it does look suspiciously like something is going wrong in the cc
handling or death note updates.

Thanks,
Richard

> Anyway, back to the particular testcase rather than the general idea,
> it goes wrong during combine, CCing Segher.  I've added debug_bb_n after
> every successful combine attempt,
> --- gcc/combine.cc.jj 2023-01-02 09:32:43.720977011 +0100
> +++ gcc/combine.cc2023-02-27 19:27:28.151289055 +0100
> @@ -4755,6 +4755,16 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>if (added_notes_insn && DF_INSN_LUID (added_notes_insn) < DF_INSN_LUID 
> (ret))
>  ret = added_notes_insn;
>  
> +{
> +static int cnt = 0;
> +char buf[64];
> +sprintf (buf, "/tmp/combine.%d", cnt++);
> +FILE *f = fopen (buf, "a");
> +fprintf (f, "%d\n", combine_successes);
> +basic_block bb = 

Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803]

2023-02-27 Thread Jakub Jelinek via Gcc-patches
On Mon, Feb 27, 2023 at 03:34:11PM +, Richard Sandiford wrote:
> > The following testcase is miscompiled on aarch64.  The problem is that
> > aarch64 with TARGET_SIMD is !SHIFT_COUNT_TRUNCATED target with
> > targetm.shift_truncation_mask (DImode) == 0 which has HAVE_conditional_move
> > true.  If a doubleword shift (in this case TImode) doesn't have its own
> > expander (as is the case of e.g. x86) and is handled in generic code,
> > there are 3 possible expansions.  One is when the shift count is constant,
> > the code computes in that case superword_op1 as op1 - BITS_PER_WORD,
> > and chooses at compile time which of expand_superword_shift or
> > expand_subword_shift to call, which ensures that whatever is used
> > actually has its shift count (superword_op1 or op1) in [0, BITS_PER_WORD - 
> > 1]
> > range.  If !HAVE_conditional_move or that expansion fails, the function
> > handles non-constant op1 similarly (there are some special cases for
> > shift_mask >= BITS_PER_WORD - 1 but let's talk here just about
> > shift_mask < BITS_PER_WORD - 1), except that the selection is done at
> > runtime, with branches around the stuff.  While superword_op1 can be
> > [-BITS_PER_WORD, BITS_PER_WORD - 1] and op1 [0, 2 * BITS_PER_WORD - 1],
> > the runtime selection ensures that the instructions executed at runtime
> > have their corresponding shift count again in the right range of
> > [0, BITS_PER_WORD - 1].
> > The problematic is the HAVE_conditional_move case, which emits both
> > sequences into the actually executed code, so necessarily one of them
> > has an out of range shift count and then using 2 conditional moves
> > picks up a result.
> > Now, in the testcase because -Og doesn't perform VRP/EVRP the shift
> > count isn't optimized to constant during GIMPLE passes, but is determined
> > to be constant during/after expansion into RTL.  The shift count is
> > actually const0_rtx later, so superword_op1 is (const_int -64) and we end
> > up with miscompilation during combine because of that.
> 
> I haven't worked through the testcase yet, but how does that happen?
> Having shifts with out-of-range shift counts shouldn't be a problem
> in itself, provided that the conditional move selects the one with
> the in-range shift.

It depends on if we (hw or the compiler) treats out-of-range shifts in RTL
as undefined behavior or just some kind of constrained unspecified behavior
(destination register can be anything, but no traps/exceptions/program
termination etc.).  Of course SHIFT_COUNT_TRUNCATED makes it well defined,
but that is not the case here.
I've always thoughts we really treat it as undefined behavior, you shouldn't
reach this spot at runtime; we certainly treat it that way in GIMPLE.
If that is the case, then invoking UB and then having a conditional move
not to select its result is not well defined.
And the patch as posted fixes that problem while not really resulting in
worse code e.g. on aarch64.

Anyway, back to the particular testcase rather than the general idea,
it goes wrong during combine, CCing Segher.  I've added debug_bb_n after
every successful combine attempt,
--- gcc/combine.cc.jj   2023-01-02 09:32:43.720977011 +0100
+++ gcc/combine.cc  2023-02-27 19:27:28.151289055 +0100
@@ -4755,6 +4755,16 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
   if (added_notes_insn && DF_INSN_LUID (added_notes_insn) < DF_INSN_LUID (ret))
 ret = added_notes_insn;
 
+{
+static int cnt = 0;
+char buf[64];
+sprintf (buf, "/tmp/combine.%d", cnt++);
+FILE *f = fopen (buf, "a");
+fprintf (f, "%d\n", combine_successes);
+basic_block bb = BASIC_BLOCK_FOR_FN (cfun, 2);
+dump_bb (f, bb, 0, dump_flags_t ());
+fclose (f);
+}
   return ret;
 }
 
and I see
(insn 52 48 53 2 (set (reg:CC 66 cc)
(compare:CC (reg:SI 130)
(const_int 0 [0]))) "pr108803.c":12:25 437 {cmpsi}
 (expr_list:REG_DEAD (reg:SI 130)
(expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffc0])
(const_int 0 [0]))
(nil
(insn 53 52 57 2 (set (reg:DI 152 [ _6+8 ])
(if_then_else:DI (ge (reg:CC 66 cc)
(const_int 0 [0]))
(reg:DI 132)
(const_int 0 [0]))) "pr108803.c":12:25 490 {*cmovdi_insn}
 (expr_list:REG_DEAD (reg:DI 132)
(nil)))
(insn 57 53 59 2 (set (reg:DI 151 [ _6 ])
(if_then_else:DI (ge (reg:CC 66 cc)
(const_int 0 [0]))
(const_int 0 [0])
(reg:DI 126))) "pr108803.c":12:25 490 {*cmovdi_insn}
 (expr_list:REG_DEAD (reg:CC 66 cc)
(nil)))
...
(insn 71 68 72 2 (set (reg:CC 66 cc)
(compare:CC (reg:SI 137)
(const_int 0 [0]))) "pr108803.c":12:42 437 {cmpsi}
 (expr_list:REG_DEAD (reg:SI 137)
(expr_list:REG_EQUAL (compare:CC (const_int -64 [0xffc0])
(const_int 0 [0]))
(nil
(insn 72 71 76 2 (set (reg:DI 153 [ _8 ])
(if_then_else:DI (ge (reg:CC 66 cc)
(const_int 0 [0]))
  

Re: [PATCH] testsuite: Don't include multiline regexps in the the pass/fail log

2023-02-27 Thread Mike Stump via Gcc-patches
On Feb 27, 2023, at 9:59 AM, Hans-Peter Nilsson  wrote:
> 
>> From: Mike Stump 
>> Date: Mon, 27 Feb 2023 09:41:18 -0800
> 
>>> diff --git a/gcc/testsuite/lib/multiline.exp 
>>> b/gcc/testsuite/lib/multiline.exp
>>> index 84ba9216642e..5eccf2bbebc1 100644
>>> --- a/gcc/testsuite/lib/multiline.exp
>>> +++ b/gcc/testsuite/lib/multiline.exp
>> 
>>> -   ${maybe_x}pass "$title was found: \"$escaped_regex\""
>>> +   ${maybe_x}pass "$title was found"
>>> } else {
>>> -   ${maybe_x}fail "$title not found: \"$escaped_regex\""
>>> +   ${maybe_x}fail "$title not found"
>> 
>> Side remark:
>> 
>> So, the string on pass and the string on fail are supposed
>> to be exactly the same.  Regression analysis works only if
>> the string is the same.  "regexp test", might be
>> suggestive enough and can be the same spelling for both.
> 
> Right.  Should I changed it now?

Sure.  Not required, but would be nice.

> (Pro: see above.  Con: again meddling with regression-test history.)

Only new tests that don't have any polish do this sort of thing.  :-)  

> Like (editing on the fly here) as the "found" part seems
> redundant:
> 
>>> -   ${maybe_x}pass "$title was found"
>>> +   ${maybe_x}pass "$title"
>>> } else {
>>> -   ${maybe_x}fail "$title not found"
>>> +   ${maybe_x}fail "$title"

Yes, same difference.  Both strings should be identical.

Thanks.

Re: [PR100127] Test for coroutine header in clang-compatible tests

2023-02-27 Thread Mike Stump via Gcc-patches
On Feb 22, 2023, at 12:04 PM, Alexandre Oliva  wrote:
> 
> That would change what gets tested with clang, I suppose, but I hope
> that's for the better.  I wondered what to do at the #else above, and
> decided to spell it a little differently.  Retested on x86_64-linux-gnu
> (trunk) and arm-vxworks7r2 (gcc-12), ok to install?

Ok.


Re: [PATCH] testsuite: Don't include multiline regexps in the the pass/fail log

2023-02-27 Thread Hans-Peter Nilsson via Gcc-patches
> From: Mike Stump 
> Date: Mon, 27 Feb 2023 09:41:18 -0800

> > diff --git a/gcc/testsuite/lib/multiline.exp 
> > b/gcc/testsuite/lib/multiline.exp
> > index 84ba9216642e..5eccf2bbebc1 100644
> > --- a/gcc/testsuite/lib/multiline.exp
> > +++ b/gcc/testsuite/lib/multiline.exp
> 
> > -   ${maybe_x}pass "$title was found: \"$escaped_regex\""
> > +   ${maybe_x}pass "$title was found"
> > } else {
> > -   ${maybe_x}fail "$title not found: \"$escaped_regex\""
> > +   ${maybe_x}fail "$title not found"
> 
> Side remark:
> 
> So, the string on pass and the string on fail are supposed
> to be exactly the same.  Regression analysis works only if
> the string is the same.  "regexp test", might be
> suggestive enough and can be the same spelling for both.

Right.  Should I changed it now?

(Pro: see above.  Con: again meddling with regression-test history.)

Like (editing on the fly here) as the "found" part seems
redundant:

> > -   ${maybe_x}pass "$title was found"
> > +   ${maybe_x}pass "$title"
> > } else {
> > -   ${maybe_x}fail "$title not found"
> > +   ${maybe_x}fail "$title"

brgds, H-P


Re: [PATCH] testsuite: Don't include multiline regexps in the the pass/fail log

2023-02-27 Thread Mike Stump via Gcc-patches
On Feb 24, 2023, at 9:54 AM, Hans-Peter Nilsson via Gcc-patches 
 wrote:
> 
> Ok to commit?

Ok.  Thanks.

> diff --git a/gcc/testsuite/lib/multiline.exp b/gcc/testsuite/lib/multiline.exp
> index 84ba9216642e..5eccf2bbebc1 100644
> --- a/gcc/testsuite/lib/multiline.exp
> +++ b/gcc/testsuite/lib/multiline.exp

> - ${maybe_x}pass "$title was found: \"$escaped_regex\""
> + ${maybe_x}pass "$title was found"
>   } else {
> - ${maybe_x}fail "$title not found: \"$escaped_regex\""
> + ${maybe_x}fail "$title not found"

Side remark:

So, the string on pass and the string on fail are supposed to be exactly the 
same.  Regression analysis works only if the string is the same.  "regexp 
test", might be suggestive enough and can be the same spelling for both.



Re: [PATCH, rs6000] Tweak modulo define_insns to eliminate register copy

2023-02-27 Thread Segher Boessenkool
Hi!

On Mon, Feb 27, 2023 at 09:11:37AM -0600, Pat Haugen wrote:
> The define_insns for the modulo operation currently force the target 
> register
> to a distinct reg in preparation for a possible future peephole combining
> div/mod. But this can lead to cases of a needless copy being inserted. Fixed
> with the following patch.

Have you verified those peepholes still match?

Do those peepholes actually improve performance?  On new CPUs?  The code
here says
;; On machines with modulo support, do a combined div/mod the old fashioned
;; method, since the multiply/subtract is faster than doing the mod instruction
;; after a divide.
but that really should not be true: we can do the div and mod in
parallel (except in SMT4 perhaps, which we never schedule for anyway),
so that should always be strictly faster.

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile { target { powerpc*-*-*  } } } */

All files in gcc.target/powerpc/ test for this already.  Just leave off
the target clause here?

> +/* { dg-require-effective-target powerpc_p9modulo_ok } */

Leave out this line, because ...

> +/* { dg-options "-mdejagnu-cpu=power9 -O2" } */

... the -mcpu= forces it to true always.

> +/* Verify r3 is used as source and target, no copy inserted. */

> +/* { dg-final { scan-assembler-not {\mmr\M} } } */

That is probably good enough, yeah, since the test results in only a
handful of insns.


Segher


Re: [Patch] gcc.dg/memchr-3.c: fix for LLP64

2023-02-27 Thread Richard Sandiford via Gcc-patches
Jonathan Yong via Gcc-patches  writes:
> Attached patch OK?
>
>  gcc.dg/memchr-3.c: fix for LLP64
>
>  gcc/testsuite/ChangeLog:
>
>  PR middle-end/97956
>  * gcc.dg/memchr-3.c (memchr): fix long to size_t in
>  prototype.
>
> From 194eb3d43964276beeaea14ebee4b241799cd966 Mon Sep 17 00:00:00 2001
> From: Jonathan Yong <10wa...@gmail.com>
> Date: Mon, 27 Feb 2023 10:02:32 +
> Subject: [PATCH] gcc.dg/memchr-3.c: fix for LLP64
>
>   gcc/testsuite/ChangeLog:
>
>   PR middle-end/97956
>   * gcc.dg/memchr-3.c (memchr): fix long to size_t in
>   prototype.

It looks like the current type signature could have been a deliberate
part of the test.  I think we should just skip it for LLP64 instead.
Preapproved if you agree.

Thanks,
Richard

>
> Signed-off-by: Jonathan Yong <10wa...@gmail.com>
> ---
>  gcc/testsuite/gcc.dg/memchr-3.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/memchr-3.c b/gcc/testsuite/gcc.dg/memchr-3.c
> index c38d9cf3349..c1f4e9e10dc 100644
> --- a/gcc/testsuite/gcc.dg/memchr-3.c
> +++ b/gcc/testsuite/gcc.dg/memchr-3.c
> @@ -5,8 +5,9 @@
>  
>  typedef __INT8_TYPE__  int8_t;
>  typedef __INT32_TYPE__ int32_t;
> +typedef __SIZE_TYPE__  size_t;
>  
> -extern void* memchr (const void*, int, long);
> +extern void* memchr (const void*, int, size_t);
>  
>  struct SX
>  {


Re: [Patch] gcc.dg/overflow-warn-9.c: exclude from LLP64

2023-02-27 Thread Richard Sandiford via Gcc-patches
Jonathan Yong via Gcc-patches  writes:
> This test is for LP64 only, exclude LLP64 too.
> Patch OK?

OK, thanks.

Richard

> From fbc83ae10df1a0e10c302fb0fee13092eb65818e Mon Sep 17 00:00:00 2001
> From: Jonathan Yong <10wa...@gmail.com>
> Date: Mon, 27 Feb 2023 09:49:31 +
> Subject: [PATCH] gcc.dg/overflow-warn-9.c: exclude from LLP64
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.dg/overflow-warn-9.c: Exclude from LLP64.
>
> Signed-off-by: Jonathan Yong <10wa...@gmail.com>
> ---
>  gcc/testsuite/gcc.dg/overflow-warn-9.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/overflow-warn-9.c 
> b/gcc/testsuite/gcc.dg/overflow-warn-9.c
> index 57c0f17bc91..012892dd343 100644
> --- a/gcc/testsuite/gcc.dg/overflow-warn-9.c
> +++ b/gcc/testsuite/gcc.dg/overflow-warn-9.c
> @@ -59,7 +59,7 @@ const struct Types t1 = {
>.ui = UINT_MAX + 1L,  /* { dg-warning "signed conversion from .long 
> int. to .unsigned int. changes value from .4294967296. to .0." "lp64" { 
> target lp64 } } */
>.ui = UINT_MAX + 1LU, /* { dg-warning "conversion from .long unsigned 
> int. to .unsigned int. changes value from .4294967296. to .0." "lp64" { 
> target lp64 } } */
>  
> -  .sl = LONG_MAX + 1LU, /* { dg-warning "signed conversion from .long 
> unsigned int. to .long int. changes value from .9223372036854775808. to 
> .-9223372036854775808." "not-ilp32" { target { ! ilp32 } } } */
> -  /* { dg-warning "signed conversion from .long unsigned int. to .long int. 
> changes value from .2147483648. to .-2147483648." "ilp32" { target ilp32 } 
> .-1 } */
> +  .sl = LONG_MAX + 1LU, /* { dg-warning "signed conversion from .long 
> unsigned int. to .long int. changes value from .9223372036854775808. to 
> .-9223372036854775808." "lp64" { target lp64 } } */
> +  /* { dg-warning "signed conversion from .long unsigned int. to .long int. 
> changes value from .2147483648. to .-2147483648." "not-lp64" { target { ! 
> lp64 } } .-1 } */
>.ul = ULONG_MAX + 1LU /* there should be some warning here */
>  };


Re: [PATCH] Fix RTL simplifications of FFS, POPCOUNT and PARITY.

2023-02-27 Thread Segher Boessenkool
Hi!

On Sun, Feb 26, 2023 at 01:10:41PM -, Roger Sayle wrote:
> This patch  teaches simplify-rtx.cc to err on the side of caution, by never
> creating (new) FFS, POPCOUNT or PARITY rtx with mismatched modes,
> matching the documentation.

> * simplify-rtx.cc (simplify_unary_operation_1) :
> Avoid generating FFS with mismatched operand and result modes,

Please have at least one word after a colon, so that it doesn't look
like something is missing.  Changelog lines are 80 positions long :-)

The patch is okay for trunk.  Thank you!


Segher


Re: [PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-02-27 Thread Segher Boessenkool
Hi!

On Wed, Jan 04, 2023 at 01:58:19PM +0530, Surya Kumari Jangala wrote:
> In the routine rs6000_analyze_swaps(), special handling of swappable
> instructions is done even if the webs that contain the swappable
> instructions are not optimized, i.e., the webs do not contain any
> permuting load/store instructions along with the associated register
> swap instructions. Doing special handling in such webs will result in
> the extracted lane being adjusted unnecessarily for vec_extract.
> 
> Modifying swappable instructions is also incorrect in webs where
> loads/stores on quad word aligned addresses are changed to lvx/stvx.
> Similarly, in webs where swap(load(vector constant)) instructions are
> replaced with load(swapped vector constant), the swappable
> instructions should not be modified.
> 
> 2023-01-04  Surya Kumari Jangala  
> 
> gcc/
>   PR rtl-optimization/106770
>   * rs6000-p8swap.cc (rs6000_analyze_swaps): .

Please add an entry?  Or multiple ones, actually.  Describe all changes.

> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>unsigned int special_handling : 4;
>/* Set if the web represented by this entry cannot be optimized.  */
>unsigned int web_not_optimizable : 1;
> +  /* Set if the web represented by this entry has been optimized, ie,

s/ie/i.e./

> + register swaps of permuting loads/stores have been removed.  */

If it really means only exactly this, then the name isn't so good.

> +  unsigned int web_is_optimized : 1;

And if it is as general as the name suggests, then the comment is no
good.  Which is it?  :-)

>/* For each load and store in an optimizable web (which implies
>   the loads and stores are permuting), find the associated
>   register swaps and mark them for removal.  Due to various
> - optimizations we may mark the same swap more than once.  Also
> - perform special handling for swappable insns that require it.  */
> + optimizations we may mark the same swap more than once. Fix up
> + the non-permuting loads and stores by converting them into
> + permuting ones.  */

Two spaces after a full stop is correct.  Please put that back.

Is it a good idea convert from/to swapping load/stores in this pass at
all?  Doesdn't that belong elsewhere?  Like, in combine, where we
already should do this.  Why does that not work?

> - if (!root_entry->web_not_optimizable)
> + if (!root_entry->web_not_optimizable) {

Blocks start on a new line, indented.

> mark_swaps_for_removal (insn_entry, i);
> +  root_entry->web_is_optimized = true;

Indent using tabs where possible.

> +swap_web_entry* root_entry
> +  = (swap_web_entry*)((_entry[i])->unionfind_root ());

Space before *, in all cases. Space before the second (.  There are too
many brackets here, too.

> +  /* Perform special handling for swappable insns that require it. 

No trailing spaces.

> + Note that special handling should be done only for those 
> + swappable insns that are present in webs optimized above.  */
> +  for (i = 0; i < e; ++i)
> +if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
> +!(insn_entry[i].special_handling == SH_NOSWAP_LD || 
> +  insn_entry[i].special_handling == SH_NOSWAP_ST))
>{
>   swap_web_entry* root_entry
> = (swap_web_entry*)((_entry[i])->unionfind_root ());
> - if (!root_entry->web_not_optimizable)
> + if (root_entry->web_is_optimized)
> handle_special_swappables (insn_entry, i);
>}

Why this change?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */

Is -O3 required?  Use -O2 if you can.  And no trailing spaces please.

> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */

Those two remaining are superfluous, so comment that please.


Segher


Re: [PATCH] Fixup possible VR_ANTI_RANGE value_range issues

2023-02-27 Thread Aldy Hernandez via Gcc-patches




On 2/27/23 14:58, Richard Biener wrote:

After fixing PR107561 the following avoids looking at VR_ANTI_RANGE
ranges where it doesn't seem obvious the code does the correct
thing here (lower_bound and upper_bound do not work as expected).


I do realize there's some confusion here, and some of it is my fault. 
This has become obvious in my upcoming work removing all of legacy.


What's going on is that ultimately min/max are broken with (non-legacy) 
iranges.  Or at the very least inconsistent between legacy and 
non-legacy.  These are left over from the legacy world, and have been 
marked DEPRECATED for a few releases, but the middle end warnings 
continued to use them and even added new uses after they were obsoleted.


min/max have different meanings depending on kind(), which is also 
deprecated, btw.  They are the underlying min/max fields from the legacy 
implementation, and thus somewhat leak the implementation details. 
Unfortunately, they are being called from non-legacy code which is 
ignoring the kind() field.


In retrospect I should've converted everything away from min/max/kind 
years ago, or at the very least converted min/max to work with 
non-legacy more consistently.


For the record:

  enum value_range_kind kind () const;  // DEPRECATED
  tree min () const;// DEPRECATED
  tree max () const;// DEPRECATED
  bool symbolic_p () const; // DEPRECATED
  bool constant_p () const; // DEPRECATED
  void normalize_symbolics ();  // DEPRECATED
  void normalize_addresses ();  // DEPRECATED
  bool may_contain_p (tree) const;  // DEPRECATED
  bool legacy_verbose_union_ (const class irange *);// DEPRECATED
  bool legacy_verbose_intersect (const irange *);   // DEPRECATED

In my local branch I tried converting all the middle-end legacy API uses 
to the new API, but a bunch of tests started failing, and lots more 
false positives started showing up in correct code.  I suspect that's 
part of the reason legacy continued to be used in these passes :-/.  As 
you point out in the PR, the tests seem designed to test the current (at 
times broken) implementation.


That being said, the 5 fixes in your patch are not wrong to begin with, 
because all uses guard lower_bound() and upper_bound() which work 
correctly.  They return the lowest and uppermost possible bound for the 
range (ignoring the underlying implementation).  So, the lower bound of 
a signed non-zero is -INF because ~[0,0] is really [-INF,-1][1,+INF]. 
In the min/max legacy code, min() of signed non-zero (~[0,0]) is 0.  The 
new implementation has no concept of anti-ranges, and we don't leak any 
of that out.


Any uses of min/max without looking at kind() are definitely broken. 
OTOH uses of lower/upper_bound are fine and should work with both legacy 
and non-legacy.


Unrelated, but one place where I haven't been able to convince myself 
that the use is correct is bounds_of_var_in_loop:



/* Check if init + nit * step overflows.  Though we checked
 scev {init, step}_loop doesn't wrap, it is not enough
 because the loop may exit immediately.  Overflow could
 happen in the plus expression in this case.  */
  if ((dir == EV_DIR_DECREASES
   && compare_values (maxvr.min (), initvr.min ()) != -1)
  || (dir == EV_DIR_GROWS
  && compare_values (maxvr.max (), initvr.max ()) != 1))


Ughh...this is all slated to go away, and I have patches removing all of 
legacy and the old API.


Does this help?  Do you still think lower and upper bound are not 
working as expected?


Aldy



Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

* gimple-ssa-sprintf.cc (get_int_range): Avoid VR_ANTI_RANGE
by using range_int_cst_p.
(format_integer): Likewise.
(handle_printf_call): Guard against VR_ANTI_RANGE.
* graphite-sese-to-poly.cc (add_param_constraints): Likewise.
* tree-ssa-strlen.cc (set_strlen_range): Likewise.
---
  gcc/gimple-ssa-sprintf.cc| 6 +++---
  gcc/graphite-sese-to-poly.cc | 2 +-
  gcc/tree-ssa-strlen.cc   | 2 +-
  3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/gimple-ssa-sprintf.cc b/gcc/gimple-ssa-sprintf.cc
index 18975708d2c..61974072f62 100644
--- a/gcc/gimple-ssa-sprintf.cc
+++ b/gcc/gimple-ssa-sprintf.cc
@@ -1082,7 +1082,7 @@ get_int_range (tree arg, gimple *stmt,
  value_range vr;
  query->range_of_expr (vr, arg, stmt);
  
-	  if (!vr.undefined_p () && !vr.varying_p ())

+ if (range_int_cst_p ())
{
  HOST_WIDE_INT type_min
= (TYPE_UNSIGNED (argtype)
@@ -1391,7 +1391,7 @@ format_integer (const directive , tree arg, pointer_query 
_qry)
value_range vr;

Re: [PATCH] simplify-rtx: Fix VOIDmode operand handling in simplify_subreg [PR108805]

2023-02-27 Thread Richard Sandiford via Gcc-patches
Uros Bizjak  writes:
> On Fri, Feb 17, 2023 at 8:38 AM Richard Biener  wrote:
>>
>> On Thu, 16 Feb 2023, Uros Bizjak wrote:
>>
>> > simplify_subreg can return VOIDmode const_int operand and will
>> > cause ICE in simplify_gen_subreg when this operand is passed to it.
>> >
>> > The patch prevents VOIDmode temporary from entering simplify_gen_subreg.
>> > We can't process const_int operand any further, since outermode
>> > is not an integer mode here.
>>
>> But if it's a CONST_INT then we know it's of int_outermode, no? That is,
>> doesn't simplify_subreg (mode, ...) always return something in 'mode'
>> and thus we can always pass just 'mode' as third argument to the
>> following simplify_gen_subreg call?
>
> You are right. I am testing the attached patch that works too.

Thanks for this, it's the correct fix.  But as noted in
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610920.html,
the final 0 is also wrong for big-endian.  Andre?

Richard

>
> Uros.
>
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index 0a1dd88b0a8..3955929bb70 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -7665,7 +7665,7 @@ simplify_context::simplify_subreg (machine_mode 
> outermode, rtx op,
>  {
>rtx tem = simplify_subreg (int_outermode, op, innermode, byte);
>if (tem)
> - return simplify_gen_subreg (outermode, tem, GET_MODE (tem), 0);
> + return simplify_gen_subreg (outermode, tem, int_outermode, 0);
>  }
>  
>/* If OP is a vector comparison and the subreg is not changing the


Re: [ada] fix unknown type name 'cpu_set_t'

2023-02-27 Thread Andreas Schwab via Gcc-patches
On Feb 27 2023, 宋冬生 via Gcc-patches wrote:

> diff --git a/gcc/ada/adaint.h b/gcc/ada/adaint.h
> index 987432c93..fa8ddaf13 100644
> --- a/gcc/ada/adaint.h
> +++ b/gcc/ada/adaint.h
> @@ -319,6 +319,9 @@ extern void   *__gnat_lwp_self   
> (void);
>  
>  /* Routines for interface to required CPU set primitives */
>  
> +#ifndef _GNU_SOURCE
> +#define _GNU_SOURCE
> +#endif
>  #include 
>  
>  extern cpu_set_t *__gnat_cpu_alloc (size_t);

Feature test macros must always be defined before any system header is
included.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [gcc r13-6315] MIPS: Add pattern for clo

2023-02-27 Thread Maciej W. Rozycki
On Fri, 24 Feb 2023, YunQiang Su via Gcc-cvs wrote:

> https://gcc.gnu.org/g:19aa3900bca808b49417a7aef295b5f1a583c298
> 
> commit r13-6315-g19aa3900bca808b49417a7aef295b5f1a583c298
> Author: Junxian Zhu 
> Date:   Fri Feb 17 16:35:56 2023 +0800
> 
> MIPS: Add pattern for clo

 We are in Stage 4, regression and documentation fixes only, no new 
features.  This should have waited for general development to reopen with 
Stage 1.

 Also formatting issues...

> diff --git a/gcc/testsuite/gcc.target/mips/clo.c 
> b/gcc/testsuite/gcc.target/mips/clo.c
> new file mode 100644
> index 000..91f29a1322a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/mips/clo.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "(HAS_CLZ)" } */
> +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
> +
> +NOMIPS16 unsigned int foo(unsigned int x)
> +{
> +  return  __builtin_clz (~x);

... here, ...

> diff --git a/gcc/testsuite/gcc.target/mips/clz.c 
> b/gcc/testsuite/gcc.target/mips/clz.c
> new file mode 100644
> index 000..74e6edb90aa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/mips/clz.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "(HAS_CLZ)" } */
> +/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
> +
> +NOMIPS16 unsigned int foo(unsigned int x)
> +{
> +  return  __builtin_clz (x);

... here, ...

> diff --git a/gcc/testsuite/gcc.target/mips/mips.exp 
> b/gcc/testsuite/gcc.target/mips/mips.exp
> index 025fbe78359..ac3ab129541 100644
> --- a/gcc/testsuite/gcc.target/mips/mips.exp
> +++ b/gcc/testsuite/gcc.target/mips/mips.exp
> @@ -252,6 +252,7 @@ set mips_option_groups {
>  warnings "-w"
>  dump "-fdump-.*"
>  ins "HAS_INS"
> + clz "HAS_CLZ"
>  dmul "NOT_HAS_DMUL"
>  ldc "HAS_LDC"
>  movn "HAS_MOVN"
> @@ -1198,11 +1199,13 @@ proc mips-dg-options { args } {
>   #
>  #   - paired-single instructions(*)
>  #   - odd numbered single precision registers
> + #   - clz clo instructions
>  #

... and here.

  Maciej


Re: [PATCH] optabs: Fix up expand_doubleword_shift_condmove for shift_mask == 0 [PR108803]

2023-02-27 Thread Richard Sandiford via Gcc-patches
Jakub Jelinek  writes:
> Hi!
>
> The following testcase is miscompiled on aarch64.  The problem is that
> aarch64 with TARGET_SIMD is !SHIFT_COUNT_TRUNCATED target with
> targetm.shift_truncation_mask (DImode) == 0 which has HAVE_conditional_move
> true.  If a doubleword shift (in this case TImode) doesn't have its own
> expander (as is the case of e.g. x86) and is handled in generic code,
> there are 3 possible expansions.  One is when the shift count is constant,
> the code computes in that case superword_op1 as op1 - BITS_PER_WORD,
> and chooses at compile time which of expand_superword_shift or
> expand_subword_shift to call, which ensures that whatever is used
> actually has its shift count (superword_op1 or op1) in [0, BITS_PER_WORD - 1]
> range.  If !HAVE_conditional_move or that expansion fails, the function
> handles non-constant op1 similarly (there are some special cases for
> shift_mask >= BITS_PER_WORD - 1 but let's talk here just about
> shift_mask < BITS_PER_WORD - 1), except that the selection is done at
> runtime, with branches around the stuff.  While superword_op1 can be
> [-BITS_PER_WORD, BITS_PER_WORD - 1] and op1 [0, 2 * BITS_PER_WORD - 1],
> the runtime selection ensures that the instructions executed at runtime
> have their corresponding shift count again in the right range of
> [0, BITS_PER_WORD - 1].
> The problematic is the HAVE_conditional_move case, which emits both
> sequences into the actually executed code, so necessarily one of them
> has an out of range shift count and then using 2 conditional moves
> picks up a result.
> Now, in the testcase because -Og doesn't perform VRP/EVRP the shift
> count isn't optimized to constant during GIMPLE passes, but is determined
> to be constant during/after expansion into RTL.  The shift count is
> actually const0_rtx later, so superword_op1 is (const_int -64) and we end
> up with miscompilation during combine because of that.

I haven't worked through the testcase yet, but how does that happen?
Having shifts with out-of-range shift counts shouldn't be a problem
in itself, provided that the conditional move selects the one with
the in-range shift.

Thanks,
Richard

> I'm afraid on targetm.shift_truncation_mask (DImode) == 0 targets we have
> to mask the shift counts when the doubleshift count is in range but
> one of subword_op1/superword_op1 is not, which is what the following
> patch does and what fixes the wrong-code.  Now, targets like x86 or aarch64,
> while they are !SHIFT_COUNT_TRUNCATED, have actually patterns to catch
> shift with masked counter, so the ANDs can be optimized out.  On the other
> side, when we know the result will be masked this way we can use the
> simpler ~op1 instead of (BITS_PER_WORD - 1) - op1 in expand_subword_shift.
> So, say on
> __attribute__((noipa)) __int128
> foo (__int128 a, unsigned k)
> {
>   return a << k;
> }
> on aarch64 at -Og the difference is:
>  foo:
> subsw5, w2, #64
> -   lsl x6, x0, x5
> +   lsl x4, x0, x2
> lsr x3, x0, 1
> -   mov w4, 63
> -   sub w4, w4, w2
> -   lsr x3, x3, x4
> +   mvn w6, w2
> +   and w2, w2, 63
> +   lsr x3, x3, x6
> lsl x1, x1, x2
> orr x1, x3, x1
> lsl x0, x0, x2
> cselx0, xzr, x0, pl
> -   cselx1, x6, x1, pl
> +   cselx1, x4, x1, pl
> ret
> We could do even better and optimize the and w2, w2, 63 instruction out,
> but it is a matter of costs and so IMHO should be handled incrementally.
> For that case consider say:
> void corge (int, int, int);
>
> void
> qux (int x, int y, int z, int n)
> {
>   n &= 31;
>   corge (x << n, y << n, z >> n);
> }
> with -O2 -fno-tree-vectorize, on x86_64 one gets
> sarl%cl, %edx
> sall%cl, %esi
> sall%cl, %edi
> jmp corge
> but on aarch64
> and w3, w3, 31
> lsl w0, w0, w3
> lsl w1, w1, w3
> asr w2, w2, w3
> b   corge
> The reason it works on x86 is that its rtx_costs hook recognizes
> that the AND in shift+mask patterns is for free.
> Trying 9 -> 11:
> 9: {r85:SI=r96:SI&0x1f;clobber flags:CC;}
>   REG_UNUSED flags:CC
>11: {r91:SI=r87:SI<   REG_DEAD r87:SI
>   REG_UNUSED flags:CC
> Failed to match this instruction:
> ...
> Failed to match this instruction:
> ...
> Successfully matched this instruction:
> (set (reg/v:SI 85 [ n ])
> (and:SI (reg:SI 96)
> (const_int 31 [0x1f])))
> Successfully matched this instruction:
> (set (reg:SI 91)
> (ashift:SI (reg/v:SI 87 [ y ])
> (subreg:QI (and:SI (reg:SI 96)
> (const_int 31 [0x1f])) 0)))
> allowing combination of insns 9 and 11
> original costs 4 + 4 = 8
> replacement costs 4 + 4 = 8
> Compare that to the aarch64 case:
> Trying 9 -> 11:
> 9: r95:SI=r106:SI&0x1f
>   REG_DEAD r106:SI
>11: r101:SI=r104:SI<   REG_DEAD r104:SI
> Failed to match this 

[ada] fix unknown type name 'cpu_set_t'

2023-02-27 Thread 宋冬生 via Gcc-patches
Hi,

When building ada with musl, I encountered the following error:


make[7]: Entering directory '/opt/gcc-build/gcc/build/gcc/ada/rts'
/opt/gcc-build/gcc/build/./gcc/xgcc -B/opt/gcc-build/gcc/build/./gcc/ 
-B/opt/gcc-13/aarch64-linux-musl/usr/aarch64-linux-musl/bin/ 
-B/opt/gcc-13/aarch64-linux-musl/usr/aarch64-linux-musl/lib/ -isystem 
/opt/gcc-13/aarch64-linux-musl/usr/aarch64-linux-musl/include -isystem 
/opt/gcc-13/aarch64-linux-musl/usr/aarch64-linux-musl/sys-include 
--sysroot=/opt/gcc-13/aarch64-linux-musl/usr/aarch64-linux-musl/sys-root   -c 
-DCROSS_DIRECTORY_STRUCTURE -DIN_GCC  -W -Wall -g -O2 -g -O2 -fexceptions 
-DIN_RTS -DHAVE_GETIPINFO   -fPIC -fno-lto   \
  -iquote . -iquote .. -iquote ../.. -iquote /opt/gcc-build/gcc/gcc/ada -iquote 
/opt/gcc-build/gcc/gcc -I/opt/gcc-build/gcc/include  -I./../.. adadecode.c -o 
adadecode.o
In file included from adadecode.c:37:
adaint.h:324:8: error: unknown type name 'cpu_set_t'
  324 | extern cpu_set_t *__gnat_cpu_alloc (size_t);
  |^


It can be seen from the man pages[1] that this error is caused by not defining 
`_GNU_SOURCE`, so I recommend the following fix:


diff --git a/gcc/ada/adaint.h b/gcc/ada/adaint.h
index 987432c93..fa8ddaf13 100644
--- a/gcc/ada/adaint.h
+++ b/gcc/ada/adaint.h
@@ -319,6 +319,9 @@ extern void   *__gnat_lwp_self 
(void);
 
 /* Routines for interface to required CPU set primitives */
 
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
 #include 
 
 extern cpu_set_t *__gnat_cpu_alloc (size_t);


[1] https://man7.org/linux/man-pages/man3/CPU_SET.3.html

Please help commit if appropriate.

-- 
Best regards,
Dongsheng Song


Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-02-27 Thread 盼 李 via Gcc-patches
Never mind, wish you have a good holiday.

Thanks for pointing this out, the if part cannot take care of poly_int with N > 
2. As I understand, we need to make it general for all the N of poly_int.

Thus I would like to double confirm with you about how to make it general. I 
suppose there will be a new function can_div_away_from_zero_p to replace the if 
(known_lt(,)) part in genmodes.cc, and leave exact_div unchanged(consider the 
word exact, I suppose we should not touch here), right? Then we still need one 
poly_int with all 1 for N as the return if can_div_away_from_zero_p is true.

Thanks again for your professional suggestion, have a nice day, ??!

Pan

From: Richard Sandiford 
Sent: Monday, February 27, 2023 22:24
To: incarnation.p.lee--- via Gcc-patches 
Cc: incarnation.p@outlook.com ; 
juzhe.zh...@rivai.ai ; kito.ch...@sifive.com 
; rguent...@suse.de ; 
pan2...@intel.com 
Subject: Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment

Sorry for the slow reply, been away for a couple of weeks.

"incarnation.p.lee--- via Gcc-patches"  writes:
> From: Pan Li 
>
>Fix the bug of the rvv bool mode precision with the adjustment.
>The bits size of vbool*_t will be adjusted to
>[1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>adjusted mode precison of vbool*_t will help underlying pass to
>make the right decision for both the correctness and optimization.
>
>Given below sample code:
>void test_1(int8_t * restrict in, int8_t * restrict out)
>{
>  vbool8_t v2 = *(vbool8_t*)in;
>  vbool16_t v5 = *(vbool16_t*)in;
>  *(vbool16_t*)(out + 200) = v5;
>  *(vbool8_t*)(out + 100) = v2;
>}
>
>Before the precision adjustment:
>addia4,a1,100
>vsetvli a5,zero,e8,m1,ta,ma
>addia1,a1,200
>vlm.v   v24,0(a0)
>vsm.v   v24,0(a4)
>// Need one vsetvli and vlm.v for correctness here.
>vsm.v   v24,0(a1)
>
>After the precision adjustment:
>csrrt0,vlenb
>sllit1,t0,1
>csrra3,vlenb
>sub sp,sp,t1
>sllia4,a3,1
>add a4,a4,sp
>sub a3,a4,a3
>vsetvli a5,zero,e8,m1,ta,ma
>addia2,a1,200
>vlm.v   v24,0(a0)
>vsm.v   v24,0(a3)
>addia1,a1,100
>vsetvli a4,zero,e8,mf2,ta,ma
>csrrt0,vlenb
>vlm.v   v25,0(a3)
>vsm.v   v25,0(a2)
>sllit1,t0,1
>vsetvli a5,zero,e8,m1,ta,ma
>vsm.v   v24,0(a1)
>add sp,sp,t1
>jr  ra
>
>However, there may be some optimization opportunates after
>the mode precision adjustment. It can be token care of in
>the RISC-V backend in the underlying separted PR(s).
>
>PR 108185
>PR 108654
>
> gcc/ChangeLog:
>
>* config/riscv/riscv-modes.def (ADJUST_PRECISION):
>* config/riscv/riscv.cc (riscv_v_adjust_precision):
>* config/riscv/riscv.h (riscv_v_adjust_precision):
>* genmodes.cc (ADJUST_PRECISION):
>(emit_mode_adjustments):
>
> gcc/testsuite/ChangeLog:
>
>* gcc.target/riscv/pr108185-1.c: New test.
>* gcc.target/riscv/pr108185-2.c: New test.
>* gcc.target/riscv/pr108185-3.c: New test.
>* gcc.target/riscv/pr108185-4.c: New test.
>* gcc.target/riscv/pr108185-5.c: New test.
>* gcc.target/riscv/pr108185-6.c: New test.
>* gcc.target/riscv/pr108185-7.c: New test.
>* gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv-modes.def|  8 +++
>  gcc/config/riscv/riscv.cc   | 12 
>  gcc/config/riscv/riscv.h|  1 +
>  gcc/genmodes.cc | 25 ++-
>  gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +
>  12 files changed, 598 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>  create mode 

Re: [PATCH] don't declare header-defined functions both static and inline, pt 2

2023-02-27 Thread Patrick Palka via Gcc-patches
On Thu, 16 Feb 2023, Patrick Palka wrote:

> This fixes some header-defined functions that are undesirably declared
> static and weren't caught by the "^static inline" pattern used in the
> previous patch.
> 
> gcc/ChangeLog:
> 
>   * hash-table.h (gt_pch_nx): Remove static.
>   * lra-int.h (lra_change_class): Likewise.
>   * recog.h (which_op_alt): Likewise.
>   * sel-sched-ir.h (sel_bb_empty_or_nop_p): Replace static with
>   inline.

I went ahead and pushed this since I reckon it's a fairly safe/obvious
follow-up to the main patch
(https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612130.html).

> ---
>  gcc/hash-table.h   | 2 +-
>  gcc/lra-int.h  | 2 +-
>  gcc/recog.h| 2 +-
>  gcc/sel-sched-ir.h | 2 +-
>  4 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/hash-table.h b/gcc/hash-table.h
> index 3f87ec06f37..c0c6e1cd83d 100644
> --- a/gcc/hash-table.h
> +++ b/gcc/hash-table.h
> @@ -1275,7 +1275,7 @@ hashtab_entry_note_pointers (void *obj, void *h, 
> gt_pointer_operator op,
>  }
>  
>  template
> -static void
> +void
>  gt_pch_nx (hash_table *h)
>  {
>h->check_complete_insertion ();
> diff --git a/gcc/lra-int.h b/gcc/lra-int.h
> index 73f8eb004b0..a400a0f85e2 100644
> --- a/gcc/lra-int.h
> +++ b/gcc/lra-int.h
> @@ -428,7 +428,7 @@ lra_get_regno_hard_regno (int regno)
>  
>  /* Change class of pseudo REGNO to NEW_CLASS.  Print info about it
> using TITLE.  Output a new line if NL_P.  */
> -static void inline
> +inline void
>  lra_change_class (int regno, enum reg_class new_class,
> const char *title, bool nl_p)
>  {
> diff --git a/gcc/recog.h b/gcc/recog.h
> index 764fa90afde..539a27c3edf 100644
> --- a/gcc/recog.h
> +++ b/gcc/recog.h
> @@ -382,7 +382,7 @@ extern const operand_alternative *recog_op_alt;
> on operand OP of the current instruction alternative (which_alternative).
> Only valid after calling preprocess_constraints and constrain_operands.  
> */
>  
> -inline static const operand_alternative *
> +inline const operand_alternative *
>  which_op_alt ()
>  {
>gcc_checking_assert (IN_RANGE (which_alternative, 0,
> diff --git a/gcc/sel-sched-ir.h b/gcc/sel-sched-ir.h
> index 7034a1ab06c..0e87134c6db 100644
> --- a/gcc/sel-sched-ir.h
> +++ b/gcc/sel-sched-ir.h
> @@ -1096,7 +1096,7 @@ get_loop_exit_edges_unique_dests (const class loop 
> *loop)
>return edges;
>  }
>  
> -static bool
> +inline bool
>  sel_bb_empty_or_nop_p (basic_block bb)
>  {
>insn_t first = sel_bb_head (bb), last;
> -- 
> 2.39.2.422.gc867e4fa18
> 
> 



[PATCH, rs6000] Tweak modulo define_insns to eliminate register copy

2023-02-27 Thread Pat Haugen via Gcc-patches

Don't force target of modulo into a distinct register.

The define_insns for the modulo operation currently force the target 
register

to a distinct reg in preparation for a possible future peephole combining
div/mod. But this can lead to cases of a needless copy being inserted. Fixed
with the following patch.

Bootstrapped and regression tested on powerpc64le.
Ok for master?

-Pat


2023-02-27  Pat Haugen  

gcc/
* config/rs6000/rs6000.md (*mod3, umod3): Add
non-earlyclobber alternative.

gcc/testsuite/
* gcc.target/powerpc/mod-no_copy.c: New.


diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 81bffb04ceb..44f7dd509cb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3437,9 +3437,9 @@ (define_expand "mod3"
 ;; In order to enable using a peephole2 for combining div/mod to 
eliminate the

 ;; mod, prefer putting the result of mod into a different register
 (define_insn "*mod3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=")
-(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
-(match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
+(mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
+(match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
   "TARGET_MODULO"
   "mods %0,%1,%2"
   [(set_attr "type" "div")
@@ -3447,9 +3447,9 @@ (define_insn "*mod3"


 (define_insn "umod3"
-  [(set (match_operand:GPR 0 "gpc_reg_operand" "=")
-(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r")
- (match_operand:GPR 2 "gpc_reg_operand" "r")))]
+  [(set (match_operand:GPR 0 "gpc_reg_operand" "=,r")
+(umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
+ (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
   "TARGET_MODULO"
   "modu %0,%1,%2"
   [(set_attr "type" "div")
diff --git a/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c 
b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c

new file mode 100644
index 000..91e3003b3fc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/mod-no_copy.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target { powerpc*-*-*  } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+
+/* Verify r3 is used as source and target, no copy inserted. */
+
+long foo (long a, long b)
+{
+  return (a % b);
+}
+
+unsigned long foo2 (unsigned long a, unsigned long b)
+{
+  return (a % b);
+}
+
+/* { dg-final { scan-assembler-not {\mmr\M} } } */


Re: [PATCH] vect: Check that vector factor is a compile-time constant

2023-02-27 Thread Richard Sandiford via Gcc-patches
FWIW, this patch looks good to me.  I'd argue it's a regression fix
of kinds, in that the current code was correct before variable VF and
became incorrect after variable VF.  It might be possible to trigger
the problem on SVE too, with a sufficiently convoluted test case.
(Haven't tried though.)

Richard Biener  writes:
> On Wed, Feb 22, 2023 at 12:03 AM Michael Collison  
> wrote:
>>
>> While working on autovectorizing for the RISCV port I encountered an
>> issue where vect_do_peeling assumes that the vectorization factor is a
>> compile-time constant. The vectorization is not a compile-time constant
>> on RISCV.
>>
>> Tested on RISCV and x86_64-linux-gnu. Okay?
>
> I wonder how you arrive at prologue peeling with a non-constant VF?

Not sure about the RVV case, but I think it makes sense in principle.
E.g. if some ISA takes the LOAD_LEN rather than fully-predicated
approach, it can't easily use the first iteration of the vector loop
to do peeling for alignment.  (At least, the IV steps would then
no longer match VF for all iterations.)  I guess it could use a
*different* vector loop, but we don't support that yet.

There are also some corner cases for which we still don't support
predicated loops and instead fall back on an unpredicated VLA loop
followed by a scalar epilogue.  Peeling for alignment would then
require a scalar prologue too.

> In any case it would probably be better to use constant_lower_bound (vf)
> here?  Also it looks wrong to apply this limit in case we are using
> a fully masked main vector loop.  But as said, the specific case of
> non-constant VF and prologue peeling probably wasn't supposed to happen,
> instead the prologue usually is applied via an offset to a fully masked loop?

Hmm, yeah, agree constant_lower_bound should work too.

Thanks,
Richard

> Richard?
>
> Thanks,
> Richard.
>
>> Michael
>>
>> gcc/
>>
>>  * tree-vect-loop-manip.cc (vect_do_peeling): Verify
>>  that vectorization factor is a compile-time constant.
>>
>> ---
>>   gcc/tree-vect-loop-manip.cc | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
>> index 6aa3d2ed0bf..1ad1961c788 100644
>> --- a/gcc/tree-vect-loop-manip.cc
>> +++ b/gcc/tree-vect-loop-manip.cc
>> @@ -2930,7 +2930,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
>> niters, tree nitersm1,
>> niters = vect_build_loop_niters (loop_vinfo, _var_p);
>> /* It's guaranteed that vector loop bound before vectorization is at
>>least VF, so set range information for newly generated var. */
>> -  if (new_var_p)
>> +  if (new_var_p && vf.is_constant ())
>>   {
>> value_range vr (type,
>> wi::to_wide (build_int_cst (type, vf)),
>> --
>> 2.34.1
>>


[committed] libstdc++: Add Doxygen comment for string::resize_and_overwite

2023-02-27 Thread Jonathan Wakely via Gcc-patches
Here's what I committed, including the fix for the typo Daniel spotted.

Pushed to trunk.

-- >8 --

This is a complicated API that should be clearly documented.

Also improve the comment on basic_ios::_M_setstate.

libstdc++-v3/ChangeLog:

* include/bits/basic_ios.h (basic_ios::_M_setstate): Add
caveat to comment.
* include/bits/basic_string.h (resize_and_overwrite): Add
doxygen comment.
---
 libstdc++-v3/include/bits/basic_ios.h|  4 ++--
 libstdc++-v3/include/bits/basic_string.h | 29 
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/include/bits/basic_ios.h 
b/libstdc++-v3/include/bits/basic_ios.h
index e0667b7d049..de5719c1d68 100644
--- a/libstdc++-v3/include/bits/basic_ios.h
+++ b/libstdc++-v3/include/bits/basic_ios.h
@@ -157,9 +157,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   setstate(iostate __state)
   { this->clear(this->rdstate() | __state); }
 
-  // Flip the internal state on for the proper state bits, then
+  // Flips the internal state on for the proper state bits, then
   // rethrows the propagated exception if bit also set in
-  // exceptions().
+  // exceptions(). Must only be called within a catch handler.
   void
   _M_setstate(iostate __state)
   {
diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index c81dc0d425a..1b8ebca7dad 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -1117,6 +1117,35 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
 
 #if __cplusplus > 202002L
 #define __cpp_lib_string_resize_and_overwrite 202110L
+  /** Resize the string and call a function to fill it.
+   *
+   * @param __n   The maximum size requested.
+   * @param __op  A callable object that writes characters to the string.
+   *
+   * This is a low-level function that is easy to misuse, be careful.
+   *
+   * Calling `str.resize_and_overwrite(n, op)` will reserve at least `n`
+   * characters in `str`, evaluate `n2 = std::move(op)(str.data(), n)`,
+   * and finally set the string length to `n2` (adding a null terminator
+   * at the end). The function object `op` is allowed to write to the
+   * extra capacity added by the initial reserve operation, which is not
+   * allowed if you just call `str.reserve(n)` yourself.
+   *
+   * This can be used to efficiently fill a `string` buffer without the
+   * overhead of zero-initializing characters that will be overwritten
+   * anyway.
+   *
+   * The callable `op` must not access the string directly (only through
+   * the pointer passed as its first argument), must not write more than
+   * `n` characters to the string, must return a value no greater than `n`,
+   * and must ensure that all characters up to the returned length are
+   * valid after it returns (i.e. there must be no uninitialized values
+   * left in the string after the call, because accessing them would
+   * have undefined behaviour). If `op` exits by throwing an exception
+   * the behaviour is undefined.
+   *
+   * @since C++23
+   */
   template
constexpr void
resize_and_overwrite(size_type __n, _Operation __op);
-- 
2.39.2



Fwd: [V2][PATCH] Fixing PR107411

2023-02-27 Thread Qing Zhao via Gcc-patches
Ping.

Qing

Begin forwarded message:

From: Qing Zhao mailto:qing.z...@oracle.com>>
Subject: [V2][PATCH] Fixing PR107411
Date: February 21, 2023 at 9:46:04 AM EST
To: ja...@redhat.com, 
rguent...@suse.de
Cc: gcc-patches@gcc.gnu.org, Qing Zhao 
mailto:qing.z...@oracle.com>>

This is the 2nd version of the patch.
compared to the first version, the major change is:

use sprintf to replace xasprintf per Jacub's suggestion.

bootstrapped and regression tested on both x86 and aarch64.

Okay for committing?

thanks.

Qing

===


This is a bug in tree-ssa-uninit.cc.
When doing the following:

 /* Ignore the call to .DEFERRED_INIT that define the original
var itself as the following case:
  temp = .DEFERRED_INIT (4, 2, “alt_reloc");
  alt_reloc = temp;
In order to avoid generating warning for the fake usage
at alt_reloc = temp.
 */

We need to compare the var name inside the .DEFERRED_INIT call
(the 3rd argument) and the name for the LHS variable. if they are the same,
we will NOT report the warning.

There is one issue when we get the name for the LHS variable. when the
variable doesn't have a DECL_NAME (it's not a user declared variable,
which is the case for this bug):

 _1 = .DEFERRED_INIT (4, 2, &"D.2389"[0]);
 D.2389 = _1;

The current checking just ignores this case, and still report the warning.

The fix is very simple, when getting the name for the LHS variable, we should
consider this case and come up with the name the same way as we construct the
3rd argument for the call to .DEFERRED_INIT (please refer to the routine
"gimple_add_init_for_auto_var")

PR middle-end/107411

gcc/ChangeLog:

PR middle-end/107411
* gimplify.cc (gimple_add_init_for_auto_var): Use sprintf 
to replace
xasprintf.
* tree-ssa-uninit.cc (warn_uninit): Handle the case 
when the
LHS varaible of a .DEFERRED_INIT call doesn't have a DECL_NAME.

gcc/testsuite/ChangeLog:

PR middle-end/107411
* g++.dg/pr107411.C: New test.
---
gcc/gimplify.cc |  4 ++--
gcc/testsuite/g++.dg/pr107411.C | 10 ++
gcc/tree-ssa-uninit.cc  | 23 
---
3 files changed, 28 insertions(+), 9 deletions(-)
create mode 100644 gcc/testsuite/g++.dg/pr107411.C

diff --git a/gcc/gimplify.cc 
b/gcc/gimplify.cc
index 96845154a92..35d1ea22623 100644
--- a/gcc/gimplify.cc
+++ b/gcc/gimplify.cc
@@ -1775,9 +1775,9 @@ gimple_add_init_for_auto_var (tree decl,

  else
{
-  char *decl_name_anonymous = xasprintf ("D.%u", DECL_UID (decl));
+  char decl_name_anonymous[3 + (HOST_BITS_PER_INT + 2) / 3];
+  sprintf (decl_name_anonymous, "D.%u", DECL_UID (decl));
  decl_name = build_string_literal (decl_name_anonymous);
-  free (decl_name_anonymous);
}

  tree call = build_call_expr_internal_loc (loc, IFN_DEFERRED_INIT,
diff --git a/gcc/testsuite/g++.dg/pr107411.C b/gcc/testsuite/g++.dg/pr107411.C
new file mode 100644
index 000..7eefecae4f3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/pr107411.C
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-Werror=uninitialized -ftrivial-auto-var-init=zero"  } */
+int t();
+void f(int);
+
+void j()
+{
+  const int& e = t();
+  f(e);
+}
diff --git a/gcc/tree-ssa-uninit.cc 
b/gcc/tree-ssa-uninit.cc
index c555cf5cd50..9f720ae1f4f 100644
--- a/gcc/tree-ssa-uninit.cc
+++ b/gcc/tree-ssa-uninit.cc
@@ -224,8 +224,6 @@ warn_uninit (opt_code opt, tree t, tree var, gimple 
*context,
at alt_reloc = temp.
 */
 tree lhs_var = NULL_TREE;
-  tree lhs_var_name = NULL_TREE;
-  const char *lhs_var_name_str = NULL;

 /* Get the variable name from the 3rd argument of call.  */
 tree var_name = gimple_call_arg (var_def_stmt, 2);
@@ -239,11 +237,22 @@ warn_uninit (opt_code opt, tree t, tree var, gimple 
*context,
 else if (TREE_CODE (gimple_assign_lhs (context)) == SSA_NAME)
lhs_var = SSA_NAME_VAR (gimple_assign_lhs (context));
   }
-  if (lhs_var
-  && (lhs_var_name = DECL_NAME (lhs_var))
-  && (lhs_var_name_str = IDENTIFIER_POINTER (lhs_var_name))
-  && (strcmp (lhs_var_name_str, var_name_str) == 0))
-return;
+  if (lhs_var)
+{
+  /* Get the name string for the LHS_VAR.
+ Refer to routine gimple_add_init_for_auto_var.  */
+  if (DECL_NAME (lhs_var)
+  && (strcmp (IDENTIFIER_POINTER (DECL_NAME (lhs_var)),
+  var_name_str) == 0))
+ return;
+  else if (!DECL_NAME (lhs_var))
+ {
+  char lhs_var_name_str_buf[3 + (HOST_BITS_PER_INT + 2) / 3];
+  sprintf (lhs_var_name_str_buf, "D.%u", DECL_UID (lhs_var));
+  if (strcmp (lhs_var_name_str_buf, var_name_str) == 0)
+return;
+ }
+}
 gcc_assert (var_name_str && 

Re: [PATCH] constraint: fix relaxed memory and repeated constraint handling

2023-02-27 Thread Richard Sandiford via Gcc-patches
"Victor L. Do Nascimento"  writes:
> The function `constrain_operands' lacked the logic to consider relaxed
> memory constraints when "traditional" memory constraints were not
> satisfied, creating potential issues as observed during the reload
> compilation pass.
>
> In addition, it was observed that while `constrain_operands' chooses
> to disregard constraints when more than one alternative is provided,
> e.g. "m,r" using CONSTRAINT__UNKNOWN, it has no checks in place to
> determine whether the multiple constraints in a given string are in
> fact repetitions of the same constraint and should thus in fact be
> treated as a single constraint, as ought to be the case for something
> like "m,m".
>
> Both of these issues are dealt with here, thus ensuring that we get
> appropriate pattern matching.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?
>
> Victor
>
> gcc/
>   * lra-constraints.cc (constraint_unique): New.
>   (process_address_1): Apply constraint_unique test.
>   * recog.cc (constrain_operands): Allow relaxed memory
>   constaints.
> ---
>  gcc/lra-constraints.cc | 43 +++---
>  gcc/recog.cc   |  3 ++-
>  2 files changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
> index dbfaf0485..c9c1653c0 100644
> --- a/gcc/lra-constraints.cc
> +++ b/gcc/lra-constraints.cc
> @@ -3448,6 +3448,45 @@ skip_constraint_modifiers (const char *str)
>}
>  }
>  
> +/*  Takes a string of 0 or more comma-separated constraints and the
> +constraint_num correspondig to the first constraint.  When more
> +than one constraint present, evaluate whether they all correspond
> +to a single, repeated constraint (e.g. "r,r") or whether we have
> +more than one distinct constraints (e.g. "r,m").  */

Minor formatting nit: indentation should be to "/* " rather than "/*  ".

> +static bool
> +constraint_unique (const char *cstr, enum constraint_num ca)
> +{
> +   enum constraint_num cb;
> +   for (;;)
> + {
> +   /* Skip past current constraint and any whitespace which may
> +   precede the end-of-line or separator characters.  */
> +   cstr = skip_constraint_modifiers (cstr
> +  + CONSTRAINT_LEN (cstr[0], cstr));
> +   /* If end of string reached and no disagreement found, we have
> +   uniqueness.  */
> +   if (*cstr == '\0')
> +  return true;
> +   /* skip_constraint_modifiers does not handle commas, handle
> +   case manually.  */
> +   if (*cstr == ',')
> +  cstr++;
> +   /* Get next constraint.  */
> +   cstr =  skip_constraint_modifiers (cstr);
> +   cb = lookup_constraint ((*cstr == '\0' || *cstr == ',') ? "X" : cstr);
> +
> +   /* If mismatch found, break out of loop.  */
> +   if (cb != ca)
> +  return false;
> +
> +   /* If *cstr == '\0', we don't want to reach the
> +   skip_constraint_modifiers statement again as that will
> +   advance the pointer past the end of the string.  */
> +   if (*cstr == '\0')
> +  return true;
> + }
> +}

How about rearranging this a bit to something like:

  ca = CONSTRAINT__UNKNOWN;
  for (;;)
{
  cstr = skip_constraint_modifiers (cstr);
  if (*cstr == '\0' || *cstr == ',')
cb = CONSTRAINT_X;
  else
{
  cb = lookup_constraint (cstr);
  if (cb == CONSTRAINT__UNKNOWN)
return false;
  cstr += CONSTRAINT_LEN (cstr[0], cstr);
}
  if (ca == CONSTRAINT__UNKNOWN)
ca = cb
  else if (ca != cb)
return false;
  if (*cstr == '\0')
return true;
  if (*cstr == ',')
cstr += 1;
}

That way we only do one lookup per loop iteration.  It also avoids
CONSTRAINT_LEN for the "empty, followed by comma" case.

If that works, the patch is OK with those changes once another
approved-for-GCC-13 patch needs it.  OK for GCC 14 otherwise.

Thanks,
Richard

> +
>  /* Major function to make reloads for an address in operand NOP or
> check its correctness (If CHECK_ONLY_P is true). The supported
> cases are:
> @@ -3507,9 +3546,7 @@ process_address_1 (int nop, bool check_only_p,
>   operand has one address constraint, probably all others constraints are
>   address ones.  */
>if (constraint[0] != '\0' && get_constraint_type (cn) != CT_ADDRESS
> -  && *skip_constraint_modifiers (constraint
> -  + CONSTRAINT_LEN (constraint[0],
> -constraint)) != '\0')
> +  && !constraint_unique (constraint, cn))
>  cn = CONSTRAINT__UNKNOWN;
>if (insn_extra_address_constraint (cn)
>/* When we find an asm operand with an address constraint that
> diff --git a/gcc/recog.cc b/gcc/recog.cc
> index 200cf4214..3ddeab59d 100644
> --- a/gcc/recog.cc
> +++ b/gcc/recog.cc
> @@ -3234,7 +3234,8 @@ constrain_operands (int 

Re: [PATCH] RISC-V: Bugfix for rvv bool mode precision adjustment

2023-02-27 Thread Richard Sandiford via Gcc-patches
Sorry for the slow reply, been away for a couple of weeks.

"incarnation.p.lee--- via Gcc-patches"  writes:
> From: Pan Li 
>
>   Fix the bug of the rvv bool mode precision with the adjustment.
>   The bits size of vbool*_t will be adjusted to
>   [1, 2, 4, 8, 16, 32, 64] according to the rvv spec 1.0 isa. The
>   adjusted mode precison of vbool*_t will help underlying pass to
>   make the right decision for both the correctness and optimization.
>
>   Given below sample code:
>   void test_1(int8_t * restrict in, int8_t * restrict out)
>   {
> vbool8_t v2 = *(vbool8_t*)in;
> vbool16_t v5 = *(vbool16_t*)in;
> *(vbool16_t*)(out + 200) = v5;
> *(vbool8_t*)(out + 100) = v2;
>   }
>
>   Before the precision adjustment:
>   addia4,a1,100
>   vsetvli a5,zero,e8,m1,ta,ma
>   addia1,a1,200
>   vlm.v   v24,0(a0)
>   vsm.v   v24,0(a4)
>   // Need one vsetvli and vlm.v for correctness here.
>   vsm.v   v24,0(a1)
>
>   After the precision adjustment:
>   csrrt0,vlenb
>   sllit1,t0,1
>   csrra3,vlenb
>   sub sp,sp,t1
>   sllia4,a3,1
>   add a4,a4,sp
>   sub a3,a4,a3
>   vsetvli a5,zero,e8,m1,ta,ma
>   addia2,a1,200
>   vlm.v   v24,0(a0)
>   vsm.v   v24,0(a3)
>   addia1,a1,100
>   vsetvli a4,zero,e8,mf2,ta,ma
>   csrrt0,vlenb
>   vlm.v   v25,0(a3)
>   vsm.v   v25,0(a2)
>   sllit1,t0,1
>   vsetvli a5,zero,e8,m1,ta,ma
>   vsm.v   v24,0(a1)
>   add sp,sp,t1
>   jr  ra
>
>   However, there may be some optimization opportunates after
>   the mode precision adjustment. It can be token care of in
>   the RISC-V backend in the underlying separted PR(s).
>
>   PR 108185
>   PR 108654
>
> gcc/ChangeLog:
>
>   * config/riscv/riscv-modes.def (ADJUST_PRECISION):
>   * config/riscv/riscv.cc (riscv_v_adjust_precision):
>   * config/riscv/riscv.h (riscv_v_adjust_precision):
>   * genmodes.cc (ADJUST_PRECISION):
>   (emit_mode_adjustments):
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/riscv/pr108185-1.c: New test.
>   * gcc.target/riscv/pr108185-2.c: New test.
>   * gcc.target/riscv/pr108185-3.c: New test.
>   * gcc.target/riscv/pr108185-4.c: New test.
>   * gcc.target/riscv/pr108185-5.c: New test.
>   * gcc.target/riscv/pr108185-6.c: New test.
>   * gcc.target/riscv/pr108185-7.c: New test.
>   * gcc.target/riscv/pr108185-8.c: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/config/riscv/riscv-modes.def|  8 +++
>  gcc/config/riscv/riscv.cc   | 12 
>  gcc/config/riscv/riscv.h|  1 +
>  gcc/genmodes.cc | 25 ++-
>  gcc/testsuite/gcc.target/riscv/pr108185-1.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-2.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-3.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-4.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-5.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-6.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-7.c | 68 ++
>  gcc/testsuite/gcc.target/riscv/pr108185-8.c | 77 +
>  12 files changed, 598 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-3.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-4.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-5.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-6.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-7.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/pr108185-8.c
>
> diff --git a/gcc/config/riscv/riscv-modes.def 
> b/gcc/config/riscv/riscv-modes.def
> index d5305efa8a6..110bddce851 100644
> --- a/gcc/config/riscv/riscv-modes.def
> +++ b/gcc/config/riscv/riscv-modes.def
> @@ -72,6 +72,14 @@ ADJUST_BYTESIZE (VNx16BI, riscv_vector_chunks * 
> riscv_bytes_per_vector_chunk);
>  ADJUST_BYTESIZE (VNx32BI, riscv_vector_chunks * 
> riscv_bytes_per_vector_chunk);
>  ADJUST_BYTESIZE (VNx64BI, riscv_v_adjust_nunits (VNx64BImode, 8));
>  
> +ADJUST_PRECISION (VNx1BI, riscv_v_adjust_precision (VNx1BImode, 1));
> +ADJUST_PRECISION (VNx2BI, riscv_v_adjust_precision (VNx2BImode, 2));
> +ADJUST_PRECISION (VNx4BI, riscv_v_adjust_precision (VNx4BImode, 4));
> +ADJUST_PRECISION (VNx8BI, riscv_v_adjust_precision (VNx8BImode, 8));
> +ADJUST_PRECISION (VNx16BI, riscv_v_adjust_precision (VNx16BImode, 16));
> +ADJUST_PRECISION (VNx32BI, riscv_v_adjust_precision (VNx32BImode, 32));
> +ADJUST_PRECISION (VNx64BI, riscv_v_adjust_precision (VNx64BImode, 64));
> +
>  /*
> | Mode| 

Re: [PATCH 1/4]middle-end: Revert can_special_div_by_const changes [PR108583]

2023-02-27 Thread Richard Biener via Gcc-patches
On Mon, 27 Feb 2023, Tamar Christina wrote:

> Hi All,
> 
> This reverts the changes for the CAN_SPECIAL_DIV_BY_CONST hook.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?

OK (you don't need approval for such reversion).

Thanks,
Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR target/108583
>   * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Remove.
>   * doc/tm.texi.in: Likewise.
>   * explow.cc (round_push, align_dynamic_address): Revert previous patch.
>   * expmed.cc (expand_divmod): Likewise.
>   * expmed.h (expand_divmod): Likewise.
>   * expr.cc (force_operand, expand_expr_divmod): Likewise.
>   * optabs.cc (expand_doubleword_mod, expand_doubleword_divmod): Likewise.
>   * target.def (can_special_div_by_const): Remove.
>   * target.h: Remove tree-core.h include
>   * targhooks.cc (default_can_special_div_by_const): Remove.
>   * targhooks.h (default_can_special_div_by_const): Remove.
>   * tree-vect-generic.cc (expand_vector_operation): Remove hook.
>   * tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook.
>   * tree-vect-stmts.cc (vectorizable_operation): Remove hook.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
> index 
> c6c891972d1e58cd163b259ba96a599d62326865..50a8872a6695b18b9bed0d393bacf733833633db
>  100644
> --- a/gcc/doc/tm.texi
> +++ b/gcc/doc/tm.texi
> @@ -6137,20 +6137,6 @@ instruction pattern.  There is no need for the hook to 
> handle these two
>  implementation approaches itself.
>  @end deftypefn
>  
> -@deftypefn {Target Hook} bool TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST 
> (enum @var{tree_code}, tree @var{vectype}, wide_int @var{constant}, rtx 
> *@var{output}, rtx @var{in0}, rtx @var{in1})
> -This hook is used to test whether the target has a special method of
> -division of vectors of type @var{vectype} using the value @var{constant},
> -and producing a vector of type @var{vectype}.  The division
> -will then not be decomposed by the vectorizer and kept as a div.
> -
> -When the hook is being used to test whether the target supports a special
> -divide, @var{in0}, @var{in1}, and @var{output} are all null.  When the hook
> -is being used to emit a division, @var{in0} and @var{in1} are the source
> -vectors of type @var{vecttype} and @var{output} is the destination vector of
> -type @var{vectype}.
> -
> -Return true if the operation is possible, emitting instructions for it
> -if rtxes are provided and updating @var{output}.
>  @end deftypefn
>  
>  @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION 
> (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
> diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
> index 
> 613b2534149415f442163d599503efaf423b673b..3e07978a02f4e6077adae6cadc93ea4273295f1f
>  100644
> --- a/gcc/doc/tm.texi.in
> +++ b/gcc/doc/tm.texi.in
> @@ -4173,7 +4173,6 @@ address;  but often a machine-dependent strategy can 
> generate better code.
>  
>  @hook TARGET_VECTORIZE_VEC_PERM_CONST
>  
> -@hook TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST
>  
>  @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
>  
> diff --git a/gcc/explow.cc b/gcc/explow.cc
> index 
> 83439b32abe1b9aa4b7983eb629804f97486acbd..be9195b33323ee5597fc212f0befa016eea4573c
>  100644
> --- a/gcc/explow.cc
> +++ b/gcc/explow.cc
> @@ -1037,7 +1037,7 @@ round_push (rtx size)
>   TRUNC_DIV_EXPR.  */
>size = expand_binop (Pmode, add_optab, size, alignm1_rtx,
>  NULL_RTX, 1, OPTAB_LIB_WIDEN);
> -  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, size, 
> align_rtx,
> +  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx,
>   NULL_RTX, 1);
>size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1);
>  
> @@ -1203,7 +1203,7 @@ align_dynamic_address (rtx target, unsigned 
> required_align)
>gen_int_mode (required_align / BITS_PER_UNIT - 1,
>  Pmode),
>NULL_RTX, 1, OPTAB_LIB_WIDEN);
> -  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, target,
> +  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
> gen_int_mode (required_align / BITS_PER_UNIT,
>   Pmode),
> NULL_RTX, 1);
> diff --git a/gcc/expmed.h b/gcc/expmed.h
> index 
> 0419e2dac85850889ce0bee59515e31a80c582de..4dfe635c22ee49f2dba4c53640941628068f3901
>  100644
> --- a/gcc/expmed.h
> +++ b/gcc/expmed.h
> @@ -710,9 +710,8 @@ extern rtx expand_shift (enum tree_code, machine_mode, 
> rtx, poly_int64, rtx,
>  extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
>  int);
>  #ifdef GCC_OPTABS_H
> -extern rtx expand_divmod (int, enum tree_code, machine_mode, tree, tree,
> -   rtx, rtx, rtx, int,
> - 

[PATCH] Fixup possible VR_ANTI_RANGE value_range issues

2023-02-27 Thread Richard Biener via Gcc-patches
After fixing PR107561 the following avoids looking at VR_ANTI_RANGE
ranges where it doesn't seem obvious the code does the correct
thing here (lower_bound and upper_bound do not work as expected).

Bootstrapped and tested on x86_64-unknown-linux-gnu.

OK?

Thanks,
Richard.

* gimple-ssa-sprintf.cc (get_int_range): Avoid VR_ANTI_RANGE
by using range_int_cst_p.
(format_integer): Likewise.
(handle_printf_call): Guard against VR_ANTI_RANGE.
* graphite-sese-to-poly.cc (add_param_constraints): Likewise.
* tree-ssa-strlen.cc (set_strlen_range): Likewise.
---
 gcc/gimple-ssa-sprintf.cc| 6 +++---
 gcc/graphite-sese-to-poly.cc | 2 +-
 gcc/tree-ssa-strlen.cc   | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/gimple-ssa-sprintf.cc b/gcc/gimple-ssa-sprintf.cc
index 18975708d2c..61974072f62 100644
--- a/gcc/gimple-ssa-sprintf.cc
+++ b/gcc/gimple-ssa-sprintf.cc
@@ -1082,7 +1082,7 @@ get_int_range (tree arg, gimple *stmt,
  value_range vr;
  query->range_of_expr (vr, arg, stmt);
 
- if (!vr.undefined_p () && !vr.varying_p ())
+ if (range_int_cst_p ())
{
  HOST_WIDE_INT type_min
= (TYPE_UNSIGNED (argtype)
@@ -1391,7 +1391,7 @@ format_integer (const directive , tree arg, 
pointer_query _qry)
   value_range vr;
   ptr_qry.rvals->range_of_expr (vr, arg, dir.info->callstmt);
 
-  if (!vr.varying_p () && !vr.undefined_p ())
+  if (range_int_cst_p ())
{
  argmin = wide_int_to_tree (TREE_TYPE (arg), vr.lower_bound ());
  argmax = wide_int_to_tree (TREE_TYPE (arg), vr.upper_bound ());
@@ -4623,7 +4623,7 @@ handle_printf_call (gimple_stmt_iterator *gsi, 
pointer_query _qry)
  value_range vr;
  ptr_qry.rvals->range_of_expr (vr, size, info.callstmt);
 
- if (!vr.undefined_p ())
+ if (!vr.undefined_p () && vr.kind () != VR_ANTI_RANGE)
{
  tree type = TREE_TYPE (size);
  tree tmin = wide_int_to_tree (type, vr.lower_bound ());
diff --git a/gcc/graphite-sese-to-poly.cc b/gcc/graphite-sese-to-poly.cc
index fbe7667380a..b89262640ac 100644
--- a/gcc/graphite-sese-to-poly.cc
+++ b/gcc/graphite-sese-to-poly.cc
@@ -426,7 +426,7 @@ add_param_constraints (scop_p scop, graphite_dim_t p, tree 
parameter)
 
   if (INTEGRAL_TYPE_P (type)
   && get_range_query (cfun)->range_of_expr (r, parameter)
-  && !r.undefined_p ())
+  && range_int_cst_p ())
 {
   min = r.lower_bound ();
   max = r.upper_bound ();
diff --git a/gcc/tree-ssa-strlen.cc b/gcc/tree-ssa-strlen.cc
index 7508c1768a5..e1230522564 100644
--- a/gcc/tree-ssa-strlen.cc
+++ b/gcc/tree-ssa-strlen.cc
@@ -1936,7 +1936,7 @@ set_strlen_range (tree lhs, wide_int min, wide_int max,
{
  value_range r;
  get_range_query (cfun)->range_of_expr (r, bound);
- if (!r.undefined_p ())
+ if (range_int_cst_p ())
{
  /* For a bound in a known range, adjust the range determined
 above as necessary.  For a bound in some anti-range or
-- 
2.35.3


[PATCH 4/4]AArch64 Update div-bitmask to implement new optab instead of target hook [PR108583]

2023-02-27 Thread Tamar Christina via Gcc-patches
Hi All,

This replaces the custom division hook with just an implementation through
add_highpart.  For NEON we implement the add highpart (Addition + extraction of
the upper highpart of the register in the same precision) as ADD + LSR.

This representation allows us to easily optimize the sequence using existing
sequences. This gets us a pretty decent sequence using SRA:

umull   v1.8h, v0.8b, v3.8b
umull2  v0.8h, v0.16b, v3.16b
add v5.8h, v1.8h, v2.8h
add v4.8h, v0.8h, v2.8h
usrav1.8h, v5.8h, 8
usrav0.8h, v4.8h, 8
uzp2v1.16b, v1.16b, v0.16b

To get the most optimal sequence however we match (a + ((b + c) >> n)) where n
is half the precision of the mode of the operation into addhn + uaddw which is
a general good optimization on its own and gets us back to:

.L4:
ldr q0, [x3]
umull   v1.8h, v0.8b, v5.8b
umull2  v0.8h, v0.16b, v5.16b
addhn   v3.8b, v1.8h, v4.8h
addhn   v2.8b, v0.8h, v4.8h
uaddw   v1.8h, v1.8h, v3.8b
uaddw   v0.8h, v0.8h, v2.8b
uzp2v1.16b, v1.16b, v0.16b
str q1, [x3], 16
cmp x3, x4
bne .L4

For SVE2 we optimize the initial sequence to the same ADD + LSR which gets us:

.L3:
ld1bz0.h, p0/z, [x0, x3]
mul z0.h, p1/m, z0.h, z2.h
add z1.h, z0.h, z3.h
usraz0.h, z1.h, #8
lsr z0.h, z0.h, #8
st1bz0.h, p0, [x0, x3]
inchx3
whilelo p0.h, w3, w2
b.any   .L3
.L1:
ret

and to get the most optimal sequence I match (a + b) >> n (same constraint on n)
to addhnb which gets us to:

.L3:
ld1bz0.h, p0/z, [x0, x3]
mul z0.h, p1/m, z0.h, z2.h
addhnb  z1.b, z0.h, z3.h
addhnb  z0.b, z0.h, z1.h
st1bz0.h, p0, [x0, x3]
inchx3
whilelo p0.h, w3, w2
b.any   .L3

There are multiple RTL representations possible for these optimizations, I did
not represent them using a zero_extend because we seem very inconsistent in this
in the backend.  Since they are unspecs we won't match them from vector ops
anyway. I figured maintainers would prefer this, but my maintainer ouija board
is still out for repairs :)

There are no new test as new correctness tests were added to the mid-end and
the existing codegen tests for this already exist.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/108583
* config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv3): Remove.
(*bitmask_shift_plus): New.
* config/aarch64/aarch64-sve2.md (*bitmask_shift_plus): New.
(@aarch64_bitmask_udiv3): Remove.
* config/aarch64/aarch64.cc
(aarch64_vectorize_can_special_div_by_constant,
TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Removed.
(TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT,
aarch64_vectorize_preferred_div_as_shifts_over_mult): New.

--- inline copy of patch -- 
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
7f212bf37cd2c120dceb7efa733c9fa76226f029..e1ecb88634f93d380ef534093ea6599dc7278108
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4867,60 +4867,27 @@ (define_expand "aarch64_hn2"
   }
 )
 
-;; div optimizations using narrowings
-;; we can do the division e.g. shorts by 255 faster by calculating it as
-;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in
-;; double the precision of x.
-;;
-;; If we imagine a short as being composed of two blocks of bytes then
-;; adding 257 or 0b_0001__0001 to the number is equivalent to
-;; adding 1 to each sub component:
-;;
-;;  short value of 16-bits
-;; ┌──┬┐
-;; │  ││
-;; └──┴┘
-;;   8-bit part1 ▲  8-bit part2   ▲
-;;   ││
-;;   ││
-;;  +1   +1
-;;
-;; after the first addition, we have to shift right by 8, and narrow the
-;; results back to a byte.  Remember that the addition must be done in
-;; double the precision of the input.  Since 8 is half the size of a short
-;; we can use a narrowing halfing instruction in AArch64, addhn which also
-;; does the addition in a wider precision and narrows back to a byte.  The
-;; shift itself is implicit in the operation as it writes back only the top
-;; half of the result. i.e. bits 2*esize-1:esize.
-;;
-;; Since we have narrowed the result of the first part back to a byte, for
-;; the second addition we can use a widening addition, uaddw.
-;;
-;; For the final shift, since it's unsigned arithmetic we emit an ushr by 8.
-;;
-;; The shift is later optimized by combine to a uzp2 with movi #0.
-(define_expand "@aarch64_bitmask_udiv3"
-  [(match_operand:VQN 0 "register_operand")
-   

[PATCH 3/4]middle-end: Implement preferred_div_as_shifts_over_mult [PR108583]

2023-02-27 Thread Tamar Christina via Gcc-patches
Hi All,

As Richard S wanted, this now implements a hook
preferred_div_as_shifts_over_mult that indicates whether a target prefers that
the vectorizer decomposes division as shifts rather than multiplication when
possible.

In order to be able to use this we need to check whether the current precision
has enough bits to do the operation without any of the additions overflowing.

We use range information to determine this and only do the operation if we're
sure am overflow won't occur. This now uses ranger to do this range check.

This seems to work better than vect_get_range_info which uses range_query, but I
have not switched the interface of vect_get_range_info over in this PR fix.

As Andy said before initializing a ranger instance is cheap but not free, and if
the intention is to call it often during a pass it should be instantiated at
pass startup and passed along to the places that need it.  This is a big
refactoring and doesn't seem right to do in this PR.  But we should in GCC 14.

Currently we only instantiate it after a long series of much cheaper checks.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/108583
* target.def (preferred_div_as_shifts_over_mult): New.
* doc/tm.texi.in: Document it.
* doc/tm.texi: Regenerate.
* targhooks.cc (default_preferred_div_as_shifts_over_mult): New.
* targhooks.h (default_preferred_div_as_shifts_over_mult): New.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Use it.

gcc/testsuite/ChangeLog:

PR target/108583
* gcc.dg/vect/vect-div-bitmask-4.c: New test.
* gcc.dg/vect/vect-div-bitmask-5.c: New test.

--- inline copy of patch -- 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 
50a8872a6695b18b9bed0d393bacf733833633db..c85196015e2e53047fcc65d32ef2d3203d2a6bab
 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6137,6 +6137,9 @@ instruction pattern.  There is no need for the hook to 
handle these two
 implementation approaches itself.
 @end deftypefn
 
+@deftypefn {Target Hook} bool 
TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT (void)
+When decomposing a division operation, if possible prefer to decompose the
+operation as shifts rather than multiplication by magic constants.
 @end deftypefn
 
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION 
(unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 
3e07978a02f4e6077adae6cadc93ea4273295f1f..0051017a7fd67691a343470f36ad4fc32c8e7e15
 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4173,6 +4173,7 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_VECTORIZE_VEC_PERM_CONST
 
+@hook TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT
 
 @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
 
diff --git a/gcc/target.def b/gcc/target.def
index 
e0a5c7adbd962f5d08ed08d1d81afa2c2baa64a5..8cc18b1f3c5de24c21faf891b9d4d0b6fd5b59d7
 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1868,6 +1868,15 @@ correct for most targets.",
  poly_uint64, (const_tree type),
  default_preferred_vector_alignment)
 
+/* Returns whether the target has a preference for decomposing divisions using
+   shifts rather than multiplies.  */
+DEFHOOK
+(preferred_div_as_shifts_over_mult,
+ "When decomposing a division operation, if possible prefer to decompose the\n\
+operation as shifts rather than multiplication by magic constants.",
+ bool, (void),
+ default_preferred_div_as_shifts_over_mult)
+
 /* Return true if vector alignment is reachable (by peeling N
iterations) for the given scalar type.  */
 DEFHOOK
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 
a6a4809ca91baa5d7fad2244549317a31390f0c2..dda011c59fbd5973ee648dfea195619cc41c71bc
 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -53,6 +53,8 @@ extern scalar_int_mode default_unwind_word_mode (void);
 extern unsigned HOST_WIDE_INT default_shift_truncation_mask
   (machine_mode);
 extern unsigned int default_min_divisions_for_recip_mul (machine_mode);
+extern bool
+default_preferred_div_as_shifts_over_mult (void);
 extern int default_mode_rep_extended (scalar_int_mode, scalar_int_mode);
 
 extern tree default_stack_protect_guard (void);
diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc
index 
211525720a620d6f533e2da91e03877337a931e7..6396f344eef09dd61f358938846a1c02a70b31d8
 100644
--- a/gcc/targhooks.cc
+++ b/gcc/targhooks.cc
@@ -1483,6 +1483,15 @@ default_preferred_vector_alignment (const_tree type)
   return TYPE_ALIGN (type);
 }
 
+/* The default implementation of
+   TARGET_VECTORIZE_PREFERRED_DIV_AS_SHIFTS_OVER_MULT.  */
+
+bool
+default_preferred_div_as_shifts_over_mult (void)
+{
+  return false;
+}
+
 /* By default assume vectors of element TYPE require a multiple of the natural
alignment of TYPE.  TYPE is naturally aligned if IS_PACKED is 

[PATCH 2/4][ranger]: Add range-ops for widen addition and widen multiplication [PR108583]

2023-02-27 Thread Tamar Christina via Gcc-patches
Hi All,

This adds range-ops for widening addition and widening multiplication.

I couldn't figure out how to write a test for this.  It looks like there are
self tests but not a way to write standalone ones?  I did create testcases in
the patch 3/4 which tests the end result.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/108583
* gimple-range-op.h (gimple_range_op_handler): Add maybe_non_standard.
* gimple-range-op.cc (gimple_range_op_handler::gimple_range_op_handler):
Use it.
(gimple_range_op_handler::maybe_non_standard): New.
* range-op.cc (class operator_widen_plus_signed,
operator_widen_plus_signed::wi_fold, class operator_widen_plus_unsigned,
operator_widen_plus_unsigned::wi_fold, class operator_widen_mult_signed,
operator_widen_mult_signed::wi_fold, class operator_widen_mult_unsigned,
operator_widen_mult_unsigned::wi_fold,
ptr_op_widen_mult_signed, ptr_op_widen_mult_unsigned,
ptr_op_widen_plus_signed, ptr_op_widen_plus_unsigned): New.
* range-op.h (ptr_op_widen_mult_signed, ptr_op_widen_mult_unsigned,
ptr_op_widen_plus_signed, ptr_op_widen_plus_unsigned): New

Co-Authored-By: Andrew MacLeod 

--- inline copy of patch -- 
diff --git a/gcc/gimple-range-op.h b/gcc/gimple-range-op.h
index 
743b858126e333ea9590c0f175aacb476260c048..1bf63c5ce6f5db924a1f5907ab4539e376281bd0
 100644
--- a/gcc/gimple-range-op.h
+++ b/gcc/gimple-range-op.h
@@ -41,6 +41,7 @@ public:
 relation_trio = TRIO_VARYING);
 private:
   void maybe_builtin_call ();
+  void maybe_non_standard ();
   gimple *m_stmt;
   tree m_op1, m_op2;
 };
diff --git a/gcc/gimple-range-op.cc b/gcc/gimple-range-op.cc
index 
d9dfdc56939bb62ade72726b15c3d5e87e4ddcd1..ad13c873c6303db5b68b74db1562c0db6763101f
 100644
--- a/gcc/gimple-range-op.cc
+++ b/gcc/gimple-range-op.cc
@@ -179,6 +179,8 @@ gimple_range_op_handler::gimple_range_op_handler (gimple *s)
   // statements.
   if (is_a  (m_stmt))
 maybe_builtin_call ();
+  else
+maybe_non_standard ();
 }
 
 // Calculate what we can determine of the range of this unary
@@ -764,6 +766,44 @@ public:
   }
 } op_cfn_parity;
 
+// Set up a gimple_range_op_handler for any nonstandard function which can be
+// supported via range-ops.
+
+void
+gimple_range_op_handler::maybe_non_standard ()
+{
+  range_operator *signed_op = ptr_op_widen_mult_signed;
+  range_operator *unsigned_op = ptr_op_widen_mult_unsigned;
+  if (gimple_code (m_stmt) == GIMPLE_ASSIGN)
+switch (gimple_assign_rhs_code (m_stmt))
+  {
+   case WIDEN_PLUS_EXPR:
+   {
+ signed_op = ptr_op_widen_plus_signed;
+ unsigned_op = ptr_op_widen_plus_unsigned;
+   }
+   gcc_fallthrough ();
+   case WIDEN_MULT_EXPR:
+   {
+ m_valid = true;
+ m_op1 = gimple_assign_rhs1 (m_stmt);
+ m_op2 = gimple_assign_rhs2 (m_stmt);
+ bool signed1 = TYPE_SIGN (TREE_TYPE (m_op1)) == SIGNED;
+ bool signed2 = TYPE_SIGN (TREE_TYPE (m_op2)) == SIGNED;
+ if (signed2 && !signed1)
+   std::swap (m_op1, m_op2);
+
+ if (signed1 || signed2)
+   m_int = signed_op;
+ else
+   m_int = unsigned_op;
+ break;
+   }
+   default:
+ break;
+  }
+}
+
 // Set up a gimple_range_op_handler for any built in function which can be
 // supported via range-ops.
 
diff --git a/gcc/range-op.h b/gcc/range-op.h
index 
f00b747f08a1fa8404c63bfe5a931b4048008b03..b1eeac70df81f2bdf228af7adff5399e7ac5e5d6
 100644
--- a/gcc/range-op.h
+++ b/gcc/range-op.h
@@ -311,4 +311,8 @@ private:
 // This holds the range op table for floating point operations.
 extern floating_op_table *floating_tree_table;
 
+extern range_operator *ptr_op_widen_mult_signed;
+extern range_operator *ptr_op_widen_mult_unsigned;
+extern range_operator *ptr_op_widen_plus_signed;
+extern range_operator *ptr_op_widen_plus_unsigned;
 #endif // GCC_RANGE_OP_H
diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 
5c67bce6d3aab81ad3186b902e09d6a96878d9bb..718ccb6f074e1a2a9ef1b7a5d4e879898d4a7fc3
 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -1556,6 +1556,73 @@ operator_plus::op2_range (irange , tree type,
   return op1_range (r, type, lhs, op1, rel.swap_op1_op2 ());
 }
 
+class operator_widen_plus_signed : public range_operator
+{
+public:
+  virtual void wi_fold (irange , tree type,
+   const wide_int _lb,
+   const wide_int _ub,
+   const wide_int _lb,
+   const wide_int _ub) const;
+} op_widen_plus_signed;
+range_operator *ptr_op_widen_plus_signed = _widen_plus_signed;
+
+void
+operator_widen_plus_signed::wi_fold (irange , tree type,
+const wide_int _lb,
+const wide_int _ub,
+const wide_int _lb,

[PATCH 1/4]middle-end: Revert can_special_div_by_const changes [PR108583]

2023-02-27 Thread Tamar Christina via Gcc-patches
Hi All,

This reverts the changes for the CAN_SPECIAL_DIV_BY_CONST hook.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/108583
* doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST): Remove.
* doc/tm.texi.in: Likewise.
* explow.cc (round_push, align_dynamic_address): Revert previous patch.
* expmed.cc (expand_divmod): Likewise.
* expmed.h (expand_divmod): Likewise.
* expr.cc (force_operand, expand_expr_divmod): Likewise.
* optabs.cc (expand_doubleword_mod, expand_doubleword_divmod): Likewise.
* target.def (can_special_div_by_const): Remove.
* target.h: Remove tree-core.h include
* targhooks.cc (default_can_special_div_by_const): Remove.
* targhooks.h (default_can_special_div_by_const): Remove.
* tree-vect-generic.cc (expand_vector_operation): Remove hook.
* tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook.
* tree-vect-stmts.cc (vectorizable_operation): Remove hook.

--- inline copy of patch -- 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 
c6c891972d1e58cd163b259ba96a599d62326865..50a8872a6695b18b9bed0d393bacf733833633db
 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6137,20 +6137,6 @@ instruction pattern.  There is no need for the hook to 
handle these two
 implementation approaches itself.
 @end deftypefn
 
-@deftypefn {Target Hook} bool TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST (enum 
@var{tree_code}, tree @var{vectype}, wide_int @var{constant}, rtx 
*@var{output}, rtx @var{in0}, rtx @var{in1})
-This hook is used to test whether the target has a special method of
-division of vectors of type @var{vectype} using the value @var{constant},
-and producing a vector of type @var{vectype}.  The division
-will then not be decomposed by the vectorizer and kept as a div.
-
-When the hook is being used to test whether the target supports a special
-divide, @var{in0}, @var{in1}, and @var{output} are all null.  When the hook
-is being used to emit a division, @var{in0} and @var{in1} are the source
-vectors of type @var{vecttype} and @var{output} is the destination vector of
-type @var{vectype}.
-
-Return true if the operation is possible, emitting instructions for it
-if rtxes are provided and updating @var{output}.
 @end deftypefn
 
 @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION 
(unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index 
613b2534149415f442163d599503efaf423b673b..3e07978a02f4e6077adae6cadc93ea4273295f1f
 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4173,7 +4173,6 @@ address;  but often a machine-dependent strategy can 
generate better code.
 
 @hook TARGET_VECTORIZE_VEC_PERM_CONST
 
-@hook TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST
 
 @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
 
diff --git a/gcc/explow.cc b/gcc/explow.cc
index 
83439b32abe1b9aa4b7983eb629804f97486acbd..be9195b33323ee5597fc212f0befa016eea4573c
 100644
--- a/gcc/explow.cc
+++ b/gcc/explow.cc
@@ -1037,7 +1037,7 @@ round_push (rtx size)
  TRUNC_DIV_EXPR.  */
   size = expand_binop (Pmode, add_optab, size, alignm1_rtx,
   NULL_RTX, 1, OPTAB_LIB_WIDEN);
-  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, size, align_rtx,
+  size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx,
NULL_RTX, 1);
   size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1);
 
@@ -1203,7 +1203,7 @@ align_dynamic_address (rtx target, unsigned 
required_align)
 gen_int_mode (required_align / BITS_PER_UNIT - 1,
   Pmode),
 NULL_RTX, 1, OPTAB_LIB_WIDEN);
-  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, target,
+  target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target,
  gen_int_mode (required_align / BITS_PER_UNIT,
Pmode),
  NULL_RTX, 1);
diff --git a/gcc/expmed.h b/gcc/expmed.h
index 
0419e2dac85850889ce0bee59515e31a80c582de..4dfe635c22ee49f2dba4c53640941628068f3901
 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -710,9 +710,8 @@ extern rtx expand_shift (enum tree_code, machine_mode, rtx, 
poly_int64, rtx,
 extern rtx maybe_expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
   int);
 #ifdef GCC_OPTABS_H
-extern rtx expand_divmod (int, enum tree_code, machine_mode, tree, tree,
- rtx, rtx, rtx, int,
- enum optab_methods = OPTAB_LIB_WIDEN);
+extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
+ rtx, int, enum optab_methods = OPTAB_LIB_WIDEN);
 #endif
 #endif
 
diff --git a/gcc/expmed.cc b/gcc/expmed.cc
index 

Re: [PATCH] s390: Add LEN_LOAD/LEN_STORE support.

2023-02-27 Thread Andreas Krebbel via Gcc-patches
On 2/27/23 11:13, Robin Dapp wrote:
>> Do you really need a copy of the address register? Couldn't you just do a
>> src = adjust_address (operands[1], BLKmode, 0);
>> You create a paradoxical subreg of the QImode input but vll actually
>> uses the whole 32 bit value. Couldn't we end up with uninitialized
>> bytes being used as part of the length then? Do we need a zero-extend
>> here?
> 
> v2 attached with these problems addressed.
> 
> Testsuite and bootstrap as before.

Ok. Thanks!

Andreas




Re: [PATCH 2/2, GCC12] AArch64: Gate various crypto intrinsics availability based on features

2023-02-27 Thread Richard Sandiford via Gcc-patches
Tejas Belagod  writes:
> The 64-bit variant of PMULL{2} and AES instructions are available if FEAT_AES
> is implemented according to the Arm ARM [1].  Similarly FEAT_SHA1 and
> FEAT_SHA256 enable the use of SHA1 and SHA256 instruction variants.
> This patch fixes arm_neon.h to correctly reflect the feature availability 
> based
> on '+aes' and '+sha2' as opposed to the ambiguous catch-all '+crypto'.
>
> [1] Section D17.2.61, C7.2.215
>
> 2022-01-11  Tejas Belagod  
>
> gcc/ChangeLog:
>
>   * config/aarch64/arm_neon.h (vmull_p64, vmull_high_p64, vaeseq_u8,
>   vaesdq_u8, vaesmcq_u8, vaesimcq_u8): Gate under "nothing+aes".
>   (vsha1*_u32, vsha256*_u32): Gate under "nothing+sha2".
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/acle/pmull64.c: New.
>   * gcc.target/aarch64/aes-fuse-1.c: Replace '+crypto' with corresponding
>   feature flag based on the intrinsic.
>   * gcc.target/aarch64/aes-fuse-2.c: Likewise.
>   * gcc.target/aarch64/aes_1.c: Likewise.
>   * gcc.target/aarch64/aes_2.c: Likewise.
>   * gcc.target/aarch64/aes_xor_combine.c: Likewise.
>   * gcc.target/aarch64/sha1_1.c: Likewise.
>   * gcc.target/aarch64/sha256_1.c: Likewise.
>   * gcc.target/aarch64/target_attr_crypto_ice_1.c: Likewise.

OK to backport, thanks.

Richard

> ---
>  gcc/config/aarch64/arm_neon.h | 35 ++-
>  .../gcc.target/aarch64/acle/pmull64.c | 14 
>  gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c |  4 +--
>  gcc/testsuite/gcc.target/aarch64/aes-fuse-2.c |  4 +--
>  gcc/testsuite/gcc.target/aarch64/aes_1.c  |  2 +-
>  gcc/testsuite/gcc.target/aarch64/aes_2.c  |  4 ++-
>  .../gcc.target/aarch64/aes_xor_combine.c  |  2 +-
>  gcc/testsuite/gcc.target/aarch64/sha1_1.c |  2 +-
>  gcc/testsuite/gcc.target/aarch64/sha256_1.c   |  2 +-
>  .../aarch64/target_attr_crypto_ice_1.c|  2 +-
>  10 files changed, 44 insertions(+), 27 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/acle/pmull64.c
>
> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 85d03c58d2a..695aafd9a5e 100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -10243,7 +10243,7 @@ vqrdmlshs_laneq_s32 (int32_t __a, int32_t __b, 
> int32x4_t __c, const int __d)
>  #pragma GCC pop_options
>  
>  #pragma GCC push_options
> -#pragma GCC target ("+nothing+crypto")
> +#pragma GCC target ("+nothing+aes")
>  /* vaes  */
>  
>  __extension__ extern __inline uint8x16_t
> @@ -10273,6 +10273,22 @@ vaesimcq_u8 (uint8x16_t data)
>  {
>return __builtin_aarch64_crypto_aesimcv16qi_uu (data);
>  }
> +
> +__extension__ extern __inline poly128_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vmull_p64 (poly64_t __a, poly64_t __b)
> +{
> +  return
> +__builtin_aarch64_crypto_pmulldi_ppp (__a, __b);
> +}
> +
> +__extension__ extern __inline poly128_t
> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> +vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
> +{
> +  return __builtin_aarch64_crypto_pmullv2di_ppp (__a, __b);
> +}
> +
>  #pragma GCC pop_options
>  
>  /* vcage  */
> @@ -23519,7 +23535,7 @@ vrsrad_n_u64 (uint64_t __a, uint64_t __b, const int 
> __c)
>  }
>  
>  #pragma GCC push_options
> -#pragma GCC target ("+nothing+crypto")
> +#pragma GCC target ("+nothing+sha2")
>  
>  /* vsha1  */
>  
> @@ -23596,21 +23612,6 @@ vsha256su1q_u32 (uint32x4_t __tw0_3, uint32x4_t 
> __w8_11, uint32x4_t __w12_15)
>  __w12_15);
>  }
>  
> -__extension__ extern __inline poly128_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -vmull_p64 (poly64_t __a, poly64_t __b)
> -{
> -  return
> -__builtin_aarch64_crypto_pmulldi_ppp (__a, __b);
> -}
> -
> -__extension__ extern __inline poly128_t
> -__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -vmull_high_p64 (poly64x2_t __a, poly64x2_t __b)
> -{
> -  return __builtin_aarch64_crypto_pmullv2di_ppp (__a, __b);
> -}
> -
>  #pragma GCC pop_options
>  
>  /* vshl */
> diff --git a/gcc/testsuite/gcc.target/aarch64/acle/pmull64.c 
> b/gcc/testsuite/gcc.target/aarch64/acle/pmull64.c
> new file mode 100644
> index 000..6a1e99e2d0d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/acle/pmull64.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-march=armv8.2-a" } */
> +
> +#pragma push_options
> +#pragma GCC target ("+aes")
> +
> +#include "arm_neon.h"
> +
> +int foo (poly64_t a, poly64_t b)
> +{
> +  return vgetq_lane_s32 (vreinterpretq_s32_p128 (vmull_p64 (a, b)), 0);
> +}
> +
> +/* { dg-final { scan-assembler "\tpmull\tv" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c 
> b/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c
> index d7b4f89919d..1b4e10f78db 100644
> --- a/gcc/testsuite/gcc.target/aarch64/aes-fuse-1.c
> +++ 

Re: [PATCH 1/2, GCC12] AArch64: Update transitive closures of aes, sha2 and sha3 extensions

2023-02-27 Thread Richard Sandiford via Gcc-patches
Tejas Belagod  writes:
> Transitive closures of architectural extensions have to be manually maintained
> for AARCH64_OPT_EXTENSION list.  Currently aes, sha2 and sha3 extensions add
> AARCH64_FL_SIMD has their dependency - this does not automatically pull in the
> transitive dependence of AARCH64_FL_FP from AARCH64_FL_SIMD's definition.  As
> described the transitive closure/dependence has to be maintained manually.
> This patch adds AARCH64_FL_FP to each of these crypto extensions' dependence
> set.  Automatic transitive closure maintenance is fixed on trunk in commit
> 11a113d501ff64fa4843e28d0a21b3f4e9d0d3de.
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-option-extensions.def (aes, sha2, sha3):
>   Update AARCH64_OPT_EXTENSION definition of architectural dependence for
>   defintion of aes, sha2 and sha3 with AARCH64_FL_FP.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64-option-extensions.def | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index b4d0ac8b600..88cefc20022 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -118,19 +118,19 @@ AARCH64_OPT_EXTENSION("dotprod", AARCH64_FL_DOTPROD, 
> AARCH64_FL_SIMD, 0, \
>  
>  /* Enabling "aes" also enables "simd".
> Disabling "aes" disables "aes" and "sve2-aes'.  */
> -AARCH64_OPT_EXTENSION("aes", AARCH64_FL_AES, AARCH64_FL_SIMD, \
> -   AARCH64_FL_SVE2_AES, false, "aes")
> +AARCH64_OPT_EXTENSION("aes", AARCH64_FL_AES, AARCH64_FL_SIMD | \
> +   AARCH64_FL_FP, AARCH64_FL_SVE2_AES, false, "aes")
>  
>  /* Enabling "sha2" also enables "simd".
> Disabling "sha2" just disables "sha2".  */
> -AARCH64_OPT_EXTENSION("sha2", AARCH64_FL_SHA2, AARCH64_FL_SIMD, 0, false, \
> -   "sha1 sha2")
> +AARCH64_OPT_EXTENSION("sha2", AARCH64_FL_SHA2, AARCH64_FL_SIMD | \
> +   AARCH64_FL_FP, 0, false, "sha1 sha2")
>  
>  /* Enabling "sha3" enables "simd" and "sha2".
> Disabling "sha3" disables "sha3" and "sve2-sha3".  */
>  AARCH64_OPT_EXTENSION("sha3", AARCH64_FL_SHA3, AARCH64_FL_SIMD | \
> -   AARCH64_FL_SHA2, AARCH64_FL_SVE2_SHA3, false, \
> -   "sha3 sha512")
> +   AARCH64_FL_SHA2 | AARCH64_FL_FP, AARCH64_FL_SVE2_SHA3, \
> +   false, "sha3 sha512")
>  
>  /* Enabling "sm4" also enables "simd".
> Disabling "sm4" disables "sm4" and "sve2-sm4".  */


[Patch,v3] Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings

2023-02-27 Thread Tobias Burnus

And another re-diff for GCC 13/mainline, updating gcc/testsuite/

(The last change is related to the "[OG12,committed] Update dg-dump-scan
for ..." discussion + OG12 https://gcc.gnu.org/g:e4de87a2309 /
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612871.html )

On 23.02.23 17:42, Tobias Burnus wrote:

On 21.02.23 12:57, Tobias Burnus wrote:

This patch moves some generic code for Fortran out of gimplify.cc
to trans-openmp.cc and fixes several issues related to mapping.

Tested with nvptx offloading.
OK for mainline?

Tobias

Caveats:

Besides the issues shown in the comment-out code, there remains also an
issue with implicit mapping - at least for deferred-length strings,
but I wouldn't be surprised if - at least depending on the used
'defaultmap' value (e.g. 'alloc') - there are also issues with array
descriptors.

Note:

Regarding the declare target check for mapping: Without declare
target, my assumption is that the hidden length variable will
get implicitly mapped if needed. Independent of deferred-length
or not, there is probably an issue with 'defaultmap(none)' and
the hidden variable. - In any case, I prefer to defer all those
issues to later (by having them captured in one/several PR).


Tobias

PS: This patch is a follow up to
  [Patch] Fortran/OpenMP: Fix DT struct-component with 'alloc' and
array descr
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/604887.html
which fixed part of the problems. But as discussed on IRC, it did
treat 'alloc'
as special and missed some other map types. - In addition, this patch
has a
much extended test coverage and fixes some more issues found that way.

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings

Previously, array descriptors might have been mapped as 'alloc'
instead of 'to' for 'alloc', not updating the array bounds. The
'alloc' could also appear for 'data exit', failing with a libgomp
assert. In some cases, either array descriptors or deferred-length
string's length variable was not mapped. And, finally, some offset
calculations with array-sections mappings went wrong.

The testcases contain some comment-out tests which require follow-up
work and for which PR exist. Those mostly relate to deferred-length
strings which have several issues beyong OpenMP support.

gcc/fortran/ChangeLog:

	* trans-decl.cc (gfc_get_symbol_decl): Add attributes
	such as 'declare target' also to hidden artificial
	variable for deferred-length character variables.
	* trans-openmp.cc (gfc_trans_omp_array_section,
	gfc_trans_omp_clauses, gfc_trans_omp_target_exit_data):
	Improve mapping of array descriptors and deferred-length
	string variables.

gcc/ChangeLog:

	* gimplify.cc (gimplify_scan_omp_clauses): Remove Fortran
	special case.

libgomp/ChangeLog:

	* testsuite/libgomp.fortran/target-enter-data-3.f90: Uncomment
	'target exit data'.
	* testsuite/libgomp.fortran/target-enter-data-4.f90: New test.
	* testsuite/libgomp.fortran/target-enter-data-5.f90: New test.
	* testsuite/libgomp.fortran/target-enter-data-6.f90: New test.
	* testsuite/libgomp.fortran/target-enter-data-7.f90: New test.

gcc/testsuite/
	* gfortran.dg/goacc/finalize-1.f: Update dg-tree; shows a fix
	for 'finalize' as a ptr is now 'delete' instead of 'release'.
	* gfortran.dg/gomp/pr78260-2.f90: Likewise as elem-size calc moved
	to if (allocated) block
	* gfortran.dg/gomp/target-exit-data.f90: Likewise as a var is now a
	replaced by a MEM< _25 > expression.

 gcc/fortran/trans-decl.cc  |   2 +
 gcc/fortran/trans-openmp.cc| 323 
 gcc/gimplify.cc|  43 +-
 gcc/testsuite/gfortran.dg/goacc/finalize-1.f   |   4 +-
 gcc/testsuite/gfortran.dg/gomp/pr78260-2.f90   |   6 +-
 .../gfortran.dg/gomp/target-exit-data.f90  |   4 +-
 .../libgomp.fortran/target-enter-data-3.f90|   2 +-
 .../libgomp.fortran/target-enter-data-4.f90| 540 +
 .../libgomp.fortran/target-enter-data-5.f90| 540 +
 .../libgomp.fortran/target-enter-data-6.f90| 392 +++
 .../libgomp.fortran/target-enter-data-7.f90|  78 +++
 11 files changed, 1792 insertions(+), 142 deletions(-)

diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index 474920966ec..c12eedfff1b 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -1824,6 +1824,8 @@ gfc_get_symbol_decl (gfc_symbol * sym)
   /* Add attributes to variables.  Functions are handled elsewhere.  */
   attributes = add_attributes_to_decl (sym->attr, NULL_TREE);
   decl_attributes (, attributes, 0);
+  if (sym->ts.deferred)
+decl_attributes (, attributes, 0);
 
   /* Symbols 

RE: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-27 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Monday, February 27, 2023 12:12 PM
> To: Tamar Christina 
> Cc: Tamar Christina via Gcc-patches ; nd
> ; rguent...@suse.de; j...@ventanamicro.com
> Subject: Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask
> by using new optabs [PR108583]
> 
> Tamar Christina  writes:
> > Hi,
> >
> >> > I avoided open coding it with add and shift because it creates a 4
> >> > instructions (and shifts which are typically slow) dependency chain
> >> > instead of a load and multiply.  This change, unless the target is
> >> > known to optimize it further is unlikely to be beneficial.  And by
> >> > the time we get to costing the only alternative is to undo the
> >> > existing pattern and
> >> so you lose the general shift optimization.
> >> >
> >> > So it seemed unwise to open code as shifts, given the codegen out
> >> > of the vectorizer would be degenerate for most targets or one needs
> >> > the more complicated route of costing during pattern matching already.
> >>
> >> Hmm, OK.  That seems like a cost-model thing though, rather than
> >> something that should be exposed through optabs.  And I imagine the
> >> open-coded version would still be better than nothing on targets without
> highpart multiply.
> >>
> >> So how about replacing the hook with one that simply asks whether
> >> division through highpart multiplication is preferred over the add/shift
> sequence?
> >> (Unfortunately it's not going to be possible to work that out from
> >> existing
> >> information.)
> >
> > So this doesn't work for SVE.  For SVE the multiplication widening
> > pass introduces FMAs at gimple level.  So in the cases where the
> > operation is fed from a widening multiplication we end up generating FMA.
> If that was it I could have matched FMA.
> >
> > But it also pushes the multiplication in the second operand because it
> > no longer has a mul to share the results with.
> >
> > In any case, the gimple code is transformed into
> >
> > vect__3.8_122 = .MASK_LOAD (_29, 8B, loop_mask_121);
> > vect_patt_57.9_123 = (vector([8,8]) unsigned short) vect__3.8_122;
> > vect_patt_64.11_127 = .FMA (vect_patt_57.9_123, vect_cst__124, { 257,
> > ... });
> > vect_patt_65.12_128 = vect_patt_64.11_127 >> 8;
> > vect_patt_66.13_129 = .FMA (vect_patt_57.9_123, vect_cst__124,
> > vect_patt_65.12_128);
> > vect_patt_62.14_130 = vect_patt_66.13_129 >> 8;
> > vect_patt_68.15_131 = (vector([8,8]) unsigned charD.21)
> > vect_patt_62.14_130;
> >
> > This transformation is much worse than the original code, it extended
> > the dependency chain with another expensive instruction. I can try to
> > correct this in RTL by matching FMA and shift and splitting into MUL +
> ADDHNB and hope CSE takes care of the extra mul.
> >
> > But this seems like a hack, and it's basically undoing the earlier
> > transformation.  It seems to me that the open coding is a bad idea.
> 
> Could you post the patch that gives this result?  I'll have a poke around.

Sure, I'll post the new series, it needs all of them.

Tamar.

> 
> Thanks,
> Richard
> 
> > Do you still want it Richard?
> >
> > Thanks,
> > Tamar
> >>
> >> Thanks,
> >> Richard
> >>
> >> >
> >> >>
> >> >> Some comments in addition to Richard's:
> >> >>
> >> >> Tamar Christina via Gcc-patches  writes:
> >> >> > Hi All,
> >> >> >
> >> >> > As discussed in the ticket, this replaces the approach for
> >> >> > optimizing the div by bitmask operation from a hook into optabs
> >> >> > implemented through add_highpart.
> >> >> >
> >> >> > In order to be able to use this we need to check whether the
> >> >> > current precision has enough bits to do the operation without
> >> >> > any of the additions
> >> >> overflowing.
> >> >> >
> >> >> > We use range information to determine this and only do the
> >> >> > operation if we're sure am overflow won't occur.
> >> >> >
> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and 
> >> >> issues.
> >> >> >
> >> >> > Ok for master?
> >> >> >
> >> >> > Thanks,
> >> >> > Tamar
> >> >> >
> >> >> > gcc/ChangeLog:
> >> >> >
> >> >> >   PR target/108583
> >> >> >   * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST):
> >> >> Remove.
> >> >> >   * doc/tm.texi.in: Likewise.
> >> >> >   * explow.cc (round_push, align_dynamic_address): Revert
> >> >> > previous
> >> >> patch.
> >> >> >   * expmed.cc (expand_divmod): Likewise.
> >> >> >   * expmed.h (expand_divmod): Likewise.
> >> >> >   * expr.cc (force_operand, expand_expr_divmod): Likewise.
> >> >> >   * optabs.cc (expand_doubleword_mod,
> >> >> expand_doubleword_divmod): Likewise.
> >> >> >   * internal-fn.def (ADDH): New.
> >> >> >   * optabs.def (sadd_highpart_optab, uadd_highpart_optab): New.
> >> >> >   * doc/md.texi: Document them.
> >> >> >   * doc/rtl.texi: Likewise.
> >> >> >   * target.def (can_special_div_by_const): Remove.
> >> >> >   * target.h: Remove tree-core.h include
> >> >> >   * 

Re: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-27 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi,
>
>> > I avoided open coding it with add and shift because it creates a 4
>> > instructions (and shifts which are typically slow) dependency chain
>> > instead of a load and multiply.  This change, unless the target is
>> > known to optimize it further is unlikely to be beneficial.  And by the
>> > time we get to costing the only alternative is to undo the existing 
>> > pattern and
>> so you lose the general shift optimization.
>> >
>> > So it seemed unwise to open code as shifts, given the codegen out of
>> > the vectorizer would be degenerate for most targets or one needs the
>> > more complicated route of costing during pattern matching already.
>> 
>> Hmm, OK.  That seems like a cost-model thing though, rather than something
>> that should be exposed through optabs.  And I imagine the open-coded
>> version would still be better than nothing on targets without highpart 
>> multiply.
>> 
>> So how about replacing the hook with one that simply asks whether division
>> through highpart multiplication is preferred over the add/shift sequence?
>> (Unfortunately it's not going to be possible to work that out from existing
>> information.)
>
> So this doesn't work for SVE.  For SVE the multiplication widening pass 
> introduces
> FMAs at gimple level.  So in the cases where the operation is fed from a 
> widening
> multiplication we end up generating FMA.  If that was it I could have matched 
> FMA.
>
> But it also pushes the multiplication in the second operand because it no 
> longer has
> a mul to share the results with.
>
> In any case, the gimple code is transformed into
>
> vect__3.8_122 = .MASK_LOAD (_29, 8B, loop_mask_121);
> vect_patt_57.9_123 = (vector([8,8]) unsigned short) vect__3.8_122;
> vect_patt_64.11_127 = .FMA (vect_patt_57.9_123, vect_cst__124, { 257, ... });
> vect_patt_65.12_128 = vect_patt_64.11_127 >> 8;
> vect_patt_66.13_129 = .FMA (vect_patt_57.9_123, vect_cst__124, 
> vect_patt_65.12_128);
> vect_patt_62.14_130 = vect_patt_66.13_129 >> 8;
> vect_patt_68.15_131 = (vector([8,8]) unsigned charD.21) vect_patt_62.14_130;
>
> This transformation is much worse than the original code, it extended the 
> dependency
> chain with another expensive instruction. I can try to correct this in RTL by 
> matching
> FMA and shift and splitting into MUL + ADDHNB and hope CSE takes care of the 
> extra mul.
>
> But this seems like a hack, and it's basically undoing the earlier 
> transformation.  It seems to
> me that the open coding is a bad idea.

Could you post the patch that gives this result?  I'll have a poke around.

Thanks,
Richard

> Do you still want it Richard?
>
> Thanks,
> Tamar
>> 
>> Thanks,
>> Richard
>> 
>> >
>> >>
>> >> Some comments in addition to Richard's:
>> >>
>> >> Tamar Christina via Gcc-patches  writes:
>> >> > Hi All,
>> >> >
>> >> > As discussed in the ticket, this replaces the approach for
>> >> > optimizing the div by bitmask operation from a hook into optabs
>> >> > implemented through add_highpart.
>> >> >
>> >> > In order to be able to use this we need to check whether the
>> >> > current precision has enough bits to do the operation without any
>> >> > of the additions
>> >> overflowing.
>> >> >
>> >> > We use range information to determine this and only do the
>> >> > operation if we're sure am overflow won't occur.
>> >> >
>> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and 
>> >> issues.
>> >> >
>> >> > Ok for master?
>> >> >
>> >> > Thanks,
>> >> > Tamar
>> >> >
>> >> > gcc/ChangeLog:
>> >> >
>> >> > PR target/108583
>> >> > * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST):
>> >> Remove.
>> >> > * doc/tm.texi.in: Likewise.
>> >> > * explow.cc (round_push, align_dynamic_address): Revert previous
>> >> patch.
>> >> > * expmed.cc (expand_divmod): Likewise.
>> >> > * expmed.h (expand_divmod): Likewise.
>> >> > * expr.cc (force_operand, expand_expr_divmod): Likewise.
>> >> > * optabs.cc (expand_doubleword_mod,
>> >> expand_doubleword_divmod): Likewise.
>> >> > * internal-fn.def (ADDH): New.
>> >> > * optabs.def (sadd_highpart_optab, uadd_highpart_optab): New.
>> >> > * doc/md.texi: Document them.
>> >> > * doc/rtl.texi: Likewise.
>> >> > * target.def (can_special_div_by_const): Remove.
>> >> > * target.h: Remove tree-core.h include
>> >> > * targhooks.cc (default_can_special_div_by_const): Remove.
>> >> > * targhooks.h (default_can_special_div_by_const): Remove.
>> >> > * tree-vect-generic.cc (expand_vector_operation): Remove hook.
>> >> > * tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook
>> >> and
>> >> > implement new obtab recognition based on range.
>> >> > * tree-vect-stmts.cc (vectorizable_operation): Remove hook.
>> >> >
>> >> > gcc/testsuite/ChangeLog:
>> >> >
>> >> > PR target/108583
>> >> > * 

Re: [PATCH] xtensa: Make use of CLAMPS instruction if configured

2023-02-27 Thread Max Filippov via Gcc-patches
On Sun, Feb 26, 2023 at 9:27 AM Takayuki 'January June' Suwa
 wrote:
>
> This patch introduces the use of CLAMPS instruction when the instruction
> is configured.
>
> /* example */
> int test(int a) {
>   if (a < -512)
> return -512;
>   if (a > 511)
> return 511;
>   return a;
> }
>
> ;; prereq: TARGET_CLAMPS
> test:
> clamps  a2, a2, 9
> ret.n
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa-protos.h (xtensa_match_CLAMPS_imms_p):
> New prototype.
> * config/xtensa/xtensa.cc (xtensa_match_CLAMPS_imms_p):
> New function.
> * config/xtensa/xtensa.h (TARGET_CLAMPS): New macro definition.
> * config/xtensa/xtensa.md (*xtensa_clamps): New insn pattern.
> ---
>  gcc/config/xtensa/xtensa-protos.h |  1 +
>  gcc/config/xtensa/xtensa.cc   | 13 +++
>  gcc/config/xtensa/xtensa.h|  1 +
>  gcc/config/xtensa/xtensa.md   | 37 +++
>  4 files changed, 52 insertions(+)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


[PATCH] gcc: xtensa: add XCHAL_HAVE_{CLAMPS, DEPBITS, EXCLUSIVE, XEA3} to dynconfig

2023-02-27 Thread Max Filippov via Gcc-patches
gcc/
* config/xtensa/xtensa-dynconfig.cc (xtensa_get_config_v2)
(xtensa_get_config_v3): New functions.

include/
* xtensa-dynconfig.h (xtensa_config_v3): New struct.
(xtensa_get_config_v3): New declaration.
(XCHAL_HAVE_CLAMPS, XCHAL_HAVE_DEPBITS, XCHAL_HAVE_EXCLUSIVE)
(XCHAL_HAVE_XEA3, XTENSA_CONFIG_V3_ENTRY_LIST): New definitions.
(XTENSA_CONFIG_INSTANCE_LIST): Add xtensa_config_v3 instance.
(XTENSA_CONFIG_ENTRY_LIST): Add XTENSA_CONFIG_V3_ENTRY_LIST.
---
 gcc/config/xtensa/xtensa-dynconfig.cc | 24 +
 include/xtensa-dynconfig.h| 50 ++-
 2 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/gcc/config/xtensa/xtensa-dynconfig.cc 
b/gcc/config/xtensa/xtensa-dynconfig.cc
index db8ff43c498b..9aea9f253c25 100644
--- a/gcc/config/xtensa/xtensa-dynconfig.cc
+++ b/gcc/config/xtensa/xtensa-dynconfig.cc
@@ -158,6 +158,30 @@ const struct xtensa_config_v1 *xtensa_get_config_v1 (void)
   return config;
 }
 
+const struct xtensa_config_v2 *xtensa_get_config_v2 (void)
+{
+  static const struct xtensa_config_v2 *config;
+  static struct xtensa_config_v2 def;
+
+  if (!config)
+config = (const struct xtensa_config_v2 *) xtensa_load_config 
("xtensa_config_v2",
+  
_config_v2,
+  );
+  return config;
+}
+
+const struct xtensa_config_v3 *xtensa_get_config_v3 (void)
+{
+  static const struct xtensa_config_v3 *config;
+  static struct xtensa_config_v3 def;
+
+  if (!config)
+config = (const struct xtensa_config_v3 *) xtensa_load_config 
("xtensa_config_v3",
+  
_config_v3,
+  );
+  return config;
+}
+
 const char * const *xtensa_get_config_strings (void)
 {
   static const char * const *config_strings;
diff --git a/include/xtensa-dynconfig.h b/include/xtensa-dynconfig.h
index bb72d6ab22d7..2cc15cc99112 100644
--- a/include/xtensa-dynconfig.h
+++ b/include/xtensa-dynconfig.h
@@ -104,6 +104,14 @@ struct xtensa_config_v2
   int xtensa_march_earliest;
 };
 
+struct xtensa_config_v3
+{
+  int xchal_have_clamps;
+  int xchal_have_depbits;
+  int xchal_have_exclusive;
+  int xchal_have_xea3;
+};
+
 typedef struct xtensa_isa_internal_struct xtensa_isa_internal;
 
 extern const void *xtensa_load_config (const char *name,
@@ -111,6 +119,7 @@ extern const void *xtensa_load_config (const char *name,
   const void *no_name_def);
 extern const struct xtensa_config_v1 *xtensa_get_config_v1 (void);
 extern const struct xtensa_config_v2 *xtensa_get_config_v2 (void);
+extern const struct xtensa_config_v3 *xtensa_get_config_v3 (void);
 
 #ifdef XTENSA_CONFIG_DEFINITION
 
@@ -182,6 +191,22 @@ extern const struct xtensa_config_v2 *xtensa_get_config_v2 
(void);
 #define XTENSA_MARCH_EARLIEST 0
 #endif
 
+#ifndef XCHAL_HAVE_CLAMPS
+#define XCHAL_HAVE_CLAMPS 0
+#endif
+
+#ifndef XCHAL_HAVE_DEPBITS
+#define XCHAL_HAVE_DEPBITS 0
+#endif
+
+#ifndef XCHAL_HAVE_EXCLUSIVE
+#define XCHAL_HAVE_EXCLUSIVE 0
+#endif
+
+#ifndef XCHAL_HAVE_XEA3
+#define XCHAL_HAVE_XEA3 0
+#endif
+
 #define XTENSA_CONFIG_ENTRY(a) a
 
 #define XTENSA_CONFIG_V1_ENTRY_LIST \
@@ -245,17 +270,27 @@ extern const struct xtensa_config_v2 
*xtensa_get_config_v2 (void);
 XTENSA_CONFIG_ENTRY(XTENSA_MARCH_LATEST), \
 XTENSA_CONFIG_ENTRY(XTENSA_MARCH_EARLIEST)
 
+#define XTENSA_CONFIG_V3_ENTRY_LIST \
+XTENSA_CONFIG_ENTRY(XCHAL_HAVE_CLAMPS), \
+XTENSA_CONFIG_ENTRY(XCHAL_HAVE_DEPBITS), \
+XTENSA_CONFIG_ENTRY(XCHAL_HAVE_EXCLUSIVE), \
+XTENSA_CONFIG_ENTRY(XCHAL_HAVE_XEA3)
+
 #define XTENSA_CONFIG_INSTANCE_LIST \
 const struct xtensa_config_v1 xtensa_config_v1 = { \
 XTENSA_CONFIG_V1_ENTRY_LIST, \
 }; \
 const struct xtensa_config_v2 xtensa_config_v2 = { \
 XTENSA_CONFIG_V2_ENTRY_LIST, \
+}; \
+const struct xtensa_config_v3 xtensa_config_v3 = { \
+XTENSA_CONFIG_V3_ENTRY_LIST, \
 }
 
 #define XTENSA_CONFIG_ENTRY_LIST \
 XTENSA_CONFIG_V1_ENTRY_LIST, \
-XTENSA_CONFIG_V2_ENTRY_LIST
+XTENSA_CONFIG_V2_ENTRY_LIST, \
+XTENSA_CONFIG_V3_ENTRY_LIST
 
 #else /* XTENSA_CONFIG_DEFINITION */
 
@@ -434,6 +469,19 @@ const struct xtensa_config_v2 xtensa_config_v2 = { \
 #undef XTENSA_MARCH_EARLIEST
 #define XTENSA_MARCH_EARLIEST  (xtensa_get_config_v2 
()->xtensa_march_earliest)
 
+
+#undef XCHAL_HAVE_CLAMPS
+#define XCHAL_HAVE_CLAMPS  (xtensa_get_config_v3 
()->xchal_have_clamps)
+
+#undef XCHAL_HAVE_DEPBITS
+#define XCHAL_HAVE_DEPBITS (xtensa_get_config_v3 
()->xchal_have_depbits)
+
+#undef XCHAL_HAVE_EXCLUSIVE
+#define XCHAL_HAVE_EXCLUSIVE   (xtensa_get_config_v3 
()->xchal_have_exclusive)
+
+#undef XCHAL_HAVE_XEA3
+#define XCHAL_HAVE_XEA3(xtensa_get_config_v3 
()->xchal_have_xea3)
+
 

[OG12,committed] Update dg-dump-scan for ... (was: [Patch] Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings)

2023-02-27 Thread Tobias Burnus

Hi Thomas,

On 25.02.23 10:11, Thomas Schwinge wrote:

Do to the scan patterns need adjusting, or is something wrong?


The former. Regarding:

* gfortran.dg/goacc/finalize-1.f - for 'acc exit data':
!$ACC EXIT DATA FINALIZE DELETE (del_f_p(2:5))

  Here, 'map\\(to:del_f_p [pointer set]' changed to 'map(release:del_f_p'
  By itself, this is handled identically in libgomp. However, there is also
  a 'finalize' - which change original-dump's 'release' to 'delete' in the
  gimple dump. 'delete' is handled differently in terms of the refcount.
  → I believe the patch actually fixed an OpenACC issue by also force-unmapping
  the descriptor and not only the pointer target.

* gfortran.dg/gomp/pr78260-2.f90
  Here, the '* 4' multiplication moved from the expression, shown in the 'to'
  clause to the expression evalulation.
  Reasons: To work properly with deferred-length strings (first, to take the
  current value and not some saved expr and to avoid issues when the var is
  unallocated - especially with absent 'optional' variables.)

Side remark: On OG12 there are too many FAIL, compared to mainline. That does
not have anything to do with the items above - but it still makes working
with OG12 harder. I hope that OG13 will have fewer fails.

Tobias

PS: --word-diff for 'finalize-1.f':
original: [-map\\(to:del_f_p \\\[pointer set, len:-]{+map\\(release:del_f_p 
\\\[len:+}
gimple: [-map\\(to:del_f_p \\\[pointer set, len:-]{+map\\(delete:del_f_p 
\\\[len:+}

and for pr78260-2.f90, the "len: " changed (twice) as in
  [-D.\[0-9\]+ \\* 4\\\]\\)-]{+D.\[0-9\]+\\\]\\)+}
and there is now additionally the following to ensure the '* 4' it not lost:
! { dg-final { scan-tree-dump-times "D\\.\[0-9\]+ = D\\.\[0-9\]+ \\* 4;" 2 
"original" } }
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
commit e4de87a2309bb6eecc9d4a391e4454b6e51289c3
Author: Tobias Burnus 
Date:   Mon Feb 27 12:47:54 2023 +0100

Update dg-dump-scan for "Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings"

Follow-up to commit 55a18d4744258e3909568e425f9f473c49f9d13f
"Fortran/OpenMP: Fix mapping of array descriptors and deferred-length strings"
updating the dumps.
* For the goacc testcase, 'to' changed to 'release' and due to 'finally' then
  to 'delete', which can be regarded as bugfix.
* For pr78260-2.f90, the calculation moved inside the 'if(...->data == NULL)'
  block to handle deferred-string length vars better, esp. when 'optional'.

gcc/testsuite/:
* gfortran.dg/goacc/finalize-1.f: Update scan-tree-dump-times for
mapping changes.
* gfortran.dg/gomp/pr78260-2.f90: Likewise.
---
 gcc/testsuite/ChangeLog.omp  | 6 ++
 gcc/testsuite/gfortran.dg/goacc/finalize-1.f | 4 ++--
 gcc/testsuite/gfortran.dg/gomp/pr78260-2.f90 | 6 --
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/ChangeLog.omp b/gcc/testsuite/ChangeLog.omp
index 98e41687633..d4217ccc71f 100644
--- a/gcc/testsuite/ChangeLog.omp
+++ b/gcc/testsuite/ChangeLog.omp
@@ -1,3 +1,9 @@
+2023-02-27  Tobias Burnus  
+
+	* gfortran.dg/goacc/finalize-1.f: Update scan-tree-dump-times for
+	mapping changes.
+	* gfortran.dg/gomp/pr78260-2.f90: Likewise.
+
 2023-02-23  Andrew Stubbs  
 
 	Backport from mainline:
diff --git a/gcc/testsuite/gfortran.dg/goacc/finalize-1.f b/gcc/testsuite/gfortran.dg/goacc/finalize-1.f
index 1e5bf0ba1e6..23f0ffc627e 100644
--- a/gcc/testsuite/gfortran.dg/goacc/finalize-1.f
+++ b/gcc/testsuite/gfortran.dg/goacc/finalize-1.f
@@ -20,8 +20,8 @@
 ! { dg-final { scan-tree-dump-times "(?n)#pragma omp target oacc_exit_data map\\(delete:del_f \\\[len: \[0-9\]+\\\]\\) finalize$" 1 "gimple" } }
 
 !$ACC EXIT DATA FINALIZE DELETE (del_f_p(2:5))
-! { dg-final { scan-tree-dump-times "(?n)#pragma acc exit data map\\(release:\\*\\(integer\\(kind=.\\)\\\[0:\\\] \\*\\) parm\\.0\\.data \\\[len: \[^\\\]\]+\\\]\\) map\\(to:del_f_p \\\[pointer set, len: \[0-9\]+\\\]\\) map\\(alloc:\\(integer\\(kind=1\\)\\\[0:\\\] \\* restrict\\) del_f_p\\.data \\\[pointer assign, bias: \\(.*int.*\\) parm\\.0\\.data - \\(.*int.*\\) del_f_p\\.data\\\]\\) finalize;$" 1 "original" } }
-! { dg-final { scan-tree-dump-times "(?n)#pragma omp target oacc_exit_data map\\(delete:MEM <\[^>\]+> \\\[\\(integer\\(kind=.\\)\\\[0:\\\] \\*\\)_\[0-9\]+\\\] \\\[len: \[^\\\]\]+\\\]\\) map\\(to:del_f_p \\\[pointer set, len: \[0-9\]+\\\]\\) map\\(alloc:del_f_p\\.data \\\[pointer assign, bias: \[^\\\]\]+\\\]\\) finalize$" 1 "gimple" } }
+! { dg-final { scan-tree-dump-times "(?n)#pragma acc exit data map\\(release:\\*\\(integer\\(kind=.\\)\\\[0:\\\] \\*\\) parm\\.0\\.data \\\[len: \[^\\\]\]+\\\]\\) map\\(release:del_f_p \\\[len: \[0-9\]+\\\]\\) 

[PING 3] [PATCH] swap: Fix incorrect lane extraction by vec_extract() [PR106770]

2023-02-27 Thread Surya Kumari Jangala via Gcc-patches
Hello,

Ping https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609374.html

Thanks,

Surya

On 04/01/23 1:58 pm, Surya Kumari Jangala via Gcc-patches wrote:
> swap: Fix incorrect lane extraction by vec_extract() [PR106770]
> 
> In the routine rs6000_analyze_swaps(), special handling of swappable
> instructions is done even if the webs that contain the swappable
> instructions are not optimized, i.e., the webs do not contain any
> permuting load/store instructions along with the associated register
> swap instructions. Doing special handling in such webs will result in
> the extracted lane being adjusted unnecessarily for vec_extract.
> 
> Modifying swappable instructions is also incorrect in webs where
> loads/stores on quad word aligned addresses are changed to lvx/stvx.
> Similarly, in webs where swap(load(vector constant)) instructions are
> replaced with load(swapped vector constant), the swappable
> instructions should not be modified.
> 
> 2023-01-04  Surya Kumari Jangala  
> 
> gcc/
>   PR rtl-optimization/106770
>   * rs6000-p8swap.cc (rs6000_analyze_swaps): .
> 
> gcc/testsuite/
>   PR rtl-optimization/106770
>   * gcc.target/powerpc/pr106770.c: New test.
> ---
> 
> diff --git a/gcc/config/rs6000/rs6000-p8swap.cc 
> b/gcc/config/rs6000/rs6000-p8swap.cc
> index 19fbbfb67dc..7ed39251df9 100644
> --- a/gcc/config/rs6000/rs6000-p8swap.cc
> +++ b/gcc/config/rs6000/rs6000-p8swap.cc
> @@ -179,6 +179,9 @@ class swap_web_entry : public web_entry_base
>unsigned int special_handling : 4;
>/* Set if the web represented by this entry cannot be optimized.  */
>unsigned int web_not_optimizable : 1;
> +  /* Set if the web represented by this entry has been optimized, ie,
> + register swaps of permuting loads/stores have been removed.  */
> +  unsigned int web_is_optimized : 1;
>/* Set if this insn should be deleted.  */
>unsigned int will_delete : 1;
>  };
> @@ -2627,22 +2630,43 @@ rs6000_analyze_swaps (function *fun)
>/* For each load and store in an optimizable web (which implies
>   the loads and stores are permuting), find the associated
>   register swaps and mark them for removal.  Due to various
> - optimizations we may mark the same swap more than once.  Also
> - perform special handling for swappable insns that require it.  */
> + optimizations we may mark the same swap more than once. Fix up
> + the non-permuting loads and stores by converting them into
> + permuting ones.  */
>for (i = 0; i < e; ++i)
>  if ((insn_entry[i].is_load || insn_entry[i].is_store)
>   && insn_entry[i].is_swap)
>{
>   swap_web_entry* root_entry
> = (swap_web_entry*)((_entry[i])->unionfind_root ());
> - if (!root_entry->web_not_optimizable)
> + if (!root_entry->web_not_optimizable) {
> mark_swaps_for_removal (insn_entry, i);
> +  root_entry->web_is_optimized = true;
> +}
>}
> -else if (insn_entry[i].is_swappable && insn_entry[i].special_handling)
> +else if (insn_entry[i].is_swappable
> + && (insn_entry[i].special_handling == SH_NOSWAP_LD ||
> + insn_entry[i].special_handling == SH_NOSWAP_ST))
> +  {
> +swap_web_entry* root_entry
> +  = (swap_web_entry*)((_entry[i])->unionfind_root ());
> +if (!root_entry->web_not_optimizable) {
> +  handle_special_swappables (insn_entry, i);
> +  root_entry->web_is_optimized = true;
> +}
> +  }
> +
> +  /* Perform special handling for swappable insns that require it. 
> + Note that special handling should be done only for those 
> + swappable insns that are present in webs optimized above.  */
> +  for (i = 0; i < e; ++i)
> +if (insn_entry[i].is_swappable && insn_entry[i].special_handling &&
> +!(insn_entry[i].special_handling == SH_NOSWAP_LD || 
> +  insn_entry[i].special_handling == SH_NOSWAP_ST))
>{
>   swap_web_entry* root_entry
> = (swap_web_entry*)((_entry[i])->unionfind_root ());
> - if (!root_entry->web_not_optimizable)
> + if (root_entry->web_is_optimized)
> handle_special_swappables (insn_entry, i);
>}
>  
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr106770.c 
> b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> new file mode 100644
> index 000..84e9aead975
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr106770.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O3 " } */
> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
> +
> +/* Test case to resolve PR106770  */
> +
> +#include 
> +
> +int cmp2(double a, double b)
> +{
> +vector double va = vec_promote(a, 1);
> +vector double vb = vec_promote(b, 1);
> +vector long long vlt = (vector long long)vec_cmplt(va, vb);
> +vector long long vgt = (vector long long)vec_cmplt(vb, va);
> +

[PATCH] rs6000: Inline lrint and lrintf

2023-02-27 Thread Ajit Agarwal via Gcc-patches
Hello All:

Here is the patch for Inline lrint and lrintf. Currently glibc don't
use __builtin_lrint as they inline lrint with fctid/fctiw instruction.
With the below changes such inlines are not required and lrint builtin
can be used.

Bootstrapped and regtested on powerpc64-linux-gnu.

rs6000: Inline lrint,lrintf

For hard-float powerpc, GCC should support inline code generation
for the lrint or lrintf built-in functions, subject only to
-fno-math-errno (the condition -fno-math-errno is already checked
in builtins.c:expand_builtin_int_roundingfn_2, so the back end's
lrint insn patterns do not need to check that condition).

TARGET_FPRND has nothing to do with fctid and fctiw.
Remove the TARGET_FPRND from lrintdi2 pattern.

2023-02-27  Ajit Kumar Agarwal  

gcc/ChangeLog:

* config/rs6000/rs6000.md (lrintdi2): Remove TARGET_FPRND
condition from pattern.
---
 gcc/config/rs6000/rs6000.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 81bffb04ceb..65c851e11fb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -6654,7 +6654,7 @@ (define_insn "lrintdi2"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d")
(unspec:DI [(match_operand:SFDF 1 "gpc_reg_operand" "")]
   UNSPEC_FCTID))]
-  "TARGET_HARD_FLOAT && TARGET_FPRND"
+  "TARGET_HARD_FLOAT"
   "fctid %0,%1"
   [(set_attr "type" "fp")])
 
-- 
2.31.1




RE: [PATCH 1/2]middle-end: Fix wrong overmatching of div-bitmask by using new optabs [PR108583]

2023-02-27 Thread Tamar Christina via Gcc-patches
Hi,

> > I avoided open coding it with add and shift because it creates a 4
> > instructions (and shifts which are typically slow) dependency chain
> > instead of a load and multiply.  This change, unless the target is
> > known to optimize it further is unlikely to be beneficial.  And by the
> > time we get to costing the only alternative is to undo the existing pattern 
> > and
> so you lose the general shift optimization.
> >
> > So it seemed unwise to open code as shifts, given the codegen out of
> > the vectorizer would be degenerate for most targets or one needs the
> > more complicated route of costing during pattern matching already.
> 
> Hmm, OK.  That seems like a cost-model thing though, rather than something
> that should be exposed through optabs.  And I imagine the open-coded
> version would still be better than nothing on targets without highpart 
> multiply.
> 
> So how about replacing the hook with one that simply asks whether division
> through highpart multiplication is preferred over the add/shift sequence?
> (Unfortunately it's not going to be possible to work that out from existing
> information.)

So this doesn't work for SVE.  For SVE the multiplication widening pass 
introduces
FMAs at gimple level.  So in the cases where the operation is fed from a 
widening
multiplication we end up generating FMA.  If that was it I could have matched 
FMA.

But it also pushes the multiplication in the second operand because it no 
longer has
a mul to share the results with.

In any case, the gimple code is transformed into

vect__3.8_122 = .MASK_LOAD (_29, 8B, loop_mask_121);
vect_patt_57.9_123 = (vector([8,8]) unsigned short) vect__3.8_122;
vect_patt_64.11_127 = .FMA (vect_patt_57.9_123, vect_cst__124, { 257, ... });
vect_patt_65.12_128 = vect_patt_64.11_127 >> 8;
vect_patt_66.13_129 = .FMA (vect_patt_57.9_123, vect_cst__124, 
vect_patt_65.12_128);
vect_patt_62.14_130 = vect_patt_66.13_129 >> 8;
vect_patt_68.15_131 = (vector([8,8]) unsigned charD.21) vect_patt_62.14_130;

This transformation is much worse than the original code, it extended the 
dependency
chain with another expensive instruction. I can try to correct this in RTL by 
matching
FMA and shift and splitting into MUL + ADDHNB and hope CSE takes care of the 
extra mul.

But this seems like a hack, and it's basically undoing the earlier 
transformation.  It seems to
me that the open coding is a bad idea.

Do you still want it Richard?

Thanks,
Tamar
> 
> Thanks,
> Richard
> 
> >
> >>
> >> Some comments in addition to Richard's:
> >>
> >> Tamar Christina via Gcc-patches  writes:
> >> > Hi All,
> >> >
> >> > As discussed in the ticket, this replaces the approach for
> >> > optimizing the div by bitmask operation from a hook into optabs
> >> > implemented through add_highpart.
> >> >
> >> > In order to be able to use this we need to check whether the
> >> > current precision has enough bits to do the operation without any
> >> > of the additions
> >> overflowing.
> >> >
> >> > We use range information to determine this and only do the
> >> > operation if we're sure am overflow won't occur.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and 
> >> issues.
> >> >
> >> > Ok for master?
> >> >
> >> > Thanks,
> >> > Tamar
> >> >
> >> > gcc/ChangeLog:
> >> >
> >> >  PR target/108583
> >> >  * doc/tm.texi (TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST):
> >> Remove.
> >> >  * doc/tm.texi.in: Likewise.
> >> >  * explow.cc (round_push, align_dynamic_address): Revert previous
> >> patch.
> >> >  * expmed.cc (expand_divmod): Likewise.
> >> >  * expmed.h (expand_divmod): Likewise.
> >> >  * expr.cc (force_operand, expand_expr_divmod): Likewise.
> >> >  * optabs.cc (expand_doubleword_mod,
> >> expand_doubleword_divmod): Likewise.
> >> >  * internal-fn.def (ADDH): New.
> >> >  * optabs.def (sadd_highpart_optab, uadd_highpart_optab): New.
> >> >  * doc/md.texi: Document them.
> >> >  * doc/rtl.texi: Likewise.
> >> >  * target.def (can_special_div_by_const): Remove.
> >> >  * target.h: Remove tree-core.h include
> >> >  * targhooks.cc (default_can_special_div_by_const): Remove.
> >> >  * targhooks.h (default_can_special_div_by_const): Remove.
> >> >  * tree-vect-generic.cc (expand_vector_operation): Remove hook.
> >> >  * tree-vect-patterns.cc (vect_recog_divmod_pattern): Remove hook
> >> and
> >> >  implement new obtab recognition based on range.
> >> >  * tree-vect-stmts.cc (vectorizable_operation): Remove hook.
> >> >
> >> > gcc/testsuite/ChangeLog:
> >> >
> >> >  PR target/108583
> >> >  * gcc.dg/vect/vect-div-bitmask-4.c: New test.
> >> >  * gcc.dg/vect/vect-div-bitmask-5.c: New test.
> >> >
> >> > --- inline copy of patch --
> >> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> >> >
> >>
> 7235d34c4b30949febfa10d5a626ac9358281cfa..02004c4b0f4d88dffe980f
> 74080
> >> 3
> >> > 8595e21af35d 100644
> >> > --- a/gcc/doc/md.texi
> >> > +++ b/gcc/doc/md.texi
> >> > @@ -5668,6 +5668,18 @@ represented in RTL using a
> >> 

[PATCH] RISC-V: Add permutation C/C++ support

2023-02-27 Thread juzhe . zhong
From: Ju-Zhe Zhong 

gcc/ChangeLog:

* config/riscv/riscv-protos.h (enum vlen_enum): New enum.
(slide1_sew64_helper): New function.
* config/riscv/riscv-v.cc (compute_vlmax): Ditto.
(get_unknown_min_value): Ditto.
(force_vector_length_operand): Ditto.
(gen_no_side_effects_vsetvl_rtx): Ditto.
(get_vl_x2_rtx): Ditto.
(slide1_sew64_helper): Ditto.
* config/riscv/riscv-vector-builtins-bases.cc (class slideop): New 
class.
(class vrgather): Ditto.
(class vrgatherei16): Ditto.
(class vcompress): Ditto.
(BASE): Ditto.
* config/riscv/riscv-vector-builtins-bases.h: Ditto.
* config/riscv/riscv-vector-builtins-functions.def (vslideup): Ditto.
(vslidedown): Ditto.
(vslide1up): Ditto.
(vslide1down): Ditto.
(vfslide1up): Ditto.
(vfslide1down): Ditto.
(vrgather): Ditto.
(vrgatherei16): Ditto.
(vcompress): Ditto.
* config/riscv/riscv-vector-builtins-types.def (DEF_RVV_EI16_OPS): New 
macro.
(vint8mf8_t): Ditto.
(vint8mf4_t): Ditto.
(vint8mf2_t): Ditto.
(vint8m1_t): Ditto.
(vint8m2_t): Ditto.
(vint8m4_t): Ditto.
(vint16mf4_t): Ditto.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vint32mf2_t): Ditto.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vint64m1_t): Ditto.
(vint64m2_t): Ditto.
(vint64m4_t): Ditto.
(vint64m8_t): Ditto.
(vuint8mf8_t): Ditto.
(vuint8mf4_t): Ditto.
(vuint8mf2_t): Ditto.
(vuint8m1_t): Ditto.
(vuint8m2_t): Ditto.
(vuint8m4_t): Ditto.
(vuint16mf4_t): Ditto.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vuint32mf2_t): Ditto.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
(vuint64m1_t): Ditto.
(vuint64m2_t): Ditto.
(vuint64m4_t): Ditto.
(vuint64m8_t): Ditto.
(vfloat32mf2_t): Ditto.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vfloat64m1_t): Ditto.
(vfloat64m2_t): Ditto.
(vfloat64m4_t): Ditto.
(vfloat64m8_t): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_EI16_OPS): Ditto.
* config/riscv/riscv.md: Adjust RVV instruction types.
* config/riscv/vector-iterators.md (down): New iterator.
(=vd,vr): New attribute.
(UNSPEC_VSLIDE1UP): New unspec.
* config/riscv/vector.md (@pred_slide): New pattern.
(*pred_slide): Ditto.
(*pred_slide_extended): Ditto.
(@pred_gather): Ditto.
(@pred_gather_scalar): Ditto.
(@pred_gatherei16): Ditto.
(@pred_compress): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/binop_vx_constraint-167.c: New test.
* gcc.target/riscv/rvv/base/binop_vx_constraint-168.c: New test.
* gcc.target/riscv/rvv/base/binop_vx_constraint-169.c: New test.
* gcc.target/riscv/rvv/base/binop_vx_constraint-170.c: New test.
* gcc.target/riscv/rvv/base/binop_vx_constraint-171.c: New test.
* gcc.target/riscv/rvv/base/binop_vx_constraint-172.c: New test.
* gcc.target/riscv/rvv/base/binop_vx_constraint-173.c: New test.
* gcc.target/riscv/rvv/base/binop_vx_constraint-174.c: New test.

---
 gcc/config/riscv/riscv-protos.h   |  12 +
 gcc/config/riscv/riscv-v.cc   | 171 
 .../riscv/riscv-vector-builtins-bases.cc  |  73 +
 .../riscv/riscv-vector-builtins-bases.h   |   9 +
 .../riscv/riscv-vector-builtins-functions.def |  12 +-
 .../riscv/riscv-vector-builtins-types.def |  59 
 gcc/config/riscv/riscv-vector-builtins.cc |  88 +-
 gcc/config/riscv/riscv.md |  28 +-
 gcc/config/riscv/vector-iterators.md  |  77 ++
 gcc/config/riscv/vector.md| 254 --
 .../riscv/rvv/base/binop_vx_constraint-167.c  | 143 ++
 .../riscv/rvv/base/binop_vx_constraint-168.c  | 143 ++
 .../riscv/rvv/base/binop_vx_constraint-169.c  | 163 +++
 .../riscv/rvv/base/binop_vx_constraint-170.c  | 163 +++
 .../riscv/rvv/base/binop_vx_constraint-171.c  |  75 ++
 .../riscv/rvv/base/binop_vx_constraint-172.c  |  71 +
 .../riscv/rvv/base/binop_vx_constraint-173.c  |  75 ++
 .../riscv/rvv/base/binop_vx_constraint-174.c  |  71 +
 18 files changed, 1646 insertions(+), 41 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/base/binop_vx_constraint-167.c
 

[Patch] c-c++-common/Warray-bounds.c: fix excess warnings on LLP64

2023-02-27 Thread Jonathan Yong via Gcc-patches

Attached patch OK?

Excess errors on x86_64-w64-mingw32:

/home/user/p/gcc/src/gcc-git/gcc/testsuite/c-c++-common/Warray-bounds.c:50:3: 
warning: array subscript 4611686018427387902 is above array bounds of 'struct 
S16[]' [-Warray-bounds=]

/home/user/p/gcc/src/gcc-git/gcc/testsuite/c-c++-common/Warray-bounds.c:55:3: 
warning: array subscript 4611686018427387902 is above array bounds of 'struct 
S16[]' [-Warray-bounds=]

/home/user/p/gcc/src/gcc-git/gcc/testsuite/c-c++-common/Warray-bounds.c:90:3: 
warning: array subscript 658812288346769699 is above array bounds of 'struct 
S16[][7]' [-Warray-bounds=]

gcc/testsuite/ChangeLog:

* c-c++-common/Warray-bounds.c: Fix excess warnings on

LLP64.From 559c2ee7cf208227e8d6c4cf7106815e45b10590 Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Mon, 27 Feb 2023 10:20:52 +
Subject: [PATCH] c-c++-common/Warray-bounds.c: fix excess warnings on LLP64

Excess errors on x86_64-w64-mingw32:
/home/user/p/gcc/src/gcc-git/gcc/testsuite/c-c++-common/Warray-bounds.c:50:3: warning: array subscript 4611686018427387902 is above array bounds of 'struct S16[]' [-Warray-bounds=]
/home/user/p/gcc/src/gcc-git/gcc/testsuite/c-c++-common/Warray-bounds.c:55:3: warning: array subscript 4611686018427387902 is above array bounds of 'struct S16[]' [-Warray-bounds=]
/home/user/p/gcc/src/gcc-git/gcc/testsuite/c-c++-common/Warray-bounds.c:90:3: warning: array subscript 658812288346769699 is above array bounds of 'struct S16[][7]' [-Warray-bounds=]

gcc/testsuite/ChangeLog:

	* c-c++-common/Warray-bounds.c: Fix excess warnings on
	LLP64.

Signed-off-by: Jonathan Yong <10wa...@gmail.com>
---
 gcc/testsuite/c-c++-common/Warray-bounds.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/Warray-bounds.c b/gcc/testsuite/c-c++-common/Warray-bounds.c
index 815badc0241..ce5827d6e2c 100644
--- a/gcc/testsuite/c-c++-common/Warray-bounds.c
+++ b/gcc/testsuite/c-c++-common/Warray-bounds.c
@@ -47,12 +47,12 @@ void farr_s16 (void)
   T (ax[-1]); /* { dg-warning "array subscript -1 is below array bounds" } */
   T (ax[0]);
 
-  T (ax[DIFF_MAX / 2 - 1]);
+  T (ax[DIFF_MAX / 2 - 1]);   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "llp64" { target llp64 } } */
   T (ax[DIFF_MAX / 2]);   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
   T (ax[DIFF_MAX / 2 + (size_t)1]);   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
   T (ax[SIZE_MAX]);   /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
   T (ax[R (DIFF_MIN, -1)]);   /* { dg-warning "array subscript -1 is below array bounds" } */
-  T (ax[R (DIFF_MAX / 2 - 1, DIFF_MAX)]);
+  T (ax[R (DIFF_MAX / 2 - 1, DIFF_MAX)]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "llp64" { target llp64 } } */
   T (ax[R (DIFF_MAX / 2, DIFF_MAX)]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
 }
 
@@ -87,7 +87,7 @@ void farr_s16_7 (void)
   T (ax_7[R (-1, DIFF_MAX)][0]);
 
   T (ax_7[R ( 1, DIFF_MAX)][0]);
-  T (ax_7[R (DIFF_MAX / 14 - 1, DIFF_MAX)][0]);
+  T (ax_7[R (DIFF_MAX / 14 - 1, DIFF_MAX)][0]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" "llp64" { target llp64 } } */
 
   i = R (DIFF_MAX / 14, DIFF_MAX);
   T (ax_7[i][0]); /* { dg-warning "array subscript \[0-9\]+ is above array bounds" } */
@@ -199,7 +199,7 @@ void fb (struct B *p)
 
 void f_cststring (int i)
 {
-  T (""[DIFF_MIN]);   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(const )?char *\\\[1]" "string" { xfail lp64 } } */
+  T (""[DIFF_MIN]);   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(const )?char *\\\[1]" "string" { xfail { lp64 || llp64 } } } */
   T (""[DIFF_MIN + 1]);   /* { dg-warning "array subscript -\[0-9\]+ is below array bounds of .(const )?char *\\\[1]" "string" } */
   T (""[-1]); /* { dg-warning "array subscript -1 is below array bounds of .(const )?char *\\\[1]" "string" } */
   T (""[0]);
-- 
2.39.2



Re: [PING] Re: [PATCH 2/2] Corrected pr25521.c target matching.

2023-02-27 Thread Cupertino Miranda via Gcc-patches


Hi Jeff,

Please, please, give me some feedback on this one.
I just don't want to have to keep asking you for time on this small
pending patches that I also have to keep track on.

I realized your committed the other one. Thank you !

Best regards,
Cupertino


Cupertino Miranda writes:

> PING !
>
> Cupertino Miranda via Gcc-patches writes:
>
>> Hi Jeff,
>>
>> Can you please confirm if the patch is Ok?
>>
>> Thanks,
>> Cupertino
>>
>>> Cupertino Miranda via Gcc-patches writes:
>>>
 Thank you for the comments and suggestions.
 I have changed the patch.

 Unfortunately in case of rx target I could not make
 scan-assembler-symbol-section to match. I believe it is because the
 .section and .global entries order is reversed in this target.

 Patch in inlined below. looking forward to your comments.

 Cupertino

 diff --git a/gcc/testsuite/gcc.dg/pr25521.c 
 b/gcc/testsuite/gcc.dg/pr25521.c
 index 63363a03b9f..82b4cd88ec0 100644
 --- a/gcc/testsuite/gcc.dg/pr25521.c
 +++ b/gcc/testsuite/gcc.dg/pr25521.c
 @@ -2,9 +2,10 @@
 sections.

 { dg-require-effective-target elf }
 -   { dg-do compile } */
 +   { dg-do compile }
 +   { dg-skip-if "" { ! const_volatile_readonly_section } } */

  const volatile int foo = 30;

 -
 -/* { dg-final { scan-assembler "\\.s\?rodata" } } */
 +/* { dg-final { scan-assembler {.section C,} { target { rx-*-* } } } } */
 +/* { dg-final { scan-assembler-symbol-section {^_?foo$} 
 {^\.(const|s?rodata)} { target { ! "rx-*-*" } } } } */
 diff --git a/gcc/testsuite/lib/target-supports.exp 
 b/gcc/testsuite/lib/target-supports.exp
 index c0694af2338..91aafd89909 100644
 --- a/gcc/testsuite/lib/target-supports.exp
 +++ b/gcc/testsuite/lib/target-supports.exp
 @@ -12295,3 +12295,13 @@ proc check_is_prog_name_available { prog } {

  return 1
  }
 +
 +# returns 1 if target does selects a readonly section for const volatile 
 variables.
 +proc check_effective_target_const_volatile_readonly_section { } {
 +
 +if { [istarget powerpc-*-*]
 +|| [check-flags { "" { powerpc64-*-* } { -m32 } }] } {
 +  return 0
 +}
 +  return 1
 +}


 Jeff Law writes:

> On 12/7/22 08:45, Cupertino Miranda wrote:
>>
>>> On 12/2/22 10:52, Cupertino Miranda via Gcc-patches wrote:
 This commit is a follow up of bugzilla #107181.
 The commit /a0aafbc/ changed the default implementation of the
 SELECT_SECTION hook in order to match clang/llvm behaviour w.r.t the
 placement of `const volatile' objects.
 However, the following targets use target-specific selection functions
 and they choke on the testcase pr25521.c:
*rx - target sets its const variables as '.section C,"a",@progbits'.
>>> That's presumably a constant section.  We should instead twiddle the 
>>> test to
>>> recognize that section.
>> Although @progbits is indeed a constant section, I believe it is
>> more interesting to detect if the `rx' starts selecting more
>> standard sections instead of the current @progbits.
>> That was the reason why I opted to XFAIL instead of PASSing it.
>> Can I keep it as such ?
> I'm not aware of any ongoing development for that port, so I would not let
> concerns about the rx port changing behavior dominate how we approach this
> problem.
>
> The rx port is using a different name for the section.  That's  valid 
> thing to
> do and to the extent we can, we should support that in the test rather 
> than
> (incorrectly IMHO) xfailing the test just becuase the name isn't what we
> expected.
>
> To avoid over-eagerly matching, I would probably search for "C,"  I 
> wouldn't do
> that for the const or rodata sections as they often have a suffix like 1, 
> 2, 4,
> 8 for different sized rodata sections.
>
> PPC32 is explicitly doing something different and placing those objects 
> into an
> RW section.  So for PPC32 it makes more sense to skip the test rather 
> than xfail
> it.
>
> Jeff


Re: [PATCH] s390: Add LEN_LOAD/LEN_STORE support.

2023-02-27 Thread Robin Dapp via Gcc-patches
> Do you really need a copy of the address register? Couldn't you just do a
> src = adjust_address (operands[1], BLKmode, 0);
> You create a paradoxical subreg of the QImode input but vll actually
> uses the whole 32 bit value. Couldn't we end up with uninitialized
> bytes being used as part of the length then? Do we need a zero-extend
> here?

v2 attached with these problems addressed.

Testsuite and bootstrap as before.

Regards
 RobinFrom 27cc2fa49a0f3fbc2c629028b51e862346392636 Mon Sep 17 00:00:00 2001
From: Robin Dapp 
Date: Mon, 22 Aug 2022 11:05:39 +0200
Subject: [PATCH v2] s390: Add LEN_LOAD/LEN_STORE support.

This patch adds LEN_LOAD/LEN_STORE support for z13 and newer.
It defines a bias value of -1 and implements the LEN_LOAD and LEN_STORE
optabs.

Add vll/vstl testcases adapted from Power.

Also change expectations for SLP testcases with more than one rgroup.

gcc/ChangeLog:

	* config/s390/predicates.md (vll_bias_operand): Add -1 bias.
	* config/s390/s390.cc (s390_option_override_internal): Make
	partial vector usage the default from z13 on.
	* config/s390/vector.md (len_load_v16qi): Add.
	(len_store_v16qi): Add.

gcc/testsuite/ChangeLog:

	* gcc.target/s390/s390.exp: Add partial subdirectory.
	* gcc.target/s390/vector/vec-nopeel-2.c: Change test
	expectation.
	* lib/target-supports.exp: Add s390.
	* gcc.target/s390/vector/partial/s390-vec-length-1.h: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-2.h: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-3.h: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-7.h: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-epil-1.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-epil-2.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-epil-3.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-epil-7.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-epil-run-1.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-epil-run-2.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-epil-run-3.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-epil-run-7.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-full-1.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-full-2.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-full-3.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-full-7.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-full-run-1.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-full-run-2.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-full-run-3.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-full-run-7.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-run-1.h: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-run-2.h: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-run-3.h: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-run-7.h: New test.
	* gcc.target/s390/vector/partial/s390-vec-length-small.c: New test.
	* gcc.target/s390/vector/partial/s390-vec-length.h: New test.
---
 gcc/config/s390/predicates.md |  8 +
 gcc/config/s390/s390.cc   |  9 -
 gcc/config/s390/vector.md | 35 ++
 gcc/testsuite/gcc.target/s390/s390.exp|  3 ++
 .../s390/vector/partial/s390-vec-length-1.h   | 18 ++
 .../s390/vector/partial/s390-vec-length-2.h   | 18 ++
 .../s390/vector/partial/s390-vec-length-3.h   | 31 
 .../s390/vector/partial/s390-vec-length-7.h   | 17 +
 .../vector/partial/s390-vec-length-epil-1.c   | 13 +++
 .../vector/partial/s390-vec-length-epil-2.c   | 13 +++
 .../vector/partial/s390-vec-length-epil-3.c   | 16 +
 .../vector/partial/s390-vec-length-epil-7.c   | 11 ++
 .../partial/s390-vec-length-epil-run-1.c  |  7 
 .../partial/s390-vec-length-epil-run-2.c  |  7 
 .../partial/s390-vec-length-epil-run-3.c  |  7 
 .../partial/s390-vec-length-epil-run-7.c  |  7 
 .../vector/partial/s390-vec-length-full-1.c   | 12 +++
 .../vector/partial/s390-vec-length-full-2.c   | 12 +++
 .../vector/partial/s390-vec-length-full-3.c   | 13 +++
 .../vector/partial/s390-vec-length-full-7.c   | 14 
 .../partial/s390-vec-length-full-run-1.c  |  7 
 .../partial/s390-vec-length-full-run-2.c  |  7 
 .../partial/s390-vec-length-full-run-3.c  |  7 
 .../partial/s390-vec-length-full-run-7.c  |  7 
 .../vector/partial/s390-vec-length-run-1.h| 34 ++
 .../vector/partial/s390-vec-length-run-2.h| 36 +++
 .../vector/partial/s390-vec-length-run-3.h| 34 ++
 .../vector/partial/s390-vec-length-run-7.h| 16 +
 .../vector/partial/s390-vec-length-small.c| 15 
 .../s390/vector/partial/s390-vec-length.h | 14 
 

[Patch] gcc.dg/memchr-3.c: fix for LLP64

2023-02-27 Thread Jonathan Yong via Gcc-patches

Attached patch OK?

gcc.dg/memchr-3.c: fix for LLP64

gcc/testsuite/ChangeLog:

PR middle-end/97956
* gcc.dg/memchr-3.c (memchr): fix long to size_t in
prototype.
From 194eb3d43964276beeaea14ebee4b241799cd966 Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Mon, 27 Feb 2023 10:02:32 +
Subject: [PATCH] gcc.dg/memchr-3.c: fix for LLP64

	gcc/testsuite/ChangeLog:

	PR middle-end/97956
	* gcc.dg/memchr-3.c (memchr): fix long to size_t in
	prototype.

Signed-off-by: Jonathan Yong <10wa...@gmail.com>
---
 gcc/testsuite/gcc.dg/memchr-3.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/memchr-3.c b/gcc/testsuite/gcc.dg/memchr-3.c
index c38d9cf3349..c1f4e9e10dc 100644
--- a/gcc/testsuite/gcc.dg/memchr-3.c
+++ b/gcc/testsuite/gcc.dg/memchr-3.c
@@ -5,8 +5,9 @@
 
 typedef __INT8_TYPE__  int8_t;
 typedef __INT32_TYPE__ int32_t;
+typedef __SIZE_TYPE__  size_t;
 
-extern void* memchr (const void*, int, long);
+extern void* memchr (const void*, int, size_t);
 
 struct SX
 {
-- 
2.39.2



RE: [PATCH][committed] aarch64: Fix typo in comment for aarch64_abs

2023-02-27 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Gcc-patches  bounces+kyrylo.tkachov=arm@gcc.gnu.org> On Behalf Of Kyrylo
> Tkachov via Gcc-patches
> Sent: Monday, February 27, 2023 10:00 AM
> To: gcc-patches@gcc.gnu.org
> Subject: [PATCH][committed] aarch64: Fix typo in comment for
> aarch64_abs
> 
> Pushing as obvious.
> 
> gcc/ChangeLog:
> 
>   * config/aarch64/aarch64.md (aarch64_abs): Fix typo in
> comment.

This entry should be about aarch64-simd.md (the git hook caught it out for me 
and I've fixed it in the pushed version)


[PATCH][committed] aarch64: Fix typo in comment for aarch64_abs

2023-02-27 Thread Kyrylo Tkachov via Gcc-patches
Pushing as obvious.

gcc/ChangeLog:

* config/aarch64/aarch64.md (aarch64_abs): Fix typo in comment.


typo.patch
Description: typo.patch


[Patch] gcc.dg/overflow-warn-9.c: exclude from LLP64

2023-02-27 Thread Jonathan Yong via Gcc-patches

This test is for LP64 only, exclude LLP64 too.
Patch OK?From fbc83ae10df1a0e10c302fb0fee13092eb65818e Mon Sep 17 00:00:00 2001
From: Jonathan Yong <10wa...@gmail.com>
Date: Mon, 27 Feb 2023 09:49:31 +
Subject: [PATCH] gcc.dg/overflow-warn-9.c: exclude from LLP64

gcc/testsuite/ChangeLog:

	* gcc.dg/overflow-warn-9.c: Exclude from LLP64.

Signed-off-by: Jonathan Yong <10wa...@gmail.com>
---
 gcc/testsuite/gcc.dg/overflow-warn-9.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/overflow-warn-9.c b/gcc/testsuite/gcc.dg/overflow-warn-9.c
index 57c0f17bc91..012892dd343 100644
--- a/gcc/testsuite/gcc.dg/overflow-warn-9.c
+++ b/gcc/testsuite/gcc.dg/overflow-warn-9.c
@@ -59,7 +59,7 @@ const struct Types t1 = {
   .ui = UINT_MAX + 1L,  /* { dg-warning "signed conversion from .long int. to .unsigned int. changes value from .4294967296. to .0." "lp64" { target lp64 } } */
   .ui = UINT_MAX + 1LU, /* { dg-warning "conversion from .long unsigned int. to .unsigned int. changes value from .4294967296. to .0." "lp64" { target lp64 } } */
 
-  .sl = LONG_MAX + 1LU, /* { dg-warning "signed conversion from .long unsigned int. to .long int. changes value from .9223372036854775808. to .-9223372036854775808." "not-ilp32" { target { ! ilp32 } } } */
-  /* { dg-warning "signed conversion from .long unsigned int. to .long int. changes value from .2147483648. to .-2147483648." "ilp32" { target ilp32 } .-1 } */
+  .sl = LONG_MAX + 1LU, /* { dg-warning "signed conversion from .long unsigned int. to .long int. changes value from .9223372036854775808. to .-9223372036854775808." "lp64" { target lp64 } } */
+  /* { dg-warning "signed conversion from .long unsigned int. to .long int. changes value from .2147483648. to .-2147483648." "not-lp64" { target { ! lp64 } } .-1 } */
   .ul = ULONG_MAX + 1LU /* there should be some warning here */
 };
-- 
2.39.2



Re: [PATCH] diagnostics: fix crash with -fdiagnostics-format=json-file

2023-02-27 Thread Martin Liška

PING^4

On 2/17/23 15:52, Martin Liška wrote:

PING^3

On 2/1/23 14:13, Martin Liška wrote:

PING^2

On 1/24/23 14:34, Martin Liška wrote:

PING^1

On 1/10/23 16:10, Martin Liška wrote:

On 1/6/23 14:21, David Malcolm wrote:

On Fri, 2023-01-06 at 12:33 +0100, Martin Liška wrote:

Patch can bootstrap on x86_64-linux-gnu and survives regression
tests.


Thanks for the patch.

I noticed that you marked PR 108307 as a dup of this, which covers
-fdiagnostics-format=sarif-file (and a .S file as input).

The patch doesn't add any test coverage (for either of the diagnostic
formats).

If we try to emit a diagnostic and base_file_name is NULL, and the user
requested one of -fdiagnostics-format={json,sarif}-file, where do the
diagnostics go?  Where should they go?


Hey.

I've done a new version of the patch where I utilize x_main_input_basename.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin










[PATCH] RISC-V: Remove void_type_node of void_args for vsetvlmax intrinsic

2023-02-27 Thread juzhe . zhong
From: Ju-Zhe Zhong 

This patch is to fix https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108927.
PR108927

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins.cc: Remove void_type_node.

---
 gcc/config/riscv/riscv-vector-builtins.cc | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index a430104f1e7..af11758e9b4 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -291,9 +291,8 @@ static const rvv_type_info oextu_ops[] = {
 static CONSTEXPR const rvv_arg_type_info rvv_arg_type_info_end
   = rvv_arg_type_info (NUM_BASE_TYPES);
 
-/* A list of args for size_t func (void) function.  */
-static CONSTEXPR const rvv_arg_type_info void_args[]
-  = {rvv_arg_type_info (RVV_BASE_void), rvv_arg_type_info_end};
+/* A list of args for size_t func () function.  */
+static CONSTEXPR const rvv_arg_type_info void_args[] = {rvv_arg_type_info_end};
 
 /* A list of args for size_t func () function.  */
 static CONSTEXPR const rvv_arg_type_info end_args[]
-- 
2.36.1