date:20231006

Re: [PATCH] s390: Make use of new copysign RTL

2023-10-06 Thread Andreas Krebbel

On 10/5/23 08:46, Stefan Schulze Frielinghaus wrote:
> gcc/ChangeLog:
> 
>   * config/s390/s390.md: Make use of new copysign RTL.

Ok. Thanks!

Andreas

> ---
>  gcc/config/s390/s390.md | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/gcc/config/s390/s390.md b/gcc/config/s390/s390.md
> index 9631b2a8c60..3f29ba21442 100644
> --- a/gcc/config/s390/s390.md
> +++ b/gcc/config/s390/s390.md
> @@ -124,7 +124,6 @@
>  
> ; Byte-wise Population Count
> UNSPEC_POPCNT
> -   UNSPEC_COPYSIGN
>  
> ; Load FP Integer
> UNSPEC_FPINT_FLOOR
> @@ -11918,9 +11917,8 @@
>  
>  (define_insn "copysign3"
>[(set (match_operand:FP 0 "register_operand" "=f")
> -  (unspec:FP [(match_operand:FP 1 "register_operand" "")
> -  (match_operand:FP 2 "register_operand" "f")]
> -  UNSPEC_COPYSIGN))]
> + (copysign:FP (match_operand:FP 1 "register_operand" "")
> +  (match_operand:FP 2 "register_operand" "f")))]
>"TARGET_Z196"
>"cpsdr\t%0,%2,%1"
>[(set_attr "op_type"  "RRF")

RE: [PATCH v1] RISC-V: Bugfix for legitimize address PR/111634

2023-10-06 Thread Li, Pan2

Thanks Jeff, committed with a better Changelog as your suggestion.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, October 7, 2023 12:53 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: juzhe.zh...@rivai.ai; Wang, Yanzhang ; 
kito.ch...@gmail.com
Subject: Re: [PATCH v1] RISC-V: Bugfix for legitimize address PR/111634



On 10/6/23 22:49, pan2...@intel.com wrote:
> From: Pan Li 
> 
> Given we have RTL as below.
> 
> (plus:DI (mult:DI (reg:DI 138 [ g.4_6 ])
>(const_int 8 [0x8]))
>   (lo_sum:DI (reg:DI 167)
>  (symbol_ref:DI ("f") [flags 0x86]  0x7fa96ea1cc60 f>)
> ))
> 
> When handling (plus (plus (mult (a) (mem_shadd_constant)) (fp)) (C)) case,
> the fp will be the lo_sum operand as above. We have assumption that the fp
> is reg but actually not here. It will have ICE when building with option
> --enable-checking=rtl.
> 
> This patch would like to fix it by adding the REG_P to ensure the operand
> is a register. The test case gcc/testsuite/gcc.dg/pr109417.c covered this
> fix when build with --enable-checking=rtl.
> 
>   PR target/111634
> 
> gcc/ChangeLog:
> 
>   * config/riscv/riscv.cc (riscv_legitimize_address): Bugfix.
OK, though the ChangeLog entry could be better.  Perhaps

* config/riscv/riscv.cc (riscv_legitimize_address): Ensure
object is a REG before extracting its register number.


Jeff

Re: [PATCH v1] RISC-V: Bugfix for legitimize address PR/111634

2023-10-06 Thread Jeff Law





On 10/6/23 22:49, pan2...@intel.com wrote:

From: Pan Li 

Given we have RTL as below.

(plus:DI (mult:DI (reg:DI 138 [ g.4_6 ])
   (const_int 8 [0x8]))
  (lo_sum:DI (reg:DI 167)
 (symbol_ref:DI ("f") [flags 0x86] )
))

When handling (plus (plus (mult (a) (mem_shadd_constant)) (fp)) (C)) case,
the fp will be the lo_sum operand as above. We have assumption that the fp
is reg but actually not here. It will have ICE when building with option
--enable-checking=rtl.

This patch would like to fix it by adding the REG_P to ensure the operand
is a register. The test case gcc/testsuite/gcc.dg/pr109417.c covered this
fix when build with --enable-checking=rtl.

PR target/111634

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_address): Bugfix.

OK, though the ChangeLog entry could be better.  Perhaps

* config/riscv/riscv.cc (riscv_legitimize_address): Ensure
object is a REG before extracting its register number.


Jeff

[PATCH v1] RISC-V: Bugfix for legitimize address PR/111634

2023-10-06 Thread pan2 . li

From: Pan Li 

Given we have RTL as below.

(plus:DI (mult:DI (reg:DI 138 [ g.4_6 ])
  (const_int 8 [0x8]))
 (lo_sum:DI (reg:DI 167)
(symbol_ref:DI ("f") [flags 0x86] )
))

When handling (plus (plus (mult (a) (mem_shadd_constant)) (fp)) (C)) case,
the fp will be the lo_sum operand as above. We have assumption that the fp
is reg but actually not here. It will have ICE when building with option
--enable-checking=rtl.

This patch would like to fix it by adding the REG_P to ensure the operand
is a register. The test case gcc/testsuite/gcc.dg/pr109417.c covered this
fix when build with --enable-checking=rtl.

PR target/111634

gcc/ChangeLog:

* config/riscv/riscv.cc (riscv_legitimize_address): Bugfix.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index d5446b63dbf..2b839241f1a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2042,7 +2042,7 @@ riscv_legitimize_address (rtx x, rtx oldx 
ATTRIBUTE_UNUSED,
{
  rtx index = XEXP (base, 0);
  rtx fp = XEXP (base, 1);
- if (REGNO (fp) == VIRTUAL_STACK_VARS_REGNUM)
+ if (REG_P (fp) && REGNO (fp) == VIRTUAL_STACK_VARS_REGNUM)
{
 
  /* If we were given a MULT, we must fix the constant
-- 
2.34.1

Re: Re: [PATCH] RISC-V: Fix scan-assembler-times of RVV test case

2023-10-06 Thread Li Xu

Commited, thanks juzhe.
--
Li Xu
>OK.
>
>
>
>juzhe.zh...@rivai.ai
>
>From: Li Xu
>Date: 2023-10-07 11:18
>To: gcc-patches
>CC: kito.cheng; palmer; juzhe.zhong; xuli
>Subject: [PATCH] RISC-V: Fix scan-assembler-times of RVV test case
>From: xuli 
>
>gcc/testsuite/ChangeLog:
>
>    * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust assembler 
>times.
>    * gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
>---
>.../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c   | 10 +-
>.../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c   | 10 +-
>2 files changed, 10 insertions(+), 10 deletions(-)
>
>diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c 
>b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
>index c566f8a4751..2ec9487a6c6 100644
>--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
>+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
>@@ -88,8 +88,8 @@ void f (void * restrict in, void * restrict out, int n, int 
>cond)
>   }
>}
>-/* { dg-final { scan-assembler-times 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>-/* { dg-final { scan-assembler-times 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>-/* { dg-final { scan-assembler-times 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>-/* { dg-final { scan-assembler-times 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 2 { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>-/* { dg-final { scan-assembler-times {vsetvli} 10 { target { no-opts "-O0"  
>no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
>"-g" } } } } */
>+/* { dg-final { scan-assembler-times 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 10 { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>+/* { dg-final { scan-assembler-not 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>+/* { dg-final { scan-assembler-not 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>+/* { dg-final { scan-assembler-not 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>+/* { dg-final { scan-assembler-times {vsetvli} 19 { target { no-opts "-O0"  
>no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
>"-g" } } } } */
>diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c 
>b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
>index d0e75258188..bcafce36895 100644
>--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
>+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
>@@ -80,8 +80,8 @@ void f (void * restrict in, void * restrict out, int n, int 
>cond)
>   }
>}
>-/* { dg-final { scan-assembler-times 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>-/* { dg-final { scan-assembler-times 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>-/* { dg-final { scan-assembler-times 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>-/* { dg-final { scan-assembler-times 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>-/* { dg-final { scan-assembler-times {vsetvli} 9 { target { no-opts "-O0"  
>no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
>"-g" } } } } */
>+/* { dg-final { scan-assembler-times 
>{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 9 { target { 
>no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
>"-funroll-loops" no-opts "-g" } } } } */
>+/* { dg-final { scan-assembler-not

Re: [PATCH] LoongArch: Reimplement multilib build option handling.

2023-10-06 Thread Yang Yujie

On Wed, Oct 04, 2023 at 02:13:46PM +0200, Jan-Benedict Glaw wrote:
> Seems this breaks for me with
> 
> ../gcc/configure [...] --enable-werror-always --enable-languages=all 
> --disable-gcov --disable-shared --disable-threads 
> --target=loongarch64-linux-gnuf32 --without-headers
> make V=1 all-gcc
> 
> 
> See eg. 
> http://toolchain.lug-owl.de/laminar/jobs/gcc-loongarch64-linux-gnuf32/44 :
> 
> /var/lib/laminar/run/gcc-loongarch64-linux-gnuf32/44/local-toolchain-install/bin/g++
>  -c   -g -O2   -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions 
> -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing 
> -Wwrite-strings -Wcast-qual -Wmissing-format-attribute 
> -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long 
> -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common  
> -DHAVE_CONFIG_H  -DGENERATOR_FILE -I. -Ibuild -I../../gcc/gcc 
> -I../../gcc/gcc/build -I../../gcc/gcc/../include  
> -I../../gcc/gcc/../libcpp/include  \
>  -o build/genpreds.o ../../gcc/gcc/genpreds.cc
> In file included from ../../gcc/gcc/config/loongarch/loongarch.h:53,
>  from ./tm.h:50,
>  from ../../gcc/gcc/genpreds.cc:26:
> ../../gcc/gcc/config/loongarch/loongarch-driver.h:82:10: fatal error: 
> loongarch-multilib.h: No such file or directory
>82 | #include "loongarch-multilib.h"
>   |  ^~
> compilation terminated.
> make[1]: *** [Makefile:2966: build/genpreds.o] Error 1
> make[1]: Leaving directory 
> '/var/lib/laminar/run/gcc-loongarch64-linux-gnuf32/44/toolchain-build/gcc'
> make: *** [Makefile:4659: all-gcc] Error 2
> 
> 
> So it failed to execute the t-multilib fragment? Happens for all my
> loongarch compilation tests:
> 
> http://toolchain.lug-owl.de/laminar/jobs/gcc-loongarch64-linux/45
> http://toolchain.lug-owl.de/laminar/jobs/gcc-loongarch64-linux-gnuf32/44
> http://toolchain.lug-owl.de/laminar/jobs/gcc-loongarch64-linux-gnuf64/44
> http://toolchain.lug-owl.de/laminar/jobs/gcc-loongarch64-linux-gnusf/44
>

Thanks for the testing!

This error seems to be difficult to reproduce since it is a makefile dependency
problem.  I think appending loongarch-multilib.h to $(GTM_H) instead of $(TM_H)
could help.

> And when this is fixed, it might be a nice idea to have a
> --with-multilib-list config in ./contrib/config-list.mk .

Thanks, will add this later too.

P.S. Currently support for "f32" is not active, and it should probably be
avoided if you want to build a working rootfs.

Yujie

Re: [PATCH] RISC-V: Fix scan-assembler-times of RVV test case

2023-10-06 Thread juzhe.zh...@rivai.ai

OK.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-10-07 11:18
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; xuli
Subject: [PATCH] RISC-V: Fix scan-assembler-times of RVV test case
From: xuli 
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust assembler 
times.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
---
.../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c   | 10 +-
.../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c   | 10 +-
2 files changed, 10 insertions(+), 10 deletions(-)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
index c566f8a4751..2ec9487a6c6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
@@ -88,8 +88,8 @@ void f (void * restrict in, void * restrict out, int n, int 
cond)
   }
}
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 10 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 10 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} { target { no-opts 
"-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" 
no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 19 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
index d0e75258188..bcafce36895 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
@@ -80,8 +80,8 @@ void f (void * restrict in, void * restrict out, int n, int 
cond)
   }
}
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 9 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 9 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } }

[PATCH] RISC-V: Fix scan-assembler-times of RVV test case

2023-10-06 Thread Li Xu

From: xuli 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c: Adjust assembler 
times.
* gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c: Ditto.
---
 .../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c   | 10 +-
 .../gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c   | 10 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
index c566f8a4751..2ec9487a6c6 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-25.c
@@ -88,8 +88,8 @@ void f (void * restrict in, void * restrict out, int n, int 
cond)
   }
 }
 
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 10 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 10 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} { target { no-opts 
"-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" 
no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-times {vsetvli} 19 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
index d0e75258188..bcafce36895 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/vlmax_back_prop-26.c
@@ -80,8 +80,8 @@ void f (void * restrict in, void * restrict out, int n, int 
cond)
   }
 }
 
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 2 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} 3 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e64,\s*m1,\s*t[au],\s*m[au]} 1 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
-/* { dg-final { scan-assembler-times {vsetvli} 9 { target { no-opts "-O0"  
no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts "-funroll-loops" no-opts 
"-g" } } } } */
+/* { dg-final { scan-assembler-times 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e8,\s*mf8,\s*t[au],\s*m[au]} 9 { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e16,\s*mf4,\s*t[au],\s*m[au]} { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts 
"-funroll-loops" no-opts "-g" } } } } */
+/* { dg-final { scan-assembler-not 
{vsetvli\s+[a-x0-9]+,\s*zero,\s*e32,\s*mf2,\s*t[au],\s*m[au]} { target { 
no-opts "-O0" no-opts "-O1"  no-opts "-Os" no-opts "-Oz" no-opts

Re: [PATCH 03/13] [APX_EGPR] Initial support for APX_F

2023-10-06 Thread Hongtao Liu

On Fri, Sep 22, 2023 at 6:58 PM Hongyu Wang  wrote:
>
> From: Kong Lingling 
>
> Add -mapx-features= enumeration to separate subfeatures of APX_F.
> -mapxf is treated same as previous ISA flag, while it sets
> -mapx-features=apx_all that enables all subfeatures.
Ok for this and the resest of patches(04-13).
>
> gcc/ChangeLog:
>
> * common/config/i386/cpuinfo.h (XSTATE_APX_F): New macro.
> (XCR_APX_F_ENABLED_MASK): Likewise.
> (get_available_features): Detect APX_F under
> * common/config/i386/i386-common.cc (OPTION_MASK_ISA2_APX_F_SET): New.
> (OPTION_MASK_ISA2_APX_F_UNSET): Likewise.
> (ix86_handle_option): Handle -mapxf.
> * common/config/i386/i386-cpuinfo.h (FEATURE_APX_F): New.
> * common/config/i386/i386-isas.h: Add entry for APX_F.
> * config/i386/cpuid.h (bit_APX_F): New.
> * config/i386/i386.h (bit_APX_F): (TARGET_APX_EGPR,
> TARGET_APX_PUSH2POP2, TARGET_APX_NDD): New define.
> * config/i386/i386-opts.h (enum apx_features): New enum.
> * config/i386/i386-isa.def (APX_F): New DEF_PTA.
> * config/i386/i386-options.cc (ix86_function_specific_save):
> Save ix86_apx_features.
> (ix86_function_specific_restore): Restore it.
> (ix86_valid_target_attribute_inner_p): Add mapxf.
> (ix86_option_override_internal): Set ix86_apx_features for PTA
> and TARGET_APX_F. Also reports error when APX_F is set but not
> having TARGET_64BIT.
> * config/i386/i386.opt: (-mapxf): New ISA flag option.
> (-mapx=): New enumeration option.
> (apx_features): New enum type.
> (apx_none): New enum value.
> (apx_egpr): Likewise.
> (apx_push2pop2): Likewise.
> (apx_ndd): Likewise.
> (apx_all): Likewise.
> * doc/invoke.texi: Document mapxf.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/apx-1.c: New test.
>
> Co-authored-by: Hongyu Wang 
> Co-authored-by: Hongtao Liu 
> ---
>  gcc/common/config/i386/cpuinfo.h  | 12 +++-
>  gcc/common/config/i386/i386-common.cc | 17 +
>  gcc/common/config/i386/i386-cpuinfo.h |  1 +
>  gcc/common/config/i386/i386-isas.h|  1 +
>  gcc/config/i386/cpuid.h   |  1 +
>  gcc/config/i386/i386-isa.def  |  1 +
>  gcc/config/i386/i386-options.cc   | 18 ++
>  gcc/config/i386/i386-opts.h   |  8 
>  gcc/config/i386/i386.h|  4 
>  gcc/config/i386/i386.opt  | 25 +
>  gcc/doc/invoke.texi   | 11 +++
>  gcc/testsuite/gcc.target/i386/apx-1.c |  8 
>  12 files changed, 102 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/apx-1.c
>
> diff --git a/gcc/common/config/i386/cpuinfo.h 
> b/gcc/common/config/i386/cpuinfo.h
> index 24ae0dbf0ac..141d3743316 100644
> --- a/gcc/common/config/i386/cpuinfo.h
> +++ b/gcc/common/config/i386/cpuinfo.h
> @@ -678,6 +678,7 @@ get_available_features (struct __processor_model 
> *cpu_model,
>  #define XSTATE_HI_ZMM  0x80
>  #define XSTATE_TILECFG 0x2
>  #define XSTATE_TILEDATA0x4
> +#define XSTATE_APX_F   0x8
>
>  #define XCR_AVX_ENABLED_MASK \
>(XSTATE_SSE | XSTATE_YMM)
> @@ -685,11 +686,13 @@ get_available_features (struct __processor_model 
> *cpu_model,
>(XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM)
>  #define XCR_AMX_ENABLED_MASK \
>(XSTATE_TILECFG | XSTATE_TILEDATA)
> +#define XCR_APX_F_ENABLED_MASK XSTATE_APX_F
>
> -  /* Check if AVX and AVX512 are usable.  */
> +  /* Check if AVX, AVX512 and APX are usable.  */
>int avx_usable = 0;
>int avx512_usable = 0;
>int amx_usable = 0;
> +  int apx_usable = 0;
>/* Check if KL is usable.  */
>int has_kl = 0;
>if ((ecx & bit_OSXSAVE))
> @@ -709,6 +712,8 @@ get_available_features (struct __processor_model 
> *cpu_model,
> }
>amx_usable = ((xcrlow & XCR_AMX_ENABLED_MASK)
> == XCR_AMX_ENABLED_MASK);
> +  apx_usable = ((xcrlow & XCR_APX_F_ENABLED_MASK)
> +   == XCR_APX_F_ENABLED_MASK);
>  }
>
>  #define set_feature(f) \
> @@ -922,6 +927,11 @@ get_available_features (struct __processor_model 
> *cpu_model,
>   if (edx & bit_AMX_COMPLEX)
> set_feature (FEATURE_AMX_COMPLEX);
> }
> + if (apx_usable)
> +   {
> + if (edx & bit_APX_F)
> +   set_feature (FEATURE_APX_F);
> +   }
> }
>  }
>
> diff --git a/gcc/common/config/i386/i386-common.cc 
> b/gcc/common/config/i386/i386-common.cc
> index 95468b7c405..86596e96ad1 100644
> --- a/gcc/common/config/i386/i386-common.cc
> +++ b/gcc/common/config/i386/i386-common.cc
> @@ -123,6 +123,7 @@ along with GCC; see the file COPYING3.  If not see
>  #define OPTION_MASK_ISA2_SM3_SET

Re: [PATCH 00/18] Support -mevex512 for AVX512

2023-10-06 Thread Hongtao Liu

On Thu, Sep 28, 2023 at 11:23 AM ZiNgA BuRgA  wrote:
>
> That sounds about right.  The code I had in mind would perhaps look like:
>
>
> #if defined(__AVX512BW__) && defined(__AVX512VL__)
>  #if defined(__EVEX256__) && !defined(__EVEX512__)
>  // compiled code is AVX10.1/256 and AVX512 compatible
>  #else
>  // compiled code is only AVX512 compatible
>  #endif
>
>  // some code which only uses 256b instructions
>  __m256i...
> #endif
>
>
> The '__EVEX256__' define would avoid needing to check compiler versions.
Sounds reasonable, regarding how to set __EVEX256__, I think it should
be set/unset along with __AVX512VL__ and __EVEX512__ should not unset
__EVEX256__.

> Hopefully you can align it with whatever Clang does:
> https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661/18

>
> Thanks!
>
> On 28/09/2023 12:26 pm, Hu, Lin1 wrote:
> > Hi,
> >
> > Thanks for you reply.
> >
> > I'd like to verify that our understanding of your requirements is correct, 
> > and that __EVEX256__ can be considered a default macro to determine whether 
> > the compiler supports the __EVEX***__ series of switches.
> >
> > For example:
> >
> > I have a segment of code like:
> > #if defined(__EVEX512__):
> > __mm512.*__;
> > #else
> > __mm256.*__;
> > #endif
> >
> > But __EVEX512__ is undefined that doesn't mean I only need 256bit, maybe I 
> > use gcc-13, so I can still use 512bit.
> >
> > So the code should be:
> > #if defined(__EVEX512__):
> > __mm512.*__;
> > #elif defined(__EVEX256__):
> > __mm256.*__;
> > #else
> > __mm512.*__;
> > #endif
> >
> > If we understand correctly, we'll consider the request. But since we're 
> > about to have a vacation, follow-up replies may be a bit slower.
> >
> > BRs,
> > Lin
> >
> > -Original Message-
> > From: ZiNgA BuRgA 
> > Sent: Thursday, September 28, 2023 8:32 AM
> > To: Hu, Lin1 ; gcc-patches@gcc.gnu.org
> > Subject: Re: [PATCH 00/18] Support -mevex512 for AVX512
> >
> > Thanks for the new patch!
> >
> > I see that there's a new __EVEX512__ define.  Will there be some 
> > __EVEX256__ (or maybe some max EVEX width) define, so that code can detect 
> > whether the compiler supports AVX10.1/256 without resorting to version 
> > checks?
> >
> >
>


-- 
BR,
Hongtao

Re: Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5

2023-10-06 Thread juzhe.zh...@rivai.ai

Thanks for reporting it.

I think we may need to change it into:
+ /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target {! vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target vect_strided5 && vect_load_lanes } } } */

Could you verify it whether it work for you ?

Thanks.


juzhe.zh...@rivai.ai
 
From: Andrew Stubbs
Date: 2023-10-06 22:29
To: Juzhe-Zhong; gcc-patches@gcc.gnu.org
CC: rguent...@suse.de; jeffreya...@gmail.com; richard.sandif...@arm.com
Subject: Re: [PATCH] test: Isolate slp-1.c check of target supports 
vect_strided5
On 15/09/2023 10:16, Juzhe-Zhong wrote:
> This test failed in RISC-V:
> FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
> "vectorizing stmts using SLP" 4
> FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using 
> SLP" 4
> 
> Because this loop:
>/* SLP with unrolling by 8.  */
>for (i = 0; i < N; i++)
>  {
>out[i*5] = 8;
>out[i*5 + 1] = 7;
>out[i*5 + 2] = 81;
>out[i*5 + 3] = 28;
>out[i*5 + 4] = 18;
>  }
> 
> is using vect_load_lanes with array size = 5.
> instead of SLP.
> 
> When we adjust the COST of LANES load store, then it will use SLP.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.dg/vect/slp-1.c: Add vect_stried5.
> 
> ---
>   gcc/testsuite/gcc.dg/vect/slp-1.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/testsuite/gcc.dg/vect/slp-1.c 
> b/gcc/testsuite/gcc.dg/vect/slp-1.c
> index 82e4f6469fb..d4a13f12df6 100644
> --- a/gcc/testsuite/gcc.dg/vect/slp-1.c
> +++ b/gcc/testsuite/gcc.dg/vect/slp-1.c
> @@ -122,5 +122,5 @@ int main (void)
>   }
>   
>   /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" 
> } } */
> -
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" 
> { target {! vect_strided5 } } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" 
> { target vect_strided5 } } } */
 
This patch causes a test regression on amdgcn because vect_strided5 is 
true (because check_effective_target_vect_fully_masked is true), but the 
testcase still gives the message 4 times. Perhaps because amdgcn uses 
masking and not vect_load_lanes?
 
Andrew

Re: Re: [PATCH v3] RISC-V:Optimize the MASK opt generation

2023-10-06 Thread Feng Wang

Hi, Kito & Jeff
Due to National Day reasons, I was unable to reply to the email in a timely 
manner. 
Thank you for making the necessary changes to this patch. For the introduction 
of this bug, 
I will also carefully summarize my experience and lessons to avoid the 
recurrence of such problems. 
Thank you again!
--
Feng Wang
>Proposed fix, and verified with "mawk" and "gawk -P" (gawk with posix
>mode) on my linux also some other report it work on freebsd, just wait
>review :)
>
>https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631785.html
>
>On Tue, Oct 3, 2023 at 2:07 AM Jeff Law  wrote:
>>
>>
>>
>> On 10/2/23 12:03, David Edelsohn wrote:
>> > On Mon, Oct 2, 2023 at 1:59 PM Jeff Law > > > wrote:
>> >
>> >
>> >
>> > On 10/2/23 11:20, David Edelsohn wrote:
>> >  > Wang,
>> >  >
>> >  > The AWK portions of this patch broke bootstrap on AIX.
>> >  >
>> >  > Also, the AWK portions are common code, not RISC-V specific.  I
>> > don't
>> >  > see anywhere that the common portions of the patch were reviewed or
>> >  > approved by anyone with authority to approve the changes to the
>> > AWK files.
>> >  >
>> >  > This patch should not have been committed without approval by a
>> > reviewer
>> >  > with authority for that portion of the compiler and should have been
>> >  > tested on targets other than RISC-V if common parts of the
>> > compiler were
>> >  > changed.
>> > I acked the generic bits.  So the lack of testing on another target is
>> > on me.
>> >
>> >
>> > Hi, Jeff
>> >
>> > Sorry. I didn't see a comment from a global reviewer in the V3 thread.
>> NP.
>>
>> >
>> > I am using Gawk on AIX.  After the change, I see a parse error from
>> > gawk.  I'm rebuilding with a checkout just before the change to confirm
>> > that it was the source of the error, and it seems to be past that
>> > failure location.  I didn't keep the exact error.  Once I get past this
>> > build cycle, I'll reproduce it.
>> I think there's already a patch circulating which fixes this.  It broke
>> at least one other platform.  Hopefully it'll all be sorted out today.
>>
>>
>> jeff

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-06 Thread Robin Dapp

> So if you think you got everything correct the patch is OK as-is,
> I just wasn't sure - maybe the neutral_element change deserves
> a comment as to how MINUS_EXPR is handled.

Heh, I never think I got everything correct ;)

Added this now:

 static bool
 fold_left_reduction_fn (code_helper code, internal_fn *reduc_fn)
 {
+  /* We support MINUS_EXPR by negating the operand.  This also preserves an
+ initial -0.0 since -0.0 - 0.0 (neutral op for MINUS_EXPR) == -0.0 +
+ (-0.0) = -0.0.  */

What I still found is that aarch64 ICEs at the assertion you added
with -frounding-math.  Therefore I changed it to:

- gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));
+ if (HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"cannot vectorize fold-left reduction because"
+" signed zeros cannot be preserved.\n");
+ return false;
+   }

No code changes apart from that.  Will leave it until Monday and push then
barring any objections.

Thanks for the pointers.

Regards
 Robin

[PATCH] fortran: fix handling of options -ffpe-trap and -ffpe-summary [PR110957]

2023-10-06 Thread Harald Anlauf

Dear all,

the attached simple patch fixes a mixup of error messages for -ffpe-trap
and -ffpe-summary.  While at it, I though it might be useful to accept
'none' as allowable argument to -ffpe-trap, so that traps previously set
on the command line may be cleared.  This change is also documented.

Regtested on x86_64-pc-linux-gnu.  OK for mainline?

***

The reporter also suggested to detect and handle -fno-trapping-math
when any trap is enabled.

I am not so sure that this can be required.  In gfortran, specifying
-ffpe-trap sets the FPU mask in the main and has no further effect.
Or am I missing something?

Any further opinions or insights?

Thanks,
Harald

From 75dc455f21cea07e64b422c9994ab8879df388de Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 6 Oct 2023 22:21:56 +0200
Subject: [PATCH] fortran: fix handling of options -ffpe-trap and -ffpe-summary
 [PR110957]

gcc/fortran/ChangeLog:

	PR fortran/110957
	* invoke.texi: Update documentation to reflect '-ffpe-trap=none'.
	* options.cc (gfc_handle_fpe_option): Fix mixup up of error messages
	for options -ffpe-trap and -ffpe-summary.  Accept '-ffpe-trap=none'
	to clear FPU traps previously set on command line.
---
 gcc/fortran/invoke.texi | 6 --
 gcc/fortran/options.cc  | 9 ++---
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 38150b1e29e..10387e39501 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -1294,7 +1294,8 @@ Specify a list of floating point exception traps to enable.  On most
 systems, if a floating point exception occurs and the trap for that
 exception is enabled, a SIGFPE signal will be sent and the program
 being aborted, producing a core file useful for debugging.  @var{list}
-is a (possibly empty) comma-separated list of the following
+is a (possibly empty) comma-separated list of either @samp{none} (to
+clear the set of exceptions to be trapped), or of the following
 exceptions: @samp{invalid} (invalid floating point operation, such as
 @code{SQRT(-1.0)}), @samp{zero} (division by zero), @samp{overflow}
 (overflow in a floating point operation), @samp{underflow} (underflow
@@ -1314,7 +1315,8 @@ If the option is used more than once in the command line, the lists will
 be joined: '@code{ffpe-trap=}@var{list1} @code{ffpe-trap=}@var{list2}'
 is equivalent to @code{ffpe-trap=}@var{list1},@var{list2}.

-Note that once enabled an exception cannot be disabled (no negative form).
+Note that once enabled an exception cannot be disabled (no negative form),
+except by clearing all traps by specifying @samp{none}.

 Many, if not most, floating point operations incur loss of precision
 due to rounding, and hence the @code{ffpe-trap=inexact} is likely to
diff --git a/gcc/fortran/options.cc b/gcc/fortran/options.cc
index 27311961325..2ad22478042 100644
--- a/gcc/fortran/options.cc
+++ b/gcc/fortran/options.cc
@@ -555,9 +555,12 @@ gfc_handle_fpe_option (const char *arg, bool trap)
 	pos++;

   result = 0;
-  if (!trap && strncmp ("none", arg, pos) == 0)
+  if (strncmp ("none", arg, pos) == 0)
 	{
-	  gfc_option.fpe_summary = 0;
+	  if (trap)
+	gfc_option.fpe = 0;
+	  else
+	gfc_option.fpe_summary = 0;
 	  arg += pos;
 	  pos = 0;
 	  continue;
@@ -586,7 +589,7 @@ gfc_handle_fpe_option (const char *arg, bool trap)
 	  break;
 	}
 	  }
-  if (!result && !trap)
+  if (!result && trap)
 	gfc_fatal_error ("Argument to %<-ffpe-trap%> is not valid: %s", arg);
   else if (!result)
 	gfc_fatal_error ("Argument to %<-ffpe-summary%> is not valid: %s", arg);
--
2.35.3

Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-06 Thread Martin Uecker

Am Freitag, dem 06.10.2023 um 06:50 -0400 schrieb Siddhesh Poyarekar:
> On 2023-10-06 01:11, Martin Uecker wrote:
> > Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook:
> > > On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote:
> > > > 2. How would you handle signedness of the size field?  The size gets
> > > > converted to sizetype everywhere it is used and overflows/underflows may
> > > > produce interesting results.  Do you want to limit the types to 
> > > > unsigned or
> > > > do you want to add a disclaimer in the docs?  The former seems like the
> > > > *right* thing to do given that it is a new feature; best to enforce the
> > > > cleaner habit at the outset.
> > > 
> > > The Linux kernel has a lot of "int" counters, so the goal is to catch
> > > negative offsets just like too-large offsets at runtime with the sanitizer
> > > and report 0 for __bdos. Refactoring all these to be unsigned is going
> > > to take time since at least some of them use the negative values as
> > > special values unrelated to array indexing. :(
> > > 
> > > So, perhaps if unsigned counters are worth enforcing, can this be a
> > > separate warning the kernel can turn off initially?
> > > 
> > 
> > I think unsigned counters are much more problematic than signed ones
> > because wraparound errors are more difficult to find.
> > 
> > With unsigned you could potentially diagnose wraparound, but only if we
> > add -fsanitize=unsigned-overflow *and* add mechanism to mark intentional
> > wraparound *and* everybody adds this annotation after carefully screening
> > their code *and* rewriting all operations such as (counter - 3) + 5
> > where the wraparound in the intermediate expression is harmless.
> > 
> > For this reason, I do not think we should ever enforce some rule that
> > the counter has to be unsigned.
> > 
> > What we could do, is detect *storing* negative values into the
> > counter at run-time using UBSan. (but if negative values are
> > used for special cases, one also should be able to turn this
> > off).
> 
> All of the object size detection relies on object sizes being sizetype. 
> The closest we could do with that is detect (sz != SIZE_MAX && sz > 
> size_t / 2), since allocators typically cannot allocate more than 
> SIZE_MAX / 2.

I was talking about the counter in:

struct {
  int counter;
  char buf[] __counted_by__((counter))
};

which could be checked to be positive either when stored to or 
when buf is used.

And yes, we could also check the size of buf.  Not sure what is
done for VLAs now, but I guess it could be similar.

Best,
Martin



> 
> Sid

[COMMITTED] RISC-V: const: hide mvconst splitter from IRA

2023-10-06 Thread Vineet Gupta

Vlad recently introduced a new gate @ira_in_progress, similar to
counterparts @{reload,lra}_in_progress.

Use this to hide the constant synthesis splitter from being recog* ()
by IRA register equivalence logic which is eager to undo the splits,
generating worse code for constants (and sometimes no code at all).

See PR/109279 (large constant), PR/110748 (const -0.0) ...

Granted the IRA logic is subsided with -fsched-pressure which is now
enabled for RISC-V backend, the gate makes this future-proof in
addition to helping with -O1 etc.

This fixes 1 addition test

   = Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected case
|  gcc |  g++ | gfortran |

   rv32imac/  ilp32/ medlow |  416 /   103 |   13 / 6 |   67 /12 |
 rv32imafdc/ ilp32d/ medlow |  416 /   103 |   13 / 6 |   24 / 4 |
   rv64imac/   lp64/ medlow |  417 /   104 |9 / 3 |   67 /12 |
 rv64imafdc/  lp64d/ medlow |  416 /   103 |5 / 2 |6 / 1 |

Also similar to v1, this doesn't move RISC-V SPEC scores at all.

gcc/ChangeLog:
* config/riscv/riscv.md (mvconst_internal): Add !ira_in_progress.

Suggested-by: Jeff Law 
Signed-off-by: Vineet Gupta 
---
 gcc/config/riscv/riscv.md | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 1ebe8f92284d..da84b9357bd3 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1997,13 +1997,16 @@
 
 ;; Pretend to have the ability to load complex const_int in order to get
 ;; better code generation around them.
-;;
 ;; But avoid constants that are special cased elsewhere.
+;;
+;; Hide it from IRA register equiv recog* () to elide potential undoing of 
split
+;;
 (define_insn_and_split "*mvconst_internal"
   [(set (match_operand:GPR 0 "register_operand" "=r")
 (match_operand:GPR 1 "splittable_const_int_operand" "i"))]
-  "!(p2m1_shift_operand (operands[1], mode)
- || high_mask_shift_operand (operands[1], mode))"
+  "!ira_in_progress
+   && !(p2m1_shift_operand (operands[1], mode)
+|| high_mask_shift_operand (operands[1], mode))"
   "#"
   "&& 1"
   [(const_int 0)]
-- 
2.34.1

Re: [PATCH v2] RISC-V: const: hide mvconst splitter from IRA

2023-10-06 Thread Jeff Law





On 10/6/23 11:49, Vineet Gupta wrote:

Vlad recently introduced a new gate @ira_in_progress, similar to
counterparts @{reload,lra}_in_progress.

Use this to hide the constant synthesis splitter from being recog* ()
by IRA register equivalence logic which is eager to undo the splits,
generating worse code for constants (and sometimes no code at all).

See PR/109279 (large constant), PR/110748 (const -0.0) ...

Granted the IRA logic is subsided with -fsched-pressure which is now
enabled for RISC-V backend, the gate makes this future-proof in
addition to helping with -O1 etc.

This fixes 1 addition test

= Summary of gcc testsuite =
 | # of unexpected case / # of unique unexpected 
case
 |  gcc |  g++ | gfortran |

rv32imac/  ilp32/ medlow |  416 /   103 |   13 / 6 |   67 /12 |
  rv32imafdc/ ilp32d/ medlow |  416 /   103 |   13 / 6 |   24 / 4 |
rv64imac/   lp64/ medlow |  417 /   104 |9 / 3 |   67 /12 |
  rv64imafdc/  lp64d/ medlow |  416 /   103 |5 / 2 |6 / 1 |

Also similar to v1, this doesn't move RISC-V SPEC scores at all.

gcc/ChangeLog:
* config/riscv/riscv.md (mvconst_internal): Add !ira_in_progress.

OK
jeff

Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-06 Thread Jeff Law





On 10/6/23 08:17, Manolis Tsamis wrote:
SNIP

So I was ready to ACK, but realized there weren't any testresults for a
primary platform mentioned.  So I ran this on x86.

It's triggering one regression (code quality).

Specifically gcc.target/i386/pr52146.c

The f-m-o code is slightly worse than without f-m-o.

Without f-m-o we get this:

 9  B88000E0  movl$-18874240, %eax
 9  FE
10 0005 67C7  movl$0, (%eax)
10  00
11 000c C3ret

With f-m-o we get this:

 9  B800  movl$0, %eax
 9  00
10 0005 67C78080  movl$0, -18874240(%eax)
10  00E0FE00
10  00
11 0010 C3ret


The key being that we don't get rid of the original move instruction,
nor does the original move instruction get smaller due to simplification
of its constant.  Additionally, the memory store gets larger.  The net
is a 4 byte increase in code size.


Yes, this case is not good for f-m-o. In theory there could be a cost
calculation step that tries to estimate the benefit of a
transformation, but given that f-m-o cannot transform code in a way
that we have big regressions it's unclear to me whether complicating
the code is worth it. At least if we can solve the issues in other
ways (also see discussion below).



This is probably a fairly rare scenario and the original bug report was
for a correctness issue in using addresses in the range
0x8000..0x in x32.  So I wouldn't lose any sleep if we
adjusted the test to pass -fno-fold-mem-offsets.  But before doing that
I wanted to give you the chance to ponder if this is something you'd
prefer to improve in f-m-o itself.   At some level if the base register
collapses down to 0, then we could take the offset as a constant address
and try to recognize that form.  If that fails, then just consider the
change unprofitable rather than trying to recognize it as reg+d.

Anyway, waiting to hear your thoughts...


Yes, this testcase has been bugging me too, I have brought that up in
previous iterations as well.

I must have missed that in the earlier discussion.


I'm also not sure whether this is a code quality or a correctness
issue? From what I understand from the relevant ticket, if we emit
movl $0, -18874240 then it's wrong code?
It's a code quality issue as long as we don't transform the code into 
movl $0, -18874240, at which point it would become a correctness issue.





With regards to the "recognize that the base register is 0", that
would be nice but how would we recognise that? f-m-o can only
calculate the folded offset but that is not enough to prove that the
base register is zero or not.
It's a chain of insns that produce an address and use it in the memory 
reference.  We essentially changed the first insn in the chain from movl 
-18874240, %eax into movl 0, %eax.  So we'd have to someone note that 
the base register in the memory reference has the value zero in the 
chain of instructions.  That may or may not be reasonable to do.




One thought that I've had is that this started being an issue on x86
when I enabled folding of mv REG, INT in addition to the existing ADD
REG, REG, INT. The idea was that a move will be folded to mv REG, 0
and on targets that we have a zero register that can be beneficial for
a number of reasons... but on x86 we don't have a zero register so the
benefit is much more limited anyway. So maybe we could disable folding
of moves on targets that don't have a zero register? That would solve
the issue and I believe it also makes sense in general. If so, is
there a way to query wether the target has such register?
We don't have a generalized query to do that.  You might be able to ask 
what the cost to load 0 into a register is, but many targets 
artificially decrease that value.


You could use the costing model to cost the entire sequence 
before/after.  There's an interface to walk a sequence and return a 
cost.  In the case of f-m-o the insns are part of the larger chain, so 
we'd need a different API.


The other option would be to declare this is known, but not important 
issue.  I would think that it's going to be rare to have absolute 
addresses and x32 isn't really used much.  The combination of the two 
should be exceedingly rare.  Hence my willingness to use 
-fno-fold-mem-offsets in the test.


Jeff

[PATCH v2] RISC-V: const: hide mvconst splitter from IRA

2023-10-06 Thread Vineet Gupta

Vlad recently introduced a new gate @ira_in_progress, similar to
counterparts @{reload,lra}_in_progress.

Use this to hide the constant synthesis splitter from being recog* ()
by IRA register equivalence logic which is eager to undo the splits,
generating worse code for constants (and sometimes no code at all).

See PR/109279 (large constant), PR/110748 (const -0.0) ...

Granted the IRA logic is subsided with -fsched-pressure which is now
enabled for RISC-V backend, the gate makes this future-proof in
addition to helping with -O1 etc.

This fixes 1 addition test

   = Summary of gcc testsuite =
| # of unexpected case / # of unique unexpected case
|  gcc |  g++ | gfortran |

   rv32imac/  ilp32/ medlow |  416 /   103 |   13 / 6 |   67 /12 |
 rv32imafdc/ ilp32d/ medlow |  416 /   103 |   13 / 6 |   24 / 4 |
   rv64imac/   lp64/ medlow |  417 /   104 |9 / 3 |   67 /12 |
 rv64imafdc/  lp64d/ medlow |  416 /   103 |5 / 2 |6 / 1 |

Also similar to v1, this doesn't move RISC-V SPEC scores at all.

gcc/ChangeLog:
* config/riscv/riscv.md (mvconst_internal): Add !ira_in_progress.

Suggested-by: Jeff Law 
Signed-off-by: Vineet Gupta 
---
changes since v1:
  - Fix bug: new condition to prevent recognition not splitting itself
---
 gcc/config/riscv/riscv.md | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index e00b8ee3579d..9b990ec2566d 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1997,13 +1997,16 @@
 
 ;; Pretend to have the ability to load complex const_int in order to get
 ;; better code generation around them.
-;;
 ;; But avoid constants that are special cased elsewhere.
+;;
+;; Hide it from IRA register equiv recog* () to elide potential undoing of 
split
+;;
 (define_insn_and_split "*mvconst_internal"
   [(set (match_operand:GPR 0 "register_operand" "=r")
 (match_operand:GPR 1 "splittable_const_int_operand" "i"))]
-  "!(p2m1_shift_operand (operands[1], mode)
- || high_mask_shift_operand (operands[1], mode))"
+  "!ira_in_progress
+   && !(p2m1_shift_operand (operands[1], mode)
+|| high_mask_shift_operand (operands[1], mode))"
   "#"
   "&& 1"
   [(const_int 0)]
-- 
2.34.1

[COMMITTED] Docs: Minimally document standard C/C++ attribute syntax.

2023-10-06 Thread Sandra Loosemore

gcc/ChangeLog:

* doc/extend.texi (Function Attributes): Mention standard attribute
syntax.
(Variable Attributes): Likewise.
(Type Attributes): Likewise.
(Attribute Syntax): Likewise.
---
 gcc/doc/extend.texi | 74 +++--
 1 file changed, 52 insertions(+), 22 deletions(-)

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index b4770f1a149..e1129a4fb95 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -2537,13 +2537,14 @@ for each target.  However, a considerable number of 
attributes are
 supported by most, if not all targets.  Those are described in
 the @ref{Common Function Attributes} section.
 
-Function attributes are introduced by the @code{__attribute__} keyword
-in the declaration of a function, followed by an attribute specification
-enclosed in double parentheses.  You can specify multiple attributes in
-a declaration by separating them by commas within the double parentheses
-or by immediately following one attribute specification with another.
-@xref{Attribute Syntax}, for the exact rules on attribute syntax and
-placement.  Compatible attribute specifications on distinct declarations
+GCC provides two different ways to specify attributes: the traditional
+GNU syntax using @samp{__attribute__ ((...))} annotations, and the
+newer standard C and C++ syntax using @samp{[[...]]} with the
+@samp{gnu::} prefix on attribute names.  Note that the exact rules for
+placement of attributes in your source code are different depending on
+which syntax you use.  @xref{Attribute Syntax}, for details.
+
+Compatible attribute specifications on distinct declarations
 of the same function are merged.  An attribute specification that is not
 compatible with attributes already applied to a declaration of the same
 function is ignored with a warning.
@@ -7433,10 +7434,9 @@ when this attribute is present.
 @cindex attribute of variables
 @cindex variable attributes
 
-The keyword @code{__attribute__} allows you to specify special properties
+You can use attributes to specify special properties
 of variables, function parameters, or structure, union, and, in C++, class
-members.  This @code{__attribute__} keyword is followed by an attribute
-specification enclosed in double parentheses.  Some attributes are currently
+members.  Some attributes are currently
 defined generically for variables.  Other attributes are defined for
 variables on particular target systems.  Other attributes are available
 for functions (@pxref{Function Attributes}), labels (@pxref{Label Attributes}),
@@ -7445,8 +7445,12 @@ enumerators (@pxref{Enumerator Attributes}), statements
 Other front ends might define more attributes
 (@pxref{C++ Extensions,,Extensions to the C++ Language}).
 
-@xref{Attribute Syntax}, for details of the exact syntax for using
-attributes.
+GCC provides two different ways to specify attributes: the traditional
+GNU syntax using @samp{__attribute__ ((...))} annotations, and the
+newer standard C and C++ syntax using @samp{[[...]]} with the
+@samp{gnu::} prefix on attribute names.  Note that the exact rules for
+placement of attributes in your source code are different depending on
+which syntax you use.  @xref{Attribute Syntax}, for details.
 
 @menu
 * Common Variable Attributes::
@@ -8508,7 +8512,7 @@ placed in either the @code{.bss_below100} section or the
 @cindex attribute of types
 @cindex type attributes
 
-The keyword @code{__attribute__} allows you to specify various special
+You can use attributes to specify various special
 properties of types.  Some type attributes apply only to structure and
 union types, and in C++, also class types, while others can apply to
 any type defined via a @code{typedef} declaration.  Unless otherwise
@@ -8521,19 +8525,20 @@ labels (@pxref{Label  Attributes}), enumerators 
(@pxref{Enumerator
 Attributes}), statements (@pxref{Statement Attributes}), and for variables
 (@pxref{Variable Attributes}).
 
-The @code{__attribute__} keyword is followed by an attribute specification
-enclosed in double parentheses.
+GCC provides two different ways to specify attributes: the traditional
+GNU syntax using @samp{__attribute__ ((...))} annotations, and the
+newer standard C and C++ syntax using @samp{[[...]]} with the
+@samp{gnu::} prefix on attribute names.  Note that the exact rules for
+placement of attributes in your source code are different depending on
+which syntax you use.  @xref{Attribute Syntax}, for details.
 
 You may specify type attributes in an enum, struct or union type
 declaration or definition by placing them immediately after the
 @code{struct}, @code{union} or @code{enum} keyword.  You can also place
 them just past the closing curly brace of the definition, but this is less
 preferred because logically the type should be fully defined at 
-the closing brace.
-
-You can also include type attributes in a @code{typedef} declaration.
-@xref{Attribute Syntax}, for details of the

Re: [PATCH v2 2/2] *: add modern gettext

2023-10-06 Thread Arsen Arsenović

Hi Bruno,

Bruno Haible  writes:

>> * intlmacosx.m4: Import from gettext-0.22 (serial 8).
>
> A further suggestion (can be done in a separate patch, later):
>
> Use intlmacosx.m4 from gettext-0.22.3 (serial 9).
>
> This version enables portability to macOS 14, which was released
> on 2023-09-26. (Older versions of libintl crash on macOS 14, due
> to an incompatible change in macOS.) [1]

Thanks.  I'll update before pushing.
-- 
Arsen Arsenović


signature.asc
Description: PGP signature

Re: [PATCH v2 2/2] *: add modern gettext

2023-10-06 Thread Bruno Haible

Arsen Arsenović wrote:
> * intlmacosx.m4: Import from gettext-0.22 (serial 8).

A further suggestion (can be done in a separate patch, later):

Use intlmacosx.m4 from gettext-0.22.3 (serial 9).

This version enables portability to macOS 14, which was released
on 2023-09-26. (Older versions of libintl crash on macOS 14, due
to an incompatible change in macOS.) [1]

Bruno

[1] https://savannah.gnu.org/news/?id=10520

Re: [PATCH] test: Isolate slp-1.c check of target supports vect_strided5

2023-10-06 Thread Andrew Stubbs


On 15/09/2023 10:16, Juzhe-Zhong wrote:

This test failed in RISC-V:
FAIL: gcc.dg/vect/slp-1.c -flto -ffat-lto-objects  scan-tree-dump-times vect 
"vectorizing stmts using SLP" 4
FAIL: gcc.dg/vect/slp-1.c scan-tree-dump-times vect "vectorizing stmts using 
SLP" 4

Because this loop:
   /* SLP with unrolling by 8.  */
   for (i = 0; i < N; i++)
 {
   out[i*5] = 8;
   out[i*5 + 1] = 7;
   out[i*5 + 2] = 81;
   out[i*5 + 3] = 28;
   out[i*5 + 4] = 18;
 }

is using vect_load_lanes with array size = 5.
instead of SLP.

When we adjust the COST of LANES load store, then it will use SLP.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/slp-1.c: Add vect_stried5.

---
  gcc/testsuite/gcc.dg/vect/slp-1.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/slp-1.c 
b/gcc/testsuite/gcc.dg/vect/slp-1.c
index 82e4f6469fb..d4a13f12df6 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-1.c
@@ -122,5 +122,5 @@ int main (void)
  }
  
  /* { dg-final { scan-tree-dump-times "vectorized 4 loops" 1 "vect"  } } */

-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" } 
} */
-
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 4 "vect" { 
target {! vect_strided5 } } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 3 "vect" { 
target vect_strided5 } } } */


This patch causes a test regression on amdgcn because vect_strided5 is 
true (because check_effective_target_vect_fully_masked is true), but the 
testcase still gives the message 4 times. Perhaps because amdgcn uses 
masking and not vect_load_lanes?


Andrew

Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-06 Thread Manolis Tsamis

On Thu, Oct 5, 2023 at 1:05 AM Jeff Law  wrote:
>
>
>
> On 10/3/23 05:45, Manolis Tsamis wrote:
> > This is a new RTL pass that tries to optimize memory offset calculations
> > by moving them from add immediate instructions to the memory loads/stores.
> > For example it can transform this:
> >
> >addi t4,sp,16
> >add  t2,a6,t4
> >shl  t3,t2,1
> >ld   a2,0(t3)
> >addi a2,1
> >sd   a2,8(t2)
> >
> > into the following (one instruction less):
> >
> >add  t2,a6,sp
> >shl  t3,t2,1
> >ld   a2,32(t3)
> >addi a2,1
> >sd   a2,24(t2)
> >
> > Although there are places where this is done already, this pass is more
> > powerful and can handle the more difficult cases that are currently not
> > optimized. Also, it runs late enough and can optimize away unnecessary
> > stack pointer calculations.
> >
> > gcc/ChangeLog:
> >
> >   * Makefile.in: Add fold-mem-offsets.o.
> >   * passes.def: Schedule a new pass.
> >   * tree-pass.h (make_pass_fold_mem_offsets): Declare.
> >   * common.opt: New options.
> >   * doc/invoke.texi: Document new option.
> >   * fold-mem-offsets.cc: New file.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/riscv/fold-mem-offsets-1.c: New test.
> >   * gcc.target/riscv/fold-mem-offsets-2.c: New test.
> >   * gcc.target/riscv/fold-mem-offsets-3.c: New test.
> >
> > Signed-off-by: Manolis Tsamis 
>
>
> So I was ready to ACK, but realized there weren't any testresults for a
> primary platform mentioned.  So I ran this on x86.
>
> It's triggering one regression (code quality).
>
> Specifically gcc.target/i386/pr52146.c
>
> The f-m-o code is slightly worse than without f-m-o.
>
> Without f-m-o we get this:
>
> 9  B88000E0  movl$-18874240, %eax
> 9  FE
>10 0005 67C7  movl$0, (%eax)
>10  00
>11 000c C3ret
>
> With f-m-o we get this:
>
> 9  B800  movl$0, %eax
> 9  00
>10 0005 67C78080  movl$0, -18874240(%eax)
>10  00E0FE00
>10  00
>11 0010 C3ret
>
>
> The key being that we don't get rid of the original move instruction,
> nor does the original move instruction get smaller due to simplification
> of its constant.  Additionally, the memory store gets larger.  The net
> is a 4 byte increase in code size.
>
Yes, this case is not good for f-m-o. In theory there could be a cost
calculation step that tries to estimate the benefit of a
transformation, but given that f-m-o cannot transform code in a way
that we have big regressions it's unclear to me whether complicating
the code is worth it. At least if we can solve the issues in other
ways (also see discussion below).

>
> This is probably a fairly rare scenario and the original bug report was
> for a correctness issue in using addresses in the range
> 0x8000..0x in x32.  So I wouldn't lose any sleep if we
> adjusted the test to pass -fno-fold-mem-offsets.  But before doing that
> I wanted to give you the chance to ponder if this is something you'd
> prefer to improve in f-m-o itself.   At some level if the base register
> collapses down to 0, then we could take the offset as a constant address
> and try to recognize that form.  If that fails, then just consider the
> change unprofitable rather than trying to recognize it as reg+d.
>
> Anyway, waiting to hear your thoughts...
>
Yes, this testcase has been bugging me too, I have brought that up in
previous iterations as well.
I'm also not sure whether this is a code quality or a correctness
issue? From what I understand from the relevant ticket, if we emit
movl $0, -18874240 then it's wrong code?

With regards to the "recognize that the base register is 0", that
would be nice but how would we recognise that? f-m-o can only
calculate the folded offset but that is not enough to prove that the
base register is zero or not.

One thought that I've had is that this started being an issue on x86
when I enabled folding of mv REG, INT in addition to the existing ADD
REG, REG, INT. The idea was that a move will be folded to mv REG, 0
and on targets that we have a zero register that can be beneficial for
a number of reasons... but on x86 we don't have a zero register so the
benefit is much more limited anyway. So maybe we could disable folding
of moves on targets that don't have a zero register? That would solve
the issue and I believe it also makes sense in general. If so, is
there a way to query wether the target has such register?

Thoughts?

> If we do a V7, then we need to fix one spelling issue that shows up in
> several places (if we go with the v6 we can just fix it prior to
> committing).  Specifically in several places we need to replace
> "recognised" with "recognized".
>
Ok, I can do that :)

Manolis
>
> jeff

[committed] amdgcn: switch mov insns to compact syntax

2023-10-06 Thread Andrew Stubbs

I've just committed this patch. It should have no functional changes 
except to make it easier to add new alternatives into the 
alternative-heavy move instructions.


Andrewamdgcn: switch mov insns to compact syntax

The move instructions typically have many alternatives (and I'm about to add
more) so are good candidates for the new syntax.

This patch only converts the patterns where there are no significant changes to
the generated files. The other patterns can be converted another time.

gcc/ChangeLog:

* config/gcn/gcn-valu.md (*mov): Convert to compact syntax.
(mov_exec): Likewise.
(mov_sgprbase): Likewise.
* config/gcn/gcn.md (*mov_insn): Likewise.
(*movti_insn): Likewise.

diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
index 284dda73da9..32b170e8522 100644
--- a/gcc/config/gcn/gcn-valu.md
+++ b/gcc/config/gcn/gcn-valu.md
@@ -457,23 +457,21 @@ (define_insn "*mov"
(set_attr "length" "4,8")])
 
 (define_insn "mov_exec"
-  [(set (match_operand:V_1REG 0 "nonimmediate_operand"  "=v, v, v, v, v, m")
+  [(set (match_operand:V_1REG 0 "nonimmediate_operand")
(vec_merge:V_1REG
- (match_operand:V_1REG 1 "general_operand"  "vA, B, v,vA, m, v")
- (match_operand:V_1REG 2 "gcn_alu_or_unspec_operand"
-"U0,U0,vA,vA,U0,U0")
- (match_operand:DI 3 "register_operand" " e, e,cV,Sv, e, e")))
-   (clobber (match_scratch: 4 "=X, X, X, X,,"))]
+ (match_operand:V_1REG 1 "general_operand")
+ (match_operand:V_1REG 2 "gcn_alu_or_unspec_operand")
+ (match_operand:DI 3 "register_operand")))
+   (clobber (match_scratch: 4))]
   "!MEM_P (operands[0]) || REG_P (operands[1])"
-  "@
-   v_mov_b32\t%0, %1
-   v_mov_b32\t%0, %1
-   v_cndmask_b32\t%0, %2, %1, vcc
-   v_cndmask_b32\t%0, %2, %1, %3
-   #
-   #"
-  [(set_attr "type" "vop1,vop1,vop2,vop3a,*,*")
-   (set_attr "length" "4,8,4,8,16,16")])
+  {@ [cons: =0, 1, 2, 3, =4; attrs: type, length]
+  [v,vA,U0,e ,X ;vop1 ,4 ] v_mov_b32\t%0, %1
+  [v,B ,U0,e ,X ;vop1 ,8 ] v_mov_b32\t%0, %1
+  [v,v ,vA,cV,X ;vop2 ,4 ] v_cndmask_b32\t%0, %2, %1, vcc
+  [v,vA,vA,Sv,X ;vop3a,8 ] v_cndmask_b32\t%0, %2, %1, %3
+  [v,m ,U0,e ,*,16] #
+  [m,v ,U0,e ,*,16] #
+  })
 
 ; This variant does not accept an unspec, but does permit MEM
 ; read/modify/write which is necessary for maskstore.
@@ -644,19 +642,18 @@ (define_insn "mov_exec"
 ;   flat_load v, vT
 
 (define_insn "mov_sgprbase"
-  [(set (match_operand:V_1REG 0 "nonimmediate_operand" "= v, v, v, m")
+  [(set (match_operand:V_1REG 0 "nonimmediate_operand")
(unspec:V_1REG
- [(match_operand:V_1REG 1 "general_operand"   " vA,vB, m, v")]
+ [(match_operand:V_1REG 1 "general_operand")]
  UNSPEC_SGPRBASE))
-   (clobber (match_operand: 2 "register_operand"  "=,,,"))]
+   (clobber (match_operand: 2 "register_operand"))]
   "lra_in_progress || reload_completed"
-  "@
-   v_mov_b32\t%0, %1
-   v_mov_b32\t%0, %1
-   #
-   #"
-  [(set_attr "type" "vop1,vop1,*,*")
-   (set_attr "length" "4,8,12,12")])
+  {@ [cons: =0, 1, =2; attrs: type, length]
+  [v,vA,vop1,4 ] v_mov_b32\t%0, %1
+  [v,vB,vop1,8 ] ^
+  [v,m ,*   ,12] #
+  [m,v ,*   ,12] #
+  })
 
 (define_insn "mov_sgprbase"
   [(set (match_operand:V_2REG 0 "nonimmediate_operand" "= v, v, m")
@@ -676,17 +673,17 @@ (define_insn "mov_sgprbase"
(set_attr "length" "8,12,12")])
 
 (define_insn "mov_sgprbase"
-  [(set (match_operand:V_4REG 0 "nonimmediate_operand" "= v, v, m")
+  [(set (match_operand:V_4REG 0 "nonimmediate_operand")
(unspec:V_4REG
- [(match_operand:V_4REG 1 "general_operand"   "vDB, m, v")]
+ [(match_operand:V_4REG 1 "general_operand")]
  UNSPEC_SGPRBASE))
-   (clobber (match_operand: 2 "register_operand"  "=,,"))]
+   (clobber (match_operand: 2 "register_operand"))]
   "lra_in_progress || reload_completed"
-  "v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1\;v_mov_b32\t%J0, 
%J1\;v_mov_b32\t%K0, %K1
-   #
-   #"
-  [(set_attr "type" "vmult,*,*")
-   (set_attr "length" "8,12,12")])
+  {@ [cons: =0, 1, =2; attrs: type, length]
+  [v,vDB,vmult,8 ] v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, 
%H1\;v_mov_b32\t%J0, %J1\;v_mov_b32\t%K0, %K1
+  [v,m  ,*,12] #
+  [m,v  ,*,12] #
+  })
 
 ; reload_in was once a standard name, but here it's only referenced by
 ; gcn_secondary_reload.  It allows a reload with a scratch register.
diff --git a/gcc/config/gcn/gcn.md b/gcc/config/gcn/gcn.md
index 7065acf402b..30fe9e34a35 100644
--- a/gcc/config/gcn/gcn.md
+++ b/gcc/config/gcn/gcn.md
@@ -542,87 +542,76 @@ (define_insn "*movbi"
 ; 32bit move pattern
 
 (define_insn "*mov_insn"
-  [(set (match_operand:SISF 0 "nonimmediate_operand"
- "=SD,SD,SD,SD,RB,Sm,RS,v,Sg, v, v,RF,v,RLRG,   v,SD, v,RM")
-   (match_operand:SISF 1 "gcn_load_operand"
- "SSA, J, B,RB,Sm,RS,Sm,v, v,Sv,RF, v,B,   v,RLRG, Y,RM, v"))]
-  ""
-  "@

[PATCH v2 0/2] Replace intl/ with out-of-tree GNU gettext

2023-10-06 Thread Arsen Arsenović

Afternoon,

This patch is a rebase and rewording of
https://inbox.sourceware.org/20230925150921.894157-1-ar...@aarsen.me/

Changes since v1:
- Implement Brunos suggested changes to install.texi.
- Elaborate commit message in p2 (as requested by the Binutils
  maintainers).

Arsen Arsenović (2):
  intl: remove, in favor of out-of-tree gettext
  *: add modern gettext

 .gitignore |1 +
 Makefile.def   |   72 +-
 Makefile.in| 1612 +++
 config/gettext-sister.m4   |   35 +-
 config/gettext.m4  |  357 +-
 config/iconv.m4|  313 +-
 config/intlmacosx.m4   |   65 +
 configure  |   44 +-
 configure.ac   |   44 +-
 contrib/download_prerequisites |2 +
 contrib/prerequisites.md5  |1 +
 contrib/prerequisites.sha512   |1 +
 gcc/Makefile.in|8 +-
 gcc/aclocal.m4 |4 +
 gcc/configure  | 2001 +++-
 gcc/doc/install.texi   |   65 +-
 intl/ChangeLog |  306 --
 intl/Makefile.in   |  264 -
 intl/README|   21 -
 intl/VERSION   |1 -
 intl/aclocal.m4|   33 -
 intl/bindtextdom.c |  374 --
 intl/config.h.in   |  280 --
 intl/config.intl.in|   12 -
 intl/configure | 8288 
 intl/configure.ac  |  108 -
 intl/dcgettext.c   |   59 -
 intl/dcigettext.c  | 1238 -
 intl/dcngettext.c  |   60 -
 intl/dgettext.c|   60 -
 intl/dngettext.c   |   62 -
 intl/eval-plural.h |  114 -
 intl/explodename.c |  192 -
 intl/finddomain.c  |  195 -
 intl/gettext.c |   64 -
 intl/gettextP.h|  224 -
 intl/gmo.h |  148 -
 intl/hash-string.h |   59 -
 intl/intl-compat.c |  151 -
 intl/l10nflist.c   |  453 --
 intl/libgnuintl.h  |  341 --
 intl/loadinfo.h|  156 -
 intl/loadmsgcat.c  | 1322 -
 intl/localcharset.c|  398 --
 intl/localcharset.h|   42 -
 intl/locale.alias  |   78 -
 intl/localealias.c |  419 --
 intl/localename.c  |  772 ---
 intl/log.c |  104 -
 intl/ngettext.c|   68 -
 intl/osdep.c   |   24 -
 intl/plural-config.h   |1 -
 intl/plural-exp.c  |  156 -
 intl/plural-exp.h  |  132 -
 intl/plural.c  | 1540 --
 intl/plural.y  |  434 --
 intl/relocatable.c |  439 --
 intl/relocatable.h |   67 -
 intl/textdomain.c  |  142 -
 libcpp/aclocal.m4  |5 +
 libcpp/configure   | 2139 -
 libstdc++-v3/configure |  727 +--
 62 files changed, 5456 insertions(+), 21441 deletions(-)
 create mode 100644 config/intlmacosx.m4
 delete mode 100644 intl/ChangeLog
 delete mode 100644 intl/Makefile.in
 delete mode 100644 intl/README
 delete mode 100644 intl/VERSION
 delete mode 100644 intl/aclocal.m4
 delete mode 100644 intl/bindtextdom.c
 delete mode 100644 intl/config.h.in
 delete mode 100644 intl/config.intl.in
 delete mode 100755 intl/configure
 delete mode 100644 intl/configure.ac
 delete mode 100644 intl/dcgettext.c
 delete mode 100644 intl/dcigettext.c
 delete mode 100644 intl/dcngettext.c
 delete mode 100644 intl/dgettext.c
 delete mode 100644 intl/dngettext.c
 delete mode 100644 intl/eval-plural.h
 delete mode 100644 intl/explodename.c
 delete mode 100644 intl/finddomain.c
 delete mode 100644 intl/gettext.c
 delete mode 100644 intl/gettextP.h
 delete mode 100644 intl/gmo.h
 delete mode 100644 intl/hash-string.h
 delete mode 100644 intl/intl-compat.c
 delete mode 100644 intl/l10nflist.c
 delete mode 100644 intl/libgnuintl.h
 delete mode 100644 intl/loadinfo.h
 delete mode 100644 intl/loadmsgcat.c
 delete mode 100644 intl/localcharset.c
 delete mode 100644 intl/localcharset.h
 delete mode 100644 intl/locale.alias
 delete mode 100644 intl/localealias.c
 delete mode 100644 intl/localename.c
 delete mode 100644 intl/log.c
 delete mode 100644 intl/ngettext.c
 delete mode 100644 intl/osdep.c
 delete mode 100644 intl/plural-config.h
 delete mode 100644 intl/plural-exp.c
 delete mode 100644 intl/plural-exp.h
 delete mode 100644 intl/plural.c
 delete mode 100644 intl/plural.y
 delete mode 100644 intl/relocatable.c
 delete mode 100644 intl/relocatable.h
 delete mode 100644 intl/textdomain.c

-- 
2.42.0

[PATCH v2 1/2] intl: remove, in favor of out-of-tree gettext

2023-10-06 Thread Arsen Arsenović

ChangeLog:

* intl: Remove directory.  Replaced with out-of-tree GNU
gettext.
---
Note that the commit message here doesn't pass the changelog verifier.
What should I reword it as?  mklog suggests:

ChangeLog:

* intl/ChangeLog: Removed.
* intl/Makefile.in: Removed.
* intl/README: Removed.
* intl/VERSION: Removed.
* intl/aclocal.m4: Removed.
* intl/bindtextdom.c: Removed.
* intl/config.h.in: Removed.
* intl/config.intl.in: Removed.
* intl/configure: Removed.
* intl/configure.ac: Removed.
* intl/dcgettext.c: Removed.
* intl/dcigettext.c: Removed.
* intl/dcngettext.c: Removed.
* intl/dgettext.c: Removed.
* intl/dngettext.c: Removed.
* intl/eval-plural.h: Removed.
* intl/explodename.c: Removed.
* intl/finddomain.c: Removed.
* intl/gettext.c: Removed.
* intl/gettextP.h: Removed.
* intl/gmo.h: Removed.
* intl/hash-string.h: Removed.
* intl/intl-compat.c: Removed.
* intl/l10nflist.c: Removed.
* intl/libgnuintl.h: Removed.
* intl/loadinfo.h: Removed.
* intl/loadmsgcat.c: Removed.
* intl/localcharset.c: Removed.
* intl/localcharset.h: Removed.
* intl/locale.alias: Removed.
* intl/localealias.c: Removed.
* intl/localename.c: Removed.
* intl/log.c: Removed.
* intl/ngettext.c: Removed.
* intl/osdep.c: Removed.
* intl/plural-config.h: Removed.
* intl/plural-exp.c: Removed.
* intl/plural-exp.h: Removed.
* intl/plural.c: Removed.
* intl/plural.y: Removed.
* intl/relocatable.c: Removed.
* intl/relocatable.h: Removed.
* intl/textdomain.c: Removed.

 intl/ChangeLog   |  306 --
 intl/Makefile.in |  264 --
 intl/README  |   21 -
 intl/VERSION |1 -
 intl/aclocal.m4  |   33 -
 intl/bindtextdom.c   |  374 --
 intl/config.h.in |  280 --
 intl/config.intl.in  |   12 -
 intl/configure   | 8288 --
 intl/configure.ac|  108 -
 intl/dcgettext.c |   59 -
 intl/dcigettext.c| 1238 ---
 intl/dcngettext.c|   60 -
 intl/dgettext.c  |   60 -
 intl/dngettext.c |   62 -
 intl/eval-plural.h   |  114 -
 intl/explodename.c   |  192 -
 intl/finddomain.c|  195 -
 intl/gettext.c   |   64 -
 intl/gettextP.h  |  224 --
 intl/gmo.h   |  148 -
 intl/hash-string.h   |   59 -
 intl/intl-compat.c   |  151 -
 intl/l10nflist.c |  453 ---
 intl/libgnuintl.h|  341 --
 intl/loadinfo.h  |  156 -
 intl/loadmsgcat.c| 1322 ---
 intl/localcharset.c  |  398 --
 intl/localcharset.h  |   42 -
 intl/locale.alias|   78 -
 intl/localealias.c   |  419 ---
 intl/localename.c|  772 
 intl/log.c   |  104 -
 intl/ngettext.c  |   68 -
 intl/osdep.c |   24 -
 intl/plural-config.h |1 -
 intl/plural-exp.c|  156 -
 intl/plural-exp.h|  132 -
 intl/plural.c| 1540 
 intl/plural.y|  434 ---
 intl/relocatable.c   |  439 ---
 intl/relocatable.h   |   67 -
 intl/textdomain.c|  142 -
 43 files changed, 19401 deletions(-)
 delete mode 100644 intl/ChangeLog
 delete mode 100644 intl/Makefile.in
 delete mode 100644 intl/README
 delete mode 100644 intl/VERSION
 delete mode 100644 intl/aclocal.m4
 delete mode 100644 intl/bindtextdom.c
 delete mode 100644 intl/config.h.in
 delete mode 100644 intl/config.intl.in
 delete mode 100755 intl/configure
 delete mode 100644 intl/configure.ac
 delete mode 100644 intl/dcgettext.c
 delete mode 100644 intl/dcigettext.c
 delete mode 100644 intl/dcngettext.c
 delete mode 100644 intl/dgettext.c
 delete mode 100644 intl/dngettext.c
 delete mode 100644 intl/eval-plural.h
 delete mode 100644 intl/explodename.c
 delete mode 100644 intl/finddomain.c
 delete mode 100644 intl/gettext.c
 delete mode 100644 intl/gettextP.h
 delete mode 100644 intl/gmo.h
 delete mode 100644 intl/hash-string.h
 delete mode 100644 intl/intl-compat.c
 delete mode 100644 intl/l10nflist.c
 delete mode 100644 intl/libgnuintl.h
 delete mode 100644 intl/loadinfo.h
 delete mode 100644 intl/loadmsgcat.c
 delete mode 100644 intl/localcharset.c
 delete mode 100644 intl/localcharset.h
 delete mode 100644 intl/locale.alias
 delete mode 100644 intl/localealias.c
 delete mode 100644 intl/localename.c
 delete mode 100644 intl/log.c
 delete mode 100644 intl/ngettext.c
 delete mode 100644 intl/osdep.c
 delete mode 100644 intl/plural-config.h
 delete mode 100644 intl/plural-exp.c
 delete mode 100644 intl/plural-exp.h
 delete mode 100644 intl/plural.c
 delete mode 100644 intl/plural.y
 delete mode 100644 intl/relocatable.c
 delete mode 100644 intl/relocatable.h
 delete mode 100644 intl/textdomain.c

patch body dropped - it is just removals
-- 
2.42.0

RE: [X86 PATCH] Implement doubleword right shifts by 1 bit using s[ha]r+rcr.

2023-10-06 Thread Roger Sayle


Grr!  I've done it again.  ENOPATCH.

> -Original Message-
> From: Roger Sayle 
> Sent: 06 October 2023 14:58
> To: 'gcc-patches@gcc.gnu.org' 
> Cc: 'Uros Bizjak' 
> Subject: [X86 PATCH] Implement doubleword right shifts by 1 bit using
s[ha]r+rcr.
> 
> 
> This patch tweaks the i386 back-end's ix86_split_ashr and ix86_split_lshr
> functions to implement doubleword right shifts by 1 bit, using a shift of
the
> highpart that sets the carry flag followed by a rotate-carry-right
> (RCR) instruction on the lowpart.
> 
> Conceptually this is similar to the recent left shift patch, but with two
> complicating factors.  The first is that although the RCR sequence is
shorter, and is
> a ~3x performance improvement on AMD, my micro-benchmarking shows it
> ~10% slower on Intel.  Hence this patch also introduces a new
> X86_TUNE_USE_RCR tuning parameter.  The second is that I believe this is
the
> first time a "rotate-right-through-carry" and a right shift that sets the
carry flag
> from the least significant bit has been modelled in GCC RTL (on a MODE_CC
> target).  For this I've used the i386 back-end's UNSPEC_CC_NE which seems
> appropriate.  Finally rcrsi2 and rcrdi2 are separate define_insns so that
we can
> use their generator functions.
> 
> For the pair of functions:
> unsigned __int128 foo(unsigned __int128 x) { return x >> 1; }
> __int128 bar(__int128 x) { return x >> 1; }
> 
> with -O2 -march=znver4 we previously generated:
> 
> foo:movq%rdi, %rax
> movq%rsi, %rdx
> shrdq   $1, %rsi, %rax
> shrq%rdx
> ret
> bar:movq%rdi, %rax
> movq%rsi, %rdx
> shrdq   $1, %rsi, %rax
> sarq%rdx
> ret
> 
> with this patch we now generate:
> 
> foo:movq%rsi, %rdx
> movq%rdi, %rax
> shrq%rdx
> rcrq%rax
> ret
> bar:movq%rsi, %rdx
> movq%rdi, %rax
> sarq%rdx
> rcrq%rax
> ret
> 
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and
> make -k check, both with and without --target_board=unix{-m32} with no new
> failures.  And to provide additional testing, I've also bootstrapped and
regression
> tested a version of this patch where the RCR is always generated
(independent of
> the -march target) again with no regressions.  Ok for mainline?
> 
> 
> 2023-10-06  Roger Sayle  
> 
> gcc/ChangeLog
> * config/i386/i386-expand.c (ix86_split_ashr): Split shifts by
> one into ashr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR
> or -Oz.
> (ix86_split_lshr): Likewise, split shifts by one bit into
> lshr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz.
> * config/i386/i386.h (TARGET_USE_RCR): New backend macro.
> * config/i386/i386.md (rcrsi2): New define_insn for rcrl.
> (rcrdi2): New define_insn for rcrq.
> (3_carry): New define_insn for right shifts that
> set the carry flag from the least significant bit, modelled using
> UNSPEC_CC_NE.
> * config/i386/x86-tune.def (X86_TUNE_USE_RCR): New tuning
parameter
> controlling use of rcr 1 vs. shrd, which is significantly faster
on
> AMD processors.
> 
> gcc/testsuite/ChangeLog
> * gcc.target/i386/rcr-1.c: New 64-bit test case.
> * gcc.target/i386/rcr-2.c: New 32-bit test case.
> 
> 
> Thanks in advance,
> Roger
> --

diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index e42ff27..399eb8e 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -6496,6 +6496,22 @@ ix86_split_ashr (rtx *operands, rtx scratch, 
machine_mode mode)
emit_insn (gen_ashr3 (low[0], low[0],
  GEN_INT (count - half_width)));
}
+  else if (count == 1
+  && (TARGET_USE_RCR || optimize_size > 1))
+   {
+ if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+ if (mode == DImode)
+   {
+ emit_insn (gen_ashrsi3_carry (high[0], high[0]));
+ emit_insn (gen_rcrsi2 (low[0], low[0]));
+   }
+ else
+   {
+ emit_insn (gen_ashrdi3_carry (high[0], high[0]));
+ emit_insn (gen_rcrdi2 (low[0], low[0]));
+   }
+   }
   else
{
  gen_shrd = mode == DImode ? gen_x86_shrd : gen_x86_64_shrd;
@@ -6561,6 +6577,22 @@ ix86_split_lshr (rtx *operands, rtx scratch, 
machine_mode mode)
emit_insn (gen_lshr3 (low[0], low[0],
  GEN_INT (count - half_width)));
}
+  else if (count == 1
+  && (TARGET_USE_RCR || optimize_size > 1))
+   {
+ if (!rtx_equal_p (operands[0], operands[1]))
+   emit_move_insn (operands[0], operands[1]);
+ if (mode == DImode)
+   {
+ emit_insn

[X86 PATCH] Implement doubleword right shifts by 1 bit using s[ha]r+rcr.

2023-10-06 Thread Roger Sayle



This patch tweaks the i386 back-end's ix86_split_ashr and ix86_split_lshr
functions to implement doubleword right shifts by 1 bit, using a shift
of the highpart that sets the carry flag followed by a rotate-carry-right
(RCR) instruction on the lowpart.

Conceptually this is similar to the recent left shift patch, but with two
complicating factors.  The first is that although the RCR sequence is
shorter, and is a ~3x performance improvement on AMD, my micro-benchmarking
shows it ~10% slower on Intel.  Hence this patch also introduces a new
X86_TUNE_USE_RCR tuning parameter.  The second is that I believe this is
the first time a "rotate-right-through-carry" and a right shift that sets
the carry flag from the least significant bit has been modelled in GCC RTL
(on a MODE_CC target).  For this I've used the i386 back-end's UNSPEC_CC_NE
which seems appropriate.  Finally rcrsi2 and rcrdi2 are separate
define_insns so that we can use their generator functions.

For the pair of functions:
unsigned __int128 foo(unsigned __int128 x) { return x >> 1; }
__int128 bar(__int128 x) { return x >> 1; }

with -O2 -march=znver4 we previously generated:

foo:movq%rdi, %rax
movq%rsi, %rdx
shrdq   $1, %rsi, %rax
shrq%rdx
ret
bar:movq%rdi, %rax
movq%rsi, %rdx
shrdq   $1, %rsi, %rax
sarq%rdx
ret

with this patch we now generate:

foo:movq%rsi, %rdx
movq%rdi, %rax
shrq%rdx
rcrq%rax
ret
bar:movq%rsi, %rdx
movq%rdi, %rax
sarq%rdx
rcrq%rax
ret

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  And to provide additional testing, I've also
bootstrapped and regression tested a version of this patch where the
RCR is always generated (independent of the -march target) again with
no regressions.  Ok for mainline?


2023-10-06  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-expand.c (ix86_split_ashr): Split shifts by
one into ashr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR
or -Oz.
(ix86_split_lshr): Likewise, split shifts by one bit into
lshr[sd]i3_carry followed by rcr[sd]i2, if TARGET_USE_RCR or -Oz.
* config/i386/i386.h (TARGET_USE_RCR): New backend macro.
* config/i386/i386.md (rcrsi2): New define_insn for rcrl.
(rcrdi2): New define_insn for rcrq.
(3_carry): New define_insn for right shifts that
set the carry flag from the least significant bit, modelled using
UNSPEC_CC_NE.
* config/i386/x86-tune.def (X86_TUNE_USE_RCR): New tuning parameter
controlling use of rcr 1 vs. shrd, which is significantly faster on
AMD processors.

gcc/testsuite/ChangeLog
* gcc.target/i386/rcr-1.c: New 64-bit test case.
* gcc.target/i386/rcr-2.c: New 32-bit test case.


Thanks in advance,
Roger
--

Re: [PATCH v6] Implement new RTL optimizations pass: fold-mem-offsets.

2023-10-06 Thread Manolis Tsamis

On Thu, Oct 5, 2023 at 5:54 PM Jeff Law  wrote:
>
>
>
> On 10/3/23 05:45, Manolis Tsamis wrote:
> > This is a new RTL pass that tries to optimize memory offset calculations
>
> > +
> > +/* If INSN is a root memory instruction then compute a potentially new 
> > offset
> > +   for it and test if the resulting instruction is valid.  */
> > +static void
> > +do_check_validity (rtx_insn *insn, fold_mem_info *info)
> > +{
> > +  rtx mem, reg;
> > +  HOST_WIDE_INT cur_offset;
> > +  if (!get_fold_mem_root (insn, , , _offset))
> > +return;
> > +
> > +  HOST_WIDE_INT new_offset = cur_offset + info->added_offset;
> > +
> > +  /* Test if it is valid to change MEM's address offset to NEW_OFFSET.  */
> > +  int icode = INSN_CODE (insn);
> > +  rtx mem_addr = XEXP (mem, 0);
> > +  machine_mode mode = GET_MODE (mem_addr);
> > +  if (new_offset != 0)
> > +XEXP (mem, 0) = gen_rtx_PLUS (mode, reg, gen_int_mode (new_offset, 
> > mode));
> > +  else
> > +XEXP (mem, 0) = reg;
> > +
> > +  bool illegal = insn_invalid_p (insn, false)
> > +  || !memory_address_addr_space_p (mode, XEXP (mem, 0),
> > +   MEM_ADDR_SPACE (mem));
> > +
> > +  /* Restore the instruction.  */
> > +  XEXP (mem, 0) = mem_addr;
> > +  INSN_CODE (insn) = icode;
> > +
> > +  if (illegal)
> > +bitmap_ior_into (_fold_insns, info->fold_insns);
> > +  else
> > +bitmap_ior_into (_fold_insns, info->fold_insns);
> > +}
> > +
> So overnight testing with the latest version of your patch triggered a
> fault on the sh3-linux-gnu target with this code at -O2:
>
> > enum
> > {
> >   _ISspace = ((5) < 8 ? ((1 << (5)) << 8) : ((1 << (5)) >> 8)),
> > };
> > extern const unsigned short int **__ctype_b_loc (void)
> >  __attribute__ ((__nothrow__ )) __attribute__ ((__const__));
> > void
> > read_alias_file (const char *fname, int fname_len)
> > {
> >   char buf[400];
> >   char *cp;
> >   cp = buf;
> >   while (((*__ctype_b_loc ())[(int) (((unsigned char) cp[0]))] & 
> > (unsigned short int) _ISspace))
> >++cp;
> > }
> >
>
>
> The problem is we need to clear the INSN_CODE before we call recog.  In
> this specific case we had (mem (plus (reg) (offset)) after f-m-o does
> its job, the offset went to zero so we changed the structure of the RTL
> to (mem (reg)).  But we had the old INSN_CODE still in place which
> caused us to reference operands that no longer exist.
>
> A simple INSN_CODE (insn) = -1 before calling_insn_invalid_p is the
> right fix.
>
Thanks for catching that. I had some thoughts that the change that
doesn't emit rtx plus when offset is zero could do something like
this, but I missed that change needed.
Will fix this in the next iteration.

Manolis
> jeff

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-06 Thread Richard Biener

On Fri, 6 Oct 2023, Robin Dapp wrote:

> > We might need a similar assert
> > 
> >   gcc_assert (HONOR_SIGNED_ZEROS (vectype_out)
> >   && !HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));?
> 
> erm, obviously not that exact assert but more something like
> 
> if (HONOR_SIGNED_ZEROS && !HONOR_SIGN_DEPENDENT_ROUNDING...)
>   {
> if (dump)
>   ...
> return false;
>   }
> 
> or so.

Yeah, of course the whole point of a fold-left reduction is to
_not_ give up without -ffast-math which is why I added the above.
I obviously didn't fully verify what happens for an original
MINUS_EXPR.  I think it's required to give up for -frounding-math,
but I think I might have put the code to do that in a generic
enough place.

For x86 you need --param vect-partial-vector-usage=2 and an
AVX512 enabled arch like -march=skylake-avx512 or -march=znver4.

I think tranforming - x to + (-x) works for signed zeros.

So if you think you got everything correct the patch is OK as-is,
I just wasn't sure - maybe the neutral_element change deserves
a comment as to how MINUS_EXPR is handled.

Richard.

Re: [PATCH v2][GCC] aarch64: Enable Cortex-X4 CPU

2023-10-06 Thread Saurabh Jha


On 10/6/2023 2:24 PM, Saurabh Jha wrote:

Hey,

This patch adds support for the Cortex-X4 CPU to GCC.

Regression testing for aarch64-none-elf target and found no regressions.

Okay for gcc-master? I don't have commit access so if it looks okay, 
could someone please help me commit this?



Thanks,

Saurabh


gcc/ChangeLog

  * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add support for 
cortex-x4 core.

  * config/aarch64/aarch64-tune.md: Regenerated.
  * doc/invoke.texi: Add command-line option for cortex-x4 core.


Apologies, I forgot to add the patch file on my previous email.

Saurabh
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
index dd474233872..eae40b29df6 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -182,6 +182,8 @@ AARCH64_CORE("cortex-x2",  cortexx2, cortexa57, V9A,  
(SVE2_BITPERM, MEMTAG, I8M
 
 AARCH64_CORE("cortex-x3",  cortexx3, cortexa57, V9A,  (SVE2_BITPERM, MEMTAG, 
I8MM, BF16), neoversen2, 0x41, 0xd4e, -1)
 
+AARCH64_CORE("cortex-x4",  cortexx4, cortexa57, V9_2A,  (SVE2_BITPERM, MEMTAG, 
PROFILE), neoversen2, 0x41, 0xd81, -1)
+
 AARCH64_CORE("neoverse-n2", neoversen2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversen2, 0x41, 0xd49, -1)
 
 AARCH64_CORE("neoverse-v2", neoversev2, cortexa57, V9A, (I8MM, BF16, 
SVE2_BITPERM, RNG, MEMTAG, PROFILE), neoversev2, 0x41, 0xd4f, -1)
diff --git a/gcc/config/aarch64/aarch64-tune.md 
b/gcc/config/aarch64/aarch64-tune.md
index ccfcad53f80..c969277d617 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,neoversen2,neoversev2,demeter"
+   
"cortexa34,cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,octeontx,octeontxt81,octeontxt83,thunderxt81,thunderxt83,ampere1,ampere1a,emag,xgene1,falkor,qdf24xx,exynosm1,phecda,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa76,cortexa76ae,cortexa77,cortexa78,cortexa78ae,cortexa78c,cortexa65,cortexa65ae,cortexx1,cortexx1c,neoversen1,ares,neoversee1,octeontx2,octeontx2t98,octeontx2t96,octeontx2t93,octeontx2f95,octeontx2f95n,octeontx2f95mm,a64fx,tsv110,thunderx3t110,neoversev1,zeus,neoverse512tvb,saphira,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55,cortexa76cortexa55,cortexr82,cortexa510,cortexa520,cortexa710,cortexa715,cortexa720,cortexx2,cortexx3,cortexx4,neoversen2,neoversev2,demeter"
(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4085fc90907..ace972a5832 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -20631,9 +20631,9 @@ performance of the code.  Permissible values for this 
option are:
 @samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53},
 @samp{cortex-a75.cortex-a55}, @samp{cortex-a76.cortex-a55},
 @samp{cortex-r82}, @samp{cortex-x1}, @samp{cortex-x1c}, @samp{cortex-x2},
-@samp{cortex-x3}, @samp{cortex-a510}, @samp{cortex-a520}, @samp{cortex-a710},
-@samp{cortex-a715}, @samp{cortex-a720}, @samp{ampere1}, @samp{ampere1a},
-and @samp{native}.
+@samp{cortex-x3}, @samp{cortex-x4}, @samp{cortex-a510}, @samp{cortex-a520},
+@samp{cortex-a710}, @samp{cortex-a715}, @samp{cortex-a720}, @samp{ampere1},
+@samp{ampere1a}, and @samp{native}.
 
 The values @samp{cortex-a57.cortex-a53}, @samp{cortex-a72.cortex-a53},
 @samp{cortex-a73.cortex-a35}, @samp{cortex-a73.cortex-a53},

[PATCH v2][GCC] aarch64: Enable Cortex-X4 CPU

2023-10-06 Thread Saurabh Jha


Hey,

This patch adds support for the Cortex-X4 CPU to GCC.

Regression testing for aarch64-none-elf target and found no regressions.

Okay for gcc-master? I don't have commit access so if it looks okay, 
could someone please help me commit this?



Thanks,

Saurabh


gcc/ChangeLog

  * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add support for 
cortex-x4 core.

  * config/aarch64/aarch64-tune.md: Regenerated.
  * doc/invoke.texi: Add command-line option for cortex-x4 core.

[committed] amdgcn: silence warning

2023-10-06 Thread Andrew Stubbs


I've just committed this simple patch to silence an enum warning.

Andrewamdgcn: silence warning

gcc/ChangeLog:

* config/gcn/gcn.cc (print_operand): Adjust xcode type to fix warning.

diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
index f6cff659703..ef3b6472a52 100644
--- a/gcc/config/gcn/gcn.cc
+++ b/gcc/config/gcn/gcn.cc
@@ -6991,7 +6991,7 @@ print_operand_address (FILE *file, rtx mem)
 void
 print_operand (FILE *file, rtx x, int code)
 {
-  int xcode = x ? GET_CODE (x) : 0;
+  rtx_code xcode = x ? GET_CODE (x) : UNKNOWN;
   bool invert = false;
   switch (code)
 {

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-06 Thread Robin Dapp

> We might need a similar assert
> 
> gcc_assert (HONOR_SIGNED_ZEROS (vectype_out)
>   && !HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));?

erm, obviously not that exact assert but more something like

if (HONOR_SIGNED_ZEROS && !HONOR_SIGN_DEPENDENT_ROUNDING...)
  {
if (dump)
  ...
return false;
  }

or so.

Regards
 Robin

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-06 Thread Robin Dapp

> ... here we probably get PLUS_EXPR for MINUS_EXPR above but IIRC
> for MINUS_EXPR the !as_initial case should return positive zero.
> 
> Can you double-check?

You're referring to the canonicalization from a - CST to a + -CST so
that the neutral op would need to change with it?  Argh, good point.

>From what I can tell the only difference for MINUS_EXPR is that we
negate the reduction operand and then just continue as if it were
a PLUS_EXPR (which is the right thing to do also for +-0.0?).
At least I didn't observe a canonicalization and we don't call
neutral_op_for_reduction in between.

What we do have, though, is for the fully-masked case (you added
that recently):

  if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
{
  vector_identity = build_zero_cst (vectype_out);
  if (!HONOR_SIGNED_ZEROS (vectype_out))
;
  else
{
  gcc_assert (!HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));
  vector_identity = const_unop (NEGATE_EXPR, vectype_out,
vector_identity);
}
}

So for

  /* Handle MINUS by adding the negative.  */
  if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
{
  tree negated = make_ssa_name (vectype_out);

We might need a similar assert

  gcc_assert (HONOR_SIGNED_ZEROS (vectype_out)
  && !HONOR_SIGN_DEPENDENT_ROUNDING (vectype_out));?

Apart from that the only call with !as_inital is in 
vect_create_epilog_for_reduction.  I just instrumented it with an
assert (false) but i386.exp doesn't trigger it at all. 

Regards
 Robin

[PATCH 3/3] [GCC] arm: vst1_types_x4 ACLE intrinsics

2023-10-06 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vst1 intrinsic for arm32.
This patch adds the _x4 variants of the vst1 intrinsic.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1_u8_x4, vst1_u16_x4, vst1_u32_x4, vst1_u64_x4): New.
(vst1_s8_x4, vst1_s16_x4, vst1_s32_x4, vst1_s64_x4): New.
(vst1_f16_x4, vst1_f32_x4): New.
(vst1_p8_x4, vst1_p16_x4, vst1_p64_x4): New.
(vst1_bf16_x4): New.
* config/arm/arm_neon_builtins.def (vst1_x4): New entries.
* config/arm/neon.md (vst1_x4): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1_base_xN_1.c: Add new test.
* gcc.target/arm/simd/vst1_bf16_xN_1.c: Add new test.
* gcc.target/arm/simd/vst1_fp16_xN_1.c: Add new test.
* gcc.target/arm/simd/vst1_p64_xN_1.c: Add new test.
---
 gcc/config/arm/arm_neon.h | 114 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  10 ++
 .../gcc.target/arm/simd/vst1_base_xN_1.c  |  62 +-
 .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |   6 +-
 .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vst1_p64_xN_1.c   |   7 +-
 7 files changed, 200 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index b01171e5966..41e645d8352 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11258,6 +11258,14 @@ vst1_p64_x3 (poly64_t * __a, poly64x1x3_t __b)
   __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_p64_x4 (poly64_t * __a, poly64x1x4_t __b)
+{
+  union { poly64x1x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11351,6 +11359,38 @@ vst1_s64_x3 (int64_t * __a, int64x1x3_t __b)
   __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s8_x4 (int8_t * __a, int8x8x4_t __b)
+{
+  union { int8x8x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s16_x4 (int16_t * __a, int16x4x4_t __b)
+{
+  union { int16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s32_x4 (int32_t * __a, int32x2x4_t __b)
+{
+  union { int32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s64_x4 (int64_t * __a, int64x1x4_t __b)
+{
+  union { int64x1x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11403,6 +11443,24 @@ vst1_f32_x3 (float32_t * __a, float32x2x3_t __b)
   __builtin_neon_vst1_x3v2sf ((__builtin_neon_sf *) __a, __bu.__o);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f16_x4 (float16_t * __a, float16x4x4_t __b)
+{
+  union { float16x4x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v4hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f32_x4 (float32_t * __a, float32x2x4_t __b)
+{
+  union { float32x2x4_t __i; __builtin_neon_oi __o; } __bu = { __b };
+  __builtin_neon_vst1_x4v2sf ((__builtin_neon_sf *) __a, __bu.__o);
+}
+
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vst1_u8 (uint8_t * __a, uint8x8_t __b)
@@ -11495,6 +11553,38 @@ vst1_u64_x3 (uint64_t * __a, uint64x1x3_t __b)
   __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_u8_x4 (uint8_t *

[PATCH 2/3] [GCC] arm: vst1_types_x3 ACLE intrinsics

2023-10-06 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vst1 intrinsic for arm32.
This patch adds the _x3 variants of the vst1 intrinsic.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1_u8_x3, vst1_u16_x3, vst1_u32_x3, vst1_u64_x3): New.
(vst1_s8_x3, vst1_s16_x3, vst1_s32_x3, vst1_s64_x3): New.
(vst1_f16_x3, vst1_f32_x3): New.
(vst1_p8_x3, vst1_p16_x3, vst1_p64_x3): New.
(vst1_bf16_x3): New.
* config/arm/arm_neon_builtins.def (vst1_x3): New entries.
* config/arm/neon.md (vst1_x3): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1_base_xN_1.c: Add new test.
* gcc.target/arm/simd/vst1_bf16_xN_1.c: Add new test.
* gcc.target/arm/simd/vst1_fp16_xN_1.c: Add new test.
* gcc.target/arm/simd/vst1_p64_xN_1.c: Add new test.
---
 gcc/config/arm/arm_neon.h | 114 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  10 ++
 .../gcc.target/arm/simd/vst1_base_xN_1.c  |  63 +-
 .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |   7 +-
 .../gcc.target/arm/simd/vst1_p64_xN_1.c   |   7 +-
 7 files changed, 202 insertions(+), 7 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 4bd6093281b..b01171e5966 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11250,6 +11250,14 @@ vst1_p64_x2 (poly64_t * __a, poly64x1x2_t __b)
   __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_p64_x3 (poly64_t * __a, poly64x1x3_t __b)
+{
+  union { poly64x1x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11311,6 +11319,38 @@ vst1_s64_x2 (int64_t * __a, int64x1x2_t __b)
   __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s8_x3 (int8_t * __a, int8x8x3_t __b)
+{
+  union { int8x8x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s16_x3 (int16_t * __a, int16x4x3_t __b)
+{
+  union { int16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s32_x3 (int32_t * __a, int32x2x3_t __b)
+{
+  union { int32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s64_x3 (int64_t * __a, int64x1x3_t __b)
+{
+  union { int64x1x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11345,6 +11385,24 @@ vst1_f32_x2 (float32_t * __a, float32x2x2_t __b)
   __builtin_neon_vst1_x2v2sf ((__builtin_neon_sf *) __a, __bu.__o);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f16_x3 (float16_t * __a, float16x4x3_t __b)
+{
+  union { float16x4x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v4hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f32_x3 (float32_t * __a, float32x2x3_t __b)
+{
+  union { float32x2x3_t __i; __builtin_neon_ei __o; } __bu = { __b };
+  __builtin_neon_vst1_x3v2sf ((__builtin_neon_sf *) __a, __bu.__o);
+}
+
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vst1_u8 (uint8_t * __a, uint8x8_t __b)
@@ -11405,6 +11463,38 @@ vst1_u64_x2 (uint64_t * __a, uint64x1x2_t __b)
   __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_u8_x3 (uint8_t *

[PATCH 1/3] [GCC] arm: vst1_types_x2 ACLE intrinsics

2023-10-06 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vst1 intrinsic for arm32.
This patch adds the _x2 variants of the vst1 intrinsic. Tests use xN so that 
the latter variants (_x3, _x4) could be added.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vst1_u8_x2, vst1_u16_x2, vst1_u32_x2, vst1_u64_x32): New.
(vst1_s8_x2, vst1_s16_x2, vst1_s32_x2, vst1_s64_x2): New.
(vst1_f16_x2, vst1_f32_x2): New.
(vst1_p8_x2, vst1_p16_x2, vst1_p64_x2): New.
(vst1_bf16_x2): New.
* config/arm/arm_neon_builtins.def (vst1_x2): New entries.
* config/arm/neon.md (vst1_x2): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vst1_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vst1_p64_xN_1.c: Add new tests.
---
 gcc/config/arm/arm_neon.h | 114 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  10 ++
 .../gcc.target/arm/simd/vst1_base_xN_1.c  |  67 ++
 .../gcc.target/arm/simd/vst1_bf16_xN_1.c  |  13 ++
 .../gcc.target/arm/simd/vst1_fp16_xN_1.c  |  13 ++
 .../gcc.target/arm/simd/vst1_p64_xN_1.c   |  13 ++
 7 files changed, 231 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vst1_p64_xN_1.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index c03be9912f8..4bd6093281b 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -11242,6 +11242,14 @@ vst1_p64 (poly64_t * __a, poly64x1_t __b)
   __builtin_neon_vst1di ((__builtin_neon_di *) __a, __b);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_p64_x2 (poly64_t * __a, poly64x1x2_t __b)
+{
+  union { poly64x1x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11271,6 +11279,38 @@ vst1_s64 (int64_t * __a, int64x1_t __b)
   __builtin_neon_vst1di ((__builtin_neon_di *) __a, __b);
 }
 
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s8_x2 (int8_t * __a, int8x8x2_t __b)
+{
+  union { int8x8x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v8qi ((__builtin_neon_qi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s16_x2 (int16_t * __a, int16x4x2_t __b)
+{
+  union { int16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v4hi ((__builtin_neon_hi *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s32_x2 (int32_t * __a, int32x2x2_t __b)
+{
+  union { int32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v2si ((__builtin_neon_si *) __a, __bu.__o);
+}
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_s64_x2 (int64_t * __a, int64x1x2_t __b)
+{
+  union { int64x1x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2di ((__builtin_neon_di *) __a, __bu.__o);
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -11287,6 +11327,24 @@ vst1_f32 (float32_t * __a, float32x2_t __b)
   __builtin_neon_vst1v2sf ((__builtin_neon_sf *) __a, __b);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f16_x2 (float16_t * __a, float16x4x2_t __b)
+{
+  union { float16x4x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v4hf (__a, __bu.__o);
+}
+#endif
+
+__extension__ extern __inline void
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vst1_f32_x2 (float32_t * __a, float32x2x2_t __b)
+{
+  union { float32x2x2_t __i; __builtin_neon_ti __o; } __bu = { __b };
+  __builtin_neon_vst1_x2v2sf ((__builtin_neon_sf *) __a, __bu.__o);
+}
+
 __extension__ extern __inline void
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vst1_u8

[PATCH 0/3] [GCC] arm: vst1_types_xN ACLE intrinsics

2023-10-06 Thread Ezra.Sitorus

Add xN variants of vst1_types intrinsic.

Re: [V3][PATCH 0/3] New attribute "counted_by" to annotate bounds for C99 FAM(PR108896)

2023-10-06 Thread Siddhesh Poyarekar


On 2023-10-06 01:11, Martin Uecker wrote:

Am Donnerstag, dem 05.10.2023 um 15:35 -0700 schrieb Kees Cook:

On Thu, Oct 05, 2023 at 04:08:52PM -0400, Siddhesh Poyarekar wrote:

2. How would you handle signedness of the size field?  The size gets
converted to sizetype everywhere it is used and overflows/underflows may
produce interesting results.  Do you want to limit the types to unsigned or
do you want to add a disclaimer in the docs?  The former seems like the
*right* thing to do given that it is a new feature; best to enforce the
cleaner habit at the outset.


The Linux kernel has a lot of "int" counters, so the goal is to catch
negative offsets just like too-large offsets at runtime with the sanitizer
and report 0 for __bdos. Refactoring all these to be unsigned is going
to take time since at least some of them use the negative values as
special values unrelated to array indexing. :(

So, perhaps if unsigned counters are worth enforcing, can this be a
separate warning the kernel can turn off initially?



I think unsigned counters are much more problematic than signed ones
because wraparound errors are more difficult to find.

With unsigned you could potentially diagnose wraparound, but only if we
add -fsanitize=unsigned-overflow *and* add mechanism to mark intentional
wraparound *and* everybody adds this annotation after carefully screening
their code *and* rewriting all operations such as (counter - 3) + 5
where the wraparound in the intermediate expression is harmless.

For this reason, I do not think we should ever enforce some rule that
the counter has to be unsigned.

What we could do, is detect *storing* negative values into the
counter at run-time using UBSan. (but if negative values are
used for special cases, one also should be able to turn this
off).


All of the object size detection relies on object sizes being sizetype. 
The closest we could do with that is detect (sz != SIZE_MAX && sz > 
size_t / 2), since allocators typically cannot allocate more than 
SIZE_MAX / 2.


Sid

[PATCH 3/3] [GCC] arm: vld1q_types_x4 ACLE intrinsics

2023-10-06 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vld1q intrinsic for arm32.
This patch adds the _x4 variants of the vld1q intrinsic. This depends on the 
the _x2 patch.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x4, vld1q_u16_x4, vld1q_u32_x4, vld1q_u64_x4): New.
(vld1q_s8_x4, vld1q_s16_x4, vld1q_s32_x4, vld1q_s64_x4): New.
(vld1q_f16_x4, vld1q_f32_x4): New.
(vld1q_p8_x4, vld1q_p16_x4, vld1q_p64_x4): New.
(vld1q_bf16_x4): New.
* config/arm/arm_neon_builtins.def (vld1_x4): New entries.
* config/arm/neon.md (vld1_x4): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new tests.
---
 gcc/config/arm/arm_neon.h | 128 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  30 
 .../gcc.target/arm/simd/vld1q_base_xN_1.c |  59 
 .../gcc.target/arm/simd/vld1q_bf16_xN_1.c |   6 +
 .../gcc.target/arm/simd/vld1q_fp16_xN_1.c |   6 +
 .../gcc.target/arm/simd/vld1q_p64_xN_1.c  |   6 +
 7 files changed, 236 insertions(+)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 557873ac028..c03be9912f8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10421,6 +10421,15 @@ vld1q_p64_x3 (const poly64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline poly64x2x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_p64_x4 (const poly64_t * __a)
+{
+  union { poly64x2x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10522,6 +10531,42 @@ vld1q_s64_x3 (const int64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline int8x16x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s8_x4 (const uint8_t * __a)
+{
+  union { int8x16x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x8x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s16_x4 (const uint16_t * __a)
+{
+  union { int16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x4x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s32_x4 (const int32_t * __a)
+{
+  union { int32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x2x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s64_x4 (const int64_t * __a)
+{
+  union { int64x2x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10578,6 +10623,26 @@ vld1q_f32_x3 (const float32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x8x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f16_x4 (const float16_t * __a)
+{
+  union { float16x8x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v8hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inline float32x4x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f32_x4 (const float32_t * __a)
+{
+  union { float32x4x4_t __i; __builtin_neon_xi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x4v4sf ((const __builtin_neon_sf *) __a);
+  return __rv.__i;
+}
+
 __extension__ extern __inline uint8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1q_u8 (const uint8_t * __a)
@@ -10678,6 +10743,42 @@ vld1q_u64_x3 (const uint64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline uint8x16x4_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_u8_x4 (const uint8_t * __a)
+{
+  union { uint8x16x4_t __i; __builtin_neon_xi __o; }

[PATCH 2/3] [GCC] arm: vld1q_types_x3 ACLE intrinsics

2023-10-06 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vld1q intrinsic for arm32.
This patch adds the _x3 variants of the vld1q intrinsic. This depends on the 
the _x2 patch.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x3, vld1q_u16_x3, vld1q_u32_x3, vld1q_u64_x3): New.
(vld1q_s8_x3, vld1q_s16_x3, vld1q_s32_x3, vld1q_s64_x3): New.
(vld1q_f16_x3, vld1q_f32_x3): New.
(vld1q_p8_x3, vld1q_p16_x3, vld1q_p64_x3): New.
(vld1q_bf16_x3): New.
* config/arm/arm_neon_builtins.def (vld1_x3): New entries.
* config/arm/neon.md (vld1_x3): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new tests.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new tests.
---
 gcc/config/arm/arm_neon.h | 128 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  27 
 .../gcc.target/arm/simd/vld1q_base_xN_1.c |  63 -
 .../gcc.target/arm/simd/vld1q_bf16_xN_1.c |   6 +
 .../gcc.target/arm/simd/vld1q_fp16_xN_1.c |   7 +-
 .../gcc.target/arm/simd/vld1q_p64_xN_1.c  |   7 +-
 7 files changed, 236 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index 3eb41c6bdc8..557873ac028 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10412,6 +10412,15 @@ vld1q_p64_x2 (const poly64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline poly64x2x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_p64_x3 (const poly64_t * __a)
+{
+  union { poly64x2x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10477,6 +10486,42 @@ vld1q_s64_x2 (const int64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline int8x16x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s8_x3 (const uint8_t * __a)
+{
+  union { int8x16x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x8x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s16_x3 (const uint16_t * __a)
+{
+  union { int16x8x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x4x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s32_x3 (const int32_t * __a)
+{
+  union { int32x4x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x2x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s64_x3 (const int64_t * __a)
+{
+  union { int64x2x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10513,6 +10558,26 @@ vld1q_f32_x2 (const float32_t * __a)
   return __rv.__i;
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x8x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f16_x3 (const float16_t * __a)
+{
+  union { float16x8x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v8hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inline float32x4x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f32_x3 (const float32_t * __a)
+{
+  union { float32x4x3_t __i; __builtin_neon_ci __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x3v4sf ((const __builtin_neon_sf *) __a);
+  return __rv.__i;
+}
+
 __extension__ extern __inline uint8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
 vld1q_u8 (const uint8_t * __a)
@@ -10577,6 +10642,42 @@ vld1q_u64_x2 (const uint64_t * __a)
   return __rv.__i;
 }
 
+__extension__ extern __inline uint8x16x3_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_u8_x3 (const uint8_t * __a)
+{
+  union { uint8x16x3_t __i;

[PATCH 1/3] [GCC] arm: vld1q_types_x2 ACLE intrinsics

2023-10-06 Thread Ezra.Sitorus

From: Ezra Sitorus 

This patch is part of a series of patches implementing the _xN variants of the 
vld1q intrinsic for arm32.
This patch adds the _x2 variants of the vld1q intrinsic. Tests use xN so that 
the latter variants (_x3, _x4) could be added.

ACLE documents are at https://developer.arm.com/documentation/ihi0053/latest/
ISA documents are at https://developer.arm.com/documentation/ddi0487/latest/

gcc/ChangeLog:
* config/arm/arm_neon.h
(vld1q_u8_x2, vld1q_u16_x2, vld1q_u32_x2, vld1q_u64_x2): New.
(vld1q_s8_x2, vld1q_s16_x2, vld1q_s32_x2, vld1q_s64_x2): New.
(vld1q_f16_x2, vld1q_f32_x2): New.
(vld1q_p8_x2, vld1q_p16_x2, vld1q_p64_x2): New.
(vld1q_bf16_x2): New.
* config/arm/arm_neon_builtins.def (vld1_x2): New entries.
* config/arm/neon.md (vld1_x2): New.

gcc/testsuite/ChangeLog:
* gcc.target/arm/simd/vld1q_base_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_bf16_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_fp16_xN_1.c: Add new test.
* gcc.target/arm/simd/vld1q_p64_xN_1.c: Add new test.
---
 gcc/config/arm/arm_neon.h | 128 ++
 gcc/config/arm/arm_neon_builtins.def  |   1 +
 gcc/config/arm/neon.md|  10 ++
 .../gcc.target/arm/simd/vld1q_base_xN_1.c |  67 +
 .../gcc.target/arm/simd/vld1q_bf16_xN_1.c |  13 ++
 .../gcc.target/arm/simd/vld1q_fp16_xN_1.c |  14 ++
 .../gcc.target/arm/simd/vld1q_p64_xN_1.c  |  14 ++
 7 files changed, 247 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_base_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_bf16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_fp16_xN_1.c
 create mode 100644 gcc/testsuite/gcc.target/arm/simd/vld1q_p64_xN_1.c

diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h
index cdfdb44259a..3eb41c6bdc8 100644
--- a/gcc/config/arm/arm_neon.h
+++ b/gcc/config/arm/arm_neon.h
@@ -10403,6 +10403,15 @@ vld1q_p64 (const poly64_t * __a)
   return (poly64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
 }
 
+__extension__ extern __inline poly64x2x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_p64_x2 (const poly64_t * __a)
+{
+  union { poly64x2x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #pragma GCC pop_options
 __extension__ extern __inline int8x16_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10432,6 +10441,42 @@ vld1q_s64 (const int64_t * __a)
   return (int64x2_t)__builtin_neon_vld1v2di ((const __builtin_neon_di *) __a);
 }
 
+__extension__ extern __inline int8x16x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s8_x2 (const int8_t * __a)
+{
+  union { int8x16x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v16qi ((const __builtin_neon_qi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int16x8x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s16_x2 (const int16_t * __a)
+{
+  union { int16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v8hi ((const __builtin_neon_hi *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int32x4x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s32_x2 (const int32_t * __a)
+{
+  union { int32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v4si ((const __builtin_neon_si *) __a);
+  return __rv.__i;
+}
+
+__extension__ extern __inline int64x2x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_s64_x2 (const int64_t * __a)
+{
+  union { int64x2x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v2di ((const __builtin_neon_di *) __a);
+  return __rv.__i;
+}
+
 #if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
 __extension__ extern __inline float16x8_t
 __attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
@@ -10448,6 +10493,26 @@ vld1q_f32 (const float32_t * __a)
   return (float32x4_t)__builtin_neon_vld1v4sf ((const __builtin_neon_sf *) 
__a);
 }
 
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
+__extension__ extern __inline float16x8x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f16_x2 (const float16_t * __a)
+{
+  union { float16x8x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v8hf (__a);
+  return __rv.__i;
+}
+#endif
+
+__extension__ extern __inline float32x4x2_t
+__attribute__  ((__always_inline__, __gnu_inline__, __artificial__))
+vld1q_f32_x2 (const float32_t * __a)
+{
+  union { float32x4x2_t __i; __builtin_neon_oi __o; } __rv;
+  __rv.__o = __builtin_neon_vld1_x2v4sf ((const

[PATCH 0/3] [GCC] arm: vld1q_types_xN ACLE intrinsics

2023-10-06 Thread Ezra.Sitorus

Add xN variants of vld1q_types intrinsic.

Re: [PATCH 01/22] Add condition coverage profiling

2023-10-06 Thread Richard Biener

On Thu, 5 Oct 2023, Jan Hubicka wrote:

[...]
> Richi, can you please look at the gimple matching part?

What did you have in mind?  I couldn't find anything obvious in the
patch counting as gimple matching - do you have a pointer?

Thanks,
Richard.

Re: [PATCH v4] [tree-optimization/110279] Consider FMA in get_reassociation_width

2023-10-06 Thread Richard Biener

On Thu, Sep 14, 2023 at 2:43 PM Di Zhao OS
 wrote:
>
> This is a new version of the patch on "nested FMA".
> Sorry for updating this after so long, I've been studying and
> writing micro cases to sort out the cause of the regression.

Sorry for taking so long to reply.

> First, following previous discussion:
> (https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629080.html)
>
> 1. From testing more altered cases, I don't think the
> problem is that reassociation works locally. In that:
>
>   1) On the example with multiplications:
>
> tmp1 = a + c * c + d * d + x * y;
> tmp2 = x * tmp1;
> result += (a + c + d + tmp2);
>
>   Given "result" rewritten by width=2, the performance is
>   worse if we rewrite "tmp1" with width=2. In contrast, if we
>   remove the multiplications from the example (and make "tmp1"
>   not singe used), and still rewrite "result" by width=2, then
>   rewriting "tmp1" with width=2 is better. (Make sense because
>   the tree's depth at "result" is still smaller if we rewrite
>   "tmp1".)
>
>   2) I tried to modify the assembly code of the example without
>   FMA, so the width of "result" is 4. On Ampere1 there's no
>   obvious improvement. So although this is an interesting
>   problem, it doesn't seem like the cause of the regression.

OK, I see.

> 2. From assembly code of the case with FMA, one problem is
> that, rewriting "tmp1" to parallel didn't decrease the
> minimum CPU cycles (taking MULT_EXPRs into account), but
> increased code size, so the overhead is increased.
>
>a) When "tmp1" is not re-written to parallel:
> fmadd d31, d2, d2, d30
> fmadd d31, d3, d3, d31
> fmadd d31, d4, d5, d31  //"tmp1"
> fmadd d31, d31, d4, d3
>
>b) When "tmp1" is re-written to parallel:
> fmul  d31, d4, d5
> fmadd d27, d2, d2, d30
> fmadd d31, d3, d3, d31
> fadd  d31, d31, d27 //"tmp1"
> fmadd d31, d31, d4, d3
>
> For version a), there are 3 dependent FMAs to calculate "tmp1".
> For version b), there are also 3 dependent instructions in the
> longer path: the 1st, 3rd and 4th.

Yes, it doesn't really change anything.  The patch has

+  /* If there's code like "acc = a * b + c * d + acc" in a tight loop, some
+ uarchs can execute results like:
+
+   _1 = a * b;
+   _2 = .FMA (c, d, _1);
+   acc_1 = acc_0 + _2;
+
+ in parallel, while turning it into
+
+   _1 = .FMA(a, b, acc_0);
+   acc_1 = .FMA(c, d, _1);
+
+ hinders that, because then the first FMA depends on the result
of preceding
+ iteration.  */

I can't see what can be run in parallel for the first case.  The .FMA
depends on the multiplication a * b.  Iff the uarch somehow decomposes
.FMA into multiply + add then the c * d multiply could run in parallel
with the a * b multiply which _might_ be able to hide some of the
latency of the full .FMA.  Like on x86 Zen FMA has a latency of 4
cycles but a multiply only 3.  But I never got confirmation from any
of the CPU designers that .FMAs are issued when the multiply
operands are ready and the add operand can be forwarded.

I also wonder why the multiplications of the two-FMA sequence
then cannot be executed at the same time?  So I have some doubt
of the theory above.

Iff this really is the reason for the sequence to execute with lower
overall latency and we want to attack this on GIMPLE then I think
we need a target hook telling us this fact (I also wonder if such
behavior can be modeled in the scheduler pipeline description at all?)

> So it seems to me the current get_reassociation_width algorithm
> isn't optimal in the presence of FMA. So I modified the patch to
> improve get_reassociation_width, rather than check for code
> patterns. (Although there could be some other complicated
> factors so the regression is more obvious when there's "nested
> FMA". But with this patch that should be avoided or reduced.)
>
> With this patch 508.namd_r 1-copy run has 7% improvement on
> Ampere1, on Intel Xeon there's about 3%. While I'm still
> collecting data on other CPUs, I'd like to know how do you
> think of this.
>
> About changes in the patch:
>
> 1. When the op list forms a complete FMA chain, try to search
> for a smaller width considering the benefit of using FMA. With
> a smaller width, the increment of code size is smaller when
> breaking the chain.

But this is all highly target specific (code size even more so).

How I understand your approach to fixing the issue leads me to
the suggestion to prioritize parallel rewriting, thus alter rank_ops_for_fma,
taking the reassoc width into account (the computed width should be
unchanged from rank_ops_for_fma) instead of "fixing up" the parallel
rewriting of FMAs (well, they are not yet formed of course).
get_reassociation_width has 'get_required_cycles', the above theory
could be verified with a very simple toy pipeline model.  We'd have
to ask the target for the reassoc width for MULT_EXPRs as well (or

Re: [PATCH] ifcvt/vect: Emit COND_ADD for conditional scalar reduction.

2023-10-06 Thread Richard Biener

On Thu, 5 Oct 2023, Robin Dapp wrote:

> Hi Tamar,
> 
> > The only comment I have is whether you actually need this helper
> > function? It looks like all the uses of it are in cases you have, or
> > will call conditional_internal_fn_code directly.
> removed the cond_fn_p entirely in the attached v3.
> 
> Bootstrapped and regtested on x86_64, aarch64 and power10.

Looks good - I only have one question, see below ...

> Regards
>  Robin
> 
> Subject: [PATCH v3] ifcvt/vect: Emit COND_ADD for conditional scalar
>  reduction.
> 
> As described in PR111401 we currently emit a COND and a PLUS expression
> for conditional reductions.  This makes it difficult to combine both
> into a masked reduction statement later.
> This patch improves that by directly emitting a COND_ADD during ifcvt and
> adjusting some vectorizer code to handle it.
> 
> It also makes neutral_op_for_reduction return -0 if HONOR_SIGNED_ZEROS
> is true.
> 
> gcc/ChangeLog:
> 
>   PR middle-end/111401
>   * tree-if-conv.cc (convert_scalar_cond_reduction): Emit COND_ADD
>   if supported.
>   (predicate_scalar_phi): Add whitespace.
>   * tree-vect-loop.cc (fold_left_reduction_fn): Add IFN_COND_ADD.
>   (neutral_op_for_reduction): Return -0 for PLUS.
>   (vect_is_simple_reduction): Don't count else operand in
>   COND_ADD.
>   (vect_create_epilog_for_reduction): Fix whitespace.
>   (vectorize_fold_left_reduction): Add COND_ADD handling.
>   (vectorizable_reduction): Don't count else operand in COND_ADD.
>   (vect_transform_reduction): Add COND_ADD handling.
>   * tree-vectorizer.h (neutral_op_for_reduction): Add default
>   parameter.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c: New test.
>   * gcc.target/riscv/rvv/autovec/cond/pr111401.c: New test.
> ---
>  .../vect-cond-reduc-in-order-2-signed-zero.c  | 141 
>  .../riscv/rvv/autovec/cond/pr111401.c | 139 
>  gcc/tree-if-conv.cc   |  63 ++--
>  gcc/tree-vect-loop.cc | 150 ++
>  gcc/tree-vectorizer.h |   2 +-
>  5 files changed, 451 insertions(+), 44 deletions(-)
>  create mode 100644 
> gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/pr111401.c
> 
> diff --git 
> a/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c 
> b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> new file mode 100644
> index 000..7b46e7d8a2a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
> @@ -0,0 +1,141 @@
> +/* Make sure a -0 stays -0 when we perform a conditional reduction.  */
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-add-options ieee } */
> +/* { dg-additional-options "-std=gnu99 -fno-fast-math" } */
> +
> +#include "tree-vect.h"
> +
> +#include 
> +
> +#define N (VECTOR_BITS * 17)
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_plus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_plus_double_ref (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res += a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone))
> +reduc_minus_double (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res -= a[i];
> +  return res;
> +}
> +
> +double __attribute__ ((noinline, noclone, optimize ("0")))
> +reduc_minus_double_ref (double *restrict a, double init, int *cond, int n)
> +{
> +  double res = init;
> +  for (int i = 0; i < n; i++)
> +if (cond[i])
> +  res -= a[i];
> +  return res;
> +}
> +
> +int __attribute__ ((optimize (1)))
> +main ()
> +{
> +  int n = 19;
> +  double a[N];
> +  int cond1[N], cond2[N];
> +
> +  for (int i = 0; i < N; i++)
> +{
> +  a[i] = (i * 0.1) * (i & 1 ? 1 : -1);
> +  cond1[i] = 0;
> +  cond2[i] = i & 4 ? 1 : 0;
> +  asm volatile ("" ::: "memory");
> +}
> +
> +  double res1 = reduc_plus_double (a, -0.0, cond1, n);
> +  double ref1 = reduc_plus_double_ref (a, -0.0, cond1, n);
> +  double res2 = reduc_minus_double (a, -0.0, cond1, n);
> +  double ref2 = reduc_minus_double_ref (a, -0.0, cond1, n);
> +  double res3 = reduc_plus_double (a, -0.0, cond1, n);
> +  double ref3 = reduc_plus_double_ref (a, -0.0, cond1, n);
> +  double res4 = reduc_minus_double (a, -0.0, cond1, n);
> +  double ref4 = reduc_minus_double_ref (a, -0.0, cond1, n);
> +
> +  if (res1 != ref1 || signbit (res1) != signbit (ref1))
> +

Re: [WIP 3/4] OpenMP: Fortran front-end support for loop transforms.

2023-10-06 Thread Tobias Burnus


Just that it doesn't get forgotten, the attached patch needs to be
applied on top.

It handles 'tile'/'unroll' directive names in the 'contains'/'absent'
clauses of the 'assume'/'assumes' directives.

Currently, we don't do anything with it after parsing; hence, no further
changes are required. (We could add a testcsase, if someone thinks that
it is necessary.)

Tobias

On 01.10.23 22:10, Sandra Loosemore wrote:

From: Frederik Harwath 

gcc/fortran/ChangeLog:

...

  * openmp.cc (gfc_free_omp_clauses): Free tile_sizes field.
  (match_tile_sizes): New.
  (enum omp_mask2): Add OMP_CLAUSE_UNROLL_FULL, OMP_CLAUSE_UNROLL_NONE,
  OMP_CLAUSE_UNROLL_PARTIAL, and OMP_CLAUSE_TILE.
  (gfc_match_omp_clauses): Handle OMP_CLAUSE_UNROLL_FULL and
  OMP_CLAUSE_UNROLL_PARTIAL syntax.
  (OMP_UNROLL_CLAUSES): Define.
  (OMP_TILE_CLAUSES): Define.
  (gfc_match_omp_tile): New.
  (gfc_match_omp_unroll): New.
  (find_nested_loop_in_chain): Handle loop transforms.
  (find_nested_loop_or_transform_in_chain): New.
  (find_nested_loop_or_transform_in_block): New.
  (diagnose_intervening_code_errors_1): Handle loop transforms.
  (restructure_intervening_code): Handle loop transforms.
  (is_outer_iteration_variable): Adjust to avoid fencepost error.
  (check_nested_loop_in_chain): Handle loop transforms.
  (expr_uses_intervening_var): Add assertion.
  (is_intervening_var): Add assertion.
  (expr_is_invariant): Adjust to avoid fencepost error.
  (omp_unroll_removes_loop_nest): New.
  (resolve_nested_loop_transforms): New.
  (resolve_omp_unroll): New.
  (resolve_nested_loops): New, split from...
  (resolve_omp_do) ...here.
  (resolve_omp_tile): New.
  (omp_code_to_statement): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.
  (resolve_oacc_nested_loops): Adjust assertion.
  (gfc_resolve_omp_directive): Handle EXEC_OMP_TILE and EXEC_OMP_UNROLL.

...
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
diff --git a/gcc/fortran/openmp.cc b/gcc/fortran/openmp.cc
index dc0c8013c3d..5570b49b2b5 100644
--- a/gcc/fortran/openmp.cc
+++ b/gcc/fortran/openmp.cc
@@ -105,8 +105,8 @@ static const struct gfc_omp_directive gfc_omp_directives[] = {
   {"task", GFC_OMP_DIR_EXECUTABLE, ST_OMP_TASK},
   {"teams", GFC_OMP_DIR_EXECUTABLE, ST_OMP_TEAMS},
   {"threadprivate", GFC_OMP_DIR_DECLARATIVE, ST_OMP_THREADPRIVATE},
-  /* {"tile", GFC_OMP_DIR_EXECUTABLE, ST_OMP_TILE}, */
-  /* {"unroll", GFC_OMP_DIR_EXECUTABLE, ST_OMP_UNROLL}, */
+  {"tile", GFC_OMP_DIR_EXECUTABLE, ST_OMP_TILE},
+  {"unroll", GFC_OMP_DIR_EXECUTABLE, ST_OMP_UNROLL},
   {"workshare", GFC_OMP_DIR_EXECUTABLE, ST_OMP_WORKSHARE},
 };

[PATCH] combine: Fix handling of unsigned constants

2023-10-06 Thread Stefan Schulze Frielinghaus

If a CONST_INT represents an integer of a mode with fewer bits than in
HOST_WIDE_INT, then the integer is sign extended.  For those two
optimizations touched by this patch, the integers of interest have only
the most significant bit set w.r.t their mode, therefore, they were sign
extended.  Thus in order to get the integer of interest, we have to chop
off the high bits.

Bootstrapped and regtested on x64, powerpc64le, and s390.  Ok for
mainline?

gcc/ChangeLog:

* combine.cc (simplify_compare_const): Fix handling of unsigned
constants.
---
 gcc/combine.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 468b7fde911..80c4ff0fbaf 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -11923,7 +11923,7 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
   /* (unsigned) < 0x8000 is equivalent to >= 0.  */
   else if (is_a  (mode, _mode)
   && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
-  && ((unsigned HOST_WIDE_INT) const_op
+  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
(int_mode))
   == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1)))
{
  const_op = 0;
@@ -11962,7 +11962,7 @@ simplify_compare_const (enum rtx_code code, 
machine_mode mode,
   /* (unsigned) >= 0x8000 is equivalent to < 0.  */
   else if (is_a  (mode, _mode)
   && GET_MODE_PRECISION (int_mode) - 1 < HOST_BITS_PER_WIDE_INT
-  && ((unsigned HOST_WIDE_INT) const_op
+  && (((unsigned HOST_WIDE_INT) const_op & GET_MODE_MASK 
(int_mode))
   == HOST_WIDE_INT_1U << (GET_MODE_PRECISION (int_mode) - 1)))
{
  const_op = 0;
-- 
2.41.0

Re: [PATCH] MATCH: Fix infinite loop between `vec_cond(vec_cond(a, b, 0), c, d)` and `a & b`

2023-10-06 Thread Richard Biener

On Fri, Oct 6, 2023 at 1:15 AM Andrew Pinski  wrote:>
> Match has a pattern which converts `vec_cond(vec_cond(a,b,0), c, d)`
> into `vec_cond(a & b, c, d)` but since in this case a is a comparison
> fold will change `a & b` back into `vec_cond(a,b,0)` which causes an
> infinite loop.
> The best way to fix this is to enable the patterns for vec_cond(*,vec_cond,*)
> only for GIMPLE so we don't get an infinite loop for fold any more.
>
> Note this is a latent bug since these patterns were added in 
> r11-2577-g229752afe3156a
> and was exposed by r14-3350-g47b833a9abe1 where now able to remove a 
> VIEW_CONVERT_EXPR.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

OK (also for branches if you like)

Richard.

> PR middle-end/111699
>
> gcc/ChangeLog:
>
> * match.pd ((c ? a : b) op d, (c ? a : b) op (c ? d : e),
> (v ? w : 0) ? a : b, c1 ? c2 ? a : b : b): Enable only for GIMPLE.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.c-torture/compile/pr111699-1.c: New test.
> ---
>  gcc/match.pd | 5 +
>  gcc/testsuite/gcc.c-torture/compile/pr111699-1.c | 7 +++
>  2 files changed, 12 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr111699-1.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 4bdd83e6e06..31bfd8b6b68 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5045,6 +5045,10 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  /* (v ? w : 0) ? a : b is just (v & w) ? a : b
> Currently disabled after pass lvec because ARM understands
> VEC_COND_EXPR but not a plain v==w fed to BIT_IOR_EXPR.  */
> +#if GIMPLE
> +/* These can only be done in gimple as fold likes to convert:
> +   (CMP) & N into (CMP) ? N : 0
> +   and we try to match the same pattern again and again. */
>  (simplify
>   (vec_cond (vec_cond:s @0 @3 integer_zerop) @1 @2)
>   (if (optimize_vectors_before_lowering_p () && types_match (@0, @3))
> @@ -5079,6 +5083,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (vec_cond @0 @3 (vec_cond:s @1 @2 @3))
>   (if (optimize_vectors_before_lowering_p () && types_match (@0, @1))
>(vec_cond (bit_and (bit_not @0) @1) @2 @3)))
> +#endif
>
>  /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> types are compatible.  */
> diff --git a/gcc/testsuite/gcc.c-torture/compile/pr111699-1.c 
> b/gcc/testsuite/gcc.c-torture/compile/pr111699-1.c
> new file mode 100644
> index 000..87b127ed199
> --- /dev/null
> +++ b/gcc/testsuite/gcc.c-torture/compile/pr111699-1.c
> @@ -0,0 +1,7 @@
> +typedef unsigned char __attribute__((__vector_size__ (8))) V;
> +
> +void
> +foo (V *v)
> +{
> +  *v =  (V) 0x107B9A7FF >= (*v <= 0);
> +}
> --
> 2.39.3
>

Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.

2023-10-06 Thread Richard Biener

On Thu, Oct 5, 2023 at 10:46 PM Tamar Christina  wrote:
>
> > -Original Message-
> > From: Richard Sandiford 
> > Sent: Thursday, October 5, 2023 9:26 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> > ; Marcus Shawcroft
> > ; Kyrylo Tkachov 
> > Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
> >
> > Tamar Christina  writes:
> > >> -Original Message-
> > >> From: Richard Sandiford 
> > >> Sent: Thursday, October 5, 2023 8:29 PM
> > >> To: Tamar Christina 
> > >> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> > >> ; Marcus Shawcroft
> > >> ; Kyrylo Tkachov
> > 
> > >> Subject: Re: [PATCH]AArch64 Add SVE implementation for cond_copysign.
> > >>
> > >> Tamar Christina  writes:
> > >> > Hi All,
> > >> >
> > >> > This adds an implementation for masked copysign along with an
> > >> > optimized pattern for masked copysign (x, -1).
> > >>
> > >> It feels like we're ending up with a lot of AArch64-specific code
> > >> that just hard- codes the observation that changing the sign is
> > >> equivalent to changing the top bit.  We then need to make sure that
> > >> we choose the best way of changing the top bit for any given situation.
> > >>
> > >> Hard-coding the -1/negative case is one instance of that.  But it
> > >> looks like we also fail to use the best sequence for SVE2.  E.g.
> > >> [https://godbolt.org/z/ajh3MM5jv]:
> > >>
> > >> #include 
> > >>
> > >> void f(double *restrict a, double *restrict b) {
> > >> for (int i = 0; i < 100; ++i)
> > >> a[i] = __builtin_copysign(a[i], b[i]); }
> > >>
> > >> void g(uint64_t *restrict a, uint64_t *restrict b, uint64_t c) {
> > >> for (int i = 0; i < 100; ++i)
> > >> a[i] = (a[i] & ~c) | (b[i] & c); }
> > >>
> > >> gives:
> > >>
> > >> f:
> > >> mov x2, 0
> > >> mov w3, 100
> > >> whilelo p7.d, wzr, w3
> > >> .L2:
> > >> ld1dz30.d, p7/z, [x0, x2, lsl 3]
> > >> ld1dz31.d, p7/z, [x1, x2, lsl 3]
> > >> and z30.d, z30.d, #0x7fff
> > >> and z31.d, z31.d, #0x8000
> > >> orr z31.d, z31.d, z30.d
> > >> st1dz31.d, p7, [x0, x2, lsl 3]
> > >> incdx2
> > >> whilelo p7.d, w2, w3
> > >> b.any   .L2
> > >> ret
> > >> g:
> > >> mov x3, 0
> > >> mov w4, 100
> > >> mov z29.d, x2
> > >> whilelo p7.d, wzr, w4
> > >> .L6:
> > >> ld1dz30.d, p7/z, [x0, x3, lsl 3]
> > >> ld1dz31.d, p7/z, [x1, x3, lsl 3]
> > >> bsl z31.d, z31.d, z30.d, z29.d
> > >> st1dz31.d, p7, [x0, x3, lsl 3]
> > >> incdx3
> > >> whilelo p7.d, w3, w4
> > >> b.any   .L6
> > >> ret
> > >>
> > >> I saw that you originally tried to do this in match.pd and that the
> > >> decision was to fold to copysign instead.  But perhaps there's a
> > >> compromise where isel does something with the (new) copysign canonical
> > form?
> > >> I.e. could we go with your new version of the match.pd patch, and add
> > >> some isel stuff as a follow-on?
> > >>
> > >
> > > Sure if that's what's desired But..
> > >
> > > The example you posted above is for instance worse for x86
> > > https://godbolt.org/z/x9ccqxW6T where the first operation has a
> > > dependency chain of 2 and the latter of 3.  It's likely any open coding 
> > > of this
> > operation is going to hurt a target.
> > >
> > > So I'm unsure what isel transform this into...
> >
> > I didn't mean that we should go straight to using isel for the general 
> > case, just
> > for the new case.  The example above was instead trying to show the general
> > point that hiding the logic ops in target code is a double-edged sword.
>
> I see.. but the problem here is that transforming copysign (x, -1) into
> (x | 0x800) would require an integer operation on an FP value.  I'm happy 
> to
> do it but it seems like it'll be an AArch64 only thing anyway.
>
> If we want to do this we need to check can_change_mode_class or a hook.
> Most targets including x86 reject the conversion.  So it'll just be 
> effectively an AArch64
> thing.
>
> You're right that the actual equivalent transformation is this 
> https://godbolt.org/z/KesfrMv5z
> But the target won't allow it.
>
> >
> > The x86_64 example for the -1 case would be
> > https://godbolt.org/z/b9s6MaKs8 where the isel change would be an
> > improvement.  Without that, I guess
> > x86_64 will need to have a similar patch to the AArch64 one.
> >
>
> I think that's to be expected.  I think it's logical that every target just 
> needs to implement
> their optabs optimally.
>
> > That said, https://godbolt.org/z/e6nqoqbMh suggests that powerpc64 is
> > probably relying on the current copysign -> neg/abs transform.
> > (Not sure why the second function uses different IVs from the first.)
> >
> > Personally, I wouldn't be against a target hook that indicated

Re: [committed] contrib: add mdcompact

2023-10-06 Thread Andrea Corallo

Richard Biener  writes:

> On Thu, Oct 5, 2023 at 5:49 PM Andrea Corallo  wrote:
>>
>> Hello all,
>>
>> this patch checks in mdcompact, the tool written in elisp that I used
>> to mass convert all the multi choice pattern in the aarch64 back-end to
>> the new compact syntax.
>>
>> I tested it on Emacs 29 (might run on older versions as well not
>> sure), also I verified it runs cleanly on a few other back-ends (arm,
>> loongarch).
>>
>> The tool can be used to convert a single pattern, an open buffer or
>> all md files in a directory.
>>
>> The tool might need further adjustment to run on some specific
>> back-end, in case very happy to help.
>>
>> This patch was pre-approved here [1].
>
> Does the result generate identical insn-*.cc files?

No, there can be indentation/aesthetic differences.

BR

  Andrea

Re: [PATCH]middle-end ifcvt: Add support for conditional copysign

2023-10-06 Thread Richard Biener

On Thu, 5 Oct 2023, Tamar Christina wrote:

> Hi All,
> 
> This adds a masked variant of copysign.  Nothing very exciting just the
> general machinery to define and use a new masked IFN.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Note: This patch is part of a testseries and tests for it are added in the
> AArch64 patch that adds supports for the optab.
> 
> Ok for master?

OK I guess.

Richard.

> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/109154
>   * internal-fn.def (COPYSIGN): New.
>   * match.pd (UNCOND_BINARY, COND_BINARY): Map IFN_COPYSIGN to
>   IFN_COND_COPYSIGN.
>   * optabs.def (cond_copysign_optab, cond_len_copysign_optab): New.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
> index 
> a2023ab9c3d01c28f51eb8a59e08c59e4c39aa7f..d9e6bdef6977f7ab9c0290bf4f4568aad0380456
>  100644
> --- a/gcc/internal-fn.def
> +++ b/gcc/internal-fn.def
> @@ -268,6 +268,7 @@ DEF_INTERNAL_SIGNED_COND_FN (MOD, ECF_CONST, first, smod, 
> umod, binary)
>  DEF_INTERNAL_COND_FN (RDIV, ECF_CONST, sdiv, binary)
>  DEF_INTERNAL_SIGNED_COND_FN (MIN, ECF_CONST, first, smin, umin, binary)
>  DEF_INTERNAL_SIGNED_COND_FN (MAX, ECF_CONST, first, smax, umax, binary)
> +DEF_INTERNAL_COND_FN (COPYSIGN, ECF_CONST, copysign, binary)
>  DEF_INTERNAL_COND_FN (FMIN, ECF_CONST, fmin, binary)
>  DEF_INTERNAL_COND_FN (FMAX, ECF_CONST, fmax, binary)
>  DEF_INTERNAL_COND_FN (AND, ECF_CONST | ECF_NOTHROW, and, binary)
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> e12b508ce8ced64e62d94d6df82734cb630b8c1c..1e8d406e6c196b10b48d3c30dc29bffc1bc27bf4
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -93,14 +93,14 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>plus minus
>mult trunc_div trunc_mod rdiv
>min max
> -  IFN_FMIN IFN_FMAX
> +  IFN_FMIN IFN_FMAX IFN_COPYSIGN
>bit_and bit_ior bit_xor
>lshift rshift)
>  (define_operator_list COND_BINARY
>IFN_COND_ADD IFN_COND_SUB
>IFN_COND_MUL IFN_COND_DIV IFN_COND_MOD IFN_COND_RDIV
>IFN_COND_MIN IFN_COND_MAX
> -  IFN_COND_FMIN IFN_COND_FMAX
> +  IFN_COND_FMIN IFN_COND_FMAX IFN_COND_COPYSIGN
>IFN_COND_AND IFN_COND_IOR IFN_COND_XOR
>IFN_COND_SHL IFN_COND_SHR)
>  
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index 
> 2ccbe4197b7b700dcdb70e2c67cfcf12d7e381b1..93d4c63700cbaa9fea1177b3d6c7a3e12f609361
>  100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -256,6 +256,7 @@ OPTAB_D (cond_fms_optab, "cond_fms$a")
>  OPTAB_D (cond_fnma_optab, "cond_fnma$a")
>  OPTAB_D (cond_fnms_optab, "cond_fnms$a")
>  OPTAB_D (cond_neg_optab, "cond_neg$a")
> +OPTAB_D (cond_copysign_optab, "cond_copysign$F$a")
>  OPTAB_D (cond_one_cmpl_optab, "cond_one_cmpl$a")
>  OPTAB_D (cond_len_add_optab, "cond_len_add$a")
>  OPTAB_D (cond_len_sub_optab, "cond_len_sub$a")
> @@ -281,6 +282,7 @@ OPTAB_D (cond_len_fms_optab, "cond_len_fms$a")
>  OPTAB_D (cond_len_fnma_optab, "cond_len_fnma$a")
>  OPTAB_D (cond_len_fnms_optab, "cond_len_fnms$a")
>  OPTAB_D (cond_len_neg_optab, "cond_len_neg$a")
> +OPTAB_D (cond_len_copysign_optab, "cond_len_copysign$F$a")
>  OPTAB_D (cond_len_one_cmpl_optab, "cond_len_one_cmpl$a")
>  OPTAB_D (cmov_optab, "cmov$a6")
>  OPTAB_D (cstore_optab, "cstore$a4")
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [PATCH]middle-end ifcvt: Allow any const IFN in conditional blocks

2023-10-06 Thread Richard Biener

On Thu, 5 Oct 2023, Tamar Christina wrote:

> Hi All,
> 
> When ifcvt was initially added masking was not a thing and as such it was
> rather conservative in what it supported.
> 
> For builtins it only allowed C99 builtin functions which it knew it can fold
> away.
> 
> These days the vectorizer is able to deal with needing to mask IFNs itself.
> vectorizable_call is able vectorize the IFN by emitting a VEC_PERM_EXPR after
> the operation to emulate the masking.
> 
> This is then used by match.pd to conver the IFN into a masked variant if it's
> available.
> 
> For these reasons the restriction in ifconvert is no longer require and we
> needless block vectorization when we can effectively handle the operations.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Note: This patch is part of a testseries and tests for it are added in the
> AArch64 patch that adds supports for the optab.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/109154
>   * tree-if-conv.cc (if_convertible_stmt_p): Allow any const IFN.
> 
> --- inline copy of patch -- 
> diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
> index 
> a8c915913aed267edfb3ebd2c530aeca7cf51832..f76e0d8f2e6e0f59073fa8484b0b2c7a6cdc9783
>  100644
> --- a/gcc/tree-if-conv.cc
> +++ b/gcc/tree-if-conv.cc
> @@ -1129,6 +1129,16 @@ if_convertible_stmt_p (gimple *stmt, 
> vec refs)
>   return true;
> }
> }
> +
> + /* There are some IFN_s that are used to replace builtins but have the
> +same semantics.  Even if MASK_CALL cannot handle them vectorable_call
> +will insert the proper selection, so do not block conversion.  */
> + int flags = gimple_call_flags (stmt);
> + if ((flags & ECF_CONST)
> + && !(flags & ECF_LOOPING_CONST_OR_PURE)
> + && gimple_call_combined_fn (stmt) != CFN_LAST)
> +   return true;
> +

Can you instead move the check inside the if (fndecl) right before
it, changing it to check gimple_call_combined_fn?

OK with that change.

Richard.

>   return false;
>}
>  
> 
> 
> 
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Re: [committed] contrib: add mdcompact

2023-10-06 Thread Richard Biener

On Thu, Oct 5, 2023 at 5:49 PM Andrea Corallo  wrote:
>
> Hello all,
>
> this patch checks in mdcompact, the tool written in elisp that I used
> to mass convert all the multi choice pattern in the aarch64 back-end to
> the new compact syntax.
>
> I tested it on Emacs 29 (might run on older versions as well not
> sure), also I verified it runs cleanly on a few other back-ends (arm,
> loongarch).
>
> The tool can be used to convert a single pattern, an open buffer or
> all md files in a directory.
>
> The tool might need further adjustment to run on some specific
> back-end, in case very happy to help.
>
> This patch was pre-approved here [1].

Does the result generate identical insn-*.cc files?

> Best Regards
>
>   Andrea Corallo
>
> [1] 
>
> contrib/ChangeLog
>
> * mdcompact/mdcompact-testsuite.el: New file.
> * mdcompact/mdcompact.el: Likewise.
> * mdcompact/tests/1.md: Likewise.
> * mdcompact/tests/1.md.out: Likewise.
> * mdcompact/tests/2.md: Likewise.
> * mdcompact/tests/2.md.out: Likewise.
> * mdcompact/tests/3.md: Likewise.
> * mdcompact/tests/3.md.out: Likewise.
> * mdcompact/tests/4.md: Likewise.
> * mdcompact/tests/4.md.out: Likewise.
> * mdcompact/tests/5.md: Likewise.
> * mdcompact/tests/5.md.out: Likewise.
> * mdcompact/tests/6.md: Likewise.
> * mdcompact/tests/6.md.out: Likewise.
> * mdcompact/tests/7.md: Likewise.
> * mdcompact/tests/7.md.out: Likewise.
> ---
>  contrib/mdcompact/mdcompact-testsuite.el |  56 +
>  contrib/mdcompact/mdcompact.el   | 296 +++
>  contrib/mdcompact/tests/1.md |  36 +++
>  contrib/mdcompact/tests/1.md.out |  32 +++
>  contrib/mdcompact/tests/2.md |  25 ++
>  contrib/mdcompact/tests/2.md.out |  21 ++
>  contrib/mdcompact/tests/3.md |  16 ++
>  contrib/mdcompact/tests/3.md.out |  17 ++
>  contrib/mdcompact/tests/4.md |  17 ++
>  contrib/mdcompact/tests/4.md.out |  17 ++
>  contrib/mdcompact/tests/5.md |  12 +
>  contrib/mdcompact/tests/5.md.out |  11 +
>  contrib/mdcompact/tests/6.md |  11 +
>  contrib/mdcompact/tests/6.md.out |  11 +
>  contrib/mdcompact/tests/7.md |  11 +
>  contrib/mdcompact/tests/7.md.out |  11 +
>  16 files changed, 600 insertions(+)
>  create mode 100644 contrib/mdcompact/mdcompact-testsuite.el
>  create mode 100644 contrib/mdcompact/mdcompact.el
>  create mode 100644 contrib/mdcompact/tests/1.md
>  create mode 100644 contrib/mdcompact/tests/1.md.out
>  create mode 100644 contrib/mdcompact/tests/2.md
>  create mode 100644 contrib/mdcompact/tests/2.md.out
>  create mode 100644 contrib/mdcompact/tests/3.md
>  create mode 100644 contrib/mdcompact/tests/3.md.out
>  create mode 100644 contrib/mdcompact/tests/4.md
>  create mode 100644 contrib/mdcompact/tests/4.md.out
>  create mode 100644 contrib/mdcompact/tests/5.md
>  create mode 100644 contrib/mdcompact/tests/5.md.out
>  create mode 100644 contrib/mdcompact/tests/6.md
>  create mode 100644 contrib/mdcompact/tests/6.md.out
>  create mode 100644 contrib/mdcompact/tests/7.md
>  create mode 100644 contrib/mdcompact/tests/7.md.out
>
> diff --git a/contrib/mdcompact/mdcompact-testsuite.el 
> b/contrib/mdcompact/mdcompact-testsuite.el
> new file mode 100644
> index 000..494c0b5cd68
> --- /dev/null
> +++ b/contrib/mdcompact/mdcompact-testsuite.el
> @@ -0,0 +1,56 @@
> +;;; -*- lexical-binding: t; -*-
> +
> +;; This file is part of GCC.
> +
> +;; GCC is free software: you can redistribute it and/or modify it
> +;; under the terms of the GNU General Public License as published by
> +;; the Free Software Foundation, either version 3 of the License, or
> +;; (at your option) any later version.
> +
> +;; GCC is distributed in the hope that it will be useful, but WITHOUT
> +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +;; License for more details.
> +
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC.  If not, see .
> +
> +;;; Commentary:
> +
> +;;; Usage:
> +;; $ emacs -batch -l mdcompact.el -l mdcompact-testsuite.el -f 
> ert-run-tests-batch-and-exit
> +
> +;;; Code:
> +
> +(require 'mdcompact)
> +(require 'ert)
> +
> +(defconst mdcompat-test-directory (concat (file-name-directory
> +  (or load-file-name
> +   buffer-file-name))
> + "tests/"))
> +
> +(defun mdcompat-test-run (f)
> +  (with-temp-buffer
> +(insert-file-contents f)
> +(mdcomp-run-at-point)
> +(let ((a (buffer-string))
> + (b (with-temp-buffer
> +

Re: [PATCH]middle-end: Recursively check is_trivially_copyable_or_pair in vec.h

2023-10-06 Thread Jakub Jelinek

On Fri, Oct 06, 2023 at 02:23:06AM +, Tamar Christina wrote:
> gcc/ChangeLog:
> 
>   * tree-if-conv.cc (INCLUDE_ALGORITHM): Remove.
>   (typedef struct ifcvt_arg_entry): New.
>   (cmp_arg_entry): New.
>   (gen_phi_arg_condition, gen_phi_nest_statement,
>   predicate_scalar_phi): Use them.

Ok, thanks.

Jakub

RE: [PATCH]middle-end match.pd: optimize fneg (fabs (x)) to x | (1 << signbit(x)) [PR109154]

2023-10-06 Thread Richard Biener

On Thu, 5 Oct 2023, Tamar Christina wrote:

> > I suppose the idea is that -abs(x) might be easier to optimize with other
> > patterns (consider a - copysign(x,...), optimizing to a + abs(x)).
> > 
> > For abs vs copysign it's a canonicalization, but (negate (abs @0)) is less
> > canonical than copysign.
> > 
> > > Should I try removing this?
> > 
> > I'd say yes (and put the reverse canonicalization next to this pattern).
> > 
> 
> This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
> canonical and allows a target to expand this sequence efficiently.  Such
> sequences are common in scientific code working with gradients.
> 
> various optimizations in match.pd only happened on COPYSIGN but not 
> COPYSIGN_ALL
> which means they exclude IFN_COPYSIGN.  COPYSIGN however is restricted to only

That's not true:

(define_operator_list COPYSIGN
BUILT_IN_COPYSIGNF
BUILT_IN_COPYSIGN
BUILT_IN_COPYSIGNL
IFN_COPYSIGN)

but they miss the extended float builtin variants like
__builtin_copysignf16.  Also see below

> the C99 builtins and so doesn't work for vectors.
> 
> The patch expands these optimizations to work on COPYSIGN_ALL.
> 
> There is an existing canonicalization of copysign (x, -1) to fneg (fabs (x))
> which I remove since this is a less efficient form.  The testsuite is also
> updated in light of this.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/109154
>   * match.pd: Add new neg+abs rule, remove inverse copysign rule and
>   expand existing copysign optimizations.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/109154
>   * gcc.dg/fold-copysign-1.c: Updated.
>   * gcc.dg/pr55152-2.c: Updated.
>   * gcc.dg/tree-ssa/abs-4.c: Updated.
>   * gcc.dg/tree-ssa/backprop-6.c: Updated.
>   * gcc.dg/tree-ssa/copy-sign-2.c: Updated.
>   * gcc.dg/tree-ssa/mult-abs-2.c: Updated.
>   * gcc.target/aarch64/fneg-abs_1.c: New test.
>   * gcc.target/aarch64/fneg-abs_2.c: New test.
>   * gcc.target/aarch64/fneg-abs_3.c: New test.
>   * gcc.target/aarch64/fneg-abs_4.c: New test.
>   * gcc.target/aarch64/sve/fneg-abs_1.c: New test.
>   * gcc.target/aarch64/sve/fneg-abs_2.c: New test.
>   * gcc.target/aarch64/sve/fneg-abs_3.c: New test.
>   * gcc.target/aarch64/sve/fneg-abs_4.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 4bdd83e6e061b16dbdb2845b9398fcfb8a6c9739..bd6599d36021e119f51a4928354f580ffe82c6e2
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -1074,45 +1074,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  
>  /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
>  (for coss (COS COSH)
> - copysigns (COPYSIGN)
> - (simplify
> -  (coss (copysigns @0 @1))
> -   (coss @0)))
> + (for copysigns (COPYSIGN_ALL)

So this ends up generating for example the match
(cosf (copysignl ...)) which doesn't make much sense.

The lock-step iteration did
(cosf (copysignf ..)) ... (ifn_cos (ifn_copysign ...))
which is leaner but misses the case of
(cosf (ifn_copysign ..)) - that's probably what you are
after with this change.

That said, there isn't a nice solution (without altering the match.pd
IL).  There's the explicit solution, spelling out all combinations.

So if we want to go with yout pragmatic solution changing this
to use COPYSIGN_ALL isn't necessary, only changing the lock-step
for iteration to a cross product for iteration is.

Changing just this pattern to

(for coss (COS COSH)
 (for copysigns (COPYSIGN)
  (simplify
   (coss (copysigns @0 @1))
   (coss @0

increases the total number of gimple-match-x.cc lines from
234988 to 235324.

The alternative is to do

(for coss (COS COSH)
 copysigns (COPYSIGN)
 (simplify
  (coss (copysigns @0 @1))
   (coss @0))
 (simplify
  (coss (IFN_COPYSIGN @0 @1))
   (coss @0)))

which properly will diagnose a duplicate pattern.  Ther are
currently no operator lists with just builtins defined (that
could be fixed, see gencfn-macros.cc), supposed we'd have
COS_C we could do

(for coss (COS_C COSH_C IFN_COS IFN_COSH)
 copysigns (COPYSIGN_C COPYSIGN_C IFN_COPYSIGN IFN_COPYSIGN 
IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN IFN_COPYSIGN 
IFN_COPYSIGN)
 (simplify
  (coss (copysigns @0 @1))
   (coss @0)))

which of course still looks ugly ;) (some syntax extension like
allowing to specify IFN_COPYSIGN*8 would be nice here and easy
enough to do)

Can you split out the part changing COPYSIGN to COPYSIGN_ALL,
re-do it to only split the fors, keeping COPYSIGN and provide
some statistics on the gimple-match-* size?  I think this might
be the pragmatic solution for now.

Richard - can you think of a clever way to express the desired
iteration?  How do RTL macro iterations address cases like this?

Richard.

> +  (simplify
> +   (coss (copysigns @0 @1))
> +(coss @0
>  
>  /*

58 matches

Mail list logo