[PATCH] [testsuite] [powerpc] adjust -m32 counts for fold-vec-extract*

2023-05-23 Thread Alexandre Oliva via Gcc-patches


Codegen changes caused add instruction count mismatches on
ppc-*-linux-gnu and other 32-bit ppc targets.  At some point the
expected counts were adjusted for lp64, but ilp32 differences
remained, and published test results confirm it.

Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
with gcc-12.

for  gcc/testsuite/ChangeLog

* gcc.target/powerpc/fold-vec-extract-char.p7.c: Adjust addi
counts for ilp32.
* gcc.target/powerpc/fold-vec-extract-double.p7.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-float.p7.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-float.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-int.p7.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-int.p8.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-short.p7.c: Likewise.
* gcc.target/powerpc/fold-vec-extract-short.p8.c: Likewise.
---
 .../gcc.target/powerpc/fold-vec-extract-char.p7.c  |3 ++-
 .../powerpc/fold-vec-extract-double.p7.c   |2 +-
 .../gcc.target/powerpc/fold-vec-extract-float.p7.c |2 +-
 .../gcc.target/powerpc/fold-vec-extract-float.p8.c |2 +-
 .../gcc.target/powerpc/fold-vec-extract-int.p7.c   |2 +-
 .../gcc.target/powerpc/fold-vec-extract-int.p8.c   |2 +-
 .../gcc.target/powerpc/fold-vec-extract-short.p7.c |2 +-
 .../gcc.target/powerpc/fold-vec-extract-short.p8.c |2 +-
 8 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
index 29a8aa84db282..c6647431d09c9 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p7.c
@@ -11,7 +11,8 @@
 /* one extsb (extend sign-bit) instruction generated for each test against
unsigned types */
 
-/* { dg-final { scan-assembler-times {\maddi\M} 9 } } */
+/* { dg-final { scan-assembler-times {\maddi\M} 9 { target { lp64 } } } } */
+/* { dg-final { scan-assembler-times {\maddi\M} 6 { target { ilp32 } } } } */
 /* { dg-final { scan-assembler-times {\mli\M} 6 } } */
 /* { dg-final { scan-assembler-times {\mstxvw4x\M|\mstvx\M|\mstxv\M} 6 } } */
 /* -m32 target uses rlwinm in place of rldicl. */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
index 3cae644b90b71..db325efbb07ff 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-double.p7.c
@@ -14,7 +14,7 @@
 /* { dg-final { scan-assembler-times {\mli\M} 1 } } */
 /* -m32 target has an 'add' in place of one of the 'addi'. */
 /* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } } } 
*/
-/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } } } 
*/
+/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target ilp32 } } } 
*/
 /* -m32 target has a rlwinm in place of a rldic .  */
 /* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mstxvd2x\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
index 59a4979457dcb..42ec69475fd07 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p7.c
@@ -13,7 +13,7 @@
 /* { dg-final { scan-assembler-times {\mli\M} 1 } } */
 /* -m32 as an add in place of an addi. */
 /* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target lp64 } } } 
*/
-/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 3 { target ilp32 } } } 
*/
+/* { dg-final { scan-assembler-times {\maddi\M|\madd\M} 2 { target ilp32 } } } 
*/
 /* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstvx\M|\mstxv\M} 1 } } */
 /* -m32 uses rlwinm in place of rldic */
 /* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c
index 4b1d75ee26d0f..68de4b307 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-float.p8.c
@@ -26,7 +26,7 @@
 /* { dg-final { scan-assembler-times {\mstxvd2x\M} 1 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\madd\M} 1 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mlfs\M} 1 { target ilp32 } } } */
-/* { dg-final { scan-assembler-times {\maddi\M} 2 { target ilp32 } } } */
+/* { dg-final { scan-assembler-times {\maddi\M} 1 { target ilp32 } } } */
 
 
 #include 
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p7.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p7.c
index 3729a1646e9c9..e8130693ee953 100644
--- 

[PATCH] [libstdc++] [testsuite] xfail to_chars/long_double on x86-vxworks

2023-05-23 Thread Alexandre Oliva via Gcc-patches


Just as on aarch64, x86's wider long double experiences loss of
precision with from_chars implemented in terms of double.  Expect the
execution fail.

Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
with gcc-12.

for  libstdc++-v3/ChangeLog

* testsuite/20_util/to_chars/long_double.cc: Expect execution
fail on x86-vxworks.
---
 .../testsuite/20_util/to_chars/long_double.cc  |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc 
b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
index 263144bd42cba..08363d9d04003 100644
--- a/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
+++ b/libstdc++-v3/testsuite/20_util/to_chars/long_double.cc
@@ -36,7 +36,7 @@
 
 // On systems that use double-precision from_chars for long double,
 // this is expected to fail.
-// { dg-xfail-run-if "from_chars limited to double-precision" { 
aarch64-*-vxworks* } }
+// { dg-xfail-run-if "from_chars limited to double-precision" { 
aarch64-*-vxworks* i*86-*-vxworks* } }
 
 // { dg-require-effective-target ieee_floats }
 // { dg-require-effective-target size32plus }

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] [testsuite] [x86] cope with --enable-frame-pointer

2023-05-23 Thread Alexandre Oliva via Gcc-patches


Various x86 tests fail if the toolchain is configured with
--enable-frame-pointer, because the unexpected extra insns mess with
the expected asm counts.  Add -fomit-frame-pointer so that they can
still pass.

Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
with gcc-12.

for  gcc/testsuite/ChangeLog

* gcc.target/i386/pieces-memcpy-7.c: Add -fomit-frame-pointer.
* gcc.target/i386/pieces-memcpy-8.c: Likewise.
* gcc.target/i386/pieces-memcpy-9.c: Likewise.
* gcc.target/i386/pieces-memset-1.c: Likewise.
* gcc.target/i386/pieces-memset-36.c: Likewise.
* gcc.target/i386/pieces-memset-4.c: Likewise.
* gcc.target/i386/pieces-memset-40.c: Likewise.
* gcc.target/i386/pieces-memset-41.c: Likewise.
* gcc.target/i386/pieces-memset-7.c: Likewise.
* gcc.target/i386/pieces-memset-8.c: Likewise.
* gcc.target/i386/pieces-memset-9.c: Likewise.
* gcc.target/i386/pr102230.c: Likewise.
* gcc.target/i386/pr78103-2.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c  |2 ++
 gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c  |2 ++
 gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c  |2 ++
 gcc/testsuite/gcc.target/i386/pieces-memset-1.c  |2 ++
 gcc/testsuite/gcc.target/i386/pieces-memset-36.c |2 ++
 gcc/testsuite/gcc.target/i386/pieces-memset-4.c  |2 ++
 gcc/testsuite/gcc.target/i386/pieces-memset-40.c |2 ++
 gcc/testsuite/gcc.target/i386/pieces-memset-41.c |2 ++
 gcc/testsuite/gcc.target/i386/pieces-memset-7.c  |2 ++
 gcc/testsuite/gcc.target/i386/pieces-memset-8.c  |2 ++
 gcc/testsuite/gcc.target/i386/pieces-memset-9.c  |2 ++
 gcc/testsuite/gcc.target/i386/pr102230.c |2 ++
 gcc/testsuite/gcc.target/i386/pr78103-2.c|2 ++
 13 files changed, 26 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c 
b/gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c
index 3d248d447ea42..64fd8b4176cec 100644
--- a/gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-7.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic" } */
+/* Cope with --enable-frame-pointer.  */
+/* { dg-additional-options "-fomit-frame-pointer" } */
 
 void
 foo (int a1, int a2, int a3, int a4, int a5, int a6, char *dst, char *src)
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c 
b/gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c
index c13a2beb2f017..fc60c46c58900 100644
--- a/gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-8.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mno-avx2 -mavx -mtune=generic" } */
+/* Cope with --enable-frame-pointer.  */
+/* { dg-additional-options "-fomit-frame-pointer" } */
 
 void
 foo (int a1, int a2, int a3, int a4, int a5, int a6, char *dst, char *src)
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c 
b/gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c
index 238f88b275eb7..62fcb6f569204 100644
--- a/gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memcpy-9.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512f -mtune=generic" } */
+/* Cope with --enable-frame-pointer.  */
+/* { dg-additional-options "-fomit-frame-pointer" } */
 
 void
 foo (int a1, int a2, int a3, int a4, int a5, int a6, char *dst, char *src)
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-1.c 
b/gcc/testsuite/gcc.target/i386/pieces-memset-1.c
index f7487ba9c5b28..0002c6838ab76 100644
--- a/gcc/testsuite/gcc.target/i386/pieces-memset-1.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-1.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic -mno-stackrealign" } */
+/* Cope with --enable-frame-pointer.  */
+/* { dg-additional-options "-fomit-frame-pointer" } */
 
 extern char *dst;
 
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-36.c 
b/gcc/testsuite/gcc.target/i386/pieces-memset-36.c
index d1f1263c7b211..d1bbfa204a7f8 100644
--- a/gcc/testsuite/gcc.target/i386/pieces-memset-36.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-36.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mno-avx512f -mavx2 -mtune=generic" } */
+/* Cope with --enable-frame-pointer.  */
+/* { dg-additional-options "-fomit-frame-pointer" } */
 
 extern char *dst;
 
diff --git a/gcc/testsuite/gcc.target/i386/pieces-memset-4.c 
b/gcc/testsuite/gcc.target/i386/pieces-memset-4.c
index a12b9dda28bd3..8b3f3b00214f8 100644
--- a/gcc/testsuite/gcc.target/i386/pieces-memset-4.c
+++ b/gcc/testsuite/gcc.target/i386/pieces-memset-4.c
@@ -1,5 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O2 -mno-avx -msse2 -mtune=generic -mno-stackrealign" } */
+/* Cope with --enable-frame-pointer.  */
+/* { dg-additional-options "-fomit-frame-pointer" } */
 
 extern char *dst;
 
diff --git 

[PATCH] [x86] reenable dword MOVE_MAX for better memmove inlining

2023-05-23 Thread Alexandre Oliva via Gcc-patches


MOVE_MAX on x86* used to accept up to 16 bytes, even without SSE,
which enabled inlining of small memmove by loading and then storing
the entire range.  After the "x86: Update piecewise move and store"
r12-2666 change, memmove of more than 4 bytes would not be inlined in
gimple_fold_bultin_memory_op, failing the expectations of a few tests.

I can see how lowering it for MOVE_MAX_PIECES can get us better
codegen decisions overall, but surely inlining memmove with 2 32-bit
loads and stores is better than an outline call that requires setting
up 3 arguments.  I suppose even 3 or 4 could do better.  But maybe it
is gimple_fold_builtin_memory_op that needs tweaking?

Anyhow, this patch raises MOVE_MAX back a little for non-SSE targets,
while preserving the new value for MOVE_MAX_PIECES.

Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
with gcc-12.

for gcc/ChangeLog

* config/i386/i386.h (MOVE_MAX): Rename to...
(MOVE_MAX_VEC): ... this.  Add NONVEC parameter, and use it as
the last resort, instead of UNITS_PER_WORD.
(MOVE_MAX): Reintroduce in terms of MOVE_MAX_VEC, with
2*UNITS_PER_WORD.
(MOVE_MAX_PIECES): Likewise, but with UNITS_PER_WORD.
---
 gcc/config/i386/i386.h |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index c7439f89bdf92..5293a332a969a 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1801,7 +1801,9 @@ typedef struct ix86_args {
is the number of bytes at a time which we can move efficiently.
MOVE_MAX_PIECES defaults to MOVE_MAX.  */
 
-#define MOVE_MAX \
+#define MOVE_MAX MOVE_MAX_VEC (2 * UNITS_PER_WORD)
+#define MOVE_MAX_PIECES MOVE_MAX_VEC (UNITS_PER_WORD)
+#define MOVE_MAX_VEC(NONVEC) \
   ((TARGET_AVX512F \
 && (ix86_move_max == PVW_AVX512 \
|| ix86_store_max == PVW_AVX512)) \
@@ -1813,7 +1815,7 @@ typedef struct ix86_args {
   : ((TARGET_SSE2 \
  && TARGET_SSE_UNALIGNED_LOAD_OPTIMAL \
  && TARGET_SSE_UNALIGNED_STORE_OPTIMAL) \
-? 16 : UNITS_PER_WORD)))
+? 16 : (NONVEC
 
 /* STORE_MAX_PIECES is the number of bytes at a time that we can store
efficiently.  Allow 16/32/64 bytes only if inter-unit move is enabled

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] [testsuite] [ppc] xfail uninit-pred-9_b bogus warn on ppc32 too

2023-05-23 Thread Alexandre Oliva via Gcc-patches


The bogus warning is present on 32-bit ppc-vx7r2 too, so drop the 64
from the powerpc xfail triplet.

Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
with gcc-12.

for  gcc/testsuite/ChangeLog

* gcc.dg/uninit-pred-9_b.c: Xfail bogus warning on 32-bit ppc
as well.
---
 gcc/testsuite/gcc.dg/uninit-pred-9_b.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c 
b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
index c8f427b12c0ab..0f508fa56e1c5 100644
--- a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
+++ b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c
@@ -17,7 +17,7 @@ int foo (int n, int l, int m, int r)
 
   if (l > 100)
 if ( (n <= 9) &&  (m < 100)  && (r < 19) )
-  blah(v); /* { dg-bogus "uninitialized" "bogus warning" { xfail 
powerpc64*-*-* cris-*-* riscv*-*-* } } */
+  blah(v); /* { dg-bogus "uninitialized" "bogus warning" { xfail 
powerpc*-*-* cris-*-* riscv*-*-* } } */
 
   if ( (n <= 8) &&  (m < 99)  && (r < 19) )
   blah(v); /* { dg-bogus "uninitialized" "pr101674" { xfail mmix-*-* } } */

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] [testsuite] [i386] enable sse2 for signbit-2.c

2023-05-23 Thread Hongtao Liu via Gcc-patches
On Wed, May 24, 2023 at 1:24 PM Alexandre Oliva via Gcc-patches
 wrote:
>
>
> The expected results for signbit-2 only arise on x86 with avx512f
> disabled and sse2 enabled.  The patch already disables avx512f
> explicitly, but it fails to enable sse2.
>
> Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
> with gcc-12.
>
> for  gcc/testsuite/ChangeLog
>
> * gcc.dg/signbit-2.c: Add -msse2 on x86.
Ok.
> ---
>  gcc/testsuite/gcc.dg/signbit-2.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.dg/signbit-2.c 
> b/gcc/testsuite/gcc.dg/signbit-2.c
> index d7b406effc62d..62bb4047d7421 100644
> --- a/gcc/testsuite/gcc.dg/signbit-2.c
> +++ b/gcc/testsuite/gcc.dg/signbit-2.c
> @@ -2,7 +2,7 @@
>  /* { dg-options "-O3 -fdump-tree-optimized" } */
>
>  /* This test does not work when the truth type does not match vector type.  
> */
> -/* { dg-additional-options "-mno-avx512f" { target { i?86-*-* x86_64-*-* } } 
> } */
> +/* { dg-additional-options "-msse2 -mno-avx512f" { target { i?86-*-* 
> x86_64-*-* } } } */
>  /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
>  /* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */
>  /* { dg-skip-if "no fallback for MVE" { arm_mve } } */
>
> --
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 



-- 
BR,
Hongtao


Re: [PATCH v2] [PR100106] Reject unaligned subregs when strict alignment is required

2023-05-23 Thread Alexandre Oliva via Gcc-patches
On May  5, 2022, Alexandre Oliva  wrote:

> for  gcc/ChangeLog

>   PR target/100106
>   * emit-rtl.cc (validate_subreg): Reject a SUBREG of a MEM that
>   requires stricter alignment than MEM's.

> for  gcc/testsuite/ChangeLog

>   PR target/100106
>   * gcc.target/powerpc/pr100106-sa.c: New.

Ping?
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594166.html

The testcase variant was approved, but the reformatted patch is still
pending review, despite support from Vlad Makarov for the original one;
the suggested separate followup patch, mentioned in the linked email,
turned out to be far more involved than anticipated, and needs further
work, but it's independent from this self-contained fix.

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-23 Thread Kewen.Lin via Gcc-patches
on 2023/5/24 06:30, Peter Bergner wrote:
> On 5/23/23 12:24 AM, Kewen.Lin wrote:
>> on 2023/5/23 01:31, Carl Love wrote:
>>> The builtins were requested for use in GLibC.  As of version 2.31 they
>>> were added as inline asm.  They requested a builtin so the asm could be
>>> removed.
>>
>> So IMHO we also want the similar support for mffscrn, that is to make
>> use of mffscrn and mffscrni on Power9 and later, but falls back to 
>> __builtin_set_fpscr_rn + mffs similar on older platforms.
> 
> So __builtin_set_fpscr_rn everything we want (sets the RN bits) and
> uses mffscrn/mffscrni on P9 and later and uses older insns on pre-P9.
> The only problem is we don't return the current FPSCR bits, as the bif
> is defined to return void.

Yes.

> Crazy idea, but could we extend the built-in
> with an overload that returns the FPSCR bits?  

So you agree that we should make this proposed new bif handle pre-P9 just
like some other existing bifs. :)  I think extending it is good and doable,
but the only concern here is the bif name "__builtin_set_fpscr_rn", which
matches the existing behavior (only set rounding) but doesn't match the
proposed extending behavior (set rounding and get some env bits back).
Maybe it's not a big deal if the documentation clarify it well.


> To be honest, I like
> the __builtin_set_fpscr_rn name better than __builtin_mffscrn[i].

+1

BR,
Kewen

> The built-in machinery can see that the usage is expecting a return value
> or not and for the pre-P9 code, can skip generating the ending mffs if
> we don't want the return value.
> 
> Peter
> 
>


Re: [PATCH] Check for sysconf decl on vxworks

2023-05-23 Thread Olivier Hainque via Gcc-patches


Good for me, thanks Alex!


> On 24 May 2023, at 07:08, Alexandre Oliva  wrote:
> 
> 
> The sysconf function is only available in rtp mode on vxworks.  In
> kernel mode, it is not even declared, but the feature test macro in
> the testsuite doesn't notice its absence because it's a link test, and
> vxworks kernel mode uses partial linking.
> 
> This patch introduces an alternate test on vxworks targets to check
> for a declaration and for an often-used sysconf parameter.
> 
> Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
> with gcc-12.
> 
> 
> for  gcc/testsuite/ChangeLog
> 
>   * lib/target-supports.exp (check_effective_target_sysconf):
>   Check for declaration and _SC_PAGESIZE on vxworks.
> ---
> gcc/testsuite/lib/target-supports.exp |   11 +++
> 1 file changed, 11 insertions(+)
> 
> diff --git a/gcc/testsuite/lib/target-supports.exp 
> b/gcc/testsuite/lib/target-supports.exp
> index bd9f432e4a761..263ef35a2e4df 100644
> --- a/gcc/testsuite/lib/target-supports.exp
> +++ b/gcc/testsuite/lib/target-supports.exp
> @@ -1146,6 +1146,17 @@ proc check_effective_target_mmap {} {
> # Return 1 if the target supports sysconf, 0 otherwise.
> 
> proc check_effective_target_sysconf {} {
> +# VxWorks has sysconf in rtp mode only, but our way to test can't
> +# tell kernel mode doesn't, as we're doing partial links for
> +# kernel modules.  We can tell by checking for a declaration, or
> +# for some sysconf parm, because configurations that don't offer
> +# sysconf don't have either.
> +if { [istarget *-*-vxworks*] } {
> + return [check_no_compiler_messages sysconfdecl assembly {
> + #include 
> + int f() { return sysconf(_SC_PAGESIZE); }
> + }];
> +}
> return [check_function_available "sysconf"]
> }
> 
> 
> 
> -- 
> Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
>   Free Software Activist   GNU Toolchain Engineer
> Disinformation flourishes because many people care deeply about injustice
> but very few check the facts.  Ask me about 



[PATCH] [testsuite] [i386] enable sse2 for signbit-2.c

2023-05-23 Thread Alexandre Oliva via Gcc-patches


The expected results for signbit-2 only arise on x86 with avx512f
disabled and sse2 enabled.  The patch already disables avx512f
explicitly, but it fails to enable sse2.

Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
with gcc-12.

for  gcc/testsuite/ChangeLog

* gcc.dg/signbit-2.c: Add -msse2 on x86.
---
 gcc/testsuite/gcc.dg/signbit-2.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c
index d7b406effc62d..62bb4047d7421 100644
--- a/gcc/testsuite/gcc.dg/signbit-2.c
+++ b/gcc/testsuite/gcc.dg/signbit-2.c
@@ -2,7 +2,7 @@
 /* { dg-options "-O3 -fdump-tree-optimized" } */
 
 /* This test does not work when the truth type does not match vector type.  */
-/* { dg-additional-options "-mno-avx512f" { target { i?86-*-* x86_64-*-* } } } 
*/
+/* { dg-additional-options "-msse2 -mno-avx512f" { target { i?86-*-* 
x86_64-*-* } } } */
 /* { dg-additional-options "-march=armv8-a" { target aarch64_sve } } */
 /* { dg-additional-options "-maltivec" { target powerpc_altivec_ok } } */
 /* { dg-skip-if "no fallback for MVE" { arm_mve } } */

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH 2/2] vect: Enhance cost evaluation in vect_transform_slp_perm_load_1

2023-05-23 Thread Kewen.Lin via Gcc-patches
on 2023/5/23 14:19, Richard Biener wrote:
> On Tue, May 23, 2023 at 5:01 AM Kewen.Lin  wrote:
>>
>> Hi Richi,
>>
>> Thanks for the review!
>>
>> on 2023/5/22 21:44, Richard Biener wrote:
>>> On Wed, May 17, 2023 at 8:15 AM Kewen.Lin  wrote:

 Hi,

 Following Richi's suggestion in [1], I'm working on deferring
 cost evaluation next to the transformation, this patch is
 to enhance function vect_transform_slp_perm_load_1 which
 could under-cost for vector permutation, since the costing
 doesn't try to consider nvectors_per_build, it's inconsistent
 with the transformation part.

 Bootstrapped and regtested on x86_64-redhat-linux,
 aarch64-linux-gnu and powerpc64{,le}-linux-gnu.

 Is it ok for trunk?

 [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563624.html

 BR,
 Kewen
 -
 gcc/ChangeLog:

 * tree-vect-slp.cc (vect_transform_slp_perm_load_1): Adjust the
 calculation on n_perms by considering nvectors_per_build.

 gcc/testsuite/ChangeLog:

 * gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c: New test.
 ---
  .../vect/costmodel/ppc/costmodel-slp-perm.c   | 23 +++
  gcc/tree-vect-slp.cc  | 66 ++-
  2 files changed, 57 insertions(+), 32 deletions(-)
  create mode 100644 
 gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c

 diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c 
 b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
 new file mode 100644
 index 000..e5c4dceddfb
 --- /dev/null
 +++ b/gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-slp-perm.c
 @@ -0,0 +1,23 @@
 +/* { dg-do compile } */
 +/* { dg-require-effective-target vect_int } */
 +/* { dg-require-effective-target powerpc_p9vector_ok } */
 +/* Specify power9 to ensure the vectorization is profitable
 +   and test point stands, otherwise it could be not profitable
 +   to vectorize.  */
 +/* { dg-additional-options "-mdejagnu-cpu=power9 -mpower9-vector" } */
 +
 +/* Verify we cost the exact count for required vec_perm.  */
 +
 +int x[1024], y[1024];
 +
 +void
 +foo ()
 +{
 +  for (int i = 0; i < 512; ++i)
 +{
 +  x[2 * i] = y[1023 - (2 * i)];
 +  x[2 * i + 1] = y[1023 - (2 * i + 1)];
 +}
 +}
 +
 +/* { dg-final { scan-tree-dump-times "2 times vec_perm" 1 "vect" } } */
 diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
 index e5c9d7e766e..af9a6dd4fa9 100644
 --- a/gcc/tree-vect-slp.cc
 +++ b/gcc/tree-vect-slp.cc
 @@ -8115,12 +8115,12 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, 
 slp_tree node,

mode = TYPE_MODE (vectype);
poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
 +  unsigned int nstmts = SLP_TREE_NUMBER_OF_VEC_STMTS (node);

/* Initialize the vect stmts of NODE to properly insert the generated
   stmts later.  */
if (! analyze_only)
 -for (unsigned i = SLP_TREE_VEC_STMTS (node).length ();
 -i < SLP_TREE_NUMBER_OF_VEC_STMTS (node); i++)
 +for (unsigned i = SLP_TREE_VEC_STMTS (node).length (); i < nstmts; 
 i++)
SLP_TREE_VEC_STMTS (node).quick_push (NULL);

/* Generate permutation masks for every NODE. Number of masks for each 
 NODE
 @@ -8161,7 +8161,10 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, 
 slp_tree node,
  (b) the permutes only need a single vector input.  */
mask.new_vector (nunits, group_size, 3);
nelts_to_build = mask.encoded_nelts ();
 -  nvectors_per_build = SLP_TREE_VEC_STMTS (node).length ();
 +  /* It's possible to obtain zero nstmts during analyze_only, so make
 +it at least one to ensure the later computation for n_perms
 +proceed.  */
 +  nvectors_per_build = nstmts > 0 ? nstmts : 1;
in_nlanes = DR_GROUP_SIZE (stmt_info) * 3;
  }
else
 @@ -8252,40 +8255,39 @@ vect_transform_slp_perm_load_1 (vec_info *vinfo, 
 slp_tree node,
   return false;
 }

 - ++*n_perms;
 -
 + tree mask_vec = NULL_TREE;
   if (!analyze_only)
 -   {
 - tree mask_vec = vect_gen_perm_mask_checked (vectype, 
 indices);
 +   mask_vec = vect_gen_perm_mask_checked (vectype, indices);

 - if (second_vec_index == -1)
 -   second_vec_index = first_vec_index;
 + if (second_vec_index == -1)
 +   second_vec_index = first_vec_index;

 - for (unsigned int ri = 0; ri < nvectors_per_build; ++ri)
 + for (unsigned int 

[PATCH] [testsuite] require profiling for -pg

2023-05-23 Thread Alexandre Oliva via Gcc-patches


Fix two tests that use -pg but don't declare their requirement for
profiling support.

Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
with gcc-12.

for  gcc/testsuite/ChangeLog

* gcc.target/i386/mcount_pic.c: Add dg-require-profiling.
* gcc.target/i386/pr104447: Likewise.
---
 gcc/testsuite/gcc.target/i386/mcount_pic.c |1 +
 gcc/testsuite/gcc.target/i386/pr104447.c   |1 +
 2 files changed, 2 insertions(+)

diff --git a/gcc/testsuite/gcc.target/i386/mcount_pic.c 
b/gcc/testsuite/gcc.target/i386/mcount_pic.c
index 5546933d1946d..732be66b7b33d 100644
--- a/gcc/testsuite/gcc.target/i386/mcount_pic.c
+++ b/gcc/testsuite/gcc.target/i386/mcount_pic.c
@@ -3,6 +3,7 @@
 /* { dg-do run } */
 /* { dg-require-effective-target fpic } */
 /* { dg-require-effective-target ia32 } */
+/* { dg-require-profiling "-pg" } */
 /* { dg-options "-O2 -fpic -pg -save-temps" } */
 
 int main ()
diff --git a/gcc/testsuite/gcc.target/i386/pr104447.c 
b/gcc/testsuite/gcc.target/i386/pr104447.c
index bf11e8696e68f..cb618c7b8bb32 100644
--- a/gcc/testsuite/gcc.target/i386/pr104447.c
+++ b/gcc/testsuite/gcc.target/i386/pr104447.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-profiling "-pg" } */
 /* { dg-options "-O2 -pg" } */
 
 int

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] [testsuite] require pthread for openmp

2023-05-23 Thread Alexandre Oliva via Gcc-patches


Fix test that uses -fopenmp without declaring requirement for pthread
support.

Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
with gcc-12.

for  gcc/testsuite/ChangeLog

* g++.dg/pr80481.C: Add explicit pthread requirement.
---
 gcc/testsuite/g++.dg/pr80481.C |2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/testsuite/g++.dg/pr80481.C b/gcc/testsuite/g++.dg/pr80481.C
index 78c463b8e3b58..3a8869914634f 100644
--- a/gcc/testsuite/g++.dg/pr80481.C
+++ b/gcc/testsuite/g++.dg/pr80481.C
@@ -1,4 +1,6 @@
 // { dg-do compile { target { i?86-*-* x86_64-*-* }  && { ! *-*-solaris* } } }
+// -fopenmp implies -pthread
+// { dg-require-effective-target pthread } 
 // { dg-options "-Ofast -funroll-loops -fopenmp -march=knl" }
 // Disabling epilogues until we find a better way to deal with scans.
 // { dg-additional-options "--param vect-epilogues-nomask=0" }

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] [testsuite] require pic for pr103074.c

2023-05-23 Thread Alexandre Oliva via Gcc-patches


Fix test that uses -fPIC without stating the requirement for PIC
support.

Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
with gcc-12.

for  gcc/testsuite/ChangeLog

* gcc.target/i386/pr103074.c: Require fpic support.
---
 gcc/testsuite/gcc.target/i386/pr103074.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.target/i386/pr103074.c 
b/gcc/testsuite/gcc.target/i386/pr103074.c
index 276ad82a1de1e..668d9531096dc 100644
--- a/gcc/testsuite/gcc.target/i386/pr103074.c
+++ b/gcc/testsuite/gcc.target/i386/pr103074.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target fpic } */
 /* { dg-options "-march=bonnell -Os -fPIC -fschedule-insns -w" } */
 
 void

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] [testsuite] tsvc: skip include malloc.h when unavailable

2023-05-23 Thread Alexandre Oliva via Gcc-patches


tsvc tests all fail on systems that don't offer a malloc.h, other than
those that explicitly rule that out.  Use the preprocessor to test for
malloc.h's availability.

tsvc.h also expects a definition for struct timeval, but it doesn't
include sys/time.h.  Add a conditional include thereof.

Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
with gcc-12.


for  gcc/testsuite/ChangeLog

* gcc.dg/vect/tsvc/tsvc.h: Test for and conditionally include
malloc.h and sys/time.h.

---
 gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h 
b/gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h
index 75494c24cfa62..cd39c041903dd 100644
--- a/gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h
+++ b/gcc/testsuite/gcc.dg/vect/tsvc/tsvc.h
@@ -11,9 +11,12 @@
 
 #include 
 #include 
-#if !defined(__APPLE__) && !defined(__DragonFly__)
+#if __has_include()
 #include 
 #endif
+#if __has_include()
+#include 
+#endif
 #include 
 #include 
 

-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


[PATCH] Check for sysconf decl on vxworks

2023-05-23 Thread Alexandre Oliva via Gcc-patches


The sysconf function is only available in rtp mode on vxworks.  In
kernel mode, it is not even declared, but the feature test macro in
the testsuite doesn't notice its absence because it's a link test, and
vxworks kernel mode uses partial linking.

This patch introduces an alternate test on vxworks targets to check
for a declaration and for an often-used sysconf parameter.

Bootstrapped on x86_64-linux-gnu.  Also tested on ppc- and x86-vx7r2
with gcc-12.


for  gcc/testsuite/ChangeLog

* lib/target-supports.exp (check_effective_target_sysconf):
Check for declaration and _SC_PAGESIZE on vxworks.
---
 gcc/testsuite/lib/target-supports.exp |   11 +++
 1 file changed, 11 insertions(+)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index bd9f432e4a761..263ef35a2e4df 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1146,6 +1146,17 @@ proc check_effective_target_mmap {} {
 # Return 1 if the target supports sysconf, 0 otherwise.
 
 proc check_effective_target_sysconf {} {
+# VxWorks has sysconf in rtp mode only, but our way to test can't
+# tell kernel mode doesn't, as we're doing partial links for
+# kernel modules.  We can tell by checking for a declaration, or
+# for some sysconf parm, because configurations that don't offer
+# sysconf don't have either.
+if { [istarget *-*-vxworks*] } {
+   return [check_no_compiler_messages sysconfdecl assembly {
+   #include 
+   int f() { return sysconf(_SC_PAGESIZE); }
+   }];
+}
 return [check_function_available "sysconf"]
 }
 


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
Disinformation flourishes because many people care deeply about injustice
but very few check the facts.  Ask me about 


Re: [PATCH] Replace __gnu_cxx::__ops::__negate with std::not_fn

2023-05-23 Thread François Dumont via Gcc-patches



On 22/05/2023 22:55, Jonathan Wakely wrote:



On Mon, 22 May 2023 at 21:51, François Dumont via Libstdc++ 
mailto:libstdc%2b...@gcc.gnu.org>> wrote:


I was thinking that it might be nice to get rid of
predefined_ops.h content.

So here is a start with __negate. Drawback is that stl_algo.h has to
include .


We definitely don't want that. std::not_fn could be move to its own 
header.


But I'm not sure this is a good change anyway, as we can't do it 
unconditionally. Pre-C++17 code would still be using the 
predefined_ops.h function objects, so we can't remove that code. And 
we'll get template bloat from instantiating the algos twice, once with 
the old function objects and once with std::not_fn.


True, what do you advise then ? Should I just forget about it ? 
Introduce a std::__not_fn for pre-C++17 ?


I am studying this last proposal, let me know if it is just impossible 
or a waste of time.




For now I just get rid of stl_algo.h include in
 to rather use stl_algobase.h. But maybe it would be
better
to also isolate std::not_fn in a dedicated header file so that
stl_algo.h do not have to include all .

 libstdc++: Replace __gnu_cxx::__ops::__negate with std::not_fn

 Replace the internal __gnu_cxx::__ops::__negate function and
associated
 __gnu_cxx::__ops::_Iter_negate by the C++17 std::not_fn.

 libstdc++-v3/ChangeLog:

 * include/bits/predefined_ops.h: Include .


No, please don't include  anywhere. If you do that, it means 
 now defines every feature test macro in the entire 
library, which makes it look like you can get smart pointers and 
ranges and constexpr math all from .


Ok, I wasn't aware about the interest of . I see now, limited 
to user code.


I'm testing only the move of std::search to stl_algobase.h to avoid 
stl_algo.h include in . I'll submit it later.


RE: [PATCH V5] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Wednesday, May 24, 2023 11:35 AM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; kito.ch...@gmail.com; pal...@dabbelt.com; 
pal...@rivosinc.com; jeffreya...@gmail.com; rdapp@gmail.com; Richard 
Sandiford 
Subject: Re: [PATCH V5] RISC-V: Add RVV comparison autovectorization

LGTM

On Wed, May 24, 2023 at 11:29 AM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch enable RVV auto-vectorization including floating-point 
> unorder and order comparison.
>
> The testcases are leveraged from Richard.
> So include Richard as co-author.
>
> And this patch is the prerequisite patch for my current middle-end work.
> Without this patch, I can't support len_mask_xxx middle-end pattern 
> since the mask is generated by comparison.
>
> For example,
> for (int i...; i < n.)
>   if (cond[i])
>  a[i] = b[i]
>
> We need len_mask_load/len_mask_store for such code and I am gonna 
> support them in the middle-end after this patch is merged.
>
> Both integer && floating (order and unorder) are tested.
> built && regression passed.
>
> Ok for trunk?
>
> Thanks.
>
> Co-Authored-By: Richard Sandiford 
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (@vcond_mask_): New pattern.
> (vec_cmp): New pattern.
> (vec_cmpu): New pattern.
> (vcond): New pattern.
> (vcondu): New pattern.
> * config/riscv/riscv-protos.h (enum insn_type): Add new enum.
> (emit_vlmax_merge_insn): New function.
> (emit_vlmax_cmp_insn): Ditto.
> (emit_vlmax_cmp_mu_insn): Ditto.
> (expand_vec_cmp): Ditto.
> (expand_vec_cmp_float): Ditto.
> (expand_vcond): Ditto.
> * config/riscv/riscv-v.cc (emit_vlmax_merge_insn): Ditto.
> (emit_vlmax_cmp_insn): Ditto.
> (emit_vlmax_cmp_mu_insn): Ditto.
> (get_cmp_insn_code): Ditto.
> (expand_vec_cmp): Ditto.
> (expand_vec_cmp_float): Ditto.
> (expand_vcond): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp:
> * gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/vcond-2.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/vcond-3.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   | 112 
>  gcc/config/riscv/riscv-protos.h   |   9 +
>  gcc/config/riscv/riscv-v.cc   | 255 ++
>  .../riscv/rvv/autovec/cmp/vcond-1.c   | 157 +++
>  .../riscv/rvv/autovec/cmp/vcond-2.c   |  75 ++
>  .../riscv/rvv/autovec/cmp/vcond-3.c   |  13 +
>  .../riscv/rvv/autovec/cmp/vcond_run-1.c   |  49 
>  .../riscv/rvv/autovec/cmp/vcond_run-2.c   |  76 ++
>  .../riscv/rvv/autovec/cmp/vcond_run-3.c   |   6 +
>  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
>  10 files changed, 754 insertions(+)
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-3.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md 
> index 7c87b6012f6..4eeeab624a4 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -162,3 +162,115 @@
>  riscv_vector::RVV_BINOP, operands);
>DONE;
>  })
> +
> +;; 
> +=
> +
> +;; == Comparisons and selects
> +;; 
> +=
> +
> +
> +;; 
> +-
> + ;;  [INT,FP] Select based on masks ;; 
> +-
> +
> +;; Includes merging patterns for:
> +;; - vmerge.vv
> +;; - vmerge.vx
> +;; - vfmerge.vf
> +;; 
> +-
> +
> +
> +(define_expand "@vcond_mask_"
> +  [(match_operand:V 0 "register_operand")
> +   (match_operand: 3 "register_operand")
> +   (match_operand:V 1 "nonmemory_operand")
> +   (match_operand:V 2 "register_operand")]
> +  "TARGET_VECTOR"
> +  {
> +/* The order of vcond_mask is opposite to pred_merge.  */
> +std::swap (operands[1], operands[2]);
> +

Re: [PATCH V5] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread Kito Cheng via Gcc-patches
LGTM

On Wed, May 24, 2023 at 11:29 AM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch enable RVV auto-vectorization including floating-point
> unorder and order comparison.
>
> The testcases are leveraged from Richard.
> So include Richard as co-author.
>
> And this patch is the prerequisite patch for my current middle-end work.
> Without this patch, I can't support len_mask_xxx middle-end pattern since
> the mask is generated by comparison.
>
> For example,
> for (int i...; i < n.)
>   if (cond[i])
>  a[i] = b[i]
>
> We need len_mask_load/len_mask_store for such code and I am gonna support them
> in the middle-end after this patch is merged.
>
> Both integer && floating (order and unorder) are tested.
> built && regression passed.
>
> Ok for trunk?
>
> Thanks.
>
> Co-Authored-By: Richard Sandiford 
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (@vcond_mask_): New pattern.
> (vec_cmp): New pattern.
> (vec_cmpu): New pattern.
> (vcond): New pattern.
> (vcondu): New pattern.
> * config/riscv/riscv-protos.h (enum insn_type): Add new enum.
> (emit_vlmax_merge_insn): New function.
> (emit_vlmax_cmp_insn): Ditto.
> (emit_vlmax_cmp_mu_insn): Ditto.
> (expand_vec_cmp): Ditto.
> (expand_vec_cmp_float): Ditto.
> (expand_vcond): Ditto.
> * config/riscv/riscv-v.cc (emit_vlmax_merge_insn): Ditto.
> (emit_vlmax_cmp_insn): Ditto.
> (emit_vlmax_cmp_mu_insn): Ditto.
> (get_cmp_insn_code): Ditto.
> (expand_vec_cmp): Ditto.
> (expand_vec_cmp_float): Ditto.
> (expand_vcond): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp:
> * gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/vcond-2.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/vcond-3.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   | 112 
>  gcc/config/riscv/riscv-protos.h   |   9 +
>  gcc/config/riscv/riscv-v.cc   | 255 ++
>  .../riscv/rvv/autovec/cmp/vcond-1.c   | 157 +++
>  .../riscv/rvv/autovec/cmp/vcond-2.c   |  75 ++
>  .../riscv/rvv/autovec/cmp/vcond-3.c   |  13 +
>  .../riscv/rvv/autovec/cmp/vcond_run-1.c   |  49 
>  .../riscv/rvv/autovec/cmp/vcond_run-2.c   |  76 ++
>  .../riscv/rvv/autovec/cmp/vcond_run-3.c   |   6 +
>  gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
>  10 files changed, 754 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-2.c
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-3.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 7c87b6012f6..4eeeab624a4 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -162,3 +162,115 @@
>  riscv_vector::RVV_BINOP, operands);
>DONE;
>  })
> +
> +;; =
> +;; == Comparisons and selects
> +;; =
> +
> +;; -
> +;;  [INT,FP] Select based on masks
> +;; -
> +;; Includes merging patterns for:
> +;; - vmerge.vv
> +;; - vmerge.vx
> +;; - vfmerge.vf
> +;; -
> +
> +(define_expand "@vcond_mask_"
> +  [(match_operand:V 0 "register_operand")
> +   (match_operand: 3 "register_operand")
> +   (match_operand:V 1 "nonmemory_operand")
> +   (match_operand:V 2 "register_operand")]
> +  "TARGET_VECTOR"
> +  {
> +/* The order of vcond_mask is opposite to pred_merge.  */
> +std::swap (operands[1], operands[2]);
> +riscv_vector::emit_vlmax_merge_insn (code_for_pred_merge (mode),
> +   riscv_vector::RVV_MERGE_OP, operands);
> +DONE;
> +  }
> +)
> +
> +;; -
> +;;  [INT,FP] Comparisons
> +;; -
> +;; Includes:
> +;; - vms.
> +;; -
> +
> 

RE: [PATCH] RISC-V: Support RVV VREINTERPRET from vbool*_t to vuint*m1_t

2023-05-23 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Wednesday, May 24, 2023 11:22 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
Yanzhang 
Subject: Re: [PATCH] RISC-V: Support RVV VREINTERPRET from vbool*_t to 
vuint*m1_t

ok

On Thu, May 18, 2023 at 2:32 PM Pan Li via Gcc-patches 
 wrote:
>
> From: Pan Li 
>
> This patch support the RVV VREINTERPRET from the vbool*_t to the 
> vuint*m1_t.  Aka:
>
> vuint*m1_t __riscv_vreinterpret_x_x(vbool*_t);
>
> These APIs help the users to convert vector the vbool*_t to the LMUL=1 
> unsigned integer vint*_t.  According to the RVV intrinsic SPEC as 
> below, the reinterpret intrinsics only change the types of the underlying 
> contents.
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-int
> rinsic-rfc.md#reinterpret-vbool-o-vintm1
>
> For example, given below code.
> vuint8m1_t test_vreinterpret_v_b1_vuint8m1 (vbool1_t src) {
>   return __riscv_vreinterpret_v_b1_u8m1 (src); }
>
> It will generate the assembly code similar as below:
> vsetvli a5,zero,e8,m8,ta,ma
> vlm.v   v1,0(a1)
> vs1r.v  v1,0(a0)
> ret
>
> Please NOTE the test files doesn't cover all the possible combinations 
> of the intrinsic APIs introduced by this PATCH due to too many.
> This is the last PATCH for the reinterpret between the signed/unsigned 
> and the bool vector types.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/genrvv-type-indexer.cc (main): Add
> unsigned_eew*_lmul1_interpret for indexer.
> * config/riscv/riscv-vector-builtins-functions.def (vreinterpret):
> Register vuint*m1_t interpret function.
> * config/riscv/riscv-vector-builtins-types.def 
> (DEF_RVV_UNSIGNED_EEW8_LMUL1_INTERPRET_OPS):
> New macro for vuint8m1_t.
> (DEF_RVV_UNSIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_UNSIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_UNSIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
> (vbool1_t): Add to unsigned_eew*_interpret_ops.
> (vbool2_t): Likewise.
> (vbool4_t): Likewise.
> (vbool8_t): Likewise.
> (vbool16_t): Likewise.
> (vbool32_t): Likewise.
> (vbool64_t): Likewise.
> * config/riscv/riscv-vector-builtins.cc 
> (DEF_RVV_UNSIGNED_EEW8_LMUL1_INTERPRET_OPS):
> New macro for vuint*m1_t.
> (DEF_RVV_UNSIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_UNSIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_UNSIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
> (required_extensions_p): Add vuint*m1_t interpret case.
> * config/riscv/riscv-vector-builtins.def 
> (unsigned_eew8_lmul1_interpret):
> Add vuint*m1_t interpret to base type.
> (unsigned_eew16_lmul1_interpret): Likewise.
> (unsigned_eew32_lmul1_interpret): Likewise.
> (unsigned_eew64_lmul1_interpret): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c:
> Enrich test cases.
> ---
>  gcc/config/riscv/genrvv-type-indexer.cc   | 12 
>  .../riscv/riscv-vector-builtins-functions.def |  4 ++
>  .../riscv/riscv-vector-builtins-types.def | 64 +
>  gcc/config/riscv/riscv-vector-builtins.cc | 70 +++
>  gcc/config/riscv/riscv-vector-builtins.def|  6 ++
>  .../rvv/base/misc_vreinterpret_vbool_vint.c   | 20 +-
>  6 files changed, 174 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
> b/gcc/config/riscv/genrvv-type-indexer.cc
> index 5148abdda0f..18e1b375396 100644
> --- a/gcc/config/riscv/genrvv-type-indexer.cc
> +++ b/gcc/config/riscv/genrvv-type-indexer.cc
> @@ -229,6 +229,10 @@ main (int argc, const char **argv)
> fprintf (fp, "  /*SIGNED_EEW%d_LMUL1_INTERPRET*/ %s,\n", eew,
>  inttype (eew, LMUL1_LOG2, /* unsigned_p 
> */false).c_str ());
>
> +  for (unsigned eew : EEW_SIZE_LIST)
> +   fprintf (fp, "  /*UNSIGNED_EEW%d_LMUL1_INTERPRET*/ %s,\n", eew,
> +inttype (eew, LMUL1_LOG2, /* unsigned_p */true).c_str 
> + ());
> +
>for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
> {
>   unsigned multiple_of_lmul = 1 << lmul_log2_offset; @@ -322,6 
> +326,10 @@ main (int argc, const char **argv)
>   fprintf (fp, "  /*SIGNED_EEW%d_LMUL1_INTERPRET*/ INVALID,\n",
>eew);
>
> +   for (unsigned eew : EEW_SIZE_LIST)
> + fprintf (fp, "  /*UNSIGNED_EEW%d_LMUL1_INTERPRET*/ INVALID,\n",
> +  eew);
> +
> for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
>   {
> unsigned multiple_of_lmul = 1 << lmul_log2_offset; @@ 
> -387,6 +395,10 @@ main (int argc, const char **argv)
>   for (unsigned eew : EEW_SIZE_LIST)
> fprintf (fp, "  

Re: Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread juzhe.zh...@rivai.ai
Thanks a lot. Part of the comments has already been fixed in V4.
But forget about V4 patch.

Could you continue review V5 patch that I just send ?
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619366.html 
with all comments from you have been fixed.
Thanks.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-24 11:20
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Richard 
Sandiford
Subject: Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization
> +void
> +expand_vec_cmp (rtx target, rtx_code code, rtx mask, rtx maskoff, rtx op0,
> +   rtx op1)
> ...
> +  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, op0, op1);
> +  rtx ops[RVV_CMP_OP + 2] = {target, mask, maskoff, cmp, op0, op1};
> +  emit_vlmax_cmp_insn (icode, RVV_CMP_OP + 2, ops);
 
It's too magic.
 
> +/* This function emits cmp instruction.  */
> +void
> +emit_vlmax_cmp_insn (unsigned icode, int op_num, rtx *ops)
> +{
> +  machine_mode mode = GET_MODE (ops[0]);
> +  bool fully_unmasked_p = op_num == RVV_CMP_OP ? true : false;
> +  bool use_real_merge_p = op_num == RVV_CMP_OP ? false : true;
 
Don't do that, plz separate break this function into two.
 
> +  /* We have a maximum of 11 operands for RVV instruction patterns according 
> to
> +   * vector.md.  */
> +  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
> +  /*FULLY_UNMASKED_P*/ fully_unmasked_p,
> +  /*USE_REAL_MERGE_P*/ use_real_merge_p,
> +  /*HAS_AVL_P*/ true,
> +  /*VLMAX_P*/ true,
> +  /*DEST_MODE*/ mode, /*MASK_MODE*/ mode);
> +  e.set_policy (op_num == RVV_CMP_OP ? MASK_UNDISTURBED : MASK_ANY);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}
> +
>  /* Expand series const vector.  */
>
>  void
> +void
> +expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx op1)
> +{
> +  machine_mode mask_mode = GET_MODE (target);
> +  machine_mode data_mode = GET_MODE (op0);
> +  insn_code icode = get_cmp_insn_code (code, data_mode);
> +
> +  if (code == LTGT)
> +{
> +  rtx gt = gen_reg_rtx (mask_mode);
> +  rtx lt = gen_reg_rtx (mask_mode);
> +  expand_vec_cmp (gt, GT, op0, op1);
> +  expand_vec_cmp (lt, LT, op0, op1);
> +  icode = code_for_pred (IOR, mask_mode);
> +  rtx ops[3] = {target, gt, lt};
 
rtx ops[] = {target, gt, lt};
 
> +  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
> +  return;
> +}
> +
> +  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, op0, op1);
> +  rtx ops[RVV_CMP_OP] = {target, cmp, op0, op1};
 
rtx ops[] = {target, cmp, op0, op1};
 
> +  emit_vlmax_cmp_insn (icode, RVV_CMP_OP, ops);
> +}
> +
 
> +  /* There is native support for the inverse comparison.  */
> +  code = reverse_condition_maybe_unordered (code);
> +  if (code == ORDERED)
> +emit_move_insn (target, eq0);
> +  else
> +expand_vec_cmp (eq0, code, eq0, eq0, op0, op1);
> +
> +  if (can_invert_p)
> +{
> +  emit_move_insn (target, eq0);
> +  return true;
> +}
> +  insn_code icode = code_for_pred_not (mask_mode);
> +  rtx ops[RVV_UNOP] = {target, eq0};
> +  emit_vlmax_insn (icode, RVV_UNOP, ops);
 
rtx ops[] = {target, eq0};
 


RE: [PATCH v2] RISC-V: Support RVV VREINTERPRET from vbool*_t to vint*m1_t

2023-05-23 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Kito Cheng  
Sent: Wednesday, May 24, 2023 11:22 AM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; Wang, 
Yanzhang 
Subject: Re: [PATCH v2] RISC-V: Support RVV VREINTERPRET from vbool*_t to 
vint*m1_t

LGTM

On Thu, May 18, 2023 at 2:37 PM Pan Li via Gcc-patches 
 wrote:
>
> From: Pan Li 
>
> This patch support the RVV VREINTERPRET from the vbool*_t to the 
> vint*m1_t.  Aka:
>
> vint*m1_t __riscv_vreinterpret_x_x(vbool*_t);
>
> These APIs help the users to convert vector the vbool*_t to the LMUL=1 
> signed integer vint*_t.  According to the RVV intrinsic SPEC as below, 
> the reinterpret intrinsics only change the types of the underlying contents.
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-int
> rinsic-rfc.md#reinterpret-vbool-o-vintm1
>
> For example, given below code.
> vint8m1_t test_vreinterpret_v_b1_vint8m1 (vbool1_t src) {
>   return __riscv_vreinterpret_v_b1_i8m1 (src); }
>
> It will generate the assembly code similar as below:
> vsetvli a5,zero,e8,m8,ta,ma
> vlm.v   v1,0(a1)
> vs1r.v  v1,0(a0)
> ret
>
> Please NOTE the test files doesn't cover all the possible combinations 
> of the intrinsic APIs introduced by this PATCH due to too many.
> The reinterpret from vbool*_t to vuint*m1_t with lmul=1 will be 
> coverred in another PATCH.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/genrvv-type-indexer.cc (EEW_SIZE_LIST): New macro
> for the eew size list.
> (LMUL1_LOG2): New macro for the log2 value of lmul=1.
> (main): Add signed_eew*_lmul1_interpret for indexer.
> * config/riscv/riscv-vector-builtins-functions.def (vreinterpret):
> Register vint*m1_t interpret function.
> * config/riscv/riscv-vector-builtins-types.def 
> (DEF_RVV_SIGNED_EEW8_LMUL1_INTERPRET_OPS):
> New macro for vint8m1_t.
> (DEF_RVV_SIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_SIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_SIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
> (vbool1_t): Add to signed_eew*_interpret_ops.
> (vbool2_t): Likewise.
> (vbool4_t): Likewise.
> (vbool8_t): Likewise.
> (vbool16_t): Likewise.
> (vbool32_t): Likewise.
> (vbool64_t): Likewise.
> * config/riscv/riscv-vector-builtins.cc 
> (DEF_RVV_SIGNED_EEW8_LMUL1_INTERPRET_OPS):
> New macro for vint*m1_t.
> (DEF_RVV_SIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_SIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_SIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
> (required_extensions_p): Add vint8m1_t interpret case.
> * config/riscv/riscv-vector-builtins.def 
> (signed_eew8_lmul1_interpret):
> Add vint*m1_t interpret to base type.
> (signed_eew16_lmul1_interpret): Likewise.
> (signed_eew32_lmul1_interpret): Likewise.
> (signed_eew64_lmul1_interpret): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c:
> Enrich the test cases.
> ---
>  gcc/config/riscv/genrvv-type-indexer.cc   | 13 
>  .../riscv/riscv-vector-builtins-functions.def |  4 ++
>  .../riscv/riscv-vector-builtins-types.def | 64 +
>  gcc/config/riscv/riscv-vector-builtins.cc | 70 +++
>  gcc/config/riscv/riscv-vector-builtins.def|  6 ++
>  .../rvv/base/misc_vreinterpret_vbool_vint.c   | 19 -
>  6 files changed, 175 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
> b/gcc/config/riscv/genrvv-type-indexer.cc
> index 33738e41d7c..5148abdda0f 100644
> --- a/gcc/config/riscv/genrvv-type-indexer.cc
> +++ b/gcc/config/riscv/genrvv-type-indexer.cc
> @@ -24,6 +24,8 @@ along with GCC; see the file COPYING3.  If not see  
> #include 
>
>  #define BOOL_SIZE_LIST {1, 2, 4, 8, 16, 32, 64}
> +#define EEW_SIZE_LIST {8, 16, 32, 64} #define LMUL1_LOG2 0
>
>  std::string
>  to_lmul (int lmul_log2)
> @@ -223,6 +225,10 @@ main (int argc, const char **argv)
>for (unsigned boolsize : BOOL_SIZE_LIST)
> fprintf (fp, "  /*BOOL%d_INTERPRET*/ INVALID,\n", boolsize);
>
> +  for (unsigned eew : EEW_SIZE_LIST)
> +   fprintf (fp, "  /*SIGNED_EEW%d_LMUL1_INTERPRET*/ %s,\n", eew,
> +inttype (eew, LMUL1_LOG2, /* unsigned_p 
> + */false).c_str ());
> +
>for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
> {
>   unsigned multiple_of_lmul = 1 << lmul_log2_offset; @@ -312,6 
> +318,10 @@ main (int argc, const char **argv)
>: "INVALID");
>   }
>
> +   for (unsigned eew : EEW_SIZE_LIST)
> + fprintf (fp, "  /*SIGNED_EEW%d_LMUL1_INTERPRET*/ INVALID,\n",
> +  eew);
> +
> for (unsigned lmul_log2_offset : {1, 

[PATCH V5] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch enable RVV auto-vectorization including floating-point
unorder and order comparison.

The testcases are leveraged from Richard.
So include Richard as co-author.

And this patch is the prerequisite patch for my current middle-end work.
Without this patch, I can't support len_mask_xxx middle-end pattern since 
the mask is generated by comparison.

For example,
for (int i...; i < n.)
  if (cond[i])
 a[i] = b[i]

We need len_mask_load/len_mask_store for such code and I am gonna support them
in the middle-end after this patch is merged.

Both integer && floating (order and unorder) are tested.
built && regression passed.

Ok for trunk?

Thanks.

Co-Authored-By: Richard Sandiford 

gcc/ChangeLog:

* config/riscv/autovec.md (@vcond_mask_): New pattern.
(vec_cmp): New pattern.
(vec_cmpu): New pattern.
(vcond): New pattern.
(vcondu): New pattern.
* config/riscv/riscv-protos.h (enum insn_type): Add new enum.
(emit_vlmax_merge_insn): New function.
(emit_vlmax_cmp_insn): Ditto.
(emit_vlmax_cmp_mu_insn): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float): Ditto.
(expand_vcond): Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_merge_insn): Ditto.
(emit_vlmax_cmp_insn): Ditto.
(emit_vlmax_cmp_mu_insn): Ditto.
(get_cmp_insn_code): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float): Ditto.
(expand_vcond): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp:
* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-3.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: New test.

---
 gcc/config/riscv/autovec.md   | 112 
 gcc/config/riscv/riscv-protos.h   |   9 +
 gcc/config/riscv/riscv-v.cc   | 255 ++
 .../riscv/rvv/autovec/cmp/vcond-1.c   | 157 +++
 .../riscv/rvv/autovec/cmp/vcond-2.c   |  75 ++
 .../riscv/rvv/autovec/cmp/vcond-3.c   |  13 +
 .../riscv/rvv/autovec/cmp/vcond_run-1.c   |  49 
 .../riscv/rvv/autovec/cmp/vcond_run-2.c   |  76 ++
 .../riscv/rvv/autovec/cmp/vcond_run-3.c   |   6 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 10 files changed, 754 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7c87b6012f6..4eeeab624a4 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -162,3 +162,115 @@
 riscv_vector::RVV_BINOP, operands);
   DONE;
 })
+
+;; =
+;; == Comparisons and selects
+;; =
+
+;; -
+;;  [INT,FP] Select based on masks
+;; -
+;; Includes merging patterns for:
+;; - vmerge.vv
+;; - vmerge.vx
+;; - vfmerge.vf
+;; -
+
+(define_expand "@vcond_mask_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand: 3 "register_operand")
+   (match_operand:V 1 "nonmemory_operand")
+   (match_operand:V 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+/* The order of vcond_mask is opposite to pred_merge.  */
+std::swap (operands[1], operands[2]);
+riscv_vector::emit_vlmax_merge_insn (code_for_pred_merge (mode),
+   riscv_vector::RVV_MERGE_OP, operands);
+DONE;
+  }
+)
+
+;; -
+;;  [INT,FP] Comparisons
+;; -
+;; Includes:
+;; - vms.
+;; -
+
+(define_expand "vec_cmp"
+  [(set (match_operand: 0 "register_operand")
+   (match_operator: 1 "comparison_operator"
+ [(match_operand:VI 2 "register_operand")
+  (match_operand:VI 3 "register_operand")]))]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_cmp (operands[0], 

Re: [PATCH v2] RISC-V: Support RVV VREINTERPRET from vbool*_t to vint*m1_t

2023-05-23 Thread Kito Cheng via Gcc-patches
LGTM

On Thu, May 18, 2023 at 2:37 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> This patch support the RVV VREINTERPRET from the vbool*_t to the
> vint*m1_t.  Aka:
>
> vint*m1_t __riscv_vreinterpret_x_x(vbool*_t);
>
> These APIs help the users to convert vector the vbool*_t to the LMUL=1
> signed integer vint*_t.  According to the RVV intrinsic SPEC as below,
> the reinterpret intrinsics only change the types of the underlying contents.
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#reinterpret-vbool-o-vintm1
>
> For example, given below code.
> vint8m1_t test_vreinterpret_v_b1_vint8m1 (vbool1_t src) {
>   return __riscv_vreinterpret_v_b1_i8m1 (src);
> }
>
> It will generate the assembly code similar as below:
> vsetvli a5,zero,e8,m8,ta,ma
> vlm.v   v1,0(a1)
> vs1r.v  v1,0(a0)
> ret
>
> Please NOTE the test files doesn't cover all the possible combinations
> of the intrinsic APIs introduced by this PATCH due to too many.
> The reinterpret from vbool*_t to vuint*m1_t with lmul=1 will be coverred
> in another PATCH.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/genrvv-type-indexer.cc (EEW_SIZE_LIST): New macro
> for the eew size list.
> (LMUL1_LOG2): New macro for the log2 value of lmul=1.
> (main): Add signed_eew*_lmul1_interpret for indexer.
> * config/riscv/riscv-vector-builtins-functions.def (vreinterpret):
> Register vint*m1_t interpret function.
> * config/riscv/riscv-vector-builtins-types.def 
> (DEF_RVV_SIGNED_EEW8_LMUL1_INTERPRET_OPS):
> New macro for vint8m1_t.
> (DEF_RVV_SIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_SIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_SIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
> (vbool1_t): Add to signed_eew*_interpret_ops.
> (vbool2_t): Likewise.
> (vbool4_t): Likewise.
> (vbool8_t): Likewise.
> (vbool16_t): Likewise.
> (vbool32_t): Likewise.
> (vbool64_t): Likewise.
> * config/riscv/riscv-vector-builtins.cc 
> (DEF_RVV_SIGNED_EEW8_LMUL1_INTERPRET_OPS):
> New macro for vint*m1_t.
> (DEF_RVV_SIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_SIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_SIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
> (required_extensions_p): Add vint8m1_t interpret case.
> * config/riscv/riscv-vector-builtins.def 
> (signed_eew8_lmul1_interpret):
> Add vint*m1_t interpret to base type.
> (signed_eew16_lmul1_interpret): Likewise.
> (signed_eew32_lmul1_interpret): Likewise.
> (signed_eew64_lmul1_interpret): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c:
> Enrich the test cases.
> ---
>  gcc/config/riscv/genrvv-type-indexer.cc   | 13 
>  .../riscv/riscv-vector-builtins-functions.def |  4 ++
>  .../riscv/riscv-vector-builtins-types.def | 64 +
>  gcc/config/riscv/riscv-vector-builtins.cc | 70 +++
>  gcc/config/riscv/riscv-vector-builtins.def|  6 ++
>  .../rvv/base/misc_vreinterpret_vbool_vint.c   | 19 -
>  6 files changed, 175 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
> b/gcc/config/riscv/genrvv-type-indexer.cc
> index 33738e41d7c..5148abdda0f 100644
> --- a/gcc/config/riscv/genrvv-type-indexer.cc
> +++ b/gcc/config/riscv/genrvv-type-indexer.cc
> @@ -24,6 +24,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include 
>
>  #define BOOL_SIZE_LIST {1, 2, 4, 8, 16, 32, 64}
> +#define EEW_SIZE_LIST {8, 16, 32, 64}
> +#define LMUL1_LOG2 0
>
>  std::string
>  to_lmul (int lmul_log2)
> @@ -223,6 +225,10 @@ main (int argc, const char **argv)
>for (unsigned boolsize : BOOL_SIZE_LIST)
> fprintf (fp, "  /*BOOL%d_INTERPRET*/ INVALID,\n", boolsize);
>
> +  for (unsigned eew : EEW_SIZE_LIST)
> +   fprintf (fp, "  /*SIGNED_EEW%d_LMUL1_INTERPRET*/ %s,\n", eew,
> +inttype (eew, LMUL1_LOG2, /* unsigned_p */false).c_str ());
> +
>for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
> {
>   unsigned multiple_of_lmul = 1 << lmul_log2_offset;
> @@ -312,6 +318,10 @@ main (int argc, const char **argv)
>: "INVALID");
>   }
>
> +   for (unsigned eew : EEW_SIZE_LIST)
> + fprintf (fp, "  /*SIGNED_EEW%d_LMUL1_INTERPRET*/ INVALID,\n",
> +  eew);
> +
> for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
>   {
> unsigned multiple_of_lmul = 1 << lmul_log2_offset;
> @@ -374,6 +384,9 @@ main (int argc, const char **argv)
>   for (unsigned boolsize : BOOL_SIZE_LIST)
> fprintf (fp, "  /*BOOL%d_INTERPRET*/ INVALID,\n", boolsize);
>
> + for (unsigned 

Re: [PATCH] RISC-V: Support RVV VREINTERPRET from vbool*_t to vuint*m1_t

2023-05-23 Thread Kito Cheng via Gcc-patches
ok

On Thu, May 18, 2023 at 2:32 PM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> This patch support the RVV VREINTERPRET from the vbool*_t to the
> vuint*m1_t.  Aka:
>
> vuint*m1_t __riscv_vreinterpret_x_x(vbool*_t);
>
> These APIs help the users to convert vector the vbool*_t to the LMUL=1
> unsigned integer vint*_t.  According to the RVV intrinsic SPEC as below,
> the reinterpret intrinsics only change the types of the underlying contents.
>
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#reinterpret-vbool-o-vintm1
>
> For example, given below code.
> vuint8m1_t test_vreinterpret_v_b1_vuint8m1 (vbool1_t src) {
>   return __riscv_vreinterpret_v_b1_u8m1 (src);
> }
>
> It will generate the assembly code similar as below:
> vsetvli a5,zero,e8,m8,ta,ma
> vlm.v   v1,0(a1)
> vs1r.v  v1,0(a0)
> ret
>
> Please NOTE the test files doesn't cover all the possible combinations
> of the intrinsic APIs introduced by this PATCH due to too many.
> This is the last PATCH for the reinterpret between the signed/unsigned
> and the bool vector types.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/genrvv-type-indexer.cc (main): Add
> unsigned_eew*_lmul1_interpret for indexer.
> * config/riscv/riscv-vector-builtins-functions.def (vreinterpret):
> Register vuint*m1_t interpret function.
> * config/riscv/riscv-vector-builtins-types.def 
> (DEF_RVV_UNSIGNED_EEW8_LMUL1_INTERPRET_OPS):
> New macro for vuint8m1_t.
> (DEF_RVV_UNSIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_UNSIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_UNSIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
> (vbool1_t): Add to unsigned_eew*_interpret_ops.
> (vbool2_t): Likewise.
> (vbool4_t): Likewise.
> (vbool8_t): Likewise.
> (vbool16_t): Likewise.
> (vbool32_t): Likewise.
> (vbool64_t): Likewise.
> * config/riscv/riscv-vector-builtins.cc 
> (DEF_RVV_UNSIGNED_EEW8_LMUL1_INTERPRET_OPS):
> New macro for vuint*m1_t.
> (DEF_RVV_UNSIGNED_EEW16_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_UNSIGNED_EEW32_LMUL1_INTERPRET_OPS): Likewise.
> (DEF_RVV_UNSIGNED_EEW64_LMUL1_INTERPRET_OPS): Likewise.
> (required_extensions_p): Add vuint*m1_t interpret case.
> * config/riscv/riscv-vector-builtins.def 
> (unsigned_eew8_lmul1_interpret):
> Add vuint*m1_t interpret to base type.
> (unsigned_eew16_lmul1_interpret): Likewise.
> (unsigned_eew32_lmul1_interpret): Likewise.
> (unsigned_eew64_lmul1_interpret): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c:
> Enrich test cases.
> ---
>  gcc/config/riscv/genrvv-type-indexer.cc   | 12 
>  .../riscv/riscv-vector-builtins-functions.def |  4 ++
>  .../riscv/riscv-vector-builtins-types.def | 64 +
>  gcc/config/riscv/riscv-vector-builtins.cc | 70 +++
>  gcc/config/riscv/riscv-vector-builtins.def|  6 ++
>  .../rvv/base/misc_vreinterpret_vbool_vint.c   | 20 +-
>  6 files changed, 174 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
> b/gcc/config/riscv/genrvv-type-indexer.cc
> index 5148abdda0f..18e1b375396 100644
> --- a/gcc/config/riscv/genrvv-type-indexer.cc
> +++ b/gcc/config/riscv/genrvv-type-indexer.cc
> @@ -229,6 +229,10 @@ main (int argc, const char **argv)
> fprintf (fp, "  /*SIGNED_EEW%d_LMUL1_INTERPRET*/ %s,\n", eew,
>  inttype (eew, LMUL1_LOG2, /* unsigned_p */false).c_str ());
>
> +  for (unsigned eew : EEW_SIZE_LIST)
> +   fprintf (fp, "  /*UNSIGNED_EEW%d_LMUL1_INTERPRET*/ %s,\n", eew,
> +inttype (eew, LMUL1_LOG2, /* unsigned_p */true).c_str ());
> +
>for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
> {
>   unsigned multiple_of_lmul = 1 << lmul_log2_offset;
> @@ -322,6 +326,10 @@ main (int argc, const char **argv)
>   fprintf (fp, "  /*SIGNED_EEW%d_LMUL1_INTERPRET*/ INVALID,\n",
>eew);
>
> +   for (unsigned eew : EEW_SIZE_LIST)
> + fprintf (fp, "  /*UNSIGNED_EEW%d_LMUL1_INTERPRET*/ INVALID,\n",
> +  eew);
> +
> for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
>   {
> unsigned multiple_of_lmul = 1 << lmul_log2_offset;
> @@ -387,6 +395,10 @@ main (int argc, const char **argv)
>   for (unsigned eew : EEW_SIZE_LIST)
> fprintf (fp, "  /*SIGNED_EEW%d_LMUL1_INTERPRET*/ INVALID,\n", 
> eew);
>
> + for (unsigned eew : EEW_SIZE_LIST)
> +   fprintf (fp, "  /*UNSIGNED_EEW%d_LMUL1_INTERPRET*/ INVALID,\n",
> +eew);
> +
>   for (unsigned lmul_log2_offset : {1, 2, 3, 4, 5, 6})
> {
>   unsigned multiple_of_lmul = 1 << 

Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread Kito Cheng via Gcc-patches
> +void
> +expand_vec_cmp (rtx target, rtx_code code, rtx mask, rtx maskoff, rtx op0,
> +   rtx op1)
> ...
> +  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, op0, op1);
> +  rtx ops[RVV_CMP_OP + 2] = {target, mask, maskoff, cmp, op0, op1};
> +  emit_vlmax_cmp_insn (icode, RVV_CMP_OP + 2, ops);

It's too magic.

> +/* This function emits cmp instruction.  */
> +void
> +emit_vlmax_cmp_insn (unsigned icode, int op_num, rtx *ops)
> +{
> +  machine_mode mode = GET_MODE (ops[0]);
> +  bool fully_unmasked_p = op_num == RVV_CMP_OP ? true : false;
> +  bool use_real_merge_p = op_num == RVV_CMP_OP ? false : true;

Don't do that, plz separate break this function into two.

> +  /* We have a maximum of 11 operands for RVV instruction patterns according 
> to
> +   * vector.md.  */
> +  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
> +  /*FULLY_UNMASKED_P*/ fully_unmasked_p,
> +  /*USE_REAL_MERGE_P*/ use_real_merge_p,
> +  /*HAS_AVL_P*/ true,
> +  /*VLMAX_P*/ true,
> +  /*DEST_MODE*/ mode, /*MASK_MODE*/ mode);
> +  e.set_policy (op_num == RVV_CMP_OP ? MASK_UNDISTURBED : MASK_ANY);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}
> +
>  /* Expand series const vector.  */
>
>  void
> +void
> +expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx op1)
> +{
> +  machine_mode mask_mode = GET_MODE (target);
> +  machine_mode data_mode = GET_MODE (op0);
> +  insn_code icode = get_cmp_insn_code (code, data_mode);
> +
> +  if (code == LTGT)
> +{
> +  rtx gt = gen_reg_rtx (mask_mode);
> +  rtx lt = gen_reg_rtx (mask_mode);
> +  expand_vec_cmp (gt, GT, op0, op1);
> +  expand_vec_cmp (lt, LT, op0, op1);
> +  icode = code_for_pred (IOR, mask_mode);
> +  rtx ops[3] = {target, gt, lt};

rtx ops[] = {target, gt, lt};

> +  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
> +  return;
> +}
> +
> +  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, op0, op1);
> +  rtx ops[RVV_CMP_OP] = {target, cmp, op0, op1};

rtx ops[] = {target, cmp, op0, op1};

> +  emit_vlmax_cmp_insn (icode, RVV_CMP_OP, ops);
> +}
> +

> +  /* There is native support for the inverse comparison.  */
> +  code = reverse_condition_maybe_unordered (code);
> +  if (code == ORDERED)
> +emit_move_insn (target, eq0);
> +  else
> +expand_vec_cmp (eq0, code, eq0, eq0, op0, op1);
> +
> +  if (can_invert_p)
> +{
> +  emit_move_insn (target, eq0);
> +  return true;
> +}
> +  insn_code icode = code_for_pred_not (mask_mode);
> +  rtx ops[RVV_UNOP] = {target, eq0};
> +  emit_vlmax_insn (icode, RVV_UNOP, ops);

rtx ops[] = {target, eq0};


RE: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to vbool[2-64]_t

2023-05-23 Thread Li, Pan2 via Gcc-patches
Just to make sure below 2 VREINTERPRET related PATCH are still in the queue, !

https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618877.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618881.html

Pan

From: Kito Cheng 
Sent: Saturday, May 20, 2023 9:58 PM
To: Li, Pan2 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 

Subject: Re: [PATCH] RISC-V: Support RVV VREINTERPRET from v{u}int*_t to 
vbool[2-64]_t

Lgtm

mailto:pan2...@intel.com>>於 2023年5月17日 週三,16:14寫道:
From: Pan Li mailto:pan2...@intel.com>>

This patch support the RVV VREINTERPRET from the int to the
vbool[2|4|8|16|32|64]_t.  Aka:

vbool[2|4|8|16|32|64]_t __riscv_vreinterpret_x_x(v{u}int[8|16|32|64]_t);

These APIs help the users to convert vector LMUL=1 integer to
vbool[2-64]_t.  According to the RVV intrinsic SPEC as below,
the reinterpret intrinsics only change the types of the underlying
contents.

https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#reinterpret-vbool-o-vintm1

For example, given below code.
vbool64_t test_vreinterpret_v_u8m1_b64 (vuint8m1_t src) {
  return __riscv_vreinterpret_v_u8m1_b64 (src);
}

It will generate the assembly code similar as below:
vsetvli a5,zero,e8,mf8,ta,ma
vlm.v   v1,0(a1)
vsm.v   v1,0(a0)
ret

Please NOTE the test files doesn't cover all the possible combinations
of the intrinsic APIs introduced by this PATCH due to too many.
The reinterpret from vbool*_t to v{u}int*_t with lmul=1 will be coverred
int another PATCH.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>

gcc/ChangeLog:

* config/riscv/genrvv-type-indexer.cc (BOOL_SIZE_LIST): Add the
rest bool size, aka 2, 4, 8, 16, 32, 64.
* config/riscv/riscv-vector-builtins-functions.def (vreinterpret):
Register vbool[2|4|8|16|32|64] interpret function.
* config/riscv/riscv-vector-builtins-types.def 
(DEF_RVV_BOOL2_INTERPRET_OPS):
New macro for vbool2_t.
(DEF_RVV_BOOL4_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL8_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL16_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL32_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL64_INTERPRET_OPS): Likewise.
(vint8m1_t): Add the type to bool[2|4|8|16|32|64]_interpret_ops.
(vint16m1_t): Likewise.
(vint32m1_t): Likewise.
(vint64m1_t): Likewise.
(vuint8m1_t): Likewise.
(vuint16m1_t): Likewise.
(vuint32m1_t): Likewise.
(vuint64m1_t): Likewise.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_BOOL2_INTERPRET_OPS):
New macro for vbool2_t.
(DEF_RVV_BOOL4_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL8_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL16_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL32_INTERPRET_OPS): Likewise.
(DEF_RVV_BOOL64_INTERPRET_OPS): Likewise.
(required_extensions_p): Add vbool[2|4|8|16|32|64] interpret case.
* config/riscv/riscv-vector-builtins.def (bool2_interpret): Add
vbool2_t interprect to base type.
(bool4_interpret): Likewise.
(bool8_interpret): Likewise.
(bool16_interpret): Likewise.
(bool32_interpret): Likewise.
(bool64_interpret): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/misc_vreinterpret_vbool_vint.c: Add
test cases for vbool[2|4|8|16|32|64]_t.
---
 gcc/config/riscv/genrvv-type-indexer.cc   |   2 +-
 .../riscv/riscv-vector-builtins-functions.def |   6 +
 .../riscv/riscv-vector-builtins-types.def |  97 +++-
 gcc/config/riscv/riscv-vector-builtins.cc | 105 +-
 gcc/config/riscv/riscv-vector-builtins.def|   9 +-
 .../rvv/base/misc_vreinterpret_vbool_vint.c   |  52 -
 6 files changed, 265 insertions(+), 6 deletions(-)

diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index 2f0375568a8..33738e41d7c 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ b/gcc/config/riscv/genrvv-type-indexer.cc
@@ -23,7 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #include 
 #include 

-#define BOOL_SIZE_LIST {1}
+#define BOOL_SIZE_LIST {1, 2, 4, 8, 16, 32, 64}

 std::string
 to_lmul (int lmul_log2)
diff --git a/gcc/config/riscv/riscv-vector-builtins-functions.def 
b/gcc/config/riscv/riscv-vector-builtins-functions.def
index 72032c6a52c..7c89a20cb24 100644
--- a/gcc/config/riscv/riscv-vector-builtins-functions.def
+++ b/gcc/config/riscv/riscv-vector-builtins-functions.def
@@ -509,6 +509,12 @@ DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, 
iu_v_eew16_interpret_ops)
 DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_eew32_interpret_ops)
 DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_eew64_interpret_ops)
 DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_bool1_interpret_ops)
+DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, iu_v_bool2_interpret_ops)
+DEF_RVV_FUNCTION (vreinterpret, misc, none_preds, 

Re: [PATCH V4] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread juzhe.zh...@rivai.ai
Hi, this patch is the patch that fixed all comments from Robin.

And this patch is the prerequisite patch for my current middle-end work.
Without this patch, I can't support len_mask_xxx middle-end pattern since 
the mask is generated by comparison.

For example,
for (int i...; i < n.)
  if (cond[i])
 a[i] = b[i]

We need len_mask_load/len_mask_store for such code and I am gonna support them
in the middle-end after this patch is merged.

Both integer && floating (order and unorder) are tested.
built && regression passed.

Ok for trunk?

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-24 11:11
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; 
Juzhe-Zhong; Richard Sandiford
Subject: [PATCH V4] RISC-V: Add RVV comparison autovectorization
From: Juzhe-Zhong 
 
This patch enable RVV auto-vectorization including floating-point
unorder and order comparison.
 
The testcases are leveraged from Richard.
So include Richard as co-author.
 
Co-Authored-By: Richard Sandiford 
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (@vcond_mask_): New pattern.
(vec_cmp): Ditto.
(vec_cmpu): Ditto.
(vcond): Ditto.
(vcondu): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add new enum.
(emit_vlmax_merge_insn): New function.
(emit_vlmax_cmp_insn): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float):Ditto.
(expand_vcond):Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_merge_insn): Ditto.
(emit_vlmax_cmp_insn): Ditto.
(get_cmp_insn_code): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float): Ditto.
(expand_vcond): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add RVV comparison testcases.
* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-3.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: New test.
 
---
gcc/config/riscv/autovec.md   | 112 
gcc/config/riscv/riscv-protos.h   |   8 +
gcc/config/riscv/riscv-v.cc   | 242 ++
.../riscv/rvv/autovec/cmp/vcond-1.c   | 157 
.../riscv/rvv/autovec/cmp/vcond-2.c   |  75 ++
.../riscv/rvv/autovec/cmp/vcond-3.c   |  13 +
.../riscv/rvv/autovec/cmp/vcond_run-1.c   |  49 
.../riscv/rvv/autovec/cmp/vcond_run-2.c   |  76 ++
.../riscv/rvv/autovec/cmp/vcond_run-3.c   |   6 +
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
10 files changed, 740 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7c87b6012f6..4eeeab624a4 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -162,3 +162,115 @@
riscv_vector::RVV_BINOP, operands);
   DONE;
})
+
+;; =
+;; == Comparisons and selects
+;; =
+
+;; -
+;;  [INT,FP] Select based on masks
+;; -
+;; Includes merging patterns for:
+;; - vmerge.vv
+;; - vmerge.vx
+;; - vfmerge.vf
+;; -
+
+(define_expand "@vcond_mask_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand: 3 "register_operand")
+   (match_operand:V 1 "nonmemory_operand")
+   (match_operand:V 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+/* The order of vcond_mask is opposite to pred_merge.  */
+std::swap (operands[1], operands[2]);
+riscv_vector::emit_vlmax_merge_insn (code_for_pred_merge (mode),
+riscv_vector::RVV_MERGE_OP, operands);
+DONE;
+  }
+)
+
+;; -
+;;  [INT,FP] Comparisons
+;; -
+;; Includes:
+;; - vms.
+;; -
+
+(define_expand "vec_cmp"
+  [(set (match_operand: 0 "register_operand")
+ (match_operator: 1 "comparison_operator"
+   

[PATCH V4] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch enable RVV auto-vectorization including floating-point
unorder and order comparison.

The testcases are leveraged from Richard.
So include Richard as co-author.

Co-Authored-By: Richard Sandiford 

gcc/ChangeLog:

* config/riscv/autovec.md (@vcond_mask_): New pattern.
(vec_cmp): Ditto.
(vec_cmpu): Ditto.
(vcond): Ditto.
(vcondu): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add new enum.
(emit_vlmax_merge_insn): New function.
(emit_vlmax_cmp_insn): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float):Ditto.
(expand_vcond):Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_merge_insn): Ditto.
(emit_vlmax_cmp_insn): Ditto.
(get_cmp_insn_code): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float): Ditto.
(expand_vcond): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add RVV comparison testcases.
* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-3.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: New test.

---
 gcc/config/riscv/autovec.md   | 112 
 gcc/config/riscv/riscv-protos.h   |   8 +
 gcc/config/riscv/riscv-v.cc   | 242 ++
 .../riscv/rvv/autovec/cmp/vcond-1.c   | 157 
 .../riscv/rvv/autovec/cmp/vcond-2.c   |  75 ++
 .../riscv/rvv/autovec/cmp/vcond-3.c   |  13 +
 .../riscv/rvv/autovec/cmp/vcond_run-1.c   |  49 
 .../riscv/rvv/autovec/cmp/vcond_run-2.c   |  76 ++
 .../riscv/rvv/autovec/cmp/vcond_run-3.c   |   6 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 10 files changed, 740 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7c87b6012f6..4eeeab624a4 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -162,3 +162,115 @@
 riscv_vector::RVV_BINOP, operands);
   DONE;
 })
+
+;; =
+;; == Comparisons and selects
+;; =
+
+;; -
+;;  [INT,FP] Select based on masks
+;; -
+;; Includes merging patterns for:
+;; - vmerge.vv
+;; - vmerge.vx
+;; - vfmerge.vf
+;; -
+
+(define_expand "@vcond_mask_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand: 3 "register_operand")
+   (match_operand:V 1 "nonmemory_operand")
+   (match_operand:V 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+/* The order of vcond_mask is opposite to pred_merge.  */
+std::swap (operands[1], operands[2]);
+riscv_vector::emit_vlmax_merge_insn (code_for_pred_merge (mode),
+   riscv_vector::RVV_MERGE_OP, operands);
+DONE;
+  }
+)
+
+;; -
+;;  [INT,FP] Comparisons
+;; -
+;; Includes:
+;; - vms.
+;; -
+
+(define_expand "vec_cmp"
+  [(set (match_operand: 0 "register_operand")
+   (match_operator: 1 "comparison_operator"
+ [(match_operand:VI 2 "register_operand")
+  (match_operand:VI 3 "register_operand")]))]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
+ operands[2], operands[3]);
+DONE;
+  }
+)
+
+(define_expand "vec_cmpu"
+  [(set (match_operand: 0 "register_operand")
+   (match_operator: 1 "comparison_operator"
+ [(match_operand:VI 2 "register_operand")
+  (match_operand:VI 3 "register_operand")]))]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
+ operands[2], operands[3]);
+DONE;
+  }
+)
+
+(define_expand "vec_cmp"
+  [(set (match_operand: 0 

RE: [PATCH V2] RISC-V: Fix incorrect code of reaching inaccessible memory address

2023-05-23 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Wednesday, May 24, 2023 10:55 AM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; jeffreya...@gmail.com; kito.ch...@gmail.com; 
pal...@dabbelt.com; pal...@rivosinc.com; rdapp@gmail.com
Subject: Re: [PATCH V2] RISC-V: Fix incorrect code of reaching inaccessible 
memory address

Lgtm, thanks

於 2023年5月24日 週三,10:39寫道:

> From: Juzhe-Zhong 
>
> To fix this issue, we seperate Vl operand and normal operands.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md: Adjust for new interface.
> * config/riscv/riscv-protos.h (emit_vlmax_insn): Add VL operand.
> (emit_nonvlmax_insn): Add AVL operand.
> * config/riscv/riscv-v.cc (emit_vlmax_insn): Add VL operand.
> (emit_nonvlmax_insn): Add AVL operand.
> (sew64_scalar_helper): Adjust for new interface.
> (expand_tuple_move): Ditto.
> * config/riscv/vector.md: Ditto.
>
> ---
>  gcc/config/riscv/autovec.md |  4 ++--
>  gcc/config/riscv/riscv-protos.h |  4 ++--
>  gcc/config/riscv/riscv-v.cc | 30 +++---
>  gcc/config/riscv/vector.md  |  4 ++--
>  4 files changed, 25 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md 
> index 04b4459222a..7c87b6012f6 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -32,7 +32,7 @@
>"TARGET_VECTOR"
>  {
>riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
> -   riscv_vector::RVV_UNOP, operands);
> +   riscv_vector::RVV_UNOP, operands,
> operands[2]);
>DONE;
>  })
>
> @@ -44,7 +44,7 @@
>"TARGET_VECTOR"
>  {
>riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
> -   riscv_vector::RVV_UNOP, operands);
> +   riscv_vector::RVV_UNOP, operands,
> operands[2]);
>DONE;
>  })
>
> diff --git a/gcc/config/riscv/riscv-protos.h 
> b/gcc/config/riscv/riscv-protos.h index 0ae4656befb..159b51a1210 
> 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -172,8 +172,8 @@ bool const_vec_all_same_in_range_p (rtx, 
> HOST_WIDE_INT, HOST_WIDE_INT);  bool legitimize_move (rtx, rtx);  void 
> emit_vlmax_vsetvl (machine_mode, rtx);  void emit_hard_vlmax_vsetvl 
> (machine_mode, rtx); -void emit_vlmax_insn (unsigned, int, rtx *); 
> -void emit_nonvlmax_insn (unsigned, int, rtx *);
> +void emit_vlmax_insn (unsigned, int, rtx *, rtx = 0); void 
> +emit_nonvlmax_insn (unsigned, int, rtx *, rtx);
>  enum vlmul_type get_vlmul (machine_mode);  unsigned int get_ratio 
> (machine_mode);  unsigned int get_nf (machine_mode); diff --git 
> a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 
> fa61a850a22..1cdc4a99701 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -71,7 +71,8 @@ public:
>m_fully_unmasked_p (false), m_use_real_merge_p (false),
>m_needs_avl_p (false), m_vlmax_p (false), m_has_tail_policy_p 
> (false),
>m_has_mask_policy_p (false), m_tail_policy (TAIL_ANY),
> -  m_mask_policy (MASK_ANY), m_dest_mode (VOIDmode), m_mask_mode
> (VOIDmode)
> +  m_mask_policy (MASK_ANY), m_dest_mode (VOIDmode), m_mask_mode
> (VOIDmode),
> +  m_vl_op (NULL_RTX)
>{}
>
>/* Initializer for various configurations.  */ @@ -83,7 +84,8 @@ 
> public:
>m_use_real_merge_p (use_real_merge_p), m_needs_avl_p (needs_avl_p),
>m_vlmax_p (vlmax_p), m_has_tail_policy_p (false),
>m_has_mask_policy_p (false), m_tail_policy (TAIL_ANY),
> -  m_mask_policy (MASK_ANY), m_dest_mode (dest_mode), m_mask_mode
> (mask_mode)
> +  m_mask_policy (MASK_ANY), m_dest_mode (dest_mode),
> +  m_mask_mode (mask_mode), m_vl_op (NULL_RTX)
>{}
>
>void set_policy (enum tail_policy ta) @@ -96,6 +98,7 @@ public:
>  m_has_mask_policy_p = true;
>  m_mask_policy = ma;
>}
> +  void set_vl (rtx vl) { m_vl_op = vl; }
>
>void add_output_operand (rtx x, machine_mode mode)
>{
> @@ -169,7 +172,7 @@ public:
>
>  if (m_needs_avl_p)
>{
> -   rtx len = ops[m_op_num];
> +   rtx len = m_vl_op;
> if (m_vlmax_p)
>   {
> if (const_vlmax_p (m_dest_mode))
> @@ -228,6 +231,7 @@ private:
>enum mask_policy m_mask_policy;
>machine_mode m_dest_mode;
>machine_mode m_mask_mode;
> +  rtx m_vl_op;
>expand_operand m_ops[MAX_OPERANDS];
>  };
>
> @@ -339,7 +343,7 @@ autovec_use_vlmax_p (void)
>  /* This function emits a {VLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by
> the
>   * actual operation.  */
>  void
> -emit_vlmax_insn (unsigned icode, int op_num, rtx *ops)
> +emit_vlmax_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
>  {
>machine_mode data_mode = GET_MODE (ops[0]);
>machine_mode mask_mode = get_mask_mode 

Re: [PATCH V2] RISC-V: Fix incorrect code of reaching inaccessible memory address

2023-05-23 Thread Kito Cheng via Gcc-patches
Lgtm, thanks

於 2023年5月24日 週三,10:39寫道:

> From: Juzhe-Zhong 
>
> To fix this issue, we seperate Vl operand and normal operands.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md: Adjust for new interface.
> * config/riscv/riscv-protos.h (emit_vlmax_insn): Add VL operand.
> (emit_nonvlmax_insn): Add AVL operand.
> * config/riscv/riscv-v.cc (emit_vlmax_insn): Add VL operand.
> (emit_nonvlmax_insn): Add AVL operand.
> (sew64_scalar_helper): Adjust for new interface.
> (expand_tuple_move): Ditto.
> * config/riscv/vector.md: Ditto.
>
> ---
>  gcc/config/riscv/autovec.md |  4 ++--
>  gcc/config/riscv/riscv-protos.h |  4 ++--
>  gcc/config/riscv/riscv-v.cc | 30 +++---
>  gcc/config/riscv/vector.md  |  4 ++--
>  4 files changed, 25 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 04b4459222a..7c87b6012f6 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -32,7 +32,7 @@
>"TARGET_VECTOR"
>  {
>riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
> -   riscv_vector::RVV_UNOP, operands);
> +   riscv_vector::RVV_UNOP, operands,
> operands[2]);
>DONE;
>  })
>
> @@ -44,7 +44,7 @@
>"TARGET_VECTOR"
>  {
>riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
> -   riscv_vector::RVV_UNOP, operands);
> +   riscv_vector::RVV_UNOP, operands,
> operands[2]);
>DONE;
>  })
>
> diff --git a/gcc/config/riscv/riscv-protos.h
> b/gcc/config/riscv/riscv-protos.h
> index 0ae4656befb..159b51a1210 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -172,8 +172,8 @@ bool const_vec_all_same_in_range_p (rtx,
> HOST_WIDE_INT, HOST_WIDE_INT);
>  bool legitimize_move (rtx, rtx);
>  void emit_vlmax_vsetvl (machine_mode, rtx);
>  void emit_hard_vlmax_vsetvl (machine_mode, rtx);
> -void emit_vlmax_insn (unsigned, int, rtx *);
> -void emit_nonvlmax_insn (unsigned, int, rtx *);
> +void emit_vlmax_insn (unsigned, int, rtx *, rtx = 0);
> +void emit_nonvlmax_insn (unsigned, int, rtx *, rtx);
>  enum vlmul_type get_vlmul (machine_mode);
>  unsigned int get_ratio (machine_mode);
>  unsigned int get_nf (machine_mode);
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index fa61a850a22..1cdc4a99701 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -71,7 +71,8 @@ public:
>m_fully_unmasked_p (false), m_use_real_merge_p (false),
>m_needs_avl_p (false), m_vlmax_p (false), m_has_tail_policy_p
> (false),
>m_has_mask_policy_p (false), m_tail_policy (TAIL_ANY),
> -  m_mask_policy (MASK_ANY), m_dest_mode (VOIDmode), m_mask_mode
> (VOIDmode)
> +  m_mask_policy (MASK_ANY), m_dest_mode (VOIDmode), m_mask_mode
> (VOIDmode),
> +  m_vl_op (NULL_RTX)
>{}
>
>/* Initializer for various configurations.  */
> @@ -83,7 +84,8 @@ public:
>m_use_real_merge_p (use_real_merge_p), m_needs_avl_p (needs_avl_p),
>m_vlmax_p (vlmax_p), m_has_tail_policy_p (false),
>m_has_mask_policy_p (false), m_tail_policy (TAIL_ANY),
> -  m_mask_policy (MASK_ANY), m_dest_mode (dest_mode), m_mask_mode
> (mask_mode)
> +  m_mask_policy (MASK_ANY), m_dest_mode (dest_mode),
> +  m_mask_mode (mask_mode), m_vl_op (NULL_RTX)
>{}
>
>void set_policy (enum tail_policy ta)
> @@ -96,6 +98,7 @@ public:
>  m_has_mask_policy_p = true;
>  m_mask_policy = ma;
>}
> +  void set_vl (rtx vl) { m_vl_op = vl; }
>
>void add_output_operand (rtx x, machine_mode mode)
>{
> @@ -169,7 +172,7 @@ public:
>
>  if (m_needs_avl_p)
>{
> -   rtx len = ops[m_op_num];
> +   rtx len = m_vl_op;
> if (m_vlmax_p)
>   {
> if (const_vlmax_p (m_dest_mode))
> @@ -228,6 +231,7 @@ private:
>enum mask_policy m_mask_policy;
>machine_mode m_dest_mode;
>machine_mode m_mask_mode;
> +  rtx m_vl_op;
>expand_operand m_ops[MAX_OPERANDS];
>  };
>
> @@ -339,7 +343,7 @@ autovec_use_vlmax_p (void)
>  /* This function emits a {VLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by
> the
>   * actual operation.  */
>  void
> -emit_vlmax_insn (unsigned icode, int op_num, rtx *ops)
> +emit_vlmax_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
>  {
>machine_mode data_mode = GET_MODE (ops[0]);
>machine_mode mask_mode = get_mask_mode (data_mode).require ();
> @@ -352,13 +356,16 @@ emit_vlmax_insn (unsigned icode, int op_num, rtx
> *ops)
>/*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
>e.set_policy (TAIL_ANY);
>e.set_policy (MASK_ANY);
> +  /* According to LRA mov pattern in vector.md, we have a clobber operand
> + to be used ad VL operand.  */
> +  e.set_vl (vl);
>e.emit_insn ((enum insn_code) icode, 

Re: Re: [PATCH] RISC-V: Fix incorrect code of touching inaccessible memory address

2023-05-23 Thread juzhe.zh...@rivai.ai
Thanks. I fix it by separating VL and normal operand.
V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619356.html 

Does it look more reasonable to you?
Just finished the building test && regression.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-24 10:10
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Fix incorrect code of touching inaccessible memory 
address
I am a little hesitant about that, since I feel the vl and normal op
should be put in separately, otherwise the means of m_op_num is kind
of unclear, we have comments there but I think it's not ideal since it
is really context sensitive and hard to determine.
 
And I suspect gcc_assert (ops[m_op_num]); is not too useful since it
might just be out of range access if we forgot to pass the vl
operands.
 
I am thinking we might need to introduce something like llvm::ArrayRef
to have a better sanity check, e.g. check the length of ops.
One possible solution is just using std::vector can achieve the same
purpose too, but come with more cost.
 
 
On Wed, May 24, 2023 at 9:57 AM  wrote:
>
> From: Juzhe-Zhong 
>
> For VLMAX situation, rtx len = ops[m_op_num] is incorrect since
> the last element the ops array should be ops[m_op_num - 1];
>
> I notice this issue when I am debugging code.
> This is a code bug even though the following codes will hide this issue.
> We still should need this minor fix.
>
> Built && Regression PASSed.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-v.cc: Fix bug of touching inaccessible memory.
>
> ---
>  gcc/config/riscv/riscv-v.cc | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index fa61a850a22..a0992773644 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -169,7 +169,11 @@ public:
>
>  if (m_needs_avl_p)
>{
> -   rtx len = ops[m_op_num];
> +   /* The variable "m_op_num" means the real operation operands except VL
> +  operand. For VLMAX patterns (no VL operand), the last operand is
> +  ops[m_op_num -1]. Wheras for non-VLMAX patterns, the last operand 
> is
> +  VL operand which is ops[m_op_num].  */
> +   rtx len = NULL_RTX;
> if (m_vlmax_p)
>   {
> if (const_vlmax_p (m_dest_mode))
> @@ -185,6 +189,20 @@ public:
> len = gen_reg_rtx (Pmode);
> emit_vlmax_vsetvl (m_dest_mode, len);
>   }
> +   else
> + {
> +   /* According to LRA mov pattern in vector.md. The VL operand 
> is
> +  always the last operand.  */
> +   gcc_assert (ops[m_op_num]);
> +   len = ops[m_op_num];
> + }
> + }
> +   else
> + {
> +   /* For non-VLMAX patterns. The VL operand is always the last
> +* operand.  */
> +   gcc_assert (ops[m_op_num]);
> +   len = ops[m_op_num];
>   }
> add_input_operand (len, Pmode);
>}
> --
> 2.36.3
>
 


[PATCH V2] RISC-V: Fix incorrect code of reaching inaccessible memory address

2023-05-23 Thread juzhe . zhong
From: Juzhe-Zhong 

To fix this issue, we seperate Vl operand and normal operands.

gcc/ChangeLog:

* config/riscv/autovec.md: Adjust for new interface.
* config/riscv/riscv-protos.h (emit_vlmax_insn): Add VL operand.
(emit_nonvlmax_insn): Add AVL operand.
* config/riscv/riscv-v.cc (emit_vlmax_insn): Add VL operand.
(emit_nonvlmax_insn): Add AVL operand.
(sew64_scalar_helper): Adjust for new interface.
(expand_tuple_move): Ditto.
* config/riscv/vector.md: Ditto.

---
 gcc/config/riscv/autovec.md |  4 ++--
 gcc/config/riscv/riscv-protos.h |  4 ++--
 gcc/config/riscv/riscv-v.cc | 30 +++---
 gcc/config/riscv/vector.md  |  4 ++--
 4 files changed, 25 insertions(+), 17 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 04b4459222a..7c87b6012f6 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -32,7 +32,7 @@
   "TARGET_VECTOR"
 {
   riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
-   riscv_vector::RVV_UNOP, operands);
+   riscv_vector::RVV_UNOP, operands, 
operands[2]);
   DONE;
 })
 
@@ -44,7 +44,7 @@
   "TARGET_VECTOR"
 {
   riscv_vector::emit_nonvlmax_insn (code_for_pred_mov (mode),
-   riscv_vector::RVV_UNOP, operands);
+   riscv_vector::RVV_UNOP, operands, 
operands[2]);
   DONE;
 })
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 0ae4656befb..159b51a1210 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -172,8 +172,8 @@ bool const_vec_all_same_in_range_p (rtx, HOST_WIDE_INT, 
HOST_WIDE_INT);
 bool legitimize_move (rtx, rtx);
 void emit_vlmax_vsetvl (machine_mode, rtx);
 void emit_hard_vlmax_vsetvl (machine_mode, rtx);
-void emit_vlmax_insn (unsigned, int, rtx *);
-void emit_nonvlmax_insn (unsigned, int, rtx *);
+void emit_vlmax_insn (unsigned, int, rtx *, rtx = 0);
+void emit_nonvlmax_insn (unsigned, int, rtx *, rtx);
 enum vlmul_type get_vlmul (machine_mode);
 unsigned int get_ratio (machine_mode);
 unsigned int get_nf (machine_mode);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index fa61a850a22..1cdc4a99701 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -71,7 +71,8 @@ public:
   m_fully_unmasked_p (false), m_use_real_merge_p (false),
   m_needs_avl_p (false), m_vlmax_p (false), m_has_tail_policy_p (false),
   m_has_mask_policy_p (false), m_tail_policy (TAIL_ANY),
-  m_mask_policy (MASK_ANY), m_dest_mode (VOIDmode), m_mask_mode (VOIDmode)
+  m_mask_policy (MASK_ANY), m_dest_mode (VOIDmode), m_mask_mode (VOIDmode),
+  m_vl_op (NULL_RTX)
   {}
 
   /* Initializer for various configurations.  */
@@ -83,7 +84,8 @@ public:
   m_use_real_merge_p (use_real_merge_p), m_needs_avl_p (needs_avl_p),
   m_vlmax_p (vlmax_p), m_has_tail_policy_p (false),
   m_has_mask_policy_p (false), m_tail_policy (TAIL_ANY),
-  m_mask_policy (MASK_ANY), m_dest_mode (dest_mode), m_mask_mode 
(mask_mode)
+  m_mask_policy (MASK_ANY), m_dest_mode (dest_mode),
+  m_mask_mode (mask_mode), m_vl_op (NULL_RTX)
   {}
 
   void set_policy (enum tail_policy ta)
@@ -96,6 +98,7 @@ public:
 m_has_mask_policy_p = true;
 m_mask_policy = ma;
   }
+  void set_vl (rtx vl) { m_vl_op = vl; }
 
   void add_output_operand (rtx x, machine_mode mode)
   {
@@ -169,7 +172,7 @@ public:
 
 if (m_needs_avl_p)
   {
-   rtx len = ops[m_op_num];
+   rtx len = m_vl_op;
if (m_vlmax_p)
  {
if (const_vlmax_p (m_dest_mode))
@@ -228,6 +231,7 @@ private:
   enum mask_policy m_mask_policy;
   machine_mode m_dest_mode;
   machine_mode m_mask_mode;
+  rtx m_vl_op;
   expand_operand m_ops[MAX_OPERANDS];
 };
 
@@ -339,7 +343,7 @@ autovec_use_vlmax_p (void)
 /* This function emits a {VLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
  * actual operation.  */
 void
-emit_vlmax_insn (unsigned icode, int op_num, rtx *ops)
+emit_vlmax_insn (unsigned icode, int op_num, rtx *ops, rtx vl)
 {
   machine_mode data_mode = GET_MODE (ops[0]);
   machine_mode mask_mode = get_mask_mode (data_mode).require ();
@@ -352,13 +356,16 @@ emit_vlmax_insn (unsigned icode, int op_num, rtx *ops)
   /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
   e.set_policy (TAIL_ANY);
   e.set_policy (MASK_ANY);
+  /* According to LRA mov pattern in vector.md, we have a clobber operand
+ to be used ad VL operand.  */
+  e.set_vl (vl);
   e.emit_insn ((enum insn_code) icode, ops);
 }
 
 /* This function emits a {NONVLMAX, TAIL_ANY, MASK_ANY} vsetvli followed by the
  * actual operation.  */
 void
-emit_nonvlmax_insn (unsigned icode, int op_num, rtx *ops)
+emit_nonvlmax_insn (unsigned icode, int op_num, rtx *ops, rtx avl)
 {
   machine_mode data_mode = 

RE: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization expander

2023-05-23 Thread Li, Pan2 via Gcc-patches
Committed, thanks Palmer.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Palmer Dabbelt
Sent: Wednesday, May 24, 2023 9:37 AM
To: juzhe.zh...@rivai.ai
Cc: gcc-patches@gcc.gnu.org; Kito Cheng ; 
kito.ch...@sifive.com; jeffreya...@gmail.com; rdapp@gmail.com
Subject: Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
expander

On Tue, 23 May 2023 18:34:00 PDT (-0700), juzhe.zh...@rivai.ai wrote:
> Yeah. Can I merge it?

You built it?  Then I'm fine with merging it.

> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Palmer Dabbelt
> Date: 2023-05-24 09:32
> To: juzhe.zhong
> CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc; 
> juzhe.zhong
> Subject: Re: [PATCH V2] RISC-V: Fix magic number of RVV 
> auto-vectorization expander On Tue, 23 May 2023 18:28:48 PDT (-0700), 
> juzhe.zh...@rivai.ai wrote:
>> From: Juzhe-Zhong 
>>
>> This simple patch fixes the magic number, remove magic number make codes 
>> more reasonable.
>>
>> Ok for trunk ?
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-v.cc (expand_vec_series): Remove magic number.
>> (expand_const_vector): Ditto.
>> (legitimize_move): Ditto.
>> (sew64_scalar_helper): Ditto.
>> (expand_tuple_move): Ditto.
>> (expand_vector_init_insert_elems): Ditto.
>> * config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.
>>
>> ---
>>  gcc/config/riscv/riscv-v.cc | 53 +
>>  gcc/config/riscv/riscv.cc   |  2 +-
>>  2 files changed, 26 insertions(+), 29 deletions(-)
>>
>> diff --git a/gcc/config/riscv/riscv-v.cc 
>> b/gcc/config/riscv/riscv-v.cc index 478a052a779..fa61a850a22 100644
>> --- a/gcc/config/riscv/riscv-v.cc
>> +++ b/gcc/config/riscv/riscv-v.cc
>> @@ -406,14 +406,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>>int shift = exact_log2 (INTVAL (step));
>>rtx shift_amount = gen_int_mode (shift, Pmode);
>>insn_code icode = code_for_pred_scalar (ASHIFT, mode);
>> -   rtx ops[3] = {step_adj, vid, shift_amount};
>> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +   rtx ops[] = {step_adj, vid, shift_amount};
>> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
>  
> Looks like it also removes the "riscv_vector" namespace from some of 
> the constants?  No big deal, it's just a different cleanup (assuming 
> it still builds and such).
>  
>>  }
>>else
>>  {
>>insn_code icode = code_for_pred_scalar (MULT, mode);
>> -   rtx ops[3] = {step_adj, vid, step};
>> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +   rtx ops[] = {step_adj, vid, step};
>> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
>>  }
>>  }
>>
>> @@ -428,8 +428,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>>  {
>>rtx result = gen_reg_rtx (mode);
>>insn_code icode = code_for_pred_scalar (PLUS, mode);
>> -  rtx ops[3] = {result, step_adj, base};
>> -  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +  rtx ops[] = {result, step_adj, base};
>> +  emit_vlmax_insn (icode, RVV_BINOP, ops);
>>emit_move_insn (dest, result);
>>  }
>>  }
>> @@ -445,8 +445,8 @@ expand_const_vector (rtx target, rtx src)
>>gcc_assert (
>>  const_vec_duplicate_p (src, )
>>  && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
>> -  rtx ops[2] = {target, src};
>> -  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, 
>> ops);
>> +  rtx ops[] = {target, src};
>> +  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>return;
>>  }
>>
>> @@ -458,16 +458,14 @@ expand_const_vector (rtx target, rtx src)  we 
>> use vmv.v.i instruction.  */
>>if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 
>> (src))  {
>> -   rtx ops[2] = {tmp, src};
>> -   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
>> -ops);
>> +   rtx ops[] = {tmp, src};
>> +   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>  }
>>else
>>  {
>>elt = force_reg (elt_mode, elt);
>> -   rtx ops[2] = {tmp, elt};
>> -   emit_vlmax_insn (code_for_pred_broadcast (mode),
>> -riscv_vector::RVV_UNOP, ops);
>> +   rtx ops[] = {tmp, elt};
>> +   emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
>>  }
>>
>>if (tmp != target)
>> @@ -536,9 +534,8 @@ legitimize_move (rtx dest, rtx src)
>>rtx tmp = gen_reg_rtx (mode);
>>if (MEM_P (src))
>>  {
>> -   rtx ops[2] = {tmp, src};
>> -   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
>> -ops);
>> +   rtx ops[] = {tmp, src};
>> +   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>  }
>>else
>>  emit_move_insn (tmp, src);
>> @@ -548,8 +545,8 @@ legitimize_move (rtx dest, rtx src)
>>if (satisfies_constraint_vu (src))
>>  return false;
>>
>> -  rtx ops[2] = {dest, src};
>> -  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, 
>> ops);
>> +  rtx ops[] 

Re: [PATCH] RISC-V: Fix incorrect code of touching inaccessible memory address

2023-05-23 Thread Kito Cheng via Gcc-patches
I am a little hesitant about that, since I feel the vl and normal op
should be put in separately, otherwise the means of m_op_num is kind
of unclear, we have comments there but I think it's not ideal since it
is really context sensitive and hard to determine.

And I suspect gcc_assert (ops[m_op_num]); is not too useful since it
might just be out of range access if we forgot to pass the vl
operands.

I am thinking we might need to introduce something like llvm::ArrayRef
to have a better sanity check, e.g. check the length of ops.
One possible solution is just using std::vector can achieve the same
purpose too, but come with more cost.


On Wed, May 24, 2023 at 9:57 AM  wrote:
>
> From: Juzhe-Zhong 
>
> For VLMAX situation, rtx len = ops[m_op_num] is incorrect since
> the last element the ops array should be ops[m_op_num - 1];
>
> I notice this issue when I am debugging code.
> This is a code bug even though the following codes will hide this issue.
> We still should need this minor fix.
>
> Built && Regression PASSed.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-v.cc: Fix bug of touching inaccessible memory.
>
> ---
>  gcc/config/riscv/riscv-v.cc | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index fa61a850a22..a0992773644 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -169,7 +169,11 @@ public:
>
>  if (m_needs_avl_p)
>{
> -   rtx len = ops[m_op_num];
> +   /* The variable "m_op_num" means the real operation operands except VL
> +  operand. For VLMAX patterns (no VL operand), the last operand is
> +  ops[m_op_num -1]. Wheras for non-VLMAX patterns, the last operand 
> is
> +  VL operand which is ops[m_op_num].  */
> +   rtx len = NULL_RTX;
> if (m_vlmax_p)
>   {
> if (const_vlmax_p (m_dest_mode))
> @@ -185,6 +189,20 @@ public:
> len = gen_reg_rtx (Pmode);
> emit_vlmax_vsetvl (m_dest_mode, len);
>   }
> +   else
> + {
> +   /* According to LRA mov pattern in vector.md. The VL operand 
> is
> +  always the last operand.  */
> +   gcc_assert (ops[m_op_num]);
> +   len = ops[m_op_num];
> + }
> + }
> +   else
> + {
> +   /* For non-VLMAX patterns. The VL operand is always the last
> +* operand.  */
> +   gcc_assert (ops[m_op_num]);
> +   len = ops[m_op_num];
>   }
> add_input_operand (len, Pmode);
>}
> --
> 2.36.3
>


Re: [PATCH] MIPS: don't expand large block move

2023-05-23 Thread YunQiang Su
Maciej W. Rozycki  于2023年5月20日周六 03:21写道:
>
> On Fri, 19 May 2023, Jeff Law wrote:
>
> > > diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
> > > index ca491b981a3..00f26d5e923 100644
> > > --- a/gcc/config/mips/mips.cc
> > > +++ b/gcc/config/mips/mips.cc
> > > @@ -8313,6 +8313,12 @@ mips_expand_block_move (rtx dest, rtx src, rtx
> > > length)
> > > }
> > > else if (optimize)
> > > {
> > > + /* When the length is big enough, the lib call has better performace
> > > +than load/store insns.
> > > +In most platform, the value is about 64-128.
> > > +And in fact lib call may be optimized with SIMD */
> > > + if (INTVAL(length) >= 64)
> > > +   return false;
> > Just a formatting nit.  Space between INTVAL and the open paren for its
> > argument list.
>
>  This is oddly wrapped too.  I'd move "performace" (typo there!) to the
> second line, to align better with the rest of the text.
>
>  Plus s/platform/platforms/ and there's a full stop missing along with two
> spaces at the end.  Also there's inconsistent style around <= and >=; the
> GNU Coding Standards ask for spaces around binary operators.  And "don't"
> in the change heading ought to be capitalised.
>
>  In fact, I'd justify the whole paragraph as each sentence doesn't have to
> start on a new line, and the commit description could benefit from some
> reformatting too, as it's now odd to read.
>

Thank you. I will fix these problems.

> > OK with that change.
>
>  I think the conditional would be better readable if it was flattened
> though:
>
>   if (INTVAL (length) <= MIPS_MAX_MOVE_BYTES_STRAIGHT)
> ...
>   else if (INTVAL (length) >= 64)
> ...
>   else if (optimize)
> ...
>

This sounds good.

> or even:
>
>   if (INTVAL (length) <= MIPS_MAX_MOVE_BYTES_STRAIGHT)
> ...
>   else if (INTVAL (length) < 64 && optimize)
> ...
>

I don't think this is a good option, since somebody may add some code,
and may break our logic.

> One just wouldn't write it as proposed if creating the whole piece from
> scratch rather than retrofitting this extra conditional.
>
>  Ultimately it may have to be tunable as LWL/LWR, etc. may be subject to
> fusion and may be faster after all.
>

oohhh, you are right.
And in fact this patch has some problems:
If the data is aligned, the value is about 1024, instead of 64-128.

>   Maciej


[PATCH] RISC-V: Fix incorrect code of touching inaccessible memory address

2023-05-23 Thread juzhe . zhong
From: Juzhe-Zhong 

For VLMAX situation, rtx len = ops[m_op_num] is incorrect since
the last element the ops array should be ops[m_op_num - 1];

I notice this issue when I am debugging code.
This is a code bug even though the following codes will hide this issue.
We still should need this minor fix.

Built && Regression PASSed.

Ok for trunk?
  
gcc/ChangeLog:

* config/riscv/riscv-v.cc: Fix bug of touching inaccessible memory.

---
 gcc/config/riscv/riscv-v.cc | 20 +++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index fa61a850a22..a0992773644 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -169,7 +169,11 @@ public:
 
 if (m_needs_avl_p)
   {
-   rtx len = ops[m_op_num];
+   /* The variable "m_op_num" means the real operation operands except VL
+  operand. For VLMAX patterns (no VL operand), the last operand is
+  ops[m_op_num -1]. Wheras for non-VLMAX patterns, the last operand is
+  VL operand which is ops[m_op_num].  */
+   rtx len = NULL_RTX;
if (m_vlmax_p)
  {
if (const_vlmax_p (m_dest_mode))
@@ -185,6 +189,20 @@ public:
len = gen_reg_rtx (Pmode);
emit_vlmax_vsetvl (m_dest_mode, len);
  }
+   else
+ {
+   /* According to LRA mov pattern in vector.md. The VL operand is
+  always the last operand.  */
+   gcc_assert (ops[m_op_num]);
+   len = ops[m_op_num];
+ }
+ }
+   else
+ {
+   /* For non-VLMAX patterns. The VL operand is always the last
+* operand.  */
+   gcc_assert (ops[m_op_num]);
+   len = ops[m_op_num];
  }
add_input_operand (len, Pmode);
   }
-- 
2.36.3



Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization expander

2023-05-23 Thread Palmer Dabbelt

On Tue, 23 May 2023 18:37:39 PDT (-0700), juzhe.zh...@rivai.ai wrote:

Yes, I built it and regression has passed.


OK, thanks!





juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt

Date: 2023-05-24 09:37
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
expander
On Tue, 23 May 2023 18:34:00 PDT (-0700), juzhe.zh...@rivai.ai wrote:

Yeah. Can I merge it?
 
You built it?  Then I'm fine with merging it.
 




juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt

Date: 2023-05-24 09:32
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc; juzhe.zhong
Subject: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
expander
On Tue, 23 May 2023 18:28:48 PDT (-0700), juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

This simple patch fixes the magic number, remove magic number make codes more 
reasonable.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_series): Remove magic number.
(expand_const_vector): Ditto.
(legitimize_move): Ditto.
(sew64_scalar_helper): Ditto.
(expand_tuple_move): Ditto.
(expand_vector_init_insert_elems): Ditto.
* config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 53 +
 gcc/config/riscv/riscv.cc   |  2 +-
 2 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 478a052a779..fa61a850a22 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -406,14 +406,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
   int shift = exact_log2 (INTVAL (step));
   rtx shift_amount = gen_int_mode (shift, Pmode);
   insn_code icode = code_for_pred_scalar (ASHIFT, mode);
-   rtx ops[3] = {step_adj, vid, shift_amount};
-   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+   rtx ops[] = {step_adj, vid, shift_amount};
+   emit_vlmax_insn (icode, RVV_BINOP, ops);
 
Looks like it also removes the "riscv_vector" namespace from some of the 
constants?  No big deal, it's just a different cleanup (assuming it 
still builds and such).
 

 }
   else
 {
   insn_code icode = code_for_pred_scalar (MULT, mode);
-   rtx ops[3] = {step_adj, vid, step};
-   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+   rtx ops[] = {step_adj, vid, step};
+   emit_vlmax_insn (icode, RVV_BINOP, ops);
 }
 }

@@ -428,8 +428,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
 {
   rtx result = gen_reg_rtx (mode);
   insn_code icode = code_for_pred_scalar (PLUS, mode);
-  rtx ops[3] = {result, step_adj, base};
-  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+  rtx ops[] = {result, step_adj, base};
+  emit_vlmax_insn (icode, RVV_BINOP, ops);
   emit_move_insn (dest, result);
 }
 }
@@ -445,8 +445,8 @@ expand_const_vector (rtx target, rtx src)
   gcc_assert (
 const_vec_duplicate_p (src, )
 && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
-  rtx ops[2] = {target, src};
-  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
+  rtx ops[] = {target, src};
+  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
   return;
 }

@@ -458,16 +458,14 @@ expand_const_vector (rtx target, rtx src)
 we use vmv.v.i instruction.  */
   if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
 {
-   rtx ops[2] = {tmp, src};
-   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
-ops);
+   rtx ops[] = {tmp, src};
+   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
 }
   else
 {
   elt = force_reg (elt_mode, elt);
-   rtx ops[2] = {tmp, elt};
-   emit_vlmax_insn (code_for_pred_broadcast (mode),
-riscv_vector::RVV_UNOP, ops);
+   rtx ops[] = {tmp, elt};
+   emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
 }

   if (tmp != target)
@@ -536,9 +534,8 @@ legitimize_move (rtx dest, rtx src)
   rtx tmp = gen_reg_rtx (mode);
   if (MEM_P (src))
 {
-   rtx ops[2] = {tmp, src};
-   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
-ops);
+   rtx ops[] = {tmp, src};
+   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
 }
   else
 emit_move_insn (tmp, src);
@@ -548,8 +545,8 @@ legitimize_move (rtx dest, rtx src)
   if (satisfies_constraint_vu (src))
 return false;

-  rtx ops[2] = {dest, src};
-  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
+  rtx ops[] = {dest, src};
+  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
   return true;
 }

@@ -813,7 +810,7 @@ sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx vl,
 *scalar_op = force_reg (scalar_mode, *scalar_op);

   rtx tmp = gen_reg_rtx (vector_mode);
-  rtx ops[3] = {tmp, *scalar_op, vl};
+  rtx ops[] = {tmp, *scalar_op, vl};
   

Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization expander

2023-05-23 Thread juzhe.zh...@rivai.ai
I always finished build up && regression testsuite before I posted the patches.



juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-05-24 09:37
To: palmer
CC: gcc-patches; kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
expander
Yes, I built it and regression has passed.



juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt
Date: 2023-05-24 09:37
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
expander
On Tue, 23 May 2023 18:34:00 PDT (-0700), juzhe.zh...@rivai.ai wrote:
> Yeah. Can I merge it?
 
You built it?  Then I'm fine with merging it.
 
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Palmer Dabbelt
> Date: 2023-05-24 09:32
> To: juzhe.zhong
> CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc; juzhe.zhong
> Subject: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
> expander
> On Tue, 23 May 2023 18:28:48 PDT (-0700), juzhe.zh...@rivai.ai wrote:
>> From: Juzhe-Zhong 
>>
>> This simple patch fixes the magic number, remove magic number make codes 
>> more reasonable.
>>
>> Ok for trunk ?
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-v.cc (expand_vec_series): Remove magic number.
>> (expand_const_vector): Ditto.
>> (legitimize_move): Ditto.
>> (sew64_scalar_helper): Ditto.
>> (expand_tuple_move): Ditto.
>> (expand_vector_init_insert_elems): Ditto.
>> * config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.
>>
>> ---
>>  gcc/config/riscv/riscv-v.cc | 53 +
>>  gcc/config/riscv/riscv.cc   |  2 +-
>>  2 files changed, 26 insertions(+), 29 deletions(-)
>>
>> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
>> index 478a052a779..fa61a850a22 100644
>> --- a/gcc/config/riscv/riscv-v.cc
>> +++ b/gcc/config/riscv/riscv-v.cc
>> @@ -406,14 +406,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>>int shift = exact_log2 (INTVAL (step));
>>rtx shift_amount = gen_int_mode (shift, Pmode);
>>insn_code icode = code_for_pred_scalar (ASHIFT, mode);
>> -   rtx ops[3] = {step_adj, vid, shift_amount};
>> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +   rtx ops[] = {step_adj, vid, shift_amount};
>> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
>  
> Looks like it also removes the "riscv_vector" namespace from some of the 
> constants?  No big deal, it's just a different cleanup (assuming it 
> still builds and such).
>  
>>  }
>>else
>>  {
>>insn_code icode = code_for_pred_scalar (MULT, mode);
>> -   rtx ops[3] = {step_adj, vid, step};
>> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +   rtx ops[] = {step_adj, vid, step};
>> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
>>  }
>>  }
>>
>> @@ -428,8 +428,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>>  {
>>rtx result = gen_reg_rtx (mode);
>>insn_code icode = code_for_pred_scalar (PLUS, mode);
>> -  rtx ops[3] = {result, step_adj, base};
>> -  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +  rtx ops[] = {result, step_adj, base};
>> +  emit_vlmax_insn (icode, RVV_BINOP, ops);
>>emit_move_insn (dest, result);
>>  }
>>  }
>> @@ -445,8 +445,8 @@ expand_const_vector (rtx target, rtx src)
>>gcc_assert (
>>  const_vec_duplicate_p (src, )
>>  && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
>> -  rtx ops[2] = {target, src};
>> -  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, 
>> ops);
>> +  rtx ops[] = {target, src};
>> +  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>return;
>>  }
>>
>> @@ -458,16 +458,14 @@ expand_const_vector (rtx target, rtx src)
>>  we use vmv.v.i instruction.  */
>>if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
>>  {
>> -   rtx ops[2] = {tmp, src};
>> -   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
>> -ops);
>> +   rtx ops[] = {tmp, src};
>> +   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>  }
>>else
>>  {
>>elt = force_reg (elt_mode, elt);
>> -   rtx ops[2] = {tmp, elt};
>> -   emit_vlmax_insn (code_for_pred_broadcast (mode),
>> -riscv_vector::RVV_UNOP, ops);
>> +   rtx ops[] = {tmp, elt};
>> +   emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
>>  }
>>
>>if (tmp != target)
>> @@ -536,9 +534,8 @@ legitimize_move (rtx dest, rtx src)
>>rtx tmp = gen_reg_rtx (mode);
>>if (MEM_P (src))
>>  {
>> -   rtx ops[2] = {tmp, src};
>> -   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
>> -ops);
>> +   rtx ops[] = {tmp, src};
>> +   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>  }
>>else
>>  emit_move_insn (tmp, src);
>> @@ -548,8 +545,8 @@ 

Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization expander

2023-05-23 Thread juzhe.zh...@rivai.ai
Yes, I built it and regression has passed.



juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt
Date: 2023-05-24 09:37
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
expander
On Tue, 23 May 2023 18:34:00 PDT (-0700), juzhe.zh...@rivai.ai wrote:
> Yeah. Can I merge it?
 
You built it?  Then I'm fine with merging it.
 
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Palmer Dabbelt
> Date: 2023-05-24 09:32
> To: juzhe.zhong
> CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc; juzhe.zhong
> Subject: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
> expander
> On Tue, 23 May 2023 18:28:48 PDT (-0700), juzhe.zh...@rivai.ai wrote:
>> From: Juzhe-Zhong 
>>
>> This simple patch fixes the magic number, remove magic number make codes 
>> more reasonable.
>>
>> Ok for trunk ?
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-v.cc (expand_vec_series): Remove magic number.
>> (expand_const_vector): Ditto.
>> (legitimize_move): Ditto.
>> (sew64_scalar_helper): Ditto.
>> (expand_tuple_move): Ditto.
>> (expand_vector_init_insert_elems): Ditto.
>> * config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.
>>
>> ---
>>  gcc/config/riscv/riscv-v.cc | 53 +
>>  gcc/config/riscv/riscv.cc   |  2 +-
>>  2 files changed, 26 insertions(+), 29 deletions(-)
>>
>> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
>> index 478a052a779..fa61a850a22 100644
>> --- a/gcc/config/riscv/riscv-v.cc
>> +++ b/gcc/config/riscv/riscv-v.cc
>> @@ -406,14 +406,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>>int shift = exact_log2 (INTVAL (step));
>>rtx shift_amount = gen_int_mode (shift, Pmode);
>>insn_code icode = code_for_pred_scalar (ASHIFT, mode);
>> -   rtx ops[3] = {step_adj, vid, shift_amount};
>> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +   rtx ops[] = {step_adj, vid, shift_amount};
>> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
>  
> Looks like it also removes the "riscv_vector" namespace from some of the 
> constants?  No big deal, it's just a different cleanup (assuming it 
> still builds and such).
>  
>>  }
>>else
>>  {
>>insn_code icode = code_for_pred_scalar (MULT, mode);
>> -   rtx ops[3] = {step_adj, vid, step};
>> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +   rtx ops[] = {step_adj, vid, step};
>> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
>>  }
>>  }
>>
>> @@ -428,8 +428,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>>  {
>>rtx result = gen_reg_rtx (mode);
>>insn_code icode = code_for_pred_scalar (PLUS, mode);
>> -  rtx ops[3] = {result, step_adj, base};
>> -  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +  rtx ops[] = {result, step_adj, base};
>> +  emit_vlmax_insn (icode, RVV_BINOP, ops);
>>emit_move_insn (dest, result);
>>  }
>>  }
>> @@ -445,8 +445,8 @@ expand_const_vector (rtx target, rtx src)
>>gcc_assert (
>>  const_vec_duplicate_p (src, )
>>  && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
>> -  rtx ops[2] = {target, src};
>> -  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, 
>> ops);
>> +  rtx ops[] = {target, src};
>> +  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>return;
>>  }
>>
>> @@ -458,16 +458,14 @@ expand_const_vector (rtx target, rtx src)
>>  we use vmv.v.i instruction.  */
>>if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
>>  {
>> -   rtx ops[2] = {tmp, src};
>> -   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
>> -ops);
>> +   rtx ops[] = {tmp, src};
>> +   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>  }
>>else
>>  {
>>elt = force_reg (elt_mode, elt);
>> -   rtx ops[2] = {tmp, elt};
>> -   emit_vlmax_insn (code_for_pred_broadcast (mode),
>> -riscv_vector::RVV_UNOP, ops);
>> +   rtx ops[] = {tmp, elt};
>> +   emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
>>  }
>>
>>if (tmp != target)
>> @@ -536,9 +534,8 @@ legitimize_move (rtx dest, rtx src)
>>rtx tmp = gen_reg_rtx (mode);
>>if (MEM_P (src))
>>  {
>> -   rtx ops[2] = {tmp, src};
>> -   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
>> -ops);
>> +   rtx ops[] = {tmp, src};
>> +   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>  }
>>else
>>  emit_move_insn (tmp, src);
>> @@ -548,8 +545,8 @@ legitimize_move (rtx dest, rtx src)
>>if (satisfies_constraint_vu (src))
>>  return false;
>>
>> -  rtx ops[2] = {dest, src};
>> -  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
>> +  rtx ops[] = {dest, src};
>> +  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>return 

Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization expander

2023-05-23 Thread Palmer Dabbelt

On Tue, 23 May 2023 18:34:00 PDT (-0700), juzhe.zh...@rivai.ai wrote:

Yeah. Can I merge it?


You built it?  Then I'm fine with merging it.





juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt

Date: 2023-05-24 09:32
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc; juzhe.zhong
Subject: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
expander
On Tue, 23 May 2023 18:28:48 PDT (-0700), juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

This simple patch fixes the magic number, remove magic number make codes more 
reasonable.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_series): Remove magic number.
(expand_const_vector): Ditto.
(legitimize_move): Ditto.
(sew64_scalar_helper): Ditto.
(expand_tuple_move): Ditto.
(expand_vector_init_insert_elems): Ditto.
* config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 53 +
 gcc/config/riscv/riscv.cc   |  2 +-
 2 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 478a052a779..fa61a850a22 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -406,14 +406,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
   int shift = exact_log2 (INTVAL (step));
   rtx shift_amount = gen_int_mode (shift, Pmode);
   insn_code icode = code_for_pred_scalar (ASHIFT, mode);
-   rtx ops[3] = {step_adj, vid, shift_amount};
-   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+   rtx ops[] = {step_adj, vid, shift_amount};
+   emit_vlmax_insn (icode, RVV_BINOP, ops);
 
Looks like it also removes the "riscv_vector" namespace from some of the 
constants?  No big deal, it's just a different cleanup (assuming it 
still builds and such).
 

 }
   else
 {
   insn_code icode = code_for_pred_scalar (MULT, mode);
-   rtx ops[3] = {step_adj, vid, step};
-   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+   rtx ops[] = {step_adj, vid, step};
+   emit_vlmax_insn (icode, RVV_BINOP, ops);
 }
 }

@@ -428,8 +428,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
 {
   rtx result = gen_reg_rtx (mode);
   insn_code icode = code_for_pred_scalar (PLUS, mode);
-  rtx ops[3] = {result, step_adj, base};
-  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+  rtx ops[] = {result, step_adj, base};
+  emit_vlmax_insn (icode, RVV_BINOP, ops);
   emit_move_insn (dest, result);
 }
 }
@@ -445,8 +445,8 @@ expand_const_vector (rtx target, rtx src)
   gcc_assert (
 const_vec_duplicate_p (src, )
 && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
-  rtx ops[2] = {target, src};
-  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
+  rtx ops[] = {target, src};
+  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
   return;
 }

@@ -458,16 +458,14 @@ expand_const_vector (rtx target, rtx src)
 we use vmv.v.i instruction.  */
   if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
 {
-   rtx ops[2] = {tmp, src};
-   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
-ops);
+   rtx ops[] = {tmp, src};
+   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
 }
   else
 {
   elt = force_reg (elt_mode, elt);
-   rtx ops[2] = {tmp, elt};
-   emit_vlmax_insn (code_for_pred_broadcast (mode),
-riscv_vector::RVV_UNOP, ops);
+   rtx ops[] = {tmp, elt};
+   emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
 }

   if (tmp != target)
@@ -536,9 +534,8 @@ legitimize_move (rtx dest, rtx src)
   rtx tmp = gen_reg_rtx (mode);
   if (MEM_P (src))
 {
-   rtx ops[2] = {tmp, src};
-   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
-ops);
+   rtx ops[] = {tmp, src};
+   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
 }
   else
 emit_move_insn (tmp, src);
@@ -548,8 +545,8 @@ legitimize_move (rtx dest, rtx src)
   if (satisfies_constraint_vu (src))
 return false;

-  rtx ops[2] = {dest, src};
-  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
+  rtx ops[] = {dest, src};
+  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
   return true;
 }

@@ -813,7 +810,7 @@ sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx vl,
 *scalar_op = force_reg (scalar_mode, *scalar_op);

   rtx tmp = gen_reg_rtx (vector_mode);
-  rtx ops[3] = {tmp, *scalar_op, vl};
+  rtx ops[] = {tmp, *scalar_op, vl};
   riscv_vector::emit_nonvlmax_insn (code_for_pred_broadcast (vector_mode),
 riscv_vector::RVV_UNOP, ops);
   emit_vector_func (operands, tmp);
@@ -1122,9 +1119,9 @@ expand_tuple_move (rtx *ops)

   if (fractional_p)
 {
-   rtx operands[3] = {subreg, mem, ops[4]};
-   emit_vlmax_insn (code_for_pred_mov (subpart_mode),
- riscv_vector::RVV_UNOP, operands);
+   rtx 

Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization expander

2023-05-23 Thread juzhe.zh...@rivai.ai
Yeah. Can I merge it?



juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt
Date: 2023-05-24 09:32
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc; juzhe.zhong
Subject: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
expander
On Tue, 23 May 2023 18:28:48 PDT (-0700), juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
>
> This simple patch fixes the magic number, remove magic number make codes more 
> reasonable.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-v.cc (expand_vec_series): Remove magic number.
> (expand_const_vector): Ditto.
> (legitimize_move): Ditto.
> (sew64_scalar_helper): Ditto.
> (expand_tuple_move): Ditto.
> (expand_vector_init_insert_elems): Ditto.
> * config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.
>
> ---
>  gcc/config/riscv/riscv-v.cc | 53 +
>  gcc/config/riscv/riscv.cc   |  2 +-
>  2 files changed, 26 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 478a052a779..fa61a850a22 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -406,14 +406,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>int shift = exact_log2 (INTVAL (step));
>rtx shift_amount = gen_int_mode (shift, Pmode);
>insn_code icode = code_for_pred_scalar (ASHIFT, mode);
> -   rtx ops[3] = {step_adj, vid, shift_amount};
> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
> +   rtx ops[] = {step_adj, vid, shift_amount};
> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
 
Looks like it also removes the "riscv_vector" namespace from some of the 
constants?  No big deal, it's just a different cleanup (assuming it 
still builds and such).
 
>  }
>else
>  {
>insn_code icode = code_for_pred_scalar (MULT, mode);
> -   rtx ops[3] = {step_adj, vid, step};
> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
> +   rtx ops[] = {step_adj, vid, step};
> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
>  }
>  }
>
> @@ -428,8 +428,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>  {
>rtx result = gen_reg_rtx (mode);
>insn_code icode = code_for_pred_scalar (PLUS, mode);
> -  rtx ops[3] = {result, step_adj, base};
> -  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
> +  rtx ops[] = {result, step_adj, base};
> +  emit_vlmax_insn (icode, RVV_BINOP, ops);
>emit_move_insn (dest, result);
>  }
>  }
> @@ -445,8 +445,8 @@ expand_const_vector (rtx target, rtx src)
>gcc_assert (
>  const_vec_duplicate_p (src, )
>  && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
> -  rtx ops[2] = {target, src};
> -  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, 
> ops);
> +  rtx ops[] = {target, src};
> +  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>return;
>  }
>
> @@ -458,16 +458,14 @@ expand_const_vector (rtx target, rtx src)
>  we use vmv.v.i instruction.  */
>if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
>  {
> -   rtx ops[2] = {tmp, src};
> -   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
> -ops);
> +   rtx ops[] = {tmp, src};
> +   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>  }
>else
>  {
>elt = force_reg (elt_mode, elt);
> -   rtx ops[2] = {tmp, elt};
> -   emit_vlmax_insn (code_for_pred_broadcast (mode),
> -riscv_vector::RVV_UNOP, ops);
> +   rtx ops[] = {tmp, elt};
> +   emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
>  }
>
>if (tmp != target)
> @@ -536,9 +534,8 @@ legitimize_move (rtx dest, rtx src)
>rtx tmp = gen_reg_rtx (mode);
>if (MEM_P (src))
>  {
> -   rtx ops[2] = {tmp, src};
> -   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
> -ops);
> +   rtx ops[] = {tmp, src};
> +   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>  }
>else
>  emit_move_insn (tmp, src);
> @@ -548,8 +545,8 @@ legitimize_move (rtx dest, rtx src)
>if (satisfies_constraint_vu (src))
>  return false;
>
> -  rtx ops[2] = {dest, src};
> -  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
> +  rtx ops[] = {dest, src};
> +  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>return true;
>  }
>
> @@ -813,7 +810,7 @@ sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx 
> vl,
>  *scalar_op = force_reg (scalar_mode, *scalar_op);
>
>rtx tmp = gen_reg_rtx (vector_mode);
> -  rtx ops[3] = {tmp, *scalar_op, vl};
> +  rtx ops[] = {tmp, *scalar_op, vl};
>riscv_vector::emit_nonvlmax_insn (code_for_pred_broadcast (vector_mode),
>  riscv_vector::RVV_UNOP, ops);
>emit_vector_func (operands, tmp);
> @@ -1122,9 +1119,9 @@ expand_tuple_move (rtx *ops)
>
>if (fractional_p)
>  {
> -   rtx operands[3] = {subreg, 

Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization expander

2023-05-23 Thread Palmer Dabbelt

On Tue, 23 May 2023 18:28:48 PDT (-0700), juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

This simple patch fixes the magic number, remove magic number make codes more 
reasonable.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_series): Remove magic number.
(expand_const_vector): Ditto.
(legitimize_move): Ditto.
(sew64_scalar_helper): Ditto.
(expand_tuple_move): Ditto.
(expand_vector_init_insert_elems): Ditto.
* config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 53 +
 gcc/config/riscv/riscv.cc   |  2 +-
 2 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 478a052a779..fa61a850a22 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -406,14 +406,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
  int shift = exact_log2 (INTVAL (step));
  rtx shift_amount = gen_int_mode (shift, Pmode);
  insn_code icode = code_for_pred_scalar (ASHIFT, mode);
- rtx ops[3] = {step_adj, vid, shift_amount};
- emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+ rtx ops[] = {step_adj, vid, shift_amount};
+ emit_vlmax_insn (icode, RVV_BINOP, ops);


Looks like it also removes the "riscv_vector" namespace from some of the 
constants?  No big deal, it's just a different cleanup (assuming it 
still builds and such).



}
   else
{
  insn_code icode = code_for_pred_scalar (MULT, mode);
- rtx ops[3] = {step_adj, vid, step};
- emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+ rtx ops[] = {step_adj, vid, step};
+ emit_vlmax_insn (icode, RVV_BINOP, ops);
}
 }

@@ -428,8 +428,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
 {
   rtx result = gen_reg_rtx (mode);
   insn_code icode = code_for_pred_scalar (PLUS, mode);
-  rtx ops[3] = {result, step_adj, base};
-  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+  rtx ops[] = {result, step_adj, base};
+  emit_vlmax_insn (icode, RVV_BINOP, ops);
   emit_move_insn (dest, result);
 }
 }
@@ -445,8 +445,8 @@ expand_const_vector (rtx target, rtx src)
   gcc_assert (
const_vec_duplicate_p (src, )
&& (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
-  rtx ops[2] = {target, src};
-  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
+  rtx ops[] = {target, src};
+  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
   return;
 }

@@ -458,16 +458,14 @@ expand_const_vector (rtx target, rtx src)
 we use vmv.v.i instruction.  */
   if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
{
- rtx ops[2] = {tmp, src};
- emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
-  ops);
+ rtx ops[] = {tmp, src};
+ emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
}
   else
{
  elt = force_reg (elt_mode, elt);
- rtx ops[2] = {tmp, elt};
- emit_vlmax_insn (code_for_pred_broadcast (mode),
-  riscv_vector::RVV_UNOP, ops);
+ rtx ops[] = {tmp, elt};
+ emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
}

   if (tmp != target)
@@ -536,9 +534,8 @@ legitimize_move (rtx dest, rtx src)
   rtx tmp = gen_reg_rtx (mode);
   if (MEM_P (src))
{
- rtx ops[2] = {tmp, src};
- emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
-  ops);
+ rtx ops[] = {tmp, src};
+ emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
}
   else
emit_move_insn (tmp, src);
@@ -548,8 +545,8 @@ legitimize_move (rtx dest, rtx src)
   if (satisfies_constraint_vu (src))
 return false;

-  rtx ops[2] = {dest, src};
-  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
+  rtx ops[] = {dest, src};
+  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
   return true;
 }

@@ -813,7 +810,7 @@ sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx vl,
 *scalar_op = force_reg (scalar_mode, *scalar_op);

   rtx tmp = gen_reg_rtx (vector_mode);
-  rtx ops[3] = {tmp, *scalar_op, vl};
+  rtx ops[] = {tmp, *scalar_op, vl};
   riscv_vector::emit_nonvlmax_insn (code_for_pred_broadcast (vector_mode),
 riscv_vector::RVV_UNOP, ops);
   emit_vector_func (operands, tmp);
@@ -1122,9 +1119,9 @@ expand_tuple_move (rtx *ops)

  if (fractional_p)
{
- rtx operands[3] = {subreg, mem, ops[4]};
- emit_vlmax_insn (code_for_pred_mov (subpart_mode),
-  

[PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization expander

2023-05-23 Thread juzhe . zhong
From: Juzhe-Zhong 

This simple patch fixes the magic number, remove magic number make codes more 
reasonable.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_series): Remove magic number.
(expand_const_vector): Ditto.
(legitimize_move): Ditto.
(sew64_scalar_helper): Ditto.
(expand_tuple_move): Ditto.
(expand_vector_init_insert_elems): Ditto.
* config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 53 +
 gcc/config/riscv/riscv.cc   |  2 +-
 2 files changed, 26 insertions(+), 29 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 478a052a779..fa61a850a22 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -406,14 +406,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
  int shift = exact_log2 (INTVAL (step));
  rtx shift_amount = gen_int_mode (shift, Pmode);
  insn_code icode = code_for_pred_scalar (ASHIFT, mode);
- rtx ops[3] = {step_adj, vid, shift_amount};
- emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+ rtx ops[] = {step_adj, vid, shift_amount};
+ emit_vlmax_insn (icode, RVV_BINOP, ops);
}
   else
{
  insn_code icode = code_for_pred_scalar (MULT, mode);
- rtx ops[3] = {step_adj, vid, step};
- emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+ rtx ops[] = {step_adj, vid, step};
+ emit_vlmax_insn (icode, RVV_BINOP, ops);
}
 }
 
@@ -428,8 +428,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
 {
   rtx result = gen_reg_rtx (mode);
   insn_code icode = code_for_pred_scalar (PLUS, mode);
-  rtx ops[3] = {result, step_adj, base};
-  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+  rtx ops[] = {result, step_adj, base};
+  emit_vlmax_insn (icode, RVV_BINOP, ops);
   emit_move_insn (dest, result);
 }
 }
@@ -445,8 +445,8 @@ expand_const_vector (rtx target, rtx src)
   gcc_assert (
const_vec_duplicate_p (src, )
&& (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
-  rtx ops[2] = {target, src};
-  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
+  rtx ops[] = {target, src};
+  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
   return;
 }
 
@@ -458,16 +458,14 @@ expand_const_vector (rtx target, rtx src)
 we use vmv.v.i instruction.  */
   if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
{
- rtx ops[2] = {tmp, src};
- emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
-  ops);
+ rtx ops[] = {tmp, src};
+ emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
}
   else
{
  elt = force_reg (elt_mode, elt);
- rtx ops[2] = {tmp, elt};
- emit_vlmax_insn (code_for_pred_broadcast (mode),
-  riscv_vector::RVV_UNOP, ops);
+ rtx ops[] = {tmp, elt};
+ emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
}
 
   if (tmp != target)
@@ -536,9 +534,8 @@ legitimize_move (rtx dest, rtx src)
   rtx tmp = gen_reg_rtx (mode);
   if (MEM_P (src))
{
- rtx ops[2] = {tmp, src};
- emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
-  ops);
+ rtx ops[] = {tmp, src};
+ emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
}
   else
emit_move_insn (tmp, src);
@@ -548,8 +545,8 @@ legitimize_move (rtx dest, rtx src)
   if (satisfies_constraint_vu (src))
 return false;
 
-  rtx ops[2] = {dest, src};
-  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
+  rtx ops[] = {dest, src};
+  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
   return true;
 }
 
@@ -813,7 +810,7 @@ sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx vl,
 *scalar_op = force_reg (scalar_mode, *scalar_op);
 
   rtx tmp = gen_reg_rtx (vector_mode);
-  rtx ops[3] = {tmp, *scalar_op, vl};
+  rtx ops[] = {tmp, *scalar_op, vl};
   riscv_vector::emit_nonvlmax_insn (code_for_pred_broadcast (vector_mode),
 riscv_vector::RVV_UNOP, ops);
   emit_vector_func (operands, tmp);
@@ -1122,9 +1119,9 @@ expand_tuple_move (rtx *ops)
 
  if (fractional_p)
{
- rtx operands[3] = {subreg, mem, ops[4]};
- emit_vlmax_insn (code_for_pred_mov (subpart_mode),
-   riscv_vector::RVV_UNOP, operands);
+ rtx operands[] = {subreg, mem, ops[4]};
+ emit_vlmax_insn (code_for_pred_mov (subpart_mode), RVV_UNOP,
+  operands);
}

Re: Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread juzhe.zh...@rivai.ai
Ok. Let's wait for Kito's more comments.
Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-24 05:07
To: 钟居哲; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; Jeff Law; 
richard.sandiford
Subject: Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization
>>> Don't you want to use your shiny new operand passing style here as
>>> with the other expanders?
> H, I do this just following ARM code style.
> You can see I do pass rtx[] for expand_vcond and pass rtx,rtx,rtx for 
> expand_vec_cmp.
> Well, I just follow ARM SVE implementation (You can check aarch64-sve.md, we 
> are the same)  :)
> If don't like it, could give me more information then I change it for you.
 
It doesn't matter that much in the end.  I just wondered that we just introduced
a new style of passing operands to the insn_expander and then immediately not
use it in the first follow up :)
 
Nit:
+  e.set_policy (op_num == RVV_CMP_OP ? MASK_UNDISTURBED : MASK_ANY);
 
This looks weird in an emit__cmp_insn.  Without a comment it's unclear
why anything else but a CMP_OP would ever be used here.  The double meaning
of the enum (that I wanted to be an instruction type rather than a "number
of operands") doesn't help.  But well, fixable in the future.  We just
need to make sure not to accumulate too many of these warts.
 
From the expander side V3 looks clean now.  The integer parts look OK to me
but I haven't checked the FP side at all.
 
Regards
Robin
 


[PATCH] RISC-V: Fix magic number of RVV auto-vectorization expander

2023-05-23 Thread juzhe . zhong
From: Juzhe-Zhong 

This simple patch fixes the magic number, replaced by enum to make code
more reasonable.

Ok for trunk ?

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_series): Fix magic number.
(expand_const_vector): Ditto.
(legitimize_move): Ditto.
(expand_vector_init_insert_elems): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 39 +
 1 file changed, 18 insertions(+), 21 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 478a052a779..524e8c7f858 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -406,14 +406,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
  int shift = exact_log2 (INTVAL (step));
  rtx shift_amount = gen_int_mode (shift, Pmode);
  insn_code icode = code_for_pred_scalar (ASHIFT, mode);
- rtx ops[3] = {step_adj, vid, shift_amount};
- emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+ rtx ops[RVV_BINOP] = {step_adj, vid, shift_amount};
+ emit_vlmax_insn (icode, RVV_BINOP, ops);
}
   else
{
  insn_code icode = code_for_pred_scalar (MULT, mode);
- rtx ops[3] = {step_adj, vid, step};
- emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+ rtx ops[RVV_BINOP] = {step_adj, vid, step};
+ emit_vlmax_insn (icode, RVV_BINOP, ops);
}
 }
 
@@ -428,8 +428,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
 {
   rtx result = gen_reg_rtx (mode);
   insn_code icode = code_for_pred_scalar (PLUS, mode);
-  rtx ops[3] = {result, step_adj, base};
-  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+  rtx ops[RVV_BINOP] = {result, step_adj, base};
+  emit_vlmax_insn (icode, RVV_BINOP, ops);
   emit_move_insn (dest, result);
 }
 }
@@ -445,8 +445,8 @@ expand_const_vector (rtx target, rtx src)
   gcc_assert (
const_vec_duplicate_p (src, )
&& (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
-  rtx ops[2] = {target, src};
-  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
+  rtx ops[RVV_UNOP] = {target, src};
+  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
   return;
 }
 
@@ -458,16 +458,14 @@ expand_const_vector (rtx target, rtx src)
 we use vmv.v.i instruction.  */
   if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
{
- rtx ops[2] = {tmp, src};
- emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
-  ops);
+ rtx ops[RVV_UNOP] = {tmp, src};
+ emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
}
   else
{
  elt = force_reg (elt_mode, elt);
- rtx ops[2] = {tmp, elt};
- emit_vlmax_insn (code_for_pred_broadcast (mode),
-  riscv_vector::RVV_UNOP, ops);
+ rtx ops[RVV_UNOP] = {tmp, elt};
+ emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
}
 
   if (tmp != target)
@@ -536,9 +534,8 @@ legitimize_move (rtx dest, rtx src)
   rtx tmp = gen_reg_rtx (mode);
   if (MEM_P (src))
{
- rtx ops[2] = {tmp, src};
- emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
-  ops);
+ rtx ops[RVV_UNOP] = {tmp, src};
+ emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
}
   else
emit_move_insn (tmp, src);
@@ -548,8 +545,8 @@ legitimize_move (rtx dest, rtx src)
   if (satisfies_constraint_vu (src))
 return false;
 
-  rtx ops[2] = {dest, src};
-  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
+  rtx ops[RVV_UNOP] = {dest, src};
+  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
   return true;
 }
 
@@ -1281,8 +1278,8 @@ expand_vector_init_insert_elems (rtx target, const 
rvv_builder ,
   unsigned int unspec
= FLOAT_MODE_P (mode) ? UNSPEC_VFSLIDE1DOWN : UNSPEC_VSLIDE1DOWN;
   insn_code icode = code_for_pred_slide (unspec, mode);
-  rtx ops[3] = {target, target, builder.elt (i)};
-  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
+  rtx ops[RVV_BINOP] = {target, target, builder.elt (i)};
+  emit_vlmax_insn (icode, RVV_BINOP, ops);
 }
 }
 
-- 
2.36.3



[PATCH] Dump if a pattern fails after having printed applying it

2023-05-23 Thread Andrew Pinski via Gcc-patches
While trying to understand how to use the ! operand for match
patterns, I noticed that the debug dumps would print out applying
a pattern but nothing when it was rejected in the end. This was confusing
me.
This adds that by emitting a dump for the failed case.
Note the patch is little more complex as we don't want to print out
if debug counter rejected the pattern and then we need to fix up
when we mark needing a label or not.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* genmatch.cc (needs_label): New variable
(expr::gen_transform): Set needs_label
if we use the local_label.
(dt_simplify::gen_1): Use `_1` for the debug count label.
After the local label, emit debug print for the failure.
Emit `_1` label if needed.
---
 gcc/genmatch.cc | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/gcc/genmatch.cc b/gcc/genmatch.cc
index 177c13d87cb..2ea80d341a2 100644
--- a/gcc/genmatch.cc
+++ b/gcc/genmatch.cc
@@ -2433,6 +2433,7 @@ capture_info::walk_c_expr (c_expr *e)
 /* The current label failing the current matched pattern during
code generation.  */
 static char *fail_label;
+static bool needs_label;
 
 /* Code generation off the decision tree and the refered AST nodes.  */
 
@@ -2611,6 +2612,7 @@ expr::gen_transform (FILE *f, int indent, const char 
*dest, bool gimple,
   fprintf_indent (f, indent,
  "if (!_r%d) goto %s;\n",
  depth, fail_label);
+  needs_label = true;
   if (*opr == CONVERT_EXPR)
{
  indent -= 4;
@@ -2640,11 +2642,13 @@ expr::gen_transform (FILE *f, int indent, const char 
*dest, bool gimple,
{
  fprintf_indent (f, indent, "if (!_r%d)\n", depth);
  fprintf_indent (f, indent, "  goto %s;\n", fail_label);
+ needs_label = true;
}
   if (force_leaf)
{
  fprintf_indent (f, indent, "if (EXPR_P (_r%d))\n", depth);
  fprintf_indent (f, indent, "  goto %s;\n", fail_label);
+ needs_label = true;
}
   if (*opr == CONVERT_EXPR)
{
@@ -3409,7 +3413,8 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   char local_fail_label[256];
   snprintf (local_fail_label, 256, "next_after_fail%u", ++fail_label_cnt);
   fail_label = local_fail_label;
-  bool needs_label = false;
+  needs_label = false;
+  bool needs_label_1 = false;
 
   /* Analyze captures and perform early-outs on the incoming arguments
  that cover cases we cannot handle.  */
@@ -3484,8 +3489,8 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
 
   if (s->kind == simplify::SIMPLIFY)
 {
-  fprintf_indent (f, indent, "if (UNLIKELY (!dbg_cnt (match))) goto 
%s;\n", fail_label);
-  needs_label = true;
+  fprintf_indent (f, indent, "if (UNLIKELY (!dbg_cnt (match))) goto 
%s_1;\n", fail_label);
+  needs_label_1 = true;
 }
 
   fprintf_indent (f, indent, "if (UNLIKELY (debug_dump)) "
@@ -3718,7 +3723,22 @@ dt_simplify::gen_1 (FILE *f, int indent, bool gimple, 
operand *result)
   indent -= 2;
   fprintf_indent (f, indent, "}\n");
   if (needs_label)
-fprintf (f, "%s:;\n", fail_label);
+{
+  fprintf (f, "%s:;\n", fail_label);
+  if (s->kind == simplify::SIMPLIFY)
+   {
+ fprintf_indent (f, indent, "if (UNLIKELY (debug_dump)) "
+ "fprintf (dump_file, \"Pattern failed ");
+ fprintf (f, "%%s:%%d, %%s:%%d\\n\", ");
+ output_line_directive (f,
+result ? result->location : 
s->match->location, true,
+true);
+ fprintf (f, ", __FILE__, __LINE__);\n");
+   }
+}
+  if (needs_label_1)
+fprintf (f, "%s_1:;\n", fail_label);
+  needs_label = false;
   fail_label = NULL;
 }
 
-- 
2.31.1



Re: [PATCH v2] rs6000: Add buildin for mffscrn instructions

2023-05-23 Thread Peter Bergner via Gcc-patches
On 5/23/23 12:24 AM, Kewen.Lin wrote:
> on 2023/5/23 01:31, Carl Love wrote:
>> The builtins were requested for use in GLibC.  As of version 2.31 they
>> were added as inline asm.  They requested a builtin so the asm could be
>> removed.
>
> So IMHO we also want the similar support for mffscrn, that is to make
> use of mffscrn and mffscrni on Power9 and later, but falls back to 
> __builtin_set_fpscr_rn + mffs similar on older platforms.

So __builtin_set_fpscr_rn everything we want (sets the RN bits) and
uses mffscrn/mffscrni on P9 and later and uses older insns on pre-P9.
The only problem is we don't return the current FPSCR bits, as the bif
is defined to return void.  Crazy idea, but could we extend the built-in
with an overload that returns the FPSCR bits?  To be honest, I like
the __builtin_set_fpscr_rn name better than __builtin_mffscrn[i].
The built-in machinery can see that the usage is expecting a return value
or not and for the pre-P9 code, can skip generating the ending mffs if
we don't want the return value.

Peter




[PATCH] libstdc++: Add missing constexpr to simd_neon

2023-05-23 Thread Matthias Kretz via Gcc-patches

Signed-off-by: Matthias Kretz 

libstdc++-v3/ChangeLog:

PR libstdc++/109261
* include/experimental/bits/simd_neon.h (_S_reduce): Add
constexpr and make NEON implementation conditional on
not __builtin_is_constant_evaluated.
---
 .../include/experimental/bits/simd_neon.h | 76 +--
 1 file changed, 36 insertions(+), 40 deletions(-)


--
──
 Dr. Matthias Kretz   https://mattkretz.github.io
 GSI Helmholtz Centre for Heavy Ion Research   https://gsi.de
 stdₓ::simd
──diff --git a/libstdc++-v3/include/experimental/bits/simd_neon.h b/libstdc++-v3/include/experimental/bits/simd_neon.h
index 637b121b130..8f732d7587b 100644
--- a/libstdc++-v3/include/experimental/bits/simd_neon.h
+++ b/libstdc++-v3/include/experimental/bits/simd_neon.h
@@ -84,50 +84,46 @@ _S_masked_store_nocvt(_SimdWrapper<_Tp, _Np> __v, _Tp* __mem,
 // }}}
 // _S_reduce {{{
 template 
-  _GLIBCXX_SIMD_INTRINSIC static _Tp
+  _GLIBCXX_SIMD_INTRINSIC static constexpr _Tp
   _S_reduce(simd<_Tp, _Abi> __x, _BinaryOperation&& __binary_op)
   {
-	constexpr size_t _Np = __x.size();
-	if constexpr (sizeof(__x) == 16 && _Np >= 4
-		  && !_Abi::template _S_is_partial<_Tp>)
-	  {
-	const auto __halves = split>>(__x);
-	const auto __y = __binary_op(__halves[0], __halves[1]);
-	return _SimdImplNeon>::_S_reduce(
-	  __y, static_cast<_BinaryOperation&&>(__binary_op));
-	  }
-	else if constexpr (_Np == 8)
-	  {
-	__x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>(
- __vector_permute<1, 0, 3, 2, 5, 4, 7, 6>(
-   __x._M_data)));
-	__x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>(
- __vector_permute<3, 2, 1, 0, 7, 6, 5, 4>(
-   __x._M_data)));
-	__x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>(
- __vector_permute<7, 6, 5, 4, 3, 2, 1, 0>(
-   __x._M_data)));
-	return __x[0];
-	  }
-	else if constexpr (_Np == 4)
-	  {
-	__x
-	  = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>(
-   __vector_permute<1, 0, 3, 2>(__x._M_data)));
-	__x
-	  = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>(
-   __vector_permute<3, 2, 1, 0>(__x._M_data)));
-	return __x[0];
-	  }
-	else if constexpr (_Np == 2)
+	if (not __builtin_is_constant_evaluated())
 	  {
-	__x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>(
- __vector_permute<1, 0>(__x._M_data)));
-	return __x[0];
+	constexpr size_t _Np = __x.size();
+	if constexpr (sizeof(__x) == 16 && _Np >= 4
+			&& !_Abi::template _S_is_partial<_Tp>)
+	  {
+		const auto __halves = split>>(__x);
+		const auto __y = __binary_op(__halves[0], __halves[1]);
+		return _SimdImplNeon>::_S_reduce(
+			 __y, static_cast<_BinaryOperation&&>(__binary_op));
+	  }
+	else if constexpr (_Np == 8)
+	  {
+		__x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>(
+	 __vector_permute<1, 0, 3, 2, 5, 4, 7, 6>(__x._M_data)));
+		__x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>(
+	 __vector_permute<3, 2, 1, 0, 7, 6, 5, 4>(__x._M_data)));
+		__x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>(
+	 __vector_permute<7, 6, 5, 4, 3, 2, 1, 0>(__x._M_data)));
+		return __x[0];
+	  }
+	else if constexpr (_Np == 4)
+	  {
+		__x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>(
+	 __vector_permute<1, 0, 3, 2>(__x._M_data)));
+		__x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>(
+	 __vector_permute<3, 2, 1, 0>(__x._M_data)));
+		return __x[0];
+	  }
+	else if constexpr (_Np == 2)
+	  {
+		__x = __binary_op(__x, _Base::template _M_make_simd<_Tp, _Np>(
+	 __vector_permute<1, 0>(__x._M_data)));
+		return __x[0];
+	  }
 	  }
-	else
-	  return _Base::_S_reduce(__x,
-  static_cast<_BinaryOperation&&>(__binary_op));
+	return _Base::_S_reduce(__x, static_cast<_BinaryOperation&&>(__binary_op));
   }
 
 // }}}


Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread Robin Dapp via Gcc-patches
>>> Don't you want to use your shiny new operand passing style here as
>>> with the other expanders?
> H, I do this just following ARM code style.
> You can see I do pass rtx[] for expand_vcond and pass rtx,rtx,rtx for 
> expand_vec_cmp.
> Well, I just follow ARM SVE implementation (You can check aarch64-sve.md, we 
> are the same)  :)
> If don't like it, could give me more information then I change it for you.

It doesn't matter that much in the end.  I just wondered that we just introduced
a new style of passing operands to the insn_expander and then immediately not
use it in the first follow up :)

Nit:
+  e.set_policy (op_num == RVV_CMP_OP ? MASK_UNDISTURBED : MASK_ANY);

This looks weird in an emit__cmp_insn.  Without a comment it's unclear
why anything else but a CMP_OP would ever be used here.  The double meaning
of the enum (that I wanted to be an instruction type rather than a "number
of operands") doesn't help.  But well, fixable in the future.  We just
need to make sure not to accumulate too many of these warts.

>From the expander side V3 looks clean now.  The integer parts look OK to me
but I haven't checked the FP side at all.

Regards
 Robin


Re: [PATCH 2/2] xtensa: Merge '*addx' and '*subx' insn patterns into one

2023-05-23 Thread Max Filippov via Gcc-patches
On Mon, May 22, 2023 at 12:06 AM Takayuki 'January June' Suwa
 wrote:
>
> By making use of the 'addsub_operator' added in the last patch.
>
> gcc/ChangeLog:
>
> * config/xtensa/xtensa.md (*addsubx): Rename from '*addx',
> and change to also accept '*subx' pattern.
> (*subx): Remove.
> ---
>  gcc/config/xtensa/xtensa.md | 31 +--
>  1 file changed, 13 insertions(+), 18 deletions(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


Re: [PATCH v2] xtensa: Optimize '(x & CST1_POW2) != 0 ? CST2_POW2 : 0'

2023-05-23 Thread Max Filippov via Gcc-patches
On Mon, May 22, 2023 at 10:48 PM Takayuki 'January June' Suwa
 wrote:
>
> On 2023/05/23 11:27, Max Filippov wrote:
> > Hi Suwa-san,
>
> Hi!
>
> > This change introduces a bunch of test failures on big endian configuration.
> > I believe that's because the starting bit position for zero_extract is 
> > counted
> > from different ends depending on the endianness.
>
> Oops, what a stupid mistake... X(
>
> ===
> This patch decreses one machine instruction from "single bit extraction
> with shifting" operation, and tries to eliminate the conditional
> branch if CST2_POW2 doesn't fit into signed 12 bits with the help
> of ifcvt optimization.
>
> /* example #1 */
> int test0(int x) {
>   return (x & 1048576) != 0 ? 1024 : 0;
> }
> extern int foo(void);
> int test1(void) {
>   return (foo() & 1048576) != 0 ? 16777216 : 0;
> }
>
> ;; before
> test0:
> movia9, 0x400
> sraia2, a2, 10
> and a2, a2, a9
> ret.n
> test1:
> addisp, sp, -16
> s32i.n  a0, sp, 12
> call0   foo
> extui   a2, a2, 20, 1
> sllia2, a2, 20
> beqz.n  a2, .L2
> movi.n  a2, 1
> sllia2, a2, 24
> .L2:
> l32i.n  a0, sp, 12
> addisp, sp, 16
> ret.n
>
> ;; after
> test0:
> extui   a2, a2, 20, 1
> sllia2, a2, 10
> ret.n
> test1:
> addisp, sp, -16
> s32i.n  a0, sp, 12
> call0   foo
> l32i.n  a0, sp, 12
> extui   a2, a2, 20, 1
> sllia2, a2, 24
> addisp, sp, 16
> ret.n
>
> In addition, if the left shift amount ('exact_log2(CST2_POW2)') is
> between 1 through 3 and a either addition or subtraction with another
> register follows, emit a ADDX[248] or SUBX[248] machine instruction
> instead of separate left shift and add/subtract ones.
>
> /* example #2 */
> int test2(int x, int y) {
>   return ((x & 1048576) != 0 ? 4 : 0) + y;
> }
> int test3(int x, int y) {
>   return ((x & 2) != 0 ? 8 : 0) - y;
> }
>
> ;; before
> test2:
> movi.n  a9, 4
> sraia2, a2, 18
> and a2, a2, a9
> add.n   a2, a2, a3
> ret.n
> test3:
> movi.n  a9, 8
> sllia2, a2, 2
> and a2, a2, a9
> sub a2, a2, a3
> ret.n
>
> ;; after
> test2:
> extui   a2, a2, 20, 1
> addx4   a2, a2, a3
> ret.n
> test3:
> extui   a2, a2, 1, 1
> subx8   a2, a2, a3
> ret.n
>
> gcc/ChangeLog:
>
> * config/xtensa/predicates.md (addsub_operator): New.
> * config/xtensa/xtensa.md (*extzvsi-1bit_ashlsi3,
> *extzvsi-1bit_addsubx): New insn_and_split patterns.
> * config/xtensa/xtensa.cc (xtensa_rtx_costs):
> Add a special case about ifcvt 'noce_try_cmove()' to handle
> constant loads that do not fit into signed 12 bits in the
> patterns added above.
> ---
>  gcc/config/xtensa/predicates.md |  3 ++
>  gcc/config/xtensa/xtensa.cc |  3 +-
>  gcc/config/xtensa/xtensa.md | 83 +
>  3 files changed, 88 insertions(+), 1 deletion(-)

Regtested for target=xtensa-linux-uclibc, no new regressions.
Committed to master.

-- 
Thanks.
-- Max


RISC-V: Use extension instructions instead of bitwise "and"

2023-05-23 Thread Jivan Hakobyan via Gcc-patches
In the case where the target supports extension instructions,
it is preferable to use that instead of doing the same in other ways.
For the following case

void foo (unsigned long a, unsigned long* ptr) {
ptr[0] = a & 0xUL;
ptr[1] &= 0xUL;
}

GCC generates
foo:
li  a5,-1
srlia5,a5,32
and a0,a0,a5
sd  a0,0(a1)
ld  a4,8(a1)
and a5,a4,a5
sd  a5,8(a1)
ret

but it will be profitable to generate this one

foo:
  zext.w a0,a0
  sd a0,0(a1)
  lwu a5,8(a1)
  sd a5,8(a1)
  ret

This patch fixes mentioned issue.
It supports HI -> DI, HI->SI and SI -> DI extensions.

gcc/ChangeLog:
* config/riscv/riscv.md (and3): New expander.
(*and3) New pattern.
* config/riscv/predicates.md (arith_operand_or_mode_mask): New
predicate.

gcc/testsuite/ChangeLog:
* gcc.target/riscv/and-extend-1.c: New test
* gcc.target/riscv/and-extend-2.c: New test

-- 
With the best regards
Jivan Hakobyan
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index ffcbb9a7589..70f570153ae 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -27,6 +27,12 @@
   (ior (match_operand 0 "const_arith_operand")
(match_operand 0 "register_operand")))
 
+(define_predicate "arith_operand_or_mode_mask"
+  (ior (match_operand 0 "arith_operand")
+   (and (match_code "const_int")
+(match_test "INTVAL (op) == GET_MODE_MASK (HImode)
+ || INTVAL (op) == GET_MODE_MASK (SImode)"
+
 (define_predicate "lui_operand"
   (and (match_code "const_int")
(match_test "LUI_OPERAND (INTVAL (op))")))
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 124d8c95804..6492812 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -1342,9 +1342,46 @@
 ;; For RV64, we don't expose the SImode operations to the rtl expanders,
 ;; but SImode versions exist for combine.
 
+(define_expand "and3"
+  [(set (match_operand:X0 "register_operand")
+(and:X (match_operand:X 1 "register_operand")
+   (match_operand:X 2 "arith_operand_or_mode_mask")))]
+  ""
+{
+  if (CONST_INT_P (operands[2]))
+  {
+enum machine_mode tmode = VOIDmode;
+if (INTVAL (operands[2]) == GET_MODE_MASK (HImode))
+  tmode = HImode;
+else if (INTVAL (operands[2]) == GET_MODE_MASK (SImode))
+  tmode = SImode;
+
+if (tmode != VOIDmode)
+{
+  rtx tmp = gen_lowpart (tmode, operands[1]);
+  emit_insn (gen_extend_insn (operands[0], tmp, mode, tmode, 1));
+  DONE;
+}
+  }
+  else
+  {
+emit_move_insn (operands[0], gen_rtx_AND (mode, operands[1], operands[2]));
+DONE;
+  }
+})
+
+(define_insn "*and3"
+  [(set (match_operand:X0 "register_operand" "=r,r")
+	(and:X (match_operand:X 1 "register_operand" "%r,r")
+		   (match_operand:X 2 "arith_operand"" r,I")))]
+  ""
+  "and%i2\t%0,%1,%2"
+  [(set_attr "type" "logical")
+   (set_attr "mode" "")])
+
 (define_insn "3"
   [(set (match_operand:X0 "register_operand" "=r,r")
-	(any_bitwise:X (match_operand:X 1 "register_operand" "%r,r")
+	(any_or:X (match_operand:X 1 "register_operand" "%r,r")
 		   (match_operand:X 2 "arith_operand"" r,I")))]
   ""
   "%i2\t%0,%1,%2"
diff --git a/gcc/testsuite/gcc.target/riscv/and-extend-1.c b/gcc/testsuite/gcc.target/riscv/and-extend-1.c
new file mode 100644
index 000..a270d287374
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/and-extend-1.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zba_zbb -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+void
+foo(unsigned long a, unsigned long* ptr)
+{
+ptr[0] = a & 0xUL;
+ptr[1] &= 0xUL;
+}
+
+void
+foo2(unsigned long a, unsigned long* ptr)
+{
+ptr[0] = a & 0x;
+ptr[1] &= 0x;
+}
+
+void
+foo3(unsigned int a, unsigned int* ptr)
+{
+ptr[0] = a & 0x;
+ptr[1] &= 0x;
+}
+
+/* { dg-final { scan-assembler-times "zext.w" 1 } } */
+/* { dg-final { scan-assembler-times "zext.h" 2 } } */
+/* { dg-final { scan-assembler-times "lwu" 1 } } */
+/* { dg-final { scan-assembler-times "lhu" 2 } } */
+/* { dg-final { scan-assembler-not "and\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/and-extend-2.c b/gcc/testsuite/gcc.target/riscv/and-extend-2.c
new file mode 100644
index 000..fe639cd1e82
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/and-extend-2.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gc_zba_zbb -mabi=ilp32" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+void
+foo(unsigned long a, unsigned long* ptr)
+{
+ptr[0] = a & 0xUL;
+ptr[1] &= 0xUL;
+}
+
+void
+foo2(unsigned long a, unsigned long* ptr)
+{
+ptr[0] = a & 0x;
+ptr[1] &= 0x;
+}
+
+void
+foo3(unsigned int a, unsigned 

[PATCH] libstdc++: use using instead of typedef for type_traits

2023-05-23 Thread Ken Matsui via Gcc-patches
Since the type_traits header is a C++11 header file, using can be used instead
of typedef. This patch provides more readability, especially for long type
names.

libstdc++-v3/ChangeLog:

* include/std/type_traits: Use using instead of typedef
---
 libstdc++-v3/include/std/type_traits | 158 +--
 1 file changed, 79 insertions(+), 79 deletions(-)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index bc6982f9e64..0e7a9c9c7f3 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -61,9 +61,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct integral_constant
 {
-  static constexpr _Tp  value = __v;
-  typedef _Tp   value_type;
-  typedef integral_constant<_Tp, __v>   type;
+  static constexpr _Tp value = __v;
+  using value_type = _Tp;
+  using type = integral_constant<_Tp, __v>;
   constexpr operator value_type() const noexcept { return value; }
 #if __cplusplus > 201103L
 
@@ -109,7 +109,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // Partial specialization for true.
   template
 struct enable_if
-{ typedef _Tp type; };
+{ using type = _Tp; };
 
   // __enable_if_t (std::enable_if_t for C++11)
   template
@@ -946,7 +946,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __is_destructible_impl
 : public __do_is_destructible_impl
 {
-  typedef decltype(__test<_Tp>(0)) type;
+  using type = decltype(__test<_Tp>(0));
 };
 
   template(0)) type;
+  using type = decltype(__test<_Tp>(0));
 };
 
   template())) type;
+  using type = decltype(__test(declval<_Tp>()));
 };
 
   template
@@ -1422,7 +1422,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 is_array<_To>>::value>
 struct __is_convertible_helper
 {
-  typedef typename is_void<_To>::type type;
+  using type = typename is_void<_To>::type;
 };
 
 #pragma GCC diagnostic push
@@ -1443,7 +1443,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__test(...);
 
 public:
-  typedef decltype(__test<_From, _To>(0)) type;
+  using type = decltype(__test<_From, _To>(0));
 };
 #pragma GCC diagnostic pop
 
@@ -1521,20 +1521,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   /// remove_const
   template
 struct remove_const
-{ typedef _Tp type; };
+{ using type = _Tp; };
 
   template
 struct remove_const<_Tp const>
-{ typedef _Tp type; };
+{ using type = _Tp; };
 
   /// remove_volatile
   template
 struct remove_volatile
-{ typedef _Tp type; };
+{ using type = _Tp; };
 
   template
 struct remove_volatile<_Tp volatile>
-{ typedef _Tp type; };
+{ using type = _Tp; };
 
   /// remove_cv
 #if __has_builtin(__remove_cv)
@@ -1658,83 +1658,83 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 struct __cv_selector<_Unqualified, false, false>
-{ typedef _Unqualified __type; };
+{ using __type = _Unqualified; };
 
   template
 struct __cv_selector<_Unqualified, false, true>
-{ typedef volatile _Unqualified __type; };
+{ using __type = volatile _Unqualified; };
 
   template
 struct __cv_selector<_Unqualified, true, false>
-{ typedef const _Unqualified __type; };
+{ using __type = const _Unqualified; };
 
   template
 struct __cv_selector<_Unqualified, true, true>
-{ typedef const volatile _Unqualified __type; };
+{ using __type = const volatile _Unqualified; };
 
   template::value,
   bool _IsVol = is_volatile<_Qualified>::value>
 class __match_cv_qualifiers
 {
-  typedef __cv_selector<_Unqualified, _IsConst, _IsVol> __match;
+  using __match = __cv_selector<_Unqualified, _IsConst, _IsVol>;
 
 public:
-  typedef typename __match::__type __type;
+  using __type = typename __match::__type;
 };
 
   // Utility for finding the unsigned versions of signed integral types.
   template
 struct __make_unsigned
-{ typedef _Tp __type; };
+{ using __type = _Tp; };
 
   template<>
 struct __make_unsigned
-{ typedef unsigned char __type; };
+{ using __type = unsigned char; };
 
   template<>
 struct __make_unsigned
-{ typedef unsigned char __type; };
+{ using __type = unsigned char; };
 
   template<>
 struct __make_unsigned
-{ typedef unsigned short __type; };
+{ using __type = unsigned short; };
 
   template<>
 struct __make_unsigned
-{ typedef unsigned int __type; };
+{ using __type = unsigned int; };
 
   template<>
 struct __make_unsigned
-{ typedef unsigned long __type; };
+{ using __type = unsigned long; };
 
   template<>
 struct __make_unsigned
-{ typedef unsigned long long __type; };
+{ using __type = unsigned long long; };
 
 #if defined(__GLIBCXX_TYPE_INT_N_0)
   __extension__
   template<>
 struct __make_unsigned<__GLIBCXX_TYPE_INT_N_0>
-{ typedef unsigned 

[PATCH] PR middle-end/109840: Preserve popcount/parity type in match.pd.

2023-05-23 Thread Roger Sayle

PR middle-end/109840 is a regression introduced by my recent patch to
fold popcount(bswap(x)) as popcount(x).  When the bswap and the popcount
have the same precision, everything works fine, but this optimization also
allowed a zero-extension between the two.  The oversight is that we need
to be strict with type conversions, both to avoid accidentally changing
the argument type to popcount, and also to reflect the effects of
argument/return-value promotion in the call to bswap, so this zero extension
needs to be preserved/explicit in the optimized form.

Interestingly, match.pd should (in theory) be able to narrow calls to
popcount and parity, removing a zero-extension from its argument, but
that is an independent optimization, that needs to check IFN_ support.
Many thanks to Andrew Pinski for his help/fixes with these transformations.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-05-23  Roger Sayle  

gcc/ChangeLog
PR middle-end/109840
* match.pd : Preserve zero-extension when
optimizing popcount((T)bswap(x)) and popcount((T)rotate(x,y)) as
popcount((T)x), so the popcount's argument keeps the same type.
:  Likewise preserve extensions when
simplifying parity((T)bswap(x)) and parity((T)rotate(x,y)) as
parity((T)x), so that the parity's argument type is the same.

gcc/testsuite/ChangeLog
PR middle-end/109840
* gcc.dg/fold-parity-8.c: New test.
* gcc.dg/fold-popcount-11.c: Likewise.


Thanks in advance, and apologies for any inconvenience. 
Roger
--

diff --git a/gcc/match.pd b/gcc/match.pd
index 1fe0559..6e32f47 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -7865,10 +7865,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (popcount (convert?@0 (bswap:s@1 @2)))
   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
   && INTEGRAL_TYPE_P (TREE_TYPE (@1)))
-   (with { unsigned int prec0 = TYPE_PRECISION (TREE_TYPE (@0));
-   unsigned int prec1 = TYPE_PRECISION (TREE_TYPE (@1)); }
- (if (prec0 == prec1 || (prec0 > prec1 && TYPE_UNSIGNED (TREE_TYPE 
(@1
-   (popcount @2)))
+   (with { tree type0 = TREE_TYPE (@0);
+   tree type1 = TREE_TYPE (@1);
+   unsigned int prec0 = TYPE_PRECISION (type0);
+   unsigned int prec1 = TYPE_PRECISION (type1); }
+ (if (prec0 == prec1 || (prec0 > prec1 && TYPE_UNSIGNED (type1)))
+   (popcount (convert:type0 (convert:type1 @2)
 
 /* popcount(rotate(X Y)) is popcount(X).  */
 (for popcount (POPCOUNT)
@@ -7878,10 +7880,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
   && INTEGRAL_TYPE_P (TREE_TYPE (@1))  
   && (GIMPLE || !TREE_SIDE_EFFECTS (@3)))
-   (with { unsigned int prec0 = TYPE_PRECISION (TREE_TYPE (@0));
-   unsigned int prec1 = TYPE_PRECISION (TREE_TYPE (@1)); }
- (if (prec0 == prec1 || (prec0 > prec1 && TYPE_UNSIGNED (TREE_TYPE 
(@1
-   (popcount @2)))
+   (with { tree type0 = TREE_TYPE (@0);
+   tree type1 = TREE_TYPE (@1);
+   unsigned int prec0 = TYPE_PRECISION (type0);
+   unsigned int prec1 = TYPE_PRECISION (type1); }
+ (if (prec0 == prec1 || (prec0 > prec1 && TYPE_UNSIGNED (type1)))
+   (popcount (convert:type0 @2
 
 /* Canonicalize POPCOUNT(x)&1 as PARITY(X).  */
 (simplify
@@ -7923,7 +7927,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && INTEGRAL_TYPE_P (TREE_TYPE (@1))
   && TYPE_PRECISION (TREE_TYPE (@0))
  >= TYPE_PRECISION (TREE_TYPE (@1)))
-   (parity @2)
+   (with { tree type0 = TREE_TYPE (@0);
+   tree type1 = TREE_TYPE (@1); }
+ (parity (convert:type0 (convert:type1 @2
 
 /* parity(rotate(X Y)) is parity(X).  */
 (for parity (PARITY)
@@ -7935,7 +7941,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && (GIMPLE || !TREE_SIDE_EFFECTS (@3))
   && TYPE_PRECISION (TREE_TYPE (@0))
  >= TYPE_PRECISION (TREE_TYPE (@1)))
-   (parity @2)
+   (with { tree type0 = TREE_TYPE (@0); }
+ (parity (convert:type0 @2)))
 
 /* parity(X)^parity(Y) is parity(X^Y).  */
 (simplify
diff --git a/gcc/testsuite/gcc.dg/fold-parity-8.c 
b/gcc/testsuite/gcc.dg/fold-parity-8.c
new file mode 100644
index 000..48e1f7f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fold-parity-8.c
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int foo(unsigned short x)
+{
+  unsigned short t1 = __builtin_bswap16(x);
+  unsigned int t2 = t1;
+  return __builtin_parity (t2);
+}
+
+int fool(unsigned short x)
+{
+  unsigned short t1 = __builtin_bswap16(x);
+  unsigned long t2 = t1;
+  return __builtin_parityl (t2);
+}
+
+int fooll(unsigned short x)
+{
+  unsigned short t1 = 

Re: [patch] mcore: Fix sprintf length warning

2023-05-23 Thread Jeff Law via Gcc-patches




On 5/22/23 17:58, Jan-Benedict Glaw wrote:

Hi!

One of the supplied argument strings is unneccesarily long (c-sky, using
basically the same code, fixed it to a shorter length) and this fixes overflow
warnings, as GCC fails to deduce that the full 256 bytes for load_op[] are
not used at all.


make[1]: Entering directory 
'/var/lib/laminar/run/gcc-mcore-elf/38/toolchain-build/gcc'
[...]
/usr/lib/gcc-snapshot/bin/g++  -fno-PIE -c   -g -O2   -DIN_GCC  
-DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings 
-Wcast-qual -Wmissing-format-attribute -Wconditionally-supported 
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros 
-Wno-overlength-strings -Werror -fno-common  -DHAVE_CONFIG_H -I. -I. 
-I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include  
-I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libcody  
-I../../gcc/gcc/../libdecnumber -I../../gcc/gcc/../libdecnumber/dpd 
-I../libdecnumber -I../../gcc/gcc/../libbacktrace   -o mcore.o -MT mcore.o -MMD 
-MP -MF ./.deps/mcore.TPo ../../gcc/gcc/config/mcore/mcore.cc
../../gcc/gcc/config/mcore/mcore.cc: In function 'const char* 
output_inline_const(machine_mode, rtx_def**)':
../../gcc/gcc/config/mcore/mcore.cc:1264:24: error: '
ixw ' directive writing 6 bytes into a region of size between 1 and 
256 [-Werror=format-overflow=]
  1264 |   sprintf (buf, "%s\n\tixw\t%s,%s\t// %ld 0x%lx", load_op, 
dst_fmt, dst_fmt, value, value);
   |^
../../gcc/gcc/config/mcore/mcore.cc:1264:21: note: using the range [0, 
18446744073709551615] for directive argument
  1264 |   sprintf (buf, "%s\n\tixw\t%s,%s\t// %ld 0x%lx", load_op, 
dst_fmt, dst_fmt, value, value);
   | ^~~~
../../gcc/gcc/config/mcore/mcore.cc:1264:15: note: 'sprintf' output between 21 
and 310 bytes into a destination of size 256
  1264 |   sprintf (buf, "%s\n\tixw\t%s,%s\t// %ld 0x%lx", load_op, 
dst_fmt, dst_fmt, value, value);
   |   
^~~~
../../gcc/gcc/config/mcore/mcore.cc:1261:24: error: '
ixh ' directive writing 6 bytes into a region of size between 1 and 
256 [-Werror=format-overflow=]
  1261 |   sprintf (buf, "%s\n\tixh\t%s,%s\t// %ld 0x%lx", load_op, 
dst_fmt, dst_fmt, value, value);
   |^
../../gcc/gcc/config/mcore/mcore.cc:1261:21: note: using the range [0, 
18446744073709551615] for directive argument
  1261 |   sprintf (buf, "%s\n\tixh\t%s,%s\t// %ld 0x%lx", load_op, 
dst_fmt, dst_fmt, value, value);
   | ^~~~
../../gcc/gcc/config/mcore/mcore.cc:1261:15: note: 'sprintf' output between 21 
and 310 bytes into a destination of size 256
  1261 |   sprintf (buf, "%s\n\tixh\t%s,%s\t// %ld 0x%lx", load_op, 
dst_fmt, dst_fmt, value, value);
   |   
^~~~
../../gcc/gcc/config/mcore/mcore.cc:1258:24: error: '
lsli' directive writing 7 bytes into a region of size between 1 and 
256 [-Werror=format-overflow=]
  1258 |   sprintf (buf, "%s\n\tlsli\t%s,%%2\t// %ld 0x%lx", load_op, 
dst_fmt, value, value);
   |^~
../../gcc/gcc/config/mcore/mcore.cc:1258:21: note: using the range [0, 
18446744073709551615] for directive argument
  1258 |   sprintf (buf, "%s\n\tlsli\t%s,%%2\t// %ld 0x%lx", load_op, 
dst_fmt, value, value);
   | ^~
../../gcc/gcc/config/mcore/mcore.cc:1258:15: note: 'sprintf' output between 22 
and 311 bytes into a destination of size 256
  1258 |   sprintf (buf, "%s\n\tlsli\t%s,%%2\t// %ld 0x%lx", load_op, 
dst_fmt, value, value);
   |   
^
../../gcc/gcc/config/mcore/mcore.cc:1255:24: error: '
rotli   ' directive writing 8 bytes into a region of size between 1 and 
256 [-Werror=format-overflow=]
  1255 |   sprintf (buf, "%s\n\trotli\t%s,%%2\t// %ld 0x%lx", load_op, 
dst_fmt, value, value);
   |^~~
../../gcc/gcc/config/mcore/mcore.cc:1255:21: note: using the range [0, 
18446744073709551615] for directive argument
  1255 |   sprintf (buf, "%s\n\trotli\t%s,%%2\t// %ld 0x%lx", load_op, 
dst_fmt, value, value);
   | ^~~
../../gcc/gcc/config/mcore/mcore.cc:1255:15: note: 'sprintf' output between 23 
and 312 bytes into a destination of size 256
  1255 |   sprintf (buf, "%s\n\trotli\t%s,%%2\t// %ld 0x%lx", load_op, 
dst_fmt, value, value);
   |   
^~
../../gcc/gcc/config/mcore/mcore.cc:1252:24: error: '

[avr,committed] Fix cost computation for bit insertions.

2023-05-23 Thread Georg-Johann Lay

Applied this patchlet that implements proper cost computation of

(set (zero_extract (...) ...))

kind patterns that do single-bit (inverted) bit insertions.


Johann

--

Improve cost computation for single-bit bit insertions.

Some miscomputation of rtx_costs lead to sub-optimal code for
single-bit bit insertions.  This patch implements TARGET_INSN_COST,
which has a chance to see the whole insn during insn combination;
in particular the SET_DEST of (set (zero_extract (...) ...)).

gcc/
* config/avr/avr.cc (avr_insn_cost): New static function.
(TARGET_INSN_COST): Define to that function.

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 9fa50ca230d..4fa6f5309b2 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -11514,6 +11514,52 @@ avr_rtx_costs (rtx x, machine_mode mode, int 
outer_code,

 }


+/* Implement `TARGET_INSN_COST'.  */
+/* For some insns, it is not enough to look at the cost of the SET_SRC.
+   In that case, have a look at the entire insn, e.g. during insn 
combine.  */

+
+static int
+avr_insn_cost (rtx_insn *insn, bool speed)
+{
+  const int unknown_cost = -1;
+  int cost = unknown_cost;
+
+  rtx set = single_set (insn);
+
+  if (set
+  && ZERO_EXTRACT == GET_CODE (SET_DEST (set)))
+{
+  // Try find anything that would flip the extracted bit.
+  bool not_bit_p = false;
+
+  subrtx_iterator::array_type array;
+  FOR_EACH_SUBRTX (iter, array, SET_SRC (set), NONCONST)
+   {
+ enum rtx_code code = GET_CODE (*iter);
+ not_bit_p |= code == NOT || code == XOR || code == GE;
+   }
+
+  // Don't go too deep into the analysis.  In almost all cases,
+  // using BLD/BST is the best we can do for single-bit moves,
+  // even considering CSE.
+  cost = COSTS_N_INSNS (2 + not_bit_p);
+}
+
+  if (cost != unknown_cost)
+{
+  if (avr_log.rtx_costs)
+   avr_edump ("\n%? (%s) insn_cost=%d\n%r\n",
+  speed ? "speed" : "size", cost, insn);
+  return cost;
+}
+
+  // Resort to what rtlanal.cc::insn_cost() implements as a default
+  // when targetm.insn_cost() is not implemented.
+
+  return pattern_cost (PATTERN (insn), speed);
+}
+
+
 /* Implement `TARGET_ADDRESS_COST'.  */

 static int
@@ -14574,6 +14620,8 @@ avr_float_lib_compare_returns_bool (machine_mode 
mode, enum rtx_code)

 #undef  TARGET_ASM_FINAL_POSTSCAN_INSN
 #define TARGET_ASM_FINAL_POSTSCAN_INSN avr_asm_final_postscan_insn

+#undef  TARGET_INSN_COST
+#define TARGET_INSN_COST avr_insn_cost
 #undef  TARGET_REGISTER_MOVE_COST
 #define TARGET_REGISTER_MOVE_COST avr_register_move_cost
 #undef  TARGET_MEMORY_MOVE_COST


Re: [PATCH] tree-optimization/109747 - SLP cost of CTORs

2023-05-23 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> The x86 backend looks at the SLP node passed to the add_stmt_cost
> hook when costing vec_construct, looking for elements that require
> a move from a GPR to a vector register and cost that.  But since
> vect_prologue_cost_for_slp decomposes the cost for an external
> SLP node into individual pieces this cost gets applied N times
> without a chance for the backend to know it's just dealing with
> a part of the SLP node.  Just looking at a part is also not perfect
> since the GPR to XMM move cost applies only once per distinct
> element so handling the whole SLP node one more correctly reflects
> cost (albeit without considering other external SLP nodes).
>
> The following addresses the issue by passing down the SLP node
> only for one piece and nullptr for the rest.  The x86 backend
> is currently the only one looking at it.
>
> In the future the cost of external elements is something to deal
> with globally but that would require the full SLP tree be available
> to costing.
>
> It's difficult to write a testcase, at the tipping point not
> vectorizing is better so I'll followup with x86 specific adjustments
> and will see to add a testcase later.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
>
> Richard, we talked about this issue two weeks ago and I was looking
> for a solution that would be OK for backporting if the need arises.
> The following is what I could come up with that retains the whole
> SLP-node wide "CSE" of the element move cost.  Is that OK until
> we come up with a better plan for trunk at some point?

Yeah, seems like a neat workaround to me FWIW.

Thanks,
Richard

>
> Thanks,
> Richard.
>
>   PR tree-optimization/109747
>   * tree-vect-slp.cc (vect_prologue_cost_for_slp): Pass down
>   the SLP node only once to the cost hook.
> ---
>  gcc/tree-vect-slp.cc | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index e5c9d7e766e..a6f277c5e21 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -6069,6 +6069,7 @@ vect_prologue_cost_for_slp (slp_tree node,
>  }
>/* ???  We're just tracking whether vectors in a single node are the same.
>   Ideally we'd do something more global.  */
> +  bool passed = false;
>for (unsigned int start : starts)
>  {
>vect_cost_for_stmt kind;
> @@ -6078,7 +6079,15 @@ vect_prologue_cost_for_slp (slp_tree node,
>   kind = scalar_to_vec;
>else
>   kind = vec_construct;
> -  record_stmt_cost (cost_vec, 1, kind, node, vectype, 0, vect_prologue);
> +  /* The target cost hook has no idea which part of the SLP node
> +  we are costing so avoid passing it down more than once.  Pass
> +  it to the first vec_construct or scalar_to_vec part since for those
> +  the x86 backend tries to account for GPR to XMM register moves.  */
> +  record_stmt_cost (cost_vec, 1, kind,
> + (kind != vector_load && !passed) ? node : nullptr,
> + vectype, 0, vect_prologue);
> +  if (kind != vector_load)
> + passed = true;
>  }
>  }


[COMMITTED] i386: Add V8QI and V4QImode partial vector shift operations

2023-05-23 Thread Uros Bizjak via Gcc-patches
Add V8QImode and V4QImode vector shift patterns that call into
ix86_expand_vecop_qihi_partial.  Generate special sequences
for constant count operands.

The patch regresses g++.dg/pr91838.C - as explained in PR91838, the
test returns different results, depending on whether V8QImode shift
pattern is present in target *.md files. The tree optimizers produce:

V f (V x)
{
  V _2;

   [local count: 1073741824]:
  _2 = x_1(D) >> 8;
  return _2;

}

and without the named expander:

V f (V x)
{
   [local count: 1073741824]:
  return { 0, 0, 0, 0, 0, 0, 0, 0 };

}

RTL part just expands from there.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial):
Call ix86_expand_vec_shift_qihi_constant for shifts
with constant count operand.
* config/i386/i386.cc (ix86_shift_rotate_cost):
Handle V4QImode and V8QImode.
* config/i386/mmx.md (v8qi3): New insn pattern.
(v4qi3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/vect-shiftv4qi.c: New test.
* gcc.target/i386/vect-shiftv8qi.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 50d9d34ebcb..ff3d382f1b4 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -23294,6 +23294,16 @@ ix86_expand_vecop_qihi_partial (enum rtx_code code, 
rtx dest, rtx op1, rtx op2)
   else
 qop2 = op2;
 
+  qdest = gen_reg_rtx (V16QImode);
+
+  if (CONST_INT_P (op2)
+  && (code == ASHIFT || code == LSHIFTRT || code == ASHIFTRT)
+  && ix86_expand_vec_shift_qihi_constant (code, qdest, qop1, qop2))
+{
+  emit_move_insn (dest, gen_lowpart (qimode, qdest));
+  return;
+}
+
   switch (code)
 {
 case MULT:
@@ -23358,8 +23368,6 @@ ix86_expand_vecop_qihi_partial (enum rtx_code code, rtx 
dest, rtx op1, rtx op2)
   bool ok;
   int i;
 
-  qdest = gen_reg_rtx (V16QImode);
-
   /* Merge the data back into the right place.  */
   d.target = qdest;
   d.op0 = qres;
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 38125ce284a..2710c6dfc56 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -20580,6 +20580,37 @@ ix86_shift_rotate_cost (const struct processor_costs 
*cost,
 
   switch (mode)
{
+   case V4QImode:
+   case V8QImode:
+ if (TARGET_AVX2)
+   /* Use vpbroadcast.  */
+   extra = cost->sse_op;
+ else
+   extra = cost->sse_load[2];
+
+ if (constant_op1)
+   {
+ if (code == ASHIFTRT)
+   {
+ count = 4;
+ extra *= 2;
+   }
+ else
+   count = 2;
+   }
+ else if (TARGET_AVX512BW && TARGET_AVX512VL)
+   {
+ count = 3;
+ return ix86_vec_cost (mode, cost->sse_op * count);
+   }
+ else if (TARGET_SSE4_1)
+   count = 4;
+ else if (code == ASHIFTRT)
+   count = 5;
+ else
+   count = 4;
+ return ix86_vec_cost (mode, cost->sse_op * count) + extra;
+
case V16QImode:
  if (TARGET_XOP)
{
@@ -20600,7 +20631,12 @@ ix86_shift_rotate_cost (const struct processor_costs 
*cost,
}
  /* FALLTHRU */
case V32QImode:
- extra = (mode == V16QImode) ? cost->sse_load[2] : cost->sse_load[3];
+ if (TARGET_AVX2)
+   /* Use vpbroadcast.  */
+   extra = cost->sse_op;
+ else
+   extra = (mode == V16QImode) ? cost->sse_load[2] : cost->sse_load[3];
+
  if (constant_op1)
{
  if (code == ASHIFTRT)
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 45773673049..a37811f 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -2680,6 +2680,28 @@
(const_string "0")))
(set_attr "mode" "TI")])
 
+(define_expand "v8qi3"
+  [(set (match_operand:V8QI 0 "register_operand")
+   (any_shift:V8QI (match_operand:V8QI 1 "register_operand")
+   (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_MMX_WITH_SSE"
+{
+  ix86_expand_vecop_qihi_partial (, operands[0],
+ operands[1], operands[2]);
+  DONE;
+})
+
+(define_expand "v4qi3"
+  [(set (match_operand:V4QI 0 "register_operand")
+   (any_shift:V4QI (match_operand:V4QI 1 "register_operand")
+   (match_operand:DI 2 "nonmemory_operand")))]
+  "TARGET_SSE2"
+{
+  ix86_expand_vecop_qihi_partial (, operands[0],
+ operands[1], operands[2]);
+  DONE;
+})
+
 (define_insn_and_split "v2qi3"
   [(set (match_operand:V2QI 0 "register_operand" "=Q")
 (any_shift:V2QI
diff --git a/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c 
b/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c
new file mode 100644
index 000..c06dfb87bd1
--- /dev/null
+++ 

Re: [PATCH] Account for vector splat GPR->XMM move cost

2023-05-23 Thread Uros Bizjak via Gcc-patches
On Tue, May 23, 2023 at 5:18 PM Richard Biener  wrote:
>
> The following also accounts for a GPR->XMM move cost for splat
> operations and properly guards eliding the cost when moving from
> memory only for SSE4.1 or HImode or larger operands.  This
> doesn't fix the PR fully yet.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
>
> Thanks,
> Richard.
>
> PR target/109944
> * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
> For vector construction or splats apply GPR->XMM move
> costing.  QImode memory can be handled directly only
> with SSE4.1 pinsrb.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 38125ce284a..011a1fb0d6d 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -23654,7 +23654,7 @@ ix86_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
>stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
>  }
> -  else if (kind == vec_construct
> +  else if ((kind == vec_construct || kind == scalar_to_vec)
>&& node
>&& SLP_TREE_DEF_TYPE (node) == vect_external_def
>&& INTEGRAL_TYPE_P (TREE_TYPE (vectype)))
> @@ -23687,7 +23687,9 @@ ix86_vector_costs::add_stmt_cost (int count, 
> vect_cost_for_stmt kind,
>  Likewise with a BIT_FIELD_REF extracting from a vector
>  register we can hope to avoid using a GPR.  */
>   if (!is_gimple_assign (def)
> - || (!gimple_assign_load_p (def)
> + || ((!gimple_assign_load_p (def)
> +  || (!TARGET_SSE4_1
> +  && GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op))) == 1))
>   && (gimple_assign_rhs_code (def) != BIT_FIELD_REF
>   || !VECTOR_TYPE_P (TREE_TYPE
> (TREE_OPERAND (gimple_assign_rhs1 (def), 
> 0))
> --
> 2.35.3


[PATCH] Account for vector splat GPR->XMM move cost

2023-05-23 Thread Richard Biener via Gcc-patches
The following also accounts for a GPR->XMM move cost for splat
operations and properly guards eliding the cost when moving from
memory only for SSE4.1 or HImode or larger operands.  This
doesn't fix the PR fully yet.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

PR target/109944
* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
For vector construction or splats apply GPR->XMM move
costing.  QImode memory can be handled directly only
with SSE4.1 pinsrb.
---
 gcc/config/i386/i386.cc | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 38125ce284a..011a1fb0d6d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23654,7 +23654,7 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
   stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
   stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
 }
-  else if (kind == vec_construct
+  else if ((kind == vec_construct || kind == scalar_to_vec)
   && node
   && SLP_TREE_DEF_TYPE (node) == vect_external_def
   && INTEGRAL_TYPE_P (TREE_TYPE (vectype)))
@@ -23687,7 +23687,9 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
 Likewise with a BIT_FIELD_REF extracting from a vector
 register we can hope to avoid using a GPR.  */
  if (!is_gimple_assign (def)
- || (!gimple_assign_load_p (def)
+ || ((!gimple_assign_load_p (def)
+  || (!TARGET_SSE4_1
+  && GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op))) == 1))
  && (gimple_assign_rhs_code (def) != BIT_FIELD_REF
  || !VECTOR_TYPE_P (TREE_TYPE
(TREE_OPERAND (gimple_assign_rhs1 (def), 0))
-- 
2.35.3


[PATCH] tree-optimization/109747 - SLP cost of CTORs

2023-05-23 Thread Richard Biener via Gcc-patches
The x86 backend looks at the SLP node passed to the add_stmt_cost
hook when costing vec_construct, looking for elements that require
a move from a GPR to a vector register and cost that.  But since
vect_prologue_cost_for_slp decomposes the cost for an external
SLP node into individual pieces this cost gets applied N times
without a chance for the backend to know it's just dealing with
a part of the SLP node.  Just looking at a part is also not perfect
since the GPR to XMM move cost applies only once per distinct
element so handling the whole SLP node one more correctly reflects
cost (albeit without considering other external SLP nodes).

The following addresses the issue by passing down the SLP node
only for one piece and nullptr for the rest.  The x86 backend
is currently the only one looking at it.

In the future the cost of external elements is something to deal
with globally but that would require the full SLP tree be available
to costing.

It's difficult to write a testcase, at the tipping point not
vectorizing is better so I'll followup with x86 specific adjustments
and will see to add a testcase later.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard, we talked about this issue two weeks ago and I was looking
for a solution that would be OK for backporting if the need arises.
The following is what I could come up with that retains the whole
SLP-node wide "CSE" of the element move cost.  Is that OK until
we come up with a better plan for trunk at some point?

Thanks,
Richard.

PR tree-optimization/109747
* tree-vect-slp.cc (vect_prologue_cost_for_slp): Pass down
the SLP node only once to the cost hook.
---
 gcc/tree-vect-slp.cc | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index e5c9d7e766e..a6f277c5e21 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -6069,6 +6069,7 @@ vect_prologue_cost_for_slp (slp_tree node,
 }
   /* ???  We're just tracking whether vectors in a single node are the same.
  Ideally we'd do something more global.  */
+  bool passed = false;
   for (unsigned int start : starts)
 {
   vect_cost_for_stmt kind;
@@ -6078,7 +6079,15 @@ vect_prologue_cost_for_slp (slp_tree node,
kind = scalar_to_vec;
   else
kind = vec_construct;
-  record_stmt_cost (cost_vec, 1, kind, node, vectype, 0, vect_prologue);
+  /* The target cost hook has no idea which part of the SLP node
+we are costing so avoid passing it down more than once.  Pass
+it to the first vec_construct or scalar_to_vec part since for those
+the x86 backend tries to account for GPR to XMM register moves.  */
+  record_stmt_cost (cost_vec, 1, kind,
+   (kind != vector_load && !passed) ? node : nullptr,
+   vectype, 0, vect_prologue);
+  if (kind != vector_load)
+   passed = true;
 }
 }
 
-- 
2.35.3


[PATCH] Generic vector op costing adjustment

2023-05-23 Thread Richard Biener via Gcc-patches
This is a small adjustment to the work done for PR108752 and
better reflects the cost of the generated sequence.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/108752
* tree-vect-stmts.cc (vectorizable_operation): For bit
operations with generic word_mode vectors do not cost
an extra stmt.  For plus, minus and negate also cost the
constant materialization.
---
 gcc/tree-vect-stmts.cc | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 0022b878767..127b987cd62 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6466,8 +6466,8 @@ vectorizable_operation (vec_info *vinfo,
{
  /* The above vect_model_simple_cost call handles constants
 in the prologue and (mis-)costs one of the stmts as
-vector stmt.  See tree-vect-generic.cc:do_plus_minus/do_negate
-for the actual lowering that will be applied.  */
+vector stmt.  See below for the actual lowering that will
+be applied.  */
  unsigned n
= slp_node ? SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node) : ncopies;
  switch (code)
@@ -6481,9 +6481,20 @@ vectorizable_operation (vec_info *vinfo,
case NEGATE_EXPR:
  n *= 4;
  break;
-   default:;
+   default:
+ /* Bit operations do not have extra cost and are accounted
+as vector stmt by vect_model_simple_cost.  */
+ n = 0;
+ break;
+   }
+ if (n != 0)
+   {
+ /* We also need to materialize two large constants.  */
+ record_stmt_cost (cost_vec, 2, scalar_stmt, stmt_info,
+   0, vect_prologue);
+ record_stmt_cost (cost_vec, n, scalar_stmt, stmt_info,
+   0, vect_body);
}
- record_stmt_cost (cost_vec, n, scalar_stmt, stmt_info, 0, vect_body);
}
   return true;
 }
-- 
2.35.3


Re: Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread 钟居哲
Hi, Robin.

>> Don't you want to use your shiny new operand passing style here as
>> with the other expanders?
H, I do this just following ARM code style.
You can see I do pass rtx[] for expand_vcond and pass rtx,rtx,rtx for 
expand_vec_cmp.
Well, I just follow ARM SVE implementation (You can check aarch64-sve.md, we 
are the same)  :)
If don't like it, could give me more information then I change it for you.

>> I don't think we need the same comment in each of these.  Same for
>> /*DEST_MODE*/ and /*MASK_MODE*/ which would be redundant if data_mode
>> were called dest_mode.
Ok

>> Swap lt and gt here for consistency's sake.
Ok.

I have fixed as you suggested.
Would you mind review V3 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619324.html 

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-23 22:12
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; Richard 
Sandiford
Subject: Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization
> +(define_expand "vec_cmp"
> +  [(set (match_operand: 0 "register_operand")
> + (match_operator: 1 "comparison_operator"
> +   [(match_operand:VI 2 "register_operand")
> +(match_operand:VI 3 "register_operand")]))]
> +  "TARGET_VECTOR"
> +  {
> +riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
> +   operands[2], operands[3]);
> +DONE;
> +  }
> +)
> +
> +(define_expand "vec_cmpu"
> +  [(set (match_operand: 0 "register_operand")
> + (match_operator: 1 "comparison_operator"
> +   [(match_operand:VI 2 "register_operand")
> +(match_operand:VI 3 "register_operand")]))]
> +  "TARGET_VECTOR"
> +  {
> +riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
> +   operands[2], operands[3]);
> +DONE;
> +  }
> +)
> +
> +(define_expand "vec_cmp"
> +  [(set (match_operand: 0 "register_operand")
> + (match_operator: 1 "comparison_operator"
> +   [(match_operand:VF 2 "register_operand")
> +(match_operand:VF 3 "register_operand")]))]
> +  "TARGET_VECTOR"
> +  {
> +riscv_vector::expand_vec_cmp_float (operands[0], GET_CODE (operands[1]),
> + operands[2], operands[3], false);
> +DONE;
> +  }
> +)
 
Don't you want to use your shiny new operand passing style here as
with the other expanders?
 
> +  /* We have a maximum of 11 operands for RVV instruction patterns according 
> to
> +   * vector.md.  */
> +  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
> +/*FULLY_UNMASKED_P*/ false,
> +/*USE_REAL_MERGE_P*/ false, /*HAS_AVL_P*/ true,
> +/*VLMAX_P*/ true,
> +/*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
> +  e.set_policy (TAIL_ANY);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}
 
I don't think we need the same comment in each of these.  Same for
/*DEST_MODE*/ and /*MASK_MODE*/ which would be redundant if data_mode
were called dest_mode.
> +/* Expand an RVV comparison.  */
> +
> +void
> +expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx op1)
> +{
> +  machine_mode mask_mode = GET_MODE (target);
> +  machine_mode data_mode = GET_MODE (op0);
> +  insn_code icode = get_cmp_insn_code (code, data_mode);
> +
> +  if (code == LTGT)
> +{
> +  rtx gt = gen_reg_rtx (mask_mode);
> +  rtx lt = gen_reg_rtx (mask_mode);
> +  expand_vec_cmp (gt, GT, op0, op1);
> +  expand_vec_cmp (lt, LT, op0, op1);
> +  icode = code_for_pred (IOR, mask_mode);
> +  rtx ops[3] = {target, gt, lt};
> +  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
> +  return;
> +}
 
Swap lt and gt here for consistency's sake.
 
Regards
Robin
ail-LinkSize:2273655
QQMail-LineLen:76
QQMail-BreakType:1
QQMail-Key:cbdff912c7f03cb40444ad0dccf1f041
QQMail-MD5:6754fd07de754a129fff82b243962497
QQMail-LinkEnd
 
--=_Part_2195_841924464.1657529212753--0eWxlPSJjb2xvcjojMDAwMDAwIj48Zm9udCB5YWhlaT0i
Ij48c3Ryb25nPlRlbDo8L3N0cm9uZz4mbmJzcDs4Ni0yOC02ODM3MzE2NiA2ODM3MzE4OCZuYnNw
OzxiciAvPgo8c3Ryb25nPkZheDo8L3N0cm9uZz4mbmJzcDs4Ni0yOC02ODM3MzE2Ni04MDQmbmJz
cDs8YnIgLz4KPHN0cm9uZz5BZGQ6PC9zdHJvbmc+NzE4LE5vLjEwLDEgTm9ydGgsMiBSaW5nLENo
ZW5nZHUsQ2hpbmEsPC9mb250PjwvZm9udD48L2ZvbnQ+PGJyIG1pY3Jvc29mdD0iIiBzdHlsZT0i
Y29sb3I6IzAwMDAwMCIgeWFoZWk9IiIgLz4KPGZvbnQgbWljcm9zb2Z0PSIiPjxmb250IHN0eWxl
PSJjb2xvcjojMDAwMDAwIj48Zm9udCB5YWhlaT0iIj48c3Ryb25nPlBvc3RhbCBjb2RlOjwvc3Ry
b25nPjYxMDAzMTwvZm9udD48L2ZvbnQ+PC9mb250PjxiciBtaWNyb3NvZnQ9IiIgc3R5bGU9ImNv
bG9yOiMwMDAwMDAiIHlhaGVpPSIiIC8+CiZuYnNwOzxpbWcgYWx0PSIiIHNyYz0iL2VudHNvZnQv
RXRBY3Rpb24uZW50Y3JtP21ldGhvZD10ZSZtYWlsSUQ9ODgwNTgzJmFzcF9jb2Q9JmNfdGFza051
bT0iIGhlaWdodD0wIHdpZHRoPTA+PC9CT0RZPjwvSFRNTD4=
--=_Part_8340_683676631.1684738404743--
 
--=_Part_8339_2046897854.1684738404722
Content-Type: image/jpeg;name="1669700265737.jpg.jpeg"
Content-Transfer-Encoding: base64
Content-ID: <2023052214532474264814...@entsoft.net>
 
/9j/4AAQSkZJRgABAQEBLAEsAAD/2wBDAAYEBQYFBAYGBQYHBwYIChAKCgkJChQODwwQFxQYGBcU

[PATCH V3] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch enable RVV auto-vectorization including floating-point
unorder and order comparison.

The testcases are leveraged from Richard.
So include Richard as co-author.

Co-Authored-By: Richard Sandiford 

gcc/ChangeLog:

* config/riscv/autovec.md (@vcond_mask_): New pattern.
(vec_cmp): Ditto.
(vec_cmpu): Ditto.
(vcond): Ditto.
(vcondu): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add new enum.
(emit_vlmax_merge_insn): New function.
(emit_vlmax_cmp_insn): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float):Ditto.
(expand_vcond):Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_merge_insn): Ditto.
(emit_vlmax_cmp_insn): Ditto.
(get_cmp_insn_code): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float): Ditto.
(expand_vcond): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add RVV comparison testcases.
* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-3.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: New test.

---
 gcc/config/riscv/autovec.md   | 112 
 gcc/config/riscv/riscv-protos.h   |   7 +
 gcc/config/riscv/riscv-v.cc   | 266 +-
 .../riscv/rvv/autovec/cmp/vcond-1.c   | 157 +++
 .../riscv/rvv/autovec/cmp/vcond-2.c   |  75 +
 .../riscv/rvv/autovec/cmp/vcond-3.c   |  13 +
 .../riscv/rvv/autovec/cmp/vcond_run-1.c   |  49 
 .../riscv/rvv/autovec/cmp/vcond_run-2.c   |  76 +
 .../riscv/rvv/autovec/cmp/vcond_run-3.c   |   6 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 10 files changed, 756 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 04b4459222a..e0258e8b798 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -162,3 +162,115 @@
 riscv_vector::RVV_BINOP, operands);
   DONE;
 })
+
+;; =
+;; == Comparisons and selects
+;; =
+
+;; -
+;;  [INT,FP] Select based on masks
+;; -
+;; Includes merging patterns for:
+;; - vmerge.vv
+;; - vmerge.vx
+;; - vfmerge.vf
+;; -
+
+(define_expand "@vcond_mask_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand: 3 "register_operand")
+   (match_operand:V 1 "nonmemory_operand")
+   (match_operand:V 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+/* The order of vcond_mask is opposite to pred_merge.  */
+std::swap (operands[1], operands[2]);
+riscv_vector::emit_vlmax_merge_insn (code_for_pred_merge (mode),
+   riscv_vector::RVV_MERGE_OP, operands);
+DONE;
+  }
+)
+
+;; -
+;;  [INT,FP] Comparisons
+;; -
+;; Includes:
+;; - vms.
+;; -
+
+(define_expand "vec_cmp"
+  [(set (match_operand: 0 "register_operand")
+   (match_operator: 1 "comparison_operator"
+ [(match_operand:VI 2 "register_operand")
+  (match_operand:VI 3 "register_operand")]))]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
+ operands[2], operands[3]);
+DONE;
+  }
+)
+
+(define_expand "vec_cmpu"
+  [(set (match_operand: 0 "register_operand")
+   (match_operator: 1 "comparison_operator"
+ [(match_operand:VI 2 "register_operand")
+  (match_operand:VI 3 "register_operand")]))]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
+ operands[2], operands[3]);
+DONE;
+  }
+)
+
+(define_expand "vec_cmp"
+  [(set 

[PATCH 1/2] Missed opportunity to use [SU]ABD

2023-05-23 Thread Oluwatamilore Adebayo via Gcc-patches
From: oluade01 

This adds a recognition pattern for the non-widening
absolute difference (ABD).

gcc/ChangeLog:

* doc/md.texi (sabd, uabd): Document them.
* internal-fn.def (ABD): Use new optab.
* optabs.def (sabd_optab, uabd_optab): New optabs,
* tree-vect-patterns.cc (vect_recog_absolute_difference):
Recognize the following idiom abs (a - b).
(vect_recog_sad_pattern): Refactor to use
vect_recog_absolute_difference.
(vect_recog_abd_pattern): Use patterns found by
vect_recog_absolute_difference to build a new ABD
internal call.
---
 gcc/doc/md.texi   |  10 ++
 gcc/internal-fn.def   |   3 +
 gcc/optabs.def|   2 +
 gcc/tree-vect-patterns.cc | 231 +-
 4 files changed, 215 insertions(+), 31 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 
07bf8bdebffb2e523f25a41f2b57e43c0276b745..77c77c6ecb0c29ef764e914e88d1090c45fb9a9e
 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5778,6 +5778,16 @@ Other shift and rotate instructions, analogous to the
 Vector shift and rotate instructions that take vectors as operand 2
 instead of a scalar type.
 
+@cindex @code{uabd@var{m}} instruction pattern
+@cindex @code{sabd@var{m}} instruction pattern
+@item @samp{uabd@var{m}}, @samp{sabd@var{m}}
+Signed and unsigned absolute difference instructions.  These
+instructions find the difference between operands 1 and 2
+then return the absolute value.  A C code equivalent would be:
+@smallexample
+op0 = op1 > op2 ? op1 - op2 : op2 - op1;
+@end smallexample
+
 @cindex @code{avg@var{m}3_floor} instruction pattern
 @cindex @code{uavg@var{m}3_floor} instruction pattern
 @item @samp{avg@var{m}3_floor}
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 
7fe742c2ae713e7152ab05cfdfba86e4e0aa3456..0f1724ecf37a31c231572edf90b5577e2d82f468
 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -167,6 +167,9 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, fms, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (ABD, ECF_CONST | ECF_NOTHROW, first,
+ sabd, uabd, binary)
+
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
  savg_floor, uavg_floor, binary)
 DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 
695f5911b300c9ca5737de9be809fa01aabe5e01..29bc92281a2175f898634cbe6af63c18021e5268
 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -359,6 +359,8 @@ OPTAB_D (mask_fold_left_plus_optab, 
"mask_fold_left_plus_$a")
 OPTAB_D (extract_last_optab, "extract_last_$a")
 OPTAB_D (fold_extract_last_optab, "fold_extract_last_$a")
 
+OPTAB_D (uabd_optab, "uabd$a3")
+OPTAB_D (sabd_optab, "sabd$a3")
 OPTAB_D (savg_floor_optab, "avg$a3_floor")
 OPTAB_D (uavg_floor_optab, "uavg$a3_floor")
 OPTAB_D (savg_ceil_optab, "avg$a3_ceil")
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 
a49b09539776c0056e77f99b10365d0a8747fbc5..3a2248263cf67834a1cb41167a1783a3b6400014
 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -770,6 +770,93 @@ vect_split_statement (vec_info *vinfo, stmt_vec_info 
stmt2_info, tree new_rhs,
 }
 }
 
+/* Look for the following pattern
+   X = x[i]
+   Y = y[i]
+   DIFF = X - Y
+   DAD = ABS_EXPR
+
+   ABS_STMT should point to a statement of code ABS_EXPR or ABSU_EXPR.
+   If REJECT_UNSIGNED is true it aborts if the type of ABS_STMT is unsigned.
+   HALF_TYPE and UNPROM will be set should the statement be found to
+   be a widened operation.
+   DIFF_OPRNDS will be set to the two inputs of the MINUS_EXPR preceding
+   ABS_STMT, otherwise it will be set the operations found by
+   vect_widened_op_tree.
+ */
+static bool
+vect_recog_absolute_difference (vec_info *vinfo, gassign *abs_stmt,
+   tree *half_type,
+   vect_unpromoted_value unprom[2],
+   tree diff_oprnds[2])
+{
+  if (!abs_stmt)
+return false;
+
+  /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a phi
+ inside the loop (in case we are analyzing an outer-loop).  */
+  enum tree_code code = gimple_assign_rhs_code (abs_stmt);
+  if (code != ABS_EXPR && code != ABSU_EXPR)
+return false;
+
+  tree abs_oprnd = gimple_assign_rhs1 (abs_stmt);
+  tree abs_type = TREE_TYPE (abs_oprnd);
+  if (!abs_oprnd)
+return false;
+  if (!ANY_INTEGRAL_TYPE_P (abs_type)
+  || TYPE_OVERFLOW_WRAPS (abs_type)
+  || TYPE_UNSIGNED (abs_type))
+return false;
+
+  /* Peel off conversions from the ABS input.  This can involve sign
+ changes (e.g.  from an unsigned subtraction to a signed ABS input)
+ or signed promotion, but it can't include unsigned promotion.
+ (Note that ABS 

Re: [PATCH] [arm] testsuite: make mve_intrinsic_type_overloads-int.c libc-agnostic

2023-05-23 Thread Stamatis Markianos-Wright via Gcc-patches



On 23/05/2023 15:41, Christophe Lyon wrote:

Glibc defines int32_t as 'int' while newlib defines it as 'long int'.

Although these correspond to the same size, g++ complains when using the




   'wrong' version:
   invalid conversion from 'long int*' to 'int32_t*' {aka 'int*'} [-fpermissive]
or
   invalid conversion from 'int*' to 'int32_t*' {aka 'long int*'} [-fpermissive]

when calling vst1q(int32*, int32x4_t) with a first parameter of type
'long int *' (resp. 'int *')

To make this test pass with any type of toolchain, this patch defines
'word_type' according to which libc is in use.


Thank you for spotting this! I think this fix is needed on all of 
GCC12,13,trunk btw (it should apply cleanly)





2023-05-23  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c:
Support both definitions of int32_t.
---
  .../mve_intrinsic_type_overloads-int.c| 28 ++-
  1 file changed, 15 insertions(+), 13 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
index 7947dc024bc..ab51cc8b323 100644
--- 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
+++ 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
@@ -47,14 +47,22 @@ foo2 (short * addr, int16x8_t value)
vst1q (addr, value);
  }
  
-void

-foo3 (int * addr, int32x4_t value)
-{
-  vst1q (addr, value); /* { dg-warning "invalid conversion" "" { target c++ } 
} */
-}
+/* Glibc defines int32_t as 'int' while newlib defines it as 'long int'.
+
+   Although these correspond to the same size, g++ complains when using the
+   'wrong' version:
+  invalid conversion from 'long int*' to 'int32_t*' {aka 'int*'} [-fpermissive]
+
+  The trick below is to make this test pass whether using glibc-based or
+  newlib-based toolchains.  */
  
+#if defined(__GLIBC__)

+#define word_type int
+#else
+#define word_type long int
+#endif
  void
-foo4 (long * addr, int32x4_t value)
+foo3 (word_type * addr, int32x4_t value)
  {
vst1q (addr, value);
  }
@@ -78,13 +86,7 @@ foo7 (unsigned short * addr, uint16x8_t value)
  }
  
  void

-foo8 (unsigned int * addr, uint32x4_t value)
-{
-  vst1q (addr, value); /* { dg-warning "invalid conversion" "" { target c++ } 
} */
-}
-
-void
-foo9 (unsigned long * addr, uint32x4_t value)
+foo8 (unsigned word_type * addr, uint32x4_t value)
  {
vst1q (addr, value);
  }


[PATCH] [arm] testsuite: make mve_intrinsic_type_overloads-int.c libc-agnostic

2023-05-23 Thread Christophe Lyon via Gcc-patches
Glibc defines int32_t as 'int' while newlib defines it as 'long int'.

Although these correspond to the same size, g++ complains when using the




   'wrong' version:
  invalid conversion from 'long int*' to 'int32_t*' {aka 'int*'} [-fpermissive]
or
  invalid conversion from 'int*' to 'int32_t*' {aka 'long int*'} [-fpermissive]

when calling vst1q(int32*, int32x4_t) with a first parameter of type
'long int *' (resp. 'int *')

To make this test pass with any type of toolchain, this patch defines
'word_type' according to which libc is in use.

2023-05-23  Christophe Lyon  

gcc/testsuite/
* gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c:
Support both definitions of int32_t.
---
 .../mve_intrinsic_type_overloads-int.c| 28 ++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
index 7947dc024bc..ab51cc8b323 100644
--- 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
+++ 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c
@@ -47,14 +47,22 @@ foo2 (short * addr, int16x8_t value)
   vst1q (addr, value);
 }
 
-void
-foo3 (int * addr, int32x4_t value)
-{
-  vst1q (addr, value); /* { dg-warning "invalid conversion" "" { target c++ } 
} */
-}
+/* Glibc defines int32_t as 'int' while newlib defines it as 'long int'.
+
+   Although these correspond to the same size, g++ complains when using the
+   'wrong' version:
+  invalid conversion from 'long int*' to 'int32_t*' {aka 'int*'} [-fpermissive]
+
+  The trick below is to make this test pass whether using glibc-based or
+  newlib-based toolchains.  */
 
+#if defined(__GLIBC__)
+#define word_type int
+#else
+#define word_type long int
+#endif
 void
-foo4 (long * addr, int32x4_t value)
+foo3 (word_type * addr, int32x4_t value)
 {
   vst1q (addr, value);
 }
@@ -78,13 +86,7 @@ foo7 (unsigned short * addr, uint16x8_t value)
 }
 
 void
-foo8 (unsigned int * addr, uint32x4_t value)
-{
-  vst1q (addr, value); /* { dg-warning "invalid conversion" "" { target c++ } 
} */
-}
-
-void
-foo9 (unsigned long * addr, uint32x4_t value)
+foo8 (unsigned word_type * addr, uint32x4_t value)
 {
   vst1q (addr, value);
 }
-- 
2.34.1



Re: [PATCH] vect: Missed opportunity to use [SU]ABD

2023-05-23 Thread Oluwatamilore Adebayo via Gcc-patches
> > +  if (reject_unsigned && TYPE_UNSIGNED (abs_type))
> > +return false;
> > +  if (!ANY_INTEGRAL_TYPE_P (abs_type) || TYPE_OVERFLOW_WRAPS (abs_type))
> > +return false;
> 
> Could you explain the reject_unsigned behaviour?  I'd have expected
> TYPE_OVERFLOW_WRAPS (abs_type) to reject the unsigned case anyway.

When REJECT_UNSIGNED is true, cases wherein the abs type is unsigned or when the
unpromoted diff type is both not equal in precision to the abs type and unsigned
the statement is rejected.

vect_recog_absolute_difference replaces some of the logic in 
vect_recog_sad_pattern
and is used by vect_recog_abd_pattern.
vect_recog_sad_pattern aborts if the abs type is unsigned or when the unprom
diff type isn't the same precision as abs type and unsigned.
vect_recog_abd_pattern doesn't do the same, so REJECT_UNSIGNED is a flag for 
this.

I found it to be unnecessary as you suggested, so it's been dropped.

> > +  if (half_type)
> > +{
> > +  if (!SAME_TYPE (unprom[0].type, unprom[1].type))
> > +   return NULL;
> 
> I wouldn't have expected this to be unecessary.  half_type is supposed
> to be a common type that can hold all values of unprom[0].type and
> unprom[1].type.  We should just be able to do:
> 
> > +  tree diff_type = TREE_TYPE (diff_oprnds[0]);
> > +  if (TYPE_PRECISION (out_type) != TYPE_PRECISION (diff_type))
> > +   {
> > + tree vectype = get_vectype_for_scalar_type (vinfo, half_type);
> > + vect_convert_inputs (vinfo, stmt_vinfo, 2, abd_oprnds,
> > +  half_type, unprom, vectype);
> 
> ...this vect_convert_inputs unconditionally.  We need to check that
> the get_vectype_for_scalar_type call succeeds though.
> 
> So does it work as:
> 
>   if (half_type)
> {
>   tree vectype = get_vectype_for_scalar_type (vinfo, half_type);
>   if (!vectype)
> return false;
>   vect_convert_inputs (vinfo, stmt_vinfo, 2, abd_oprnds,
>half_type, unprom, vectype);
> }
> 
> ?

The proposed solution works.

> > +   }
> > +  else
> > +   {
> > + abd_oprnds[0] = diff_oprnds[0];
> > + abd_oprnds[1] = diff_oprnds[1];
> > +   }
> > +}
> > +  else
> > +{
> > +  if (unprom[0].op && unprom[1].op
> > + && (!SAME_TYPE (unprom[0].type, unprom[1].type)
> > + || !SAME_TYPE (unprom[0].type, out_type)))
> > +   return NULL;
> 
> AIUI, the !half_type case shouldn't look at unprom, since it's handling
> simple MINUS_EXPRs.  I think we can just delete this "if" statement.

If statement removed.

> > +  unprom[0].op = diff_oprnds[0];
> > +  unprom[1].op = diff_oprnds[1];
> > +  tree signed_out = signed_type_for (out_type);
> > +  tree signed_out_vectype = get_vectype_for_scalar_type (vinfo, 
> > signed_out);
> 
> We need to check for success here too.

Add a check.

> > +  vect_convert_inputs (vinfo, stmt_vinfo, 2, abd_oprnds,
> > +  signed_out, unprom, signed_out_vectype);
> > +
> > +  if (!SAME_TYPE (TREE_TYPE (diff_oprnds[0]), TREE_TYPE 
> > (abd_oprnds[0])))
> > +   return NULL;
> 
> I don't think this is needed.

Statement removed.

> > +}
> > +
> > +  if (!SAME_TYPE (TREE_TYPE (abd_oprnds[0]), TREE_TYPE (abd_oprnds[1]))
> > +  || !SAME_TYPE (TREE_TYPE (abd_oprnds[0]), out_type))
> > +return NULL;
> 
> I also don't think this is needed.  AIUI, the previous code has done
> all the necessary correctness checks.

Statements removed.

> > +  vect_pattern_detected ("vect_recog_abd_pattern", last_stmt);
> > +
> > +  tree vectype = get_vectype_for_scalar_type (vinfo, out_type);
> 
> I think instead we want the vector types computed above.  That is:
> 
> - The ABD should be done on the vector version of half_type
>   if the subtraction was on promoted inputs.  The result of
>   the ABD should then be zero-extended (using vect_convert_output)
>   to out_type.
> 
>   In particular, it's the sign of HALF_TYPE that decides whether
>   it's signed or unsigned ABD.
> 
> - The ABD should be done on the vector version of signed_outtype
>   if the subtraction was on unpromoted inputs.  We then might need
>   to sign-cast it to outtype, if outtype is unsigned.  We can
>   use vect_convert_output for that too.
> 
>   In other words, this case must use signed ABD.

In the half_type case out_type is set to be half_type.

Patch is in the next response.


RE: [PATCH] RISC-V: Fix warning of vxrm pattern

2023-05-23 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Tuesday, May 23, 2023 9:43 PM
To: juzhe.zh...@rivai.ai; gcc-patches@gcc.gnu.org
Cc: kito.ch...@gmail.com; kito.ch...@sifive.com; pal...@dabbelt.com; 
pal...@rivosinc.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Fix warning of vxrm pattern



On 5/23/23 04:09, juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
> 
> I just notice the warning:
> ../../../riscv-gcc/gcc/config/riscv/vector.md:618:1: warning: source missing 
> a mode?
> 
> gcc/ChangeLog:
> 
>  * config/riscv/vector.md: Add mode.
While I'm a big fan of the gen* warnings, I do wish they had a bit more smarts 
to avoid the missing mode warnings for arguments which are restricted to 
CONST_INTs which don't have modes.  Oh well.

OK for the trunk.

jeff


Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread Robin Dapp via Gcc-patches
> +(define_expand "vec_cmp"
> +  [(set (match_operand: 0 "register_operand")
> + (match_operator: 1 "comparison_operator"
> +   [(match_operand:VI 2 "register_operand")
> +(match_operand:VI 3 "register_operand")]))]
> +  "TARGET_VECTOR"
> +  {
> +riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
> +   operands[2], operands[3]);
> +DONE;
> +  }
> +)
> +
> +(define_expand "vec_cmpu"
> +  [(set (match_operand: 0 "register_operand")
> + (match_operator: 1 "comparison_operator"
> +   [(match_operand:VI 2 "register_operand")
> +(match_operand:VI 3 "register_operand")]))]
> +  "TARGET_VECTOR"
> +  {
> +riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
> +   operands[2], operands[3]);
> +DONE;
> +  }
> +)
> +
> +(define_expand "vec_cmp"
> +  [(set (match_operand: 0 "register_operand")
> + (match_operator: 1 "comparison_operator"
> +   [(match_operand:VF 2 "register_operand")
> +(match_operand:VF 3 "register_operand")]))]
> +  "TARGET_VECTOR"
> +  {
> +riscv_vector::expand_vec_cmp_float (operands[0], GET_CODE (operands[1]),
> + operands[2], operands[3], false);
> +DONE;
> +  }
> +)

Don't you want to use your shiny new operand passing style here as
with the other expanders?

> +  /* We have a maximum of 11 operands for RVV instruction patterns according 
> to
> +   * vector.md.  */
> +  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
> +/*FULLY_UNMASKED_P*/ false,
> +/*USE_REAL_MERGE_P*/ false, /*HAS_AVL_P*/ true,
> +/*VLMAX_P*/ true,
> +/*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
> +  e.set_policy (TAIL_ANY);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}

I don't think we need the same comment in each of these.  Same for
/*DEST_MODE*/ and /*MASK_MODE*/ which would be redundant if data_mode
were called dest_mode.
> +/* Expand an RVV comparison.  */
> +
> +void
> +expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx op1)
> +{
> +  machine_mode mask_mode = GET_MODE (target);
> +  machine_mode data_mode = GET_MODE (op0);
> +  insn_code icode = get_cmp_insn_code (code, data_mode);
> +
> +  if (code == LTGT)
> +{
> +  rtx gt = gen_reg_rtx (mask_mode);
> +  rtx lt = gen_reg_rtx (mask_mode);
> +  expand_vec_cmp (gt, GT, op0, op1);
> +  expand_vec_cmp (lt, LT, op0, op1);
> +  icode = code_for_pred (IOR, mask_mode);
> +  rtx ops[3] = {target, gt, lt};
> +  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
> +  return;
> +}

Swap lt and gt here for consistency's sake.

Regards
 Robin


[PATCH V2] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch enable RVV auto-vectorization including floating-point
unorder and order comparison.

The testcases are leveraged from Richard.
So include Richard as co-author.

Co-Authored-By: Richard Sandiford 

gcc/ChangeLog:

* config/riscv/autovec.md (@vcond_mask_): New pattern.
(vec_cmp): Ditto.
(vec_cmpu): Ditto.
(vcond): Ditto.
(vcondu): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add new enum.
(emit_vlmax_merge_insn): New function.
(emit_vlmax_cmp_insn): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float):Ditto.
(expand_vcond):Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_merge_insn): Ditto.
(emit_vlmax_cmp_insn): Ditto.
(get_cmp_insn_code): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float): Ditto.
(expand_vcond): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/rvv.exp: Add RVV comparison testcases.
* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-3.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: New test.

---
 gcc/config/riscv/autovec.md   | 112 
 gcc/config/riscv/riscv-protos.h   |   7 +
 gcc/config/riscv/riscv-v.cc   | 258 +-
 .../riscv/rvv/autovec/cmp/vcond-1.c   | 157 +++
 .../riscv/rvv/autovec/cmp/vcond-2.c   |  75 +
 .../riscv/rvv/autovec/cmp/vcond-3.c   |  13 +
 .../riscv/rvv/autovec/cmp/vcond_run-1.c   |  49 
 .../riscv/rvv/autovec/cmp/vcond_run-2.c   |  76 ++
 .../riscv/rvv/autovec/cmp/vcond_run-3.c   |   6 +
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
 10 files changed, 754 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 04b4459222a..e0258e8b798 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -162,3 +162,115 @@
 riscv_vector::RVV_BINOP, operands);
   DONE;
 })
+
+;; =
+;; == Comparisons and selects
+;; =
+
+;; -
+;;  [INT,FP] Select based on masks
+;; -
+;; Includes merging patterns for:
+;; - vmerge.vv
+;; - vmerge.vx
+;; - vfmerge.vf
+;; -
+
+(define_expand "@vcond_mask_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand: 3 "register_operand")
+   (match_operand:V 1 "nonmemory_operand")
+   (match_operand:V 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+/* The order of vcond_mask is opposite to pred_merge.  */
+std::swap (operands[1], operands[2]);
+riscv_vector::emit_vlmax_merge_insn (code_for_pred_merge (mode),
+   riscv_vector::RVV_MERGE_OP, operands);
+DONE;
+  }
+)
+
+;; -
+;;  [INT,FP] Comparisons
+;; -
+;; Includes:
+;; - vms.
+;; -
+
+(define_expand "vec_cmp"
+  [(set (match_operand: 0 "register_operand")
+   (match_operator: 1 "comparison_operator"
+ [(match_operand:VI 2 "register_operand")
+  (match_operand:VI 3 "register_operand")]))]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
+ operands[2], operands[3]);
+DONE;
+  }
+)
+
+(define_expand "vec_cmpu"
+  [(set (match_operand: 0 "register_operand")
+   (match_operator: 1 "comparison_operator"
+ [(match_operand:VI 2 "register_operand")
+  (match_operand:VI 3 "register_operand")]))]
+  "TARGET_VECTOR"
+  {
+riscv_vector::expand_vec_cmp (operands[0], GET_CODE (operands[1]),
+ operands[2], operands[3]);
+DONE;
+  }
+)
+
+(define_expand "vec_cmp"
+  [(set 

Re: [PATCH] RISC-V: Fix warning of vxrm pattern

2023-05-23 Thread Jeff Law via Gcc-patches




On 5/23/23 04:09, juzhe.zh...@rivai.ai wrote:

From: Juzhe-Zhong 

I just notice the warning:
../../../riscv-gcc/gcc/config/riscv/vector.md:618:1: warning: source missing a 
mode?

gcc/ChangeLog:

 * config/riscv/vector.md: Add mode.
While I'm a big fan of the gen* warnings, I do wish they had a bit more 
smarts to avoid the missing mode warnings for arguments which are 
restricted to CONST_INTs which don't have modes.  Oh well.


OK for the trunk.

jeff


Re: [COMMITTED] Remove buggy special case in irange::invert [PR109934].

2023-05-23 Thread Aldy Hernandez via Gcc-patches
BTW, we should probably backport this to god knows how many branches.

Aldy

On Tue, May 23, 2023 at 2:58 PM Aldy Hernandez  wrote:
>
> [Andrew, do you remotely remember what if anything this did?  It came
> from a wholesale merge from our long forgotten branch, so there's no
> history on the specifics of it.  Not important, I'm just curious.  It
> was probably me high on something.]
>
> This patch removes a buggy special case in irange::invert which seems
> to have been broken for a while, and probably never triggered because
> the legacy code was handled elsewhere, and the non-legacy code was
> using an int_range_max of int_range<255> which made it extremely
> likely for num_ranges == 255.  However, with auto-resizing ranges,
> int_range_max will start off at 3 and can hit this bogus code in the
> unswitching code.
>
> PR tree-optimization/109934
>
> gcc/ChangeLog:
>
> * value-range.cc (irange::invert): Remove buggy special case.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr109934.c: New test.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr109934.c | 22 ++
>  gcc/value-range.cc   |  8 
>  2 files changed, 22 insertions(+), 8 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109934.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109934.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr109934.c
> new file mode 100644
> index 000..08bd5ce95c6
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109934.c
> @@ -0,0 +1,22 @@
> +// { dg-do run }
> +// { dg-options "-O3" }
> +
> +int printf(const char *, ...);
> +short a;
> +long b = 3, c;
> +int d(int e) {
> +  switch (e)
> +  case 111:
> +  case 222:
> +  case 44:
> +return 0;
> +  return e;
> +}
> +int main() {
> +  for (; a >= 0; --a)
> +if (d(c + 23) - 23)
> +  b = 0;
> +
> +  if (b != 3)
> +__builtin_abort ();
> +}
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index 45b1e655967..874a1843ebf 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -1650,14 +1650,6 @@ irange::invert ()
>wide_int type_min = wi::min_value (prec, sign);
>wide_int type_max = wi::max_value (prec, sign);
>m_nonzero_mask = wi::minus_one (prec);
> -  if (m_num_ranges == m_max_ranges
> -  && lower_bound () != type_min
> -  && upper_bound () != type_max)
> -{
> -  m_base[1] = type_max;
> -  m_num_ranges = 1;
> -  return;
> -}
>
>// At this point, we need one extra sub-range to represent the
>// inverse.
> --
> 2.40.1
>



[COMMITTED] Remove buggy special case in irange::invert [PR109934].

2023-05-23 Thread Aldy Hernandez via Gcc-patches
[Andrew, do you remotely remember what if anything this did?  It came
from a wholesale merge from our long forgotten branch, so there's no
history on the specifics of it.  Not important, I'm just curious.  It
was probably me high on something.]

This patch removes a buggy special case in irange::invert which seems
to have been broken for a while, and probably never triggered because
the legacy code was handled elsewhere, and the non-legacy code was
using an int_range_max of int_range<255> which made it extremely
likely for num_ranges == 255.  However, with auto-resizing ranges,
int_range_max will start off at 3 and can hit this bogus code in the
unswitching code.

PR tree-optimization/109934

gcc/ChangeLog:

* value-range.cc (irange::invert): Remove buggy special case.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr109934.c: New test.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr109934.c | 22 ++
 gcc/value-range.cc   |  8 
 2 files changed, 22 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109934.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109934.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr109934.c
new file mode 100644
index 000..08bd5ce95c6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109934.c
@@ -0,0 +1,22 @@
+// { dg-do run }
+// { dg-options "-O3" }
+
+int printf(const char *, ...);
+short a;
+long b = 3, c;
+int d(int e) {
+  switch (e)
+  case 111:
+  case 222:
+  case 44:
+return 0;
+  return e;
+}
+int main() {
+  for (; a >= 0; --a)
+if (d(c + 23) - 23)
+  b = 0;
+
+  if (b != 3)
+__builtin_abort ();
+}
diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index 45b1e655967..874a1843ebf 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -1650,14 +1650,6 @@ irange::invert ()
   wide_int type_min = wi::min_value (prec, sign);
   wide_int type_max = wi::max_value (prec, sign);
   m_nonzero_mask = wi::minus_one (prec);
-  if (m_num_ranges == m_max_ranges
-  && lower_bound () != type_min
-  && upper_bound () != type_max)
-{
-  m_base[1] = type_max;
-  m_num_ranges = 1;
-  return;
-}
 
   // At this point, we need one extra sub-range to represent the
   // inverse.
-- 
2.40.1



[patch]: Implement PR104327 for avr

2023-05-23 Thread Georg-Johann Lay

PR target/104327 not only affects s390 but also avr:
The avr backend pre-sets some options depending on optimization level.
The inliner then thinks that always_inline functions are not eligible
for inlining and terminates with an error.

Proposing the following patch that implements TARGET_CAN_INLINE_P.

Ok to apply?

Johann

--

target/104327: Allow more inlining between different optimization levels.

avr-common.cc introduces the following options that are set depending
on optimization level: -mgas-isr-prologues, -mmain-is-OS-task and
-fsplit-wide-types-early.  The inliner thinks that different options
disallow cross-optimization inlining, so provide can_inline_p.

gcc/
PR target/104327
* config/avr/avr.cc (avr_can_inline_p): New static function.
(TARGET_CAN_INLINE_P): Define to that function.
diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index 9fa50ca230d..55b48f63865 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -1018,6 +1018,22 @@ avr_no_gccisr_function_p (tree func)
   return avr_lookup_function_attribute1 (func, "no_gccisr");
 }

+
+/* Implement `TARGET_CAN_INLINE_P'.  */
+/* Some options like -mgas_isr_prologues depend on optimization level,
+   and the inliner might think that due to different options, inlining
+   is not permitted; see PR104327.  */
+
+static bool
+avr_can_inline_p (tree /* caller */, tree callee)
+{
+  // For now, dont't allow to inline ISRs.  If the user actually wants
+  // to inline ISR code, they have to turn the body of the ISR into an
+  // ordinary function.
+
+  return ! avr_interrupt_function_p (callee);
+}
+
 /* Implement `TARGET_SET_CURRENT_FUNCTION'.  */
 /* Sanity cheching for above function attributes.  */

@@ -14713,6 +14729,9 @@ avr_float_lib_compare_returns_bool (machine_mode 
mode, enum rtx_code)

 #undef  TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST avr_md_asm_adjust

+#undef  TARGET_CAN_INLINE_P
+#define TARGET_CAN_INLINE_P avr_can_inline_p
+
 struct gcc_target targetm = TARGET_INITIALIZER;

 



Re: [PATCH V12] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-23 Thread juzhe.zh...@rivai.ai
Bootstrap on X86 passed.
Ok for trunk?

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-22 16:38
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH V12] VECT: Add decrement IV iteration loop control by variable 
amount support
From: Ju-Zhe Zhong 
 
gcc/ChangeLog:
 
* tree-vect-loop-manip.cc (vect_adjust_loop_lens_control): New function.
(vect_set_loop_controls_directly): Add decrement IV support.
(vect_set_loop_condition_partial_vectors): Ditto.
* tree-vect-loop.cc: Ditto.
* tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New macro.
 
---
gcc/tree-vect-loop-manip.cc | 184 +++-
gcc/tree-vect-loop.cc   |  10 ++
gcc/tree-vectorizer.h   |   8 ++
3 files changed, 199 insertions(+), 3 deletions(-)
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index ff6159e08d5..94b38d1e0fb 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -385,6 +385,66 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
rgroup_controls *dest_rgm,
   return false;
}
+/* Try to use adjust loop lens for non-SLP multiple-rgroups.
+
+ _36 = MIN_EXPR ;
+
+ First length (MIN (X, VF/N)):
+   loop_len_15 = MIN_EXPR <_36, VF/N>;
+
+ Second length:
+   tmp = _36 - loop_len_15;
+   loop_len_16 = MIN (tmp, VF/N);
+
+ Third length:
+   tmp2 = tmp - loop_len_16;
+   loop_len_17 = MIN (tmp2, VF/N);
+
+ Last length:
+   loop_len_18 = tmp2 - loop_len_17;
+*/
+
+static void
+vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
+rgroup_controls *dest_rgm,
+rgroup_controls *src_rgm, tree step)
+{
+  tree ctrl_type = dest_rgm->type;
+  poly_uint64 nitems_per_ctrl
+= TYPE_VECTOR_SUBPARTS (ctrl_type) * dest_rgm->factor;
+  tree length_limit = build_int_cst (iv_type, nitems_per_ctrl);
+
+  for (unsigned int i = 0; i < dest_rgm->controls.length (); ++i)
+{
+  if (!step)
+ step = src_rgm->controls[i / dest_rgm->controls.length ()];
+  tree ctrl = dest_rgm->controls[i];
+  if (i == 0)
+ {
+   /* First iteration: MIN (X, VF/N) capped to the range [0, VF/N].  */
+   gassign *assign
+ = gimple_build_assign (ctrl, MIN_EXPR, step, length_limit);
+   gimple_seq_add_stmt (seq, assign);
+ }
+  else if (i == dest_rgm->controls.length () - 1)
+ {
+   /* Last iteration: Remain capped to the range [0, VF/N].  */
+   gassign *assign = gimple_build_assign (ctrl, MINUS_EXPR, step,
+ dest_rgm->controls[i - 1]);
+   gimple_seq_add_stmt (seq, assign);
+ }
+  else
+ {
+   /* (MIN (remain, VF*I/N)) capped to the range [0, VF/N].  */
+   step = gimple_build (seq, MINUS_EXPR, iv_type, step,
+dest_rgm->controls[i - 1]);
+   gassign *assign
+ = gimple_build_assign (ctrl, MIN_EXPR, step, length_limit);
+   gimple_seq_add_stmt (seq, assign);
+ }
+}
+}
+
/* Helper for vect_set_loop_condition_partial_vectors.  Generate definitions
for all the rgroup controls in RGC and return a control that is nonzero
when the loop needs to iterate.  Add any new preheader statements to
@@ -468,9 +528,78 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   standard_iv_increment_position (loop, _gsi, _after);
-  create_iv (build_int_cst (iv_type, 0), PLUS_EXPR, nitems_step, NULL_TREE,
-  loop, _gsi, insert_after, _before_incr,
-  _after_incr);
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  nitems_total = gimple_convert (preheader_seq, iv_type, nitems_total);
+  tree step = make_ssa_name (iv_type);
+  /* Create decrement IV.  */
+  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, _gsi,
+ insert_after, _before_incr, _after_incr);
+  tree temp = gimple_build (header_seq, MIN_EXPR, iv_type,
+ index_before_incr, nitems_step);
+  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, temp));
+
+  if (rgc->max_nscalars_per_iter == 1)
+ {
+   /* single rgroup:
+  ...
+  _10 = (unsigned long) count_12(D);
+  ...
+  # ivtmp_9 = PHI 
+  _36 = MIN_EXPR ;
+  ...
+  vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
+  ...
+  ivtmp_35 = ivtmp_9 - _36;
+  ...
+  if (ivtmp_35 != 0)
+goto ; [83.33%]
+  else
+goto ; [16.67%]
+   */
+   gassign *assign = gimple_build_assign (rgc->controls[0], step);
+   gimple_seq_add_stmt (header_seq, assign);
+ }
+  else
+ {
+   /* Multiple rgroup (SLP):
+  ...
+  _38 = (unsigned long) bnd.7_29;
+  _39 = _38 * 2;
+  ...
+  # ivtmp_41 = PHI 
+  ...
+  _43 = MIN_EXPR ;
+  loop_len_26 = MIN_EXPR <_43, 16>;
+  loop_len_25 = _43 - loop_len_26;
+  ...
+  .LEN_STORE (_6, 8B, loop_len_26, ...);
+  ...
+  .LEN_STORE (_25, 8B, loop_len_25, ...);
+  _33 = loop_len_26 / 2;
+  ...
+  .LEN_STORE (_8, 16B, _33, ...);
+  _36 = 

Re: [PATCH] Fix type error of 'switch (SUBREG_BYTE (op)).'

2023-05-23 Thread Richard Sandiford via Gcc-patches
Jeff Law via Gcc-patches  writes:
> On 5/17/23 03:03, Jin Ma wrote:
>> For example:
>> (define_insn "mov_lowpart_sidi2"
>>[(set (match_operand:SI0 "register_operand" "=r")
>>  (subreg:SI (match_operand:DI 1 "register_operand" " r") 0))]
>>"TARGET_64BIT"
>>"mov\t%0,%1")
>> 
>> (define_insn "mov_highpart_sidi2"
>>[(set (match_operand:SI0 "register_operand" "=r")
>>  (subreg:SI (match_operand:DI 1 "register_operand" " r") 1))]
>>"TARGET_64BIT"
>>"movh\t%0,%1")
>> 
>> When defining the above patterns, the generated file insn-recog.cc will
>> appear 'switch (SUBREG_BYTE (op))', but since the return value of
>> SUBREG_BYTE is poly_uint16_pod, the following error will occur:
>> "error: switch quantity not an integer".
>> 
>> gcc/ChangeLog:
>> 
>>  * genrecog.cc (print_nonbool_test): Fix type error of
>>  'switch (SUBREG_BYTE (op))'.
> Thanks.  Installed.

We shouldn't add to_constant just because it's a convenient
way of getting rid of errors :)  There has to be a good reason
in principle why the value is known at compile time.

So I think this should be reverted.  Nothing guarantees that
SUBREG_BYTEs are constant on AArch64 and RISC-V.  And for SVE
it's common for them not to be.

If we want to support the above, I think we need to make the
generator use known_eq instead.

The patterns don't look right though.  An SI subreg of a DI
can't have a SUBREG_BYTE of 1.  And the lowpart SUBREG_BYTE
depends on endianness.  So I think a better way of writing
the lowpart pattern above is to use subreg_lowpart_operator
(which riscv already has).

The high part can't be done using subregs though.

Thanks,
Richard





Re: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread Richard Sandiford via Gcc-patches
"juzhe.zh...@rivai.ai"  writes:
> Yeah. I know. 
> Like ARM does everywhere:
> (define_expand "vcond"
>   [(set (match_operand:SVE_ALL 0 "register_operand")
>   (if_then_else:SVE_ALL
> (match_operator 3 "comparison_operator"
>   [(match_operand:SVE_I 4 "register_operand")
>(match_operand:SVE_I 5 "nonmemory_operand")])
> (match_operand:SVE_ALL 1 "nonmemory_operand")
> (match_operand:SVE_ALL 2 "nonmemory_operand")))]
>   "TARGET_SVE &&  == "
>   {
> aarch64_expand_sve_vcond (mode, mode, operands);
> DONE;
>   }
> )
>
> passing "operands" looks codes much cleaner.

FWIW, I think we only do that when we're reusing optab patterns.
The handling of operand 3 is forced by the definition of vcond_optab.

When there's a choice, we generally use "@" patterns instead, and
pass codes and modes to the expander.

Thanks,
Richard


Re: [PATCH] c-family: implement -ffp-contract=on

2023-05-23 Thread Alexander Monakov via Gcc-patches


On Tue, 23 May 2023, Richard Biener wrote:
> > Ah, no, I deliberately decided against that, because that way we would go
> > via gimplify_arg, which would emit all side effects in *pre_p. That seems
> > wrong if arguments had side-effects that should go in *post_p.
> 
> Ah, true - that warrants a comment though.

Incrementally fixed up in my tree like this:

diff --git a/gcc/c-family/c-gimplify.cc b/gcc/c-family/c-gimplify.cc
index f7635d3b0c..17b0610a89 100644
--- a/gcc/c-family/c-gimplify.cc
+++ b/gcc/c-family/c-gimplify.cc
@@ -803,6 +803,7 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
ATTRIBUTE_UNUSED,
else
  ops[2] = build1 (NEGATE_EXPR, type, ops[2]);
  }
+   /* Avoid gimplify_arg: it emits all side effects into *PRE_P.  */
for (auto & : ops)
  if (gimplify_expr (, pre_p, post_p, is_gimple_val, fb_rvalue)
  == GS_ERROR)

Alexander


RE: [PATCH] arm: Fix ICE due to infinite splitting [PR109800]

2023-05-23 Thread Kyrylo Tkachov via Gcc-patches
Hi Alex,

> -Original Message-
> From: Alex Coplan 
> Sent: Thursday, May 11, 2023 12:15 PM
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Richard Earnshaw ;
> Ramana Radhakrishnan ; Kyrylo Tkachov
> 
> Subject: [PATCH] arm: Fix ICE due to infinite splitting [PR109800]
> 
> Hi,
> 
> In r11-966-g9a182ef9ee011935d827ab5c6c9a7cd8e22257d8 we introduce a
> simplification to emit_move_insn that attempts to simplify moves of the
> form:
> 
> (set (subreg:M1 (reg:M2 ...)) (constant C))
> 
> where M1 and M2 are of equal mode size. That is problematic for the splitter
> vfp.md:no_literal_pool_df_immediate in the arm backend, which tries to pun
> an
> lvalue DFmode pseudo into DImode and assign a constant to it with
> emit_move_insn, as the new transformation simply undoes this, and we end
> up
> splitting indefinitely.
> 
> This patch changes things around in the arm backend so that we use a
> DImode temporary (instead of DFmode) and first load the DImode constant
> into the pseudo, and then pun the pseudo into DFmode as an rvalue in a
> reg -> reg move. I believe this should be semantically equivalent but
> avoids the pathalogical behaviour seen in the PR.
> 
> Bootstrapped/regtested on arm-linux-gnueabihf, regtested on
> arm-none-eabi and armeb-none-eabi.
> 
> OK for trunk and backports?

Ok but the testcase...

> 
> Thanks,
> Alex
> 
> gcc/ChangeLog:
> 
>   PR target/109800
>   * config/arm/arm.md (movdf): Generate temporary pseudo in
> DImode
>   instead of DFmode.
>   * config/arm/vfp.md (no_literal_pool_df_immediate): Rather than
> punning an
>   lvalue DFmode pseudo into DImode, use a DImode pseudo and pun it
> into
>   DFmode as an rvalue.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR target/109800
>   * gcc.target/arm/pr109800.c: New test.

diff --git a/gcc/testsuite/gcc.target/arm/pr109800.c 
b/gcc/testsuite/gcc.target/arm/pr109800.c
new file mode 100644
index 000..71d1ede13dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr109800.c
@@ -0,0 +1,3 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv7-m -mfloat-abi=hard -mfpu=fpv4-sp-d16 
-mbig-endian -mpure-code" } */
+double f() { return 5.0; }

... The arm testsuite options are kinda hard to get right with all the 
effective targets and multilibs and such hardcoded abi and march options tend 
to break in some target.
I suggest you put this testcase in gcc.target/arm/pure-code and add a 
dg-skip-if to skip the test if the multilib options specify a different 
float-abi.

Thanks,
Kyrill


Re: [PATCH 2/2] aarch64: Provide FPR alternatives for some bit insertions [PR109632]

2023-05-23 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> On Tue, May 23, 2023 at 12:38 PM Richard Sandiford via Gcc-patches
>  wrote:
>>
>> At -O2, and so with SLP vectorisation enabled:
>>
>> struct complx_t { float re, im; };
>> complx_t add(complx_t a, complx_t b) {
>>   return {a.re + b.re, a.im + b.im};
>> }
>>
>> generates:
>>
>> fmovw3, s1
>> fmovx0, d0
>> fmovx1, d2
>> fmovw2, s3
>> bfi x0, x3, 32, 32
>> fmovd31, x0
>> bfi x1, x2, 32, 32
>> fmovd30, x1
>> faddv31.2s, v31.2s, v30.2s
>> fmovx1, d31
>> lsr x0, x1, 32
>> fmovs1, w0
>> lsr w0, w1, 0
>> fmovs0, w0
>> ret
>>
>> This is because complx_t is passed and returned in FPRs, but GCC gives
>> it DImode.
>
> Isn't that the choice of the target?  Of course "FPRs" might mean a
> single FPR here and arguably DFmode would be similarly bad?

Yeah, the problem is really the "single register" aspect, rather than
the exact choice of mode.  We're happy to store DImode values in FPRs
if it makes sense (and we will do for this example, after the patch).

V2SFmode or DFmode would be just as bad, like you say.

> That said, to the ppc folks who also recently tried to change how
> argument passing materializes I suggested to piggy-back on a
> SRA style analysis (you could probably simply build the access
> tree for all function parameters using its infrastructure) to drive
> RTL expansion heuristics (it's heuristics after all...) what exact
> (set of) pseudo / stack slot we want to form from the actual
> argument hard registers.

My long-term plan is to allow a DECL_RTL to be a PARALLEL of pseudos,
just like the DECL_INCOMING_RTL of a PARM_DECL can be a PARALLEL of
hard registers.  This also makes it possible to store things that are
currently BLKmode (but still passed and returned in registers).
E.g. it means that a 12-byte structure can be stored in registers
rather than being forced to the stack frame.

The idea (at least at first) is to handle only those cases that
make sense from an ABI point of view.  We'd still be relying on
SRA to split up operations on individual fields.

I have a WIP patch that gives some promising improvements,
but it needs more time.

Thanks,
Richard


[PATCH] Dump ANTIC_OUT before pruning it

2023-05-23 Thread Richard Biener via Gcc-patches
This dumps ANTIC_OUT before pruning clobbered mems from it as part
of the ANTIC_IN compute.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

* tree-ssa-pre.cc (compute_antic_aux): Dump the correct
ANTIC_OUT.
---
 gcc/tree-ssa-pre.cc | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index d56431b4145..b1ceea90a8e 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -2216,6 +2216,10 @@ compute_antic_aux (basic_block block, bool 
block_has_abnormal_pred_edge)
}
 }
 
+  /* Dump ANTIC_OUT before it's pruned.  */
+  if (dump_file && (dump_flags & TDF_DETAILS))
+print_bitmap_set (dump_file, ANTIC_OUT, "ANTIC_OUT", block->index);
+
   /* Prune expressions that are clobbered in block and thus become
  invalid if translated from ANTIC_OUT to ANTIC_IN.  */
   prune_clobbered_mems (ANTIC_OUT, block);
@@ -2270,9 +2274,6 @@ compute_antic_aux (basic_block block, bool 
block_has_abnormal_pred_edge)
  maybe_dump_sets:
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
-  if (ANTIC_OUT)
-   print_bitmap_set (dump_file, ANTIC_OUT, "ANTIC_OUT", block->index);
-
   if (changed)
fprintf (dump_file, "[changed] ");
   print_bitmap_set (dump_file, ANTIC_IN (block), "ANTIC_IN",
-- 
2.35.3


Re: [PATCH 2/2] aarch64: Provide FPR alternatives for some bit insertions [PR109632]

2023-05-23 Thread Richard Biener via Gcc-patches
On Tue, May 23, 2023 at 12:38 PM Richard Sandiford via Gcc-patches
 wrote:
>
> At -O2, and so with SLP vectorisation enabled:
>
> struct complx_t { float re, im; };
> complx_t add(complx_t a, complx_t b) {
>   return {a.re + b.re, a.im + b.im};
> }
>
> generates:
>
> fmovw3, s1
> fmovx0, d0
> fmovx1, d2
> fmovw2, s3
> bfi x0, x3, 32, 32
> fmovd31, x0
> bfi x1, x2, 32, 32
> fmovd30, x1
> faddv31.2s, v31.2s, v30.2s
> fmovx1, d31
> lsr x0, x1, 32
> fmovs1, w0
> lsr w0, w1, 0
> fmovs0, w0
> ret
>
> This is because complx_t is passed and returned in FPRs, but GCC gives
> it DImode.

Isn't that the choice of the target?  Of course "FPRs" might mean a
single FPR here and arguably DFmode would be similarly bad?

That said, to the ppc folks who also recently tried to change how
argument passing materializes I suggested to piggy-back on a
SRA style analysis (you could probably simply build the access
tree for all function parameters using its infrastructure) to drive
RTL expansion heuristics (it's heuristics after all...) what exact
(set of) pseudo / stack slot we want to form from the actual
argument hard registers.

> We therefore “need” to assemble a DImode pseudo from the
> two individual floats, bitcast it to a vector, do the arithmetic,
> bitcast it back to a DImode pseudo, then extract the individual floats.
>
> There are many problems here.  The most basic is that we shouldn't
> use SLP for such a trivial example.  But SLP should in principle be
> beneficial for more complicated examples, so preventing SLP for the
> example above just changes the reproducer needed.  A more fundamental
> problem is that it doesn't make sense to use single DImode pseudos in a
> testcase like this.  I have a WIP patch to allow re and im to be stored
> in individual SFmode pseudos instead, but it's quite an invasive change
> and might end up going nowhere.
>
> A simpler problem to tackle is that we allow DImode pseudos to be stored
> in FPRs, but we don't provide any patterns for inserting values into
> them, even though INS makes that easy for element-like insertions.
> This patch adds some patterns for that.
>
> Doing that showed that aarch64_modes_tieable_p was too strict:
> it didn't allow SFmode and DImode values to be tied, even though
> both of them occupy a single GPR and FPR, and even though we allow
> both classes to change between the modes.
>
> The *aarch64_bfidi_subreg_ pattern is
> especially ugly, but it's not clear what target-independent
> code ought to simplify it to, if it was going to simplify it.
>
> We should probably do the same thing for extractions, but that's left
> as future work.
>
> After the patch we generate:
>
> ins v0.s[1], v1.s[0]
> ins v2.s[1], v3.s[0]
> faddv0.2s, v0.2s, v2.2s
> fmovx0, d0
> ushrd1, d0, 32
> lsr w0, w0, 0
> fmovs0, w0
> ret
>
> which seems like a step in the right direction.
>
> All in all, there's nothing elegant about this patchh.  It just
> seems like the least worst option.
>
> Tested on aarch64-linux-gnu and aarch64_be-elf (including ILP32).
> Pushed to trunk.
>
> Richard
>
>
> gcc/
> PR target/109632
> * config/aarch64/aarch64.cc (aarch64_modes_tieable_p): Allow
> subregs between any scalars that are 64 bits or smaller.
> * config/aarch64/iterators.md (SUBDI_BITS): New int iterator.
> (bits_etype): New int attribute.
> * config/aarch64/aarch64.md (*insv_reg_)
> (*aarch64_bfi_): New patterns.
> (*aarch64_bfidi_subreg_): Likewise.
>
> gcc/testsuite/
> * gcc.target/aarch64/ins_bitfield_1.c: New test.
> * gcc.target/aarch64/ins_bitfield_2.c: Likewise.
> * gcc.target/aarch64/ins_bitfield_3.c: Likewise.
> * gcc.target/aarch64/ins_bitfield_4.c: Likewise.
> * gcc.target/aarch64/ins_bitfield_5.c: Likewise.
> * gcc.target/aarch64/ins_bitfield_6.c: Likewise.
> ---
>  gcc/config/aarch64/aarch64.cc |  12 ++
>  gcc/config/aarch64/aarch64.md |  62 +++
>  gcc/config/aarch64/iterators.md   |   4 +
>  .../gcc.target/aarch64/ins_bitfield_1.c   | 142 
>  .../gcc.target/aarch64/ins_bitfield_2.c   | 142 
>  .../gcc.target/aarch64/ins_bitfield_3.c   | 156 ++
>  .../gcc.target/aarch64/ins_bitfield_4.c   | 156 ++
>  .../gcc.target/aarch64/ins_bitfield_5.c   | 139 
>  .../gcc.target/aarch64/ins_bitfield_6.c   | 139 
>  9 files changed, 952 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_2.c
>  create mode 100644 

[PATCH 1/2] md: Allow to refer to the value of int iterator FOO

2023-05-23 Thread Richard Sandiford via Gcc-patches
In a follow-up patch, I wanted to use an int iterator to iterate
over various possible values of a const_int.  But one problem
with int iterators was that there was no way of referring to the
current value of the iterator.  This is unlike modes and codes,
which provide automatic "mode", "MODE", "code" and "CODE"
attribbutes.  These automatic definitions are the equivalent
of an explicit:

  (define_mode_attr mode [(QI "qi") (HI "hi") ...])

We obviously can't do that for every possible value of an int.

One option would have been to go for some kind of lazily-populated
attribute.  But that sounds quite complicated.  This patch instead
goes for the simpler approach of allowing  to refer to the
current value of FOO.

In principle it would be possible to allow the same thing
for mode and code iterators.  But for modes there are at least
4 realistic possiblities:

  - the E_* enumeration value (which is what this patch would give)
  - the user-facing C token, like DImode, SFmode, etc.
  - the equivalent of 
  - the equivalent of 

Because of this ambiguity, it seemed better to stick to the
current approach for modes.  For codes it's less clear-cut,
but  and  are both realistic possibilities, so again
it seemed better to be explicit.

The patch also removes “Each @var{int} must have the same rtx format.
@xref{RTL Classes}.”, which was erroneously copied from the code
iterator section.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Also tested by
checking that no port had an int iterator whose name matched an
existing attribute.  Pushed to trunk.

Richard


gcc/
* doc/md.texi: Document that  can be used to refer to the
numerical value of an int iterator FOO.  Tweak other parts of
the int iterator documentation.
* read-rtl.cc (iterator_group::has_self_attr): New field.
(map_attr_string): When has_self_attr is true, make 
expand to the current value of iterator FOO.
(initialize_iterators): Set has_self_attr for int iterators.
---
 gcc/doc/md.texi | 15 +--
 gcc/read-rtl.cc | 26 ++
 2 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8ebce31ba78..6a435eb4461 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -11502,11 +11502,10 @@ The construct:
 @end smallexample
 
 defines a pseudo integer constant @var{name} that can be instantiated as
-@var{inti} if condition @var{condi} is true.  Each @var{int} must have the
-same rtx format.  @xref{RTL Classes}.  Int iterators can appear in only
-those rtx fields that have 'i', 'n', 'w', or 'p' as the specifier.  This
-means that each @var{int} has to be a constant defined using define_constant
-or define_c_enum.
+@var{inti} if condition @var{condi} is true.  Int iterators can appear in
+only those rtx fields that have `i', `n', `w', or `p' as the specifier.
+This means that each @var{int} has to be a constant defined using
+@samp{define_constant} or @samp{define_c_enum}.
 
 As with mode and code iterators, each pattern that uses @var{name} will be
 expanded @var{n} times, once with all uses of @var{name} replaced by
@@ -11517,9 +11516,13 @@ It is possible to define attributes for ints as well 
as for codes and modes.
 Attributes are defined using:
 
 @smallexample
-(define_int_attr @var{name} [(@var{int1} "@var{value1}") @dots{} (@var{intn} 
"@var{valuen}")])
+(define_int_attr @var{attr_name} [(@var{int1} "@var{value1}") @dots{} 
(@var{intn} "@var{valuen}")])
 @end smallexample
 
+In additon to these user-defined attributes, it is possible to use
+@samp{<@var{name}>} to refer to the current expansion of iterator
+@var{name} (such as @var{int1}, @var{int2}, and so on).
+
 Here's an example of int iterators in action, taken from the ARM port:
 
 @smallexample
diff --git a/gcc/read-rtl.cc b/gcc/read-rtl.cc
index 8cb25aebdbb..292f8b72d43 100644
--- a/gcc/read-rtl.cc
+++ b/gcc/read-rtl.cc
@@ -80,6 +80,12 @@ struct iterator_group {
 
   /* Return the C token for the given standard mode, code, etc.  */
   const char *(*get_c_token) (int);
+
+  /* True if each iterator name should be treated as an attribute that
+ maps to the C token produced by get_c_token.  This means that for
+ an iterator ITER,  can be used in strings to refer to the
+ current value of ITER, as a C token.  */
+  bool has_self_attr;
 };
 
 /* Records one use of an iterator.  */
@@ -472,6 +478,25 @@ map_attr_string (file_location loc, const char *p, mapping 
**iterator_out = 0)
  || iterator->name[iterator_name_len] != 0))
continue;
 
+  if (iterator->group->has_self_attr
+ && strcmp (attr, iterator->name) == 0)
+   {
+ if (iterator_out)
+   *iterator_out = iterator;
+ int number = iterator->current_value->number;
+ const char *string = iterator->group->get_c_token (number);
+ if (res && strcmp (string, res->string) != 0)
+   {
+ error_at (loc, "ambiguous 

[PATCH 2/2] aarch64: Provide FPR alternatives for some bit insertions [PR109632]

2023-05-23 Thread Richard Sandiford via Gcc-patches
At -O2, and so with SLP vectorisation enabled:

struct complx_t { float re, im; };
complx_t add(complx_t a, complx_t b) {
  return {a.re + b.re, a.im + b.im};
}

generates:

fmovw3, s1
fmovx0, d0
fmovx1, d2
fmovw2, s3
bfi x0, x3, 32, 32
fmovd31, x0
bfi x1, x2, 32, 32
fmovd30, x1
faddv31.2s, v31.2s, v30.2s
fmovx1, d31
lsr x0, x1, 32
fmovs1, w0
lsr w0, w1, 0
fmovs0, w0
ret

This is because complx_t is passed and returned in FPRs, but GCC gives
it DImode.  We therefore “need” to assemble a DImode pseudo from the
two individual floats, bitcast it to a vector, do the arithmetic,
bitcast it back to a DImode pseudo, then extract the individual floats.

There are many problems here.  The most basic is that we shouldn't
use SLP for such a trivial example.  But SLP should in principle be
beneficial for more complicated examples, so preventing SLP for the
example above just changes the reproducer needed.  A more fundamental
problem is that it doesn't make sense to use single DImode pseudos in a
testcase like this.  I have a WIP patch to allow re and im to be stored
in individual SFmode pseudos instead, but it's quite an invasive change
and might end up going nowhere.

A simpler problem to tackle is that we allow DImode pseudos to be stored
in FPRs, but we don't provide any patterns for inserting values into
them, even though INS makes that easy for element-like insertions.
This patch adds some patterns for that.

Doing that showed that aarch64_modes_tieable_p was too strict:
it didn't allow SFmode and DImode values to be tied, even though
both of them occupy a single GPR and FPR, and even though we allow
both classes to change between the modes.

The *aarch64_bfidi_subreg_ pattern is
especially ugly, but it's not clear what target-independent
code ought to simplify it to, if it was going to simplify it.

We should probably do the same thing for extractions, but that's left
as future work.

After the patch we generate:

ins v0.s[1], v1.s[0]
ins v2.s[1], v3.s[0]
faddv0.2s, v0.2s, v2.2s
fmovx0, d0
ushrd1, d0, 32
lsr w0, w0, 0
fmovs0, w0
ret

which seems like a step in the right direction.

All in all, there's nothing elegant about this patchh.  It just
seems like the least worst option.

Tested on aarch64-linux-gnu and aarch64_be-elf (including ILP32).
Pushed to trunk.

Richard


gcc/
PR target/109632
* config/aarch64/aarch64.cc (aarch64_modes_tieable_p): Allow
subregs between any scalars that are 64 bits or smaller.
* config/aarch64/iterators.md (SUBDI_BITS): New int iterator.
(bits_etype): New int attribute.
* config/aarch64/aarch64.md (*insv_reg_)
(*aarch64_bfi_): New patterns.
(*aarch64_bfidi_subreg_): Likewise.

gcc/testsuite/
* gcc.target/aarch64/ins_bitfield_1.c: New test.
* gcc.target/aarch64/ins_bitfield_2.c: Likewise.
* gcc.target/aarch64/ins_bitfield_3.c: Likewise.
* gcc.target/aarch64/ins_bitfield_4.c: Likewise.
* gcc.target/aarch64/ins_bitfield_5.c: Likewise.
* gcc.target/aarch64/ins_bitfield_6.c: Likewise.
---
 gcc/config/aarch64/aarch64.cc |  12 ++
 gcc/config/aarch64/aarch64.md |  62 +++
 gcc/config/aarch64/iterators.md   |   4 +
 .../gcc.target/aarch64/ins_bitfield_1.c   | 142 
 .../gcc.target/aarch64/ins_bitfield_2.c   | 142 
 .../gcc.target/aarch64/ins_bitfield_3.c   | 156 ++
 .../gcc.target/aarch64/ins_bitfield_4.c   | 156 ++
 .../gcc.target/aarch64/ins_bitfield_5.c   | 139 
 .../gcc.target/aarch64/ins_bitfield_6.c   | 139 
 9 files changed, 952 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_2.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_3.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_4.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_5.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/ins_bitfield_6.c

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index d6fc94015fa..146c2ad4988 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -24827,6 +24827,18 @@ aarch64_modes_tieable_p (machine_mode mode1, 
machine_mode mode2)
   if (GET_MODE_CLASS (mode1) == GET_MODE_CLASS (mode2))
 return true;
 
+  /* Allow changes between scalar modes if both modes fit within 64 bits.
+ This is because:
+
+ - We allow all such modes for both FPRs and GPRs.
+ - They occupy a single register for both FPRs and GPRs.
+ - We can 

RE: Re: [PATCH V2] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread Li, Pan2 via Gcc-patches
Committed, thanks Kito.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Kito Cheng via Gcc-patches
Sent: Tuesday, May 23, 2023 5:57 PM
To: 钟居哲 
Cc: Robin Dapp ; gcc-patches ; 
Kito.cheng ; palmer ; palmer 
; jeffreyalaw 
Subject: Re: Re: [PATCH V2] RISC-V: Refactor the framework of RVV 
auto-vectorization

Lgtm, we can always improve later, I am not intend to block things too :)

juzhe.zh...@rivai.ai  於 2023年5月23日 週二 17:46 寫道:

> Oh, Thanks.
> Let's wait for Kito's final approved.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Robin Dapp
> Date: 2023-05-23 17:44
> To: juzhe.zhong; gcc-patches
> CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
> Subject: Re: [PATCH V2] RISC-V: Refactor the framework of RVV 
> auto-vectorization Hi Juzhe,
>
> thanks, IMHO it's clearer with the changes now.  There are still 
> things that could be improved but it is surely an improvement over 
> what we currently have.  Therefore I'd vote to go ahead so we can 
> continue with more expanders and changes.
>
> Still, we should be prepared for more refactoring changes in the future.
>
> Regards
> Robin
>
>


[PATCH] RISC-V: Fix warning of vxrm pattern

2023-05-23 Thread juzhe . zhong
From: Juzhe-Zhong 

I just notice the warning:
../../../riscv-gcc/gcc/config/riscv/vector.md:618:1: warning: source missing a 
mode?

gcc/ChangeLog:

* config/riscv/vector.md: Add mode.

---
 gcc/config/riscv/vector.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index ac244430970..13b94862693 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -617,7 +617,7 @@
 ;; Set VXRM
 (define_insn "vxrmsi"
   [(set (reg:SI VXRM_REGNUM)
-   (match_operand 0 "const_int_operand" "i"))]
+   (match_operand:SI 0 "const_int_operand" "i"))]
   "TARGET_VECTOR"
   "csrwi\tvxrm,%0"
   [(set_attr "type" "wrvxrm")
-- 
2.36.3



[PATCH][committed] aarch64: PR target/109855 Add predicate and constraints to define_subst in aarch64-simd.md

2023-05-23 Thread Kyrylo Tkachov via Gcc-patches
Hi all,

In this PR we ICE because the substituted pattern for mla "lost" its predicate 
and constraint for operand 0
because the define_subst template:
  [(set (match_operand: 0)
(vec_concat:
 (match_dup 1)
 (match_operand:VDZ 2 "aarch64_simd_or_scalar_imm_zero")))])

Uses match_operand instead of match_dup for operand 0. We can't use match_dup 0 
for it because we need to specify the widened mode.
The problem is fixed by adding a "register_operand" predicate and "=w" 
constraint to the match_operand.
This makes sense conceptually too as the transformation we're targeting only 
applies to instructions that write a "w" register.
With this change the mddump pattern that ICEs goes from:
(define_insn ("aarch64_mlav4hi_vec_concatz_le")
 [
(set (match_operand:V8HI 0 ("") ("")) <<-- Missing constraint!
(vec_concat:V8HI (plus:V4HI (mult:V4HI (match_operand:V4HI 2 
("register_operand") ("w"))
(match_operand:V4HI 3 ("register_operand") ("w")))
(match_operand:V4HI 1 ("register_operand") ("0")))
(match_operand:V4HI 4 ("aarch64_simd_or_scalar_imm_zero") 
(""
] ("(!BYTES_BIG_ENDIAN) && (TARGET_SIMD)") ("mla\t%0.4h, %2.4h, %3.4h")

to the proper:
(define_insn ("aarch64_mlav4hi_vec_concatz_le")
 [
(set (match_operand:V8HI 0 ("register_operand") ("=w")) << 
Constraint in the right place
(vec_concat:V8HI (plus:V4HI (mult:V4HI (match_operand:V4HI 2 
("register_operand") ("w"))
(match_operand:V4HI 3 ("register_operand") ("w")))
(match_operand:V4HI 1 ("register_operand") ("0")))
(match_operand:V4HI 4 ("aarch64_simd_or_scalar_imm_zero") 
(""
] ("(!BYTES_BIG_ENDIAN) && (TARGET_SIMD)") ("mla\t%0.4h, %2.4h, %3.4h")

This seems to do the right thing for multi-alternative patterns as well, the 
annotated pattern for aarch64_cmltv8qi is:
(define_insn ("aarch64_cmltv8qi")
 [
(set (match_operand:V8QI 0 ("register_operand") ("=w,w"))
(neg:V8QI (lt:V8QI (match_operand:V8QI 1 ("register_operand") 
("w,w"))
(match_operand:V8QI 2 ("aarch64_simd_reg_or_zero") 
("w,ZDz")
]

whereas the substituted version now looks like:
(define_insn ("aarch64_cmltv8qi_vec_concatz_le")
 [
(set (match_operand:V16QI 0 ("register_operand") ("=w,w"))
(vec_concat:V16QI (neg:V8QI (lt:V8QI (match_operand:V8QI 1 
("register_operand") ("w,w"))
(match_operand:V8QI 2 ("aarch64_simd_reg_or_zero") 
("w,ZDz"
(match_operand:V8QI 3 ("aarch64_simd_or_scalar_imm_zero") 
(""
]

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

PR target/109855
* config/aarch64/aarch64-simd.md (add_vec_concat_subst_le): Add 
predicate
and constraint for operand 0.
(add_vec_concat_subst_be): Likewise.

gcc/testsuite/ChangeLog:

PR target/109855
* gcc.target/aarch64/pr109855.c: New test.


subst-pred.patch
Description: subst-pred.patch


Re: Re: [PATCH V2] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread Kito Cheng via Gcc-patches
Lgtm, we can always improve later, I am not intend to block things too :)

juzhe.zh...@rivai.ai  於 2023年5月23日 週二 17:46 寫道:

> Oh, Thanks.
> Let's wait for Kito's final approved.
>
>
>
> juzhe.zh...@rivai.ai
>
> From: Robin Dapp
> Date: 2023-05-23 17:44
> To: juzhe.zhong; gcc-patches
> CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
> Subject: Re: [PATCH V2] RISC-V: Refactor the framework of RVV
> auto-vectorization
> Hi Juzhe,
>
> thanks, IMHO it's clearer with the changes now.  There are still
> things that could be improved but it is surely an improvement over
> what we currently have.  Therefore I'd vote to go ahead so we can
> continue with more expanders and changes.
>
> Still, we should be prepared for more refactoring changes in the future.
>
> Regards
> Robin
>
>


[PATCH] tree-optimization/109849 - missed code hoisting

2023-05-23 Thread Richard Biener via Gcc-patches
The following fixes code hoisting to properly consider ANTIC_OUT instead
of ANTIC_IN.  That's a bit expensive to re-compute but since we no
longer iterate we're doing this only once per BB which should be
acceptable.  This avoids missing hoistings to the end of blocks where
something in the block clobbers the hoisted value.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/109849
* tree-ssa-pre.cc (do_hoist_insertion): Compute ANTIC_OUT
and use that to determine what to hoist.

* gcc.dg/tree-ssa/ssa-hoist-8.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-8.c | 22 ++
 gcc/tree-ssa-pre.cc | 48 ++---
 2 files changed, 64 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-8.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-8.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-8.c
new file mode 100644
index 000..66bb48e0dc1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-hoist-8.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-pre-stats" } */
+
+int mem;
+void foo ();
+int bar (int flag)
+{
+  int res;
+  foo ();
+  /* Hoist the load of mem here even though foo () clobbers it.  */
+  if (flag)
+res = mem;
+  else
+{
+  res = mem;
+  mem = 2;
+}
+  return res;
+}
+
+/* { dg-final { scan-tree-dump "HOIST inserted: 1" "pre" } } */
+/* { dg-final { scan-tree-dump-times " = mem;" 1 "pre" } } */
diff --git a/gcc/tree-ssa-pre.cc b/gcc/tree-ssa-pre.cc
index 1f7eea93c16..d56431b4145 100644
--- a/gcc/tree-ssa-pre.cc
+++ b/gcc/tree-ssa-pre.cc
@@ -3622,19 +3622,51 @@ do_hoist_insertion (basic_block block)
   && stmt_ends_bb_p (gsi_stmt (last)))
 return false;
 
-  /* Compute the set of hoistable expressions from ANTIC_IN.  First compute
+  /* We have multiple successors, compute ANTIC_OUT by taking the intersection
+ of all of ANTIC_IN translating through PHI nodes.  Note we do not have to
+ worry about iteration stability here so just intersect the expression sets
+ as well.  This is a simplification of what we do in compute_antic_aux.  */
+  bitmap_set_t ANTIC_OUT = bitmap_set_new ();
+  bool first = true;
+  FOR_EACH_EDGE (e, ei, block->succs)
+{
+  if (first)
+   {
+ phi_translate_set (ANTIC_OUT, ANTIC_IN (e->dest), e);
+ first = false;
+   }
+  else if (!gimple_seq_empty_p (phi_nodes (e->dest)))
+   {
+ bitmap_set_t tmp = bitmap_set_new ();
+ phi_translate_set (tmp, ANTIC_IN (e->dest), e);
+ bitmap_and_into (_OUT->values, >values);
+ bitmap_and_into (_OUT->expressions, >expressions);
+ bitmap_set_free (tmp);
+   }
+  else
+   {
+ bitmap_and_into (_OUT->values, _IN (e->dest)->values);
+ bitmap_and_into (_OUT->expressions,
+  _IN (e->dest)->expressions);
+   }
+}
+
+  /* Compute the set of hoistable expressions from ANTIC_OUT.  First compute
  hoistable values.  */
   bitmap_set hoistable_set;
 
-  /* A hoistable value must be in ANTIC_IN(block)
+  /* A hoistable value must be in ANTIC_OUT(block)
  but not in AVAIL_OUT(BLOCK).  */
   bitmap_initialize (_set.values, _bitmap_obstack);
   bitmap_and_compl (_set.values,
-   _IN (block)->values, _OUT (block)->values);
+   _OUT->values, _OUT (block)->values);
 
   /* Short-cut for a common case: hoistable_set is empty.  */
   if (bitmap_empty_p (_set.values))
-return false;
+{
+  bitmap_set_free (ANTIC_OUT);
+  return false;
+}
 
   /* Compute which of the hoistable values is in AVAIL_OUT of
  at least one of the successors of BLOCK.  */
@@ -3652,11 +3684,14 @@ do_hoist_insertion (basic_block block)
 
   /* Short-cut for a common case: availout_in_some is empty.  */
   if (bitmap_empty_p (_in_some))
-return false;
+{
+  bitmap_set_free (ANTIC_OUT);
+  return false;
+}
 
   /* Hack hoitable_set in-place so we can use sorted_array_from_bitmap_set.  */
   bitmap_move (_set.values, _in_some);
-  hoistable_set.expressions = ANTIC_IN (block)->expressions;
+  hoistable_set.expressions = ANTIC_OUT->expressions;
 
   /* Now finally construct the topological-ordered expression set.  */
   vec exprs = sorted_array_from_bitmap_set (_set);
@@ -3731,6 +3766,7 @@ do_hoist_insertion (basic_block block)
 }
 
   exprs.release ();
+  bitmap_set_free (ANTIC_OUT);
 
   return new_stuff;
 }
-- 
2.35.3


Re: Re: [PATCH V2] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread juzhe.zh...@rivai.ai
Oh, Thanks.
Let's wait for Kito's final approved.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-23 17:44
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Refactor the framework of RVV auto-vectorization
Hi Juzhe,
 
thanks, IMHO it's clearer with the changes now.  There are still
things that could be improved but it is surely an improvement over
what we currently have.  Therefore I'd vote to go ahead so we can
continue with more expanders and changes.
 
Still, we should be prepared for more refactoring changes in the future.
 
Regards
Robin
 


Re: [PATCH V2] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

thanks, IMHO it's clearer with the changes now.  There are still
things that could be improved but it is surely an improvement over
what we currently have.  Therefore I'd vote to go ahead so we can
continue with more expanders and changes.

Still, we should be prepared for more refactoring changes in the future.

Regards
 Robin


[PATCH] testsuite, analyzer: Fix testcases with fclose

2023-05-23 Thread Christophe Lyon via Gcc-patches
The gcc.dg/analyzer/data-model-4.c and
gcc.dg/analyzer/torture/conftest-1.c fail with recent glibc headers
and succeed with older headers.

The new error message is:
warning: use of possibly-NULL 'f' where non-null expected [CWE-690] 
[-Wanalyzer-possible-null-argument]

Like similar previous fixes in this area, this patch updates the
testcase so that this warning isn't reported.

2023-05-23  Christophe Lyon  

gcc/testsuite/
* gcc.dg/analyzer/data-model-4.c: Exit if fopen returns NULL.
* gcc.dg/analyzer/torture/conftest-1.c: Likewise.
---
 gcc/testsuite/gcc.dg/analyzer/data-model-4.c   | 2 ++
 gcc/testsuite/gcc.dg/analyzer/torture/conftest-1.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/analyzer/data-model-4.c 
b/gcc/testsuite/gcc.dg/analyzer/data-model-4.c
index 33f90871dfb..d41868d6dbc 100644
--- a/gcc/testsuite/gcc.dg/analyzer/data-model-4.c
+++ b/gcc/testsuite/gcc.dg/analyzer/data-model-4.c
@@ -8,6 +8,8 @@ int
 main ()
 {
   FILE *f = fopen ("conftest.out", "w");
+  if (f == NULL)
+return 1;
   return ferror (f) || fclose (f) != 0;
 
   ;
diff --git a/gcc/testsuite/gcc.dg/analyzer/torture/conftest-1.c 
b/gcc/testsuite/gcc.dg/analyzer/torture/conftest-1.c
index 0cf85f0ebe1..9631bcf73e0 100644
--- a/gcc/testsuite/gcc.dg/analyzer/torture/conftest-1.c
+++ b/gcc/testsuite/gcc.dg/analyzer/torture/conftest-1.c
@@ -3,6 +3,8 @@ int
 main ()
 {
   FILE *f = fopen ("conftest.out", "w");
+  if (f == NULL)
+return 1;
   return ferror (f) || fclose (f) != 0;
 
   ;
-- 
2.34.1



Re: [PATCH] libatomic: Provide gthr.h default implementation

2023-05-23 Thread Sebastian Huber

On 10.01.23 16:38, Sebastian Huber wrote:

On 19/12/2022 17:02, Sebastian Huber wrote:

Build libatomic for all targets.  Use gthr.h to provide a default
implementation.  If the thread model is "single", then this 
implementation will

not work if for example atomic operations are used for thread/interrupt
synchronization.


Is this and the related -fprofile-update=atomic patch something for GCC 14?


Now that the GCC 14 development is in progress, what about this patch?

--
embedded brains GmbH
Herr Sebastian HUBER
Dornierstr. 4
82178 Puchheim
Germany
email: sebastian.hu...@embedded-brains.de
phone: +49-89-18 94 741 - 16
fax:   +49-89-18 94 741 - 08

Registergericht: Amtsgericht München
Registernummer: HRB 157899
Vertretungsberechtigte Geschäftsführer: Peter Rasmussen, Thomas Dörfler
Unsere Datenschutzerklärung finden Sie hier:
https://embedded-brains.de/datenschutzerklaerung/


Re: [C PATCH v3] Fix ICEs related to VM types in C 2/2

2023-05-23 Thread Martin Uecker via Gcc-patches
Am Dienstag, dem 23.05.2023 um 10:18 +0200 schrieb Richard Biener:
> On Tue, May 23, 2023 at 8:24 AM Martin Uecker 
> wrote:
> > 
> > Am Dienstag, dem 23.05.2023 um 08:13 +0200 schrieb Richard Biener:
> > > On Mon, May 22, 2023 at 7:24 PM Martin Uecker via Gcc-patches
> > >  wrote:
> > > > 
> > > > 
> > > > 
> > > > This version contains the middle-end changes for PR109450
> > > > and test cases as before.  The main middle-end change is that
> > > > we use gimplify_type_sizes also for parameters and remove
> > > > the special code that also walked into pointers (which is
> > > > incorrect).
> > > > 
> > > > In addition, in the C FE this patch now also adds DECL_EXPR
> > > > for vm-types which are pointed-to by parameters declared
> > > > as arrays.  The new function created contains the exact
> > > > code previously used only for regular pointers, and is
> > > > now also called for parameters declared as arrays.
> > > > 
> > > > 
> > > > Martin
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > >     Fix ICEs related to VM types in C 2/2 [PR109450]
> > > > 
> > > >     Size expressions were sometimes lost and not gimplified
> > > > correctly,
> > > >     leading to ICEs and incorrect evaluation order.  Fix this
> > > > by 1) not
> > > >     recursing pointers when gimplifying parameters in the
> > > > middle-end
> > > >     (the code is merged with gimplify_type_sizes), which is
> > > > incorrect
> > > >     because it might access variables declared later for
> > > > incomplete
> > > >     structs, and 2) adding a decl expr for variably-modified
> > > > arrays
> > > >     that are pointed to by parameters declared as arrays.
> > > > 
> > > >     PR c/109450
> > > > 
> > > >     gcc/
> > > >     * c/c-decl.cc (add_decl_expr): New function.
> > > >     (grokdeclarator): Add decl expr for size expression
> > > > in
> > > >     types pointed to by parameters declared as arrays.
> > > >     * function.cc (gimplify_parm_type): Remove
> > > > function.
> > > >     (gimplify_parameters): Call gimplify_parm_sizes.
> > > >     * gimplify.cc (gimplify_type_sizes): Make function
> > > > static.
> > > >     (gimplify_parm_sizes): New function.
> > > > 
> > > >     gcc/testsuite/
> > > >     * gcc.dg/pr109450-1.c: New test.
> > > >     * gcc.dg/pr109450-2.c: New test.
> > > >     * gcc.dg/vla-26.c: New test.
> > > > 
> > > > diff --git a/gcc/c/c-decl.cc b/gcc/c/c-decl.cc
> > > > index 494d3cf1747..c35347734b2 100644
> > > > --- a/gcc/c/c-decl.cc
> > > > +++ b/gcc/c/c-decl.cc
> > > > @@ -6490,6 +6490,55 @@ smallest_type_quals_location (const
> > > > location_t *locations,
> > > >    return loc;
> > > >  }
> > > > 
> > > > +
> > > > +/* We attach an artificial TYPE_DECL to pointed-to type
> > > > +   and arrange for it to be included in a DECL_EXPR.  This
> > > > +   forces the sizes evaluation at a safe point and ensures it
> > > > +   is not deferred until e.g. within a deeper conditional
> > > > context.
> > > > +
> > > > +   PARM contexts have no enclosing statement list that
> > > > +   can hold the DECL_EXPR, so we need to use a BIND_EXPR
> > > > +   instead, and add it to the list of expressions that
> > > > +   need to be evaluated.
> > > > +
> > > > +   TYPENAME contexts do have an enclosing statement list,
> > > > +   but it would be incorrect to use it, as the size should
> > > > +   only be evaluated if the containing expression is
> > > > +   evaluated.  We might also be in the middle of an
> > > > +   expression with side effects on the pointed-to type size
> > > > +   "arguments" prior to the pointer declaration point and
> > > > +   the fake TYPE_DECL in the enclosing context would force
> > > > +   the size evaluation prior to the side effects.  We
> > > > therefore
> > > > +   use BIND_EXPRs in TYPENAME contexts too.  */
> > > > +static void
> > > > +add_decl_expr(location_t loc, enum decl_context decl_context,
> > > > tree type, tree *expr)
> > > > +{
> > > > +  tree bind = NULL_TREE;
> > > > +  if (decl_context == TYPENAME || decl_context == PARM ||
> > > > decl_context == FIELD)
> > > > +    {
> > > > +  bind = build3 (BIND_EXPR, void_type_node, NULL_TREE,
> > > > NULL_TREE, NULL_TREE);
> > > > +  TREE_SIDE_EFFECTS (bind) = 1;
> > > > +  BIND_EXPR_BODY (bind) = push_stmt_list ();
> > > > +  push_scope ();
> > > > +    }
> > > > +
> > > > +  tree decl = build_decl (loc, TYPE_DECL, NULL_TREE, type);
> > > > +  pushdecl (decl);
> > > > +  DECL_ARTIFICIAL (decl) = 1;
> > > > +  add_stmt (build_stmt (DECL_SOURCE_LOCATION (decl),
> > > > DECL_EXPR, decl));
> > > > +  TYPE_NAME (type) = decl;
> > > > +
> > > > +  if (bind)
> > > > +    {
> > > > +  pop_scope ();
> > > > +  BIND_EXPR_BODY (bind) = pop_stmt_list (BIND_EXPR_BODY
> > > > (bind));
> > > > +  if (*expr)
> > > > +   *expr = build2 (COMPOUND_EXPR, void_type_node, *expr,
> > > > bind);
> > > > +   

Re: Re: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread juzhe.zh...@rivai.ai
Yeah. I know. 
Like ARM does everywhere:
(define_expand "vcond"
  [(set (match_operand:SVE_ALL 0 "register_operand")
  (if_then_else:SVE_ALL
(match_operator 3 "comparison_operator"
  [(match_operand:SVE_I 4 "register_operand")
   (match_operand:SVE_I 5 "nonmemory_operand")])
(match_operand:SVE_ALL 1 "nonmemory_operand")
(match_operand:SVE_ALL 2 "nonmemory_operand")))]
  "TARGET_SVE &&  == "
  {
aarch64_expand_sve_vcond (mode, mode, operands);
DONE;
  }
)

passing "operands" looks codes much cleaner.

Hi, kito. Could you take a look at the V2 refactor patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619291.html 
This is important for us since we can't post more autovec patches without 
refactor patch.

Thanks


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-23 16:45
To: juzhe.zh...@rivai.ai
CC: Robin Dapp; gcc-patches; Kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: Re: [PATCH] RISC-V: Refactor the framework of RVV 
auto-vectorization
> ARM uses rtx operands[] in many places and I personally prefer this way since
> it will make codes much cleaner.
> I dislike the way making the function argument with multiple operand ,like 
> this:
> void func(rtx dest, rtx src1, rtx src2, )
> If we are doing this, we will need to add helpers forever...
 
Don't forget we are using C++, so we have function overloading or
default arguments :)
 


[PATCH V2] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread juzhe . zhong
From: Juzhe-Zhong 

This patch is to refactor the framework of RVV auto-vectorization.
Since we find out are keep adding helpers && wrappers when implementing 
auto-vectorization.
It will make the RVV auto-vectorizaiton very messy.

After double check my downstream RVV GCC, assemble all auto-vectorization 
patterns we are
going to have. Base on these informations, I refactor the RVV framework to make 
it is
easier and flexible for future use.

For example, we will definitely implement len_mask_load/len_mask_store patterns 
which
have both length && mask operand and use undefine merge operand.

len_cond_div or cond_div will have length or mask operand and use a real merge 
operand
instead of undefine merge operand.

Also, we will have some patterns will use tail undisturbed and mask any.

etc. We will defintely have various features.

Base on these circumstances, we add these following private members:
  
  int m_op_num;
  /* It't true when the pattern has a dest operand. Most of the patterns have
 dest operand wheras some patterns like STOREs does not have dest operand.
  */
  bool m_has_dest_p;
  bool m_fully_unmasked_p;
  bool m_use_real_merge_p;
  bool m_has_avl_p;
  bool m_vlmax_p;
  bool m_has_tail_policy_p;
  bool m_has_mask_policy_p;
  enum tail_policy m_tail_policy;
  enum mask_policy m_mask_policy;
  machine_mode m_dest_mode;
  machine_mode m_mask_mode;

These variables I believe can cover all potential situations.

And the instruction generater wrapper is "emit_insn" which will add operands and
emit instruction according to the variables I mentioned above.

After this is done. We will easily add helpers without changing any base class 
"insn_expand".

Currently, we have "emit_vlmax_tany_many" and "emit_nonvlmax_tany_many".

For example, when we want to emit a binary operations:
We have 
#define RVV_BINOP_NUM 3 (number including the output)

Then just use emit_vlmax_tany_many (...RVV_BINOP_NUM...)

So, if we support ternary operation in the future. It's quite simple:
#define RVV_TERNOP_NUM 4 (number including the output)
emit_vlmax_tany_many (...RVV_BINOP_NUM...)

"*_tany_many" means we are using tail any and mask any.

We will definitely need tail undisturbed or mask undisturbed when we support 
these patterns
in middle-end. It's very simple to extend such helper base on current framework:

we can do that in the future like this:

void
emit_nonvlmax_tu_mu (unsigned icode, int op_num, rtx *ops)
{
  machine_mode data_mode = GET_MODE (ops[0]);
  machine_mode mask_mode = get_mask_mode (data_mode).require ();
  /* The number = 11 is because we have maximum 11 operands for
 RVV instruction patterns according to vector.md.  */
  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
   /*USE_ALL_TRUES_MASK_P*/ true,
   /*USE_UNDEF_MERGE_P*/ true, /*HAS_AVL_P*/ true,
   /*VLMAX_P*/ false,
   /*HAS_TAIL_POLICY_P*/ true, /*HAS_MASK_POLICY_P*/ true,
   /*TAIL_POLICY*/ TAIL_UNDISTURBED, /*MASK_POLICY*/ 
MASK_UNDISTURBED,
   /*DEST_MODE*/ data_mode, /*MASK_MODE*/ mask_mode);
  e.emit_insn ((enum insn_code) icode, ops);
}

That's enough (I have tested it fully in my downstream RVV GCC).
I didn't add it in this patch.

Thanks.

gcc/ChangeLog:

* config/riscv/autovec.md: Refactor the framework of RVV 
auto-vectorization.
* config/riscv/riscv-protos.h (RVV_MISC_OP_NUM): Ditto.
(RVV_UNOP_NUM): New macro.
(RVV_BINOP_NUM): Ditto.
(legitimize_move): Refactor the framework of RVV auto-vectorization.
(emit_vlmax_op): Ditto.
(emit_vlmax_reg_op): Ditto.
(emit_len_op): Ditto.
(emit_len_binop): Ditto.
(emit_vlmax_tany_many): Ditto.
(emit_nonvlmax_tany_many): Ditto.
(sew64_scalar_helper): Ditto.
(expand_tuple_move): Ditto.
* config/riscv/riscv-v.cc (emit_pred_op): Ditto.
(emit_pred_binop): Ditto.
(emit_vlmax_op): Ditto.
(emit_vlmax_tany_many): New function.
(emit_len_op): Remove.
(emit_nonvlmax_tany_many): New function.
(emit_vlmax_reg_op): Remove.
(emit_len_binop): Ditto.
(emit_index_op): Ditto.
(expand_vec_series): Refactor the framework of RVV auto-vectorization.
(expand_const_vector): Ditto.
(legitimize_move): Ditto.
(sew64_scalar_helper): Ditto.
(expand_tuple_move): Ditto.
(expand_vector_init_insert_elems): Ditto.
* config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.
* config/riscv/vector.md: Ditto.

---
 gcc/config/riscv/autovec.md |  40 +---
 gcc/config/riscv/riscv-protos.h |  19 +-
 gcc/config/riscv/riscv-v.cc | 354 ++--
 gcc/config/riscv/riscv.cc   |   8 +-
 gcc/config/riscv/vector.md  |  40 +---
 5 files changed, 232 insertions(+), 229 deletions(-)

diff --git a/gcc/config/riscv/autovec.md 

Re: Re: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread Kito Cheng via Gcc-patches
> ARM uses rtx operands[] in many places and I personally prefer this way since
> it will make codes much cleaner.
> I dislike the way making the function argument with multiple operand ,like 
> this:
> void func(rtx dest, rtx src1, rtx src2, )
> If we are doing this, we will need to add helpers forever...

Don't forget we are using C++, so we have function overloading or
default arguments :)


Re: Re: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread juzhe.zh...@rivai.ai

Hi, Robin.

>> Why does a store not have a destination (as commented below)?
OK, V2 patch will have more comments.

>> m_all_unmasked_p or m_fully_unmasked_p?
OK.

>> Apart from the insn-centric name, couldn't we also decide this
>> based on the context later?  In the vector-builtins.cc we have
>> use_real_mask_p and use_real_merge_p that do this.
Ok. V2 will follow builtin framework

>> This means "has avl operand" I suppose?  From the caller's point
>> of view (and also the vsetvl pass) something like "needs avl" or so
>> would be more descriptive but undecided here.
Ok.

>> Do we need to expose these in the constructor?  As far as I can
>> tell we can decide later whether the instruction has a policy
>> or not (as I did in my patch, depending on whether all inputs
>> are masks or so).

Maybe, we can add helpers to set policies. I will send V2 let you see.

>> Having the mask mode be automatically deduced from the destination
>>is good, it was just obnoxious before to pass ...
Ok

>> I don't particularly like the names ;) Going back to vlmax and
>> nonvlmax I don't mind but do we really need to have the policies
>> encoded in the name now?  Especially since "many" is a word and
>> the default is ANY anyway.  Why not emit_vlmax_insn/emit_vlmax_op
>> for now and add the tu/mu later?

Ok

>> You can just drop the "The number = 11 is because" and say
>> "We have a maximum of 11 operands for...".
>> The eleven arguments seem a bit clunky here ;)  I would suggest
>> changing this again in the future bur for now let's just go ahead
>> with it in order to make progress.
Ok

>> The rtx operands[] array I like least of the changes in this patch.
>> It's essentially an untyped array whose meaning is dependent on context
>> containing source operands and the length that is sometimes empty and
>> sometimes not.  I can't think of something that wouldn't complicate things
>> though but before we at least had functions called _len that would take
>> a length (NULL or not) and _vlmax that wouldn't.  It's pretty easy to mess
>> up here on the caller's side.

ARM uses rtx operands[] in many places and I personally prefer this way since
it will make codes much cleaner. 
I dislike the way making the function argument with multiple operand ,like this:
void func(rtx dest, rtx src1, rtx src2, )
If we are doing this, we will need to add helpers forever...

Sending V2 patch soon.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-23 16:06
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization
Hi Juzhe,
 
in general I find the revised structure quite logical and it is definitely
an improvement.  Some abstraction are still a bit leaky but we can always
refactor "on the fly".  Some comments on the general parts, skipping
over the later details.
 
>   bool m_has_dest_p;
 
Why does a store not have a destination (as commented below)?
 
>   /* It't true if the pattern uses all trues mask operand.  */
>   bool m_use_all_trues_mask_p;
 
m_all_unmasked_p or m_fully_unmasked_p?
 
>   /* It's true if the pattern uses undefined merge operand.  */
>   bool m_use_undef_merge_p;
 
Apart from the insn-centric name, couldn't we also decide this
based on the context later?  In the vector-builtins.cc we have
use_real_mask_p and use_real_merge_p that do this.
 
>   bool m_has_avl_p;
 
This means "has avl operand" I suppose?  From the caller's point
of view (and also the vsetvl pass) something like "needs avl" or so
would be more descriptive but undecided here.
 
>   bool m_vlmax_p;
>   bool m_has_tail_policy_p;
>   bool m_has_mask_policy_p;
 
Do we need to expose these in the constructor?  As far as I can
tell we can decide later whether the instruction has a policy
or not (as I did in my patch, depending on whether all inputs
are masks or so).
 
>   enum tail_policy m_tail_policy;
>   enum mask_policy m_mask_policy;
 
>   machine_mode m_dest_mode;
>   machine_mode m_mask_mode;
 
Having the mask mode be automatically deduced from the destination
is good, it was just obnoxious before to pass ...
 
> Currently, we have "emit_vlmax_tany_many" and "emit_nonvlmax_tany_many".
 
I don't particularly like the names ;) Going back to vlmax and
nonvlmax I don't mind but do we really need to have the policies
encoded in the name now?  Especially since "many" is a word and
the default is ANY anyway.  Why not emit_vlmax_insn/emit_vlmax_op
for now and add the tu/mu later?
> #define RVV_BINOP_NUM 3 (number including the output)
 
Could make this into an "instruction type" rather than just a
number  (i.e. RVV_BINOP) and then set the number of operands
internally according to the type.  This would also make it clearer
in case we later want to set other options depending on the type.
> Then just use emit_vlmax_tany_many (...RVV_BINOP_NUM...)
> 
> So, if we support ternary operation in the future. It's quite simple:
> #define 

  1   2   >