Re: [PATCH v4] c++: Check for indirect change of active union member in constexpr [PR101631,PR102286]

2023-09-22 Thread Jonathan Wakely
On Sat, 23 Sept 2023, 01:39 Nathaniel Shead via Libstdc++, <
libstd...@gcc.gnu.org> wrote:

> Now that bootstrap has finished, I have gotten regressions in the
> following libstdc++ tests:
>
> Running libstdc++:libstdc++-dg/conformance.exp ...
> FAIL: 20_util/bitset/access/constexpr.cc -std=gnu++23 (test for excess
> errors)
> FAIL: 20_util/bitset/access/constexpr.cc -std=gnu++26 (test for excess
> errors)
> FAIL: 20_util/variant/constexpr.cc -std=gnu++20 (test for excess errors)
> FAIL: 20_util/variant/constexpr.cc -std=gnu++26 (test for excess errors)
> FAIL: 21_strings/basic_string/cons/char/constexpr.cc -std=gnu++20 (test
> for excess errors)
> FAIL: 21_strings/basic_string/cons/char/constexpr.cc -std=gnu++26 (test
> for excess errors)
> FAIL: 21_strings/basic_string/cons/wchar_t/constexpr.cc -std=gnu++20 (test
> for excess errors)
> FAIL: 21_strings/basic_string/cons/wchar_t/constexpr.cc -std=gnu++26 (test
> for excess errors)
> FAIL: 21_strings/basic_string/modifiers/swap/constexpr-wchar_t.cc
> -std=gnu++20 (test for excess errors)
> FAIL: 21_strings/basic_string/modifiers/swap/constexpr-wchar_t.cc
> -std=gnu++26 (test for excess errors)
> FAIL: 21_strings/basic_string/modifiers/swap/constexpr.cc -std=gnu++20
> (test for excess errors)
> FAIL: 21_strings/basic_string/modifiers/swap/constexpr.cc -std=gnu++26
> (test for excess errors)
> FAIL: std/ranges/adaptors/join_with/1.cc -std=gnu++23 (test for excess
> errors)
> UNRESOLVED: std/ranges/adaptors/join_with/1.cc -std=gnu++23 compilation
> failed to produce executable
> FAIL: std/ranges/adaptors/join_with/1.cc -std=gnu++26 (test for excess
> errors)
> UNRESOLVED: std/ranges/adaptors/join_with/1.cc -std=gnu++26 compilation
> failed to produce executable
>
> On investigation though it looks like the issue might be with libstdc++
> rather than the patch itself; running the failing tests using clang with
> libstdc++ also produces similar errors, and my reading of the code
> suggests that this is correct.
>
> What's the way forward here? Should I look at creating a patch to fix
> the libstdc++ issues before resubmitting this patch for the C++
> frontend? Or should I submit a version of this patch without the
> `std::construct_at` changes and wait till libstdc++ gets fixed for that?
>

I think we should fix libstdc++. There are probably only a few places that
need a fix, which cause all those failures.

I can help with those fixes. I'll look into it after the weekend.



> On Sat, Sep 23, 2023 at 01:01:20AM +1000, Nathaniel Shead wrote:
> > On Fri, Sep 22, 2023 at 02:21:15PM +0100, Jason Merrill wrote:
> > > On 9/21/23 09:41, Nathaniel Shead wrote:
> > > > I've updated the error messages, and also fixed another bug I found
> > > > while retesting (value-initialised unions weren't considered to have
> any
> > > > active member yet).
> > > >
> > > > Bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > >
> > > > -- >8 --
> > > >
> > > > This patch adds checks for attempting to change the active member of
> a
> > > > union by methods other than a member access expression.
> > > >
> > > > To be able to properly distinguish `*(&u.a) = ` from `u.a = `, this
> > > > patch redoes the solution for c++/59950 to avoid extranneous *&; it
> > > > seems that the only case that needed the workaround was when copying
> > > > empty classes.
> > > >
> > > > This patch also ensures that constructors for a union field mark that
> > > > field as the active member before entering the call itself; this
> ensures
> > > > that modifications of the field within the constructor's body don't
> > > > cause false positives (as these will not appear to be member access
> > > > expressions). This means that we no longer need to start the
> lifetime of
> > > > empty union members after the constructor body completes.
> > > >
> > > > As a drive-by fix, this patch also ensures that value-initialised
> unions
> > > > are considered to have activated their initial member for the
> purpose of
> > > > checking stores, which catches some additional mistakes pre-C++20.
> > > >
> > > >   PR c++/101631
> > > >
> > > > gcc/cp/ChangeLog:
> > > >
> > > >   * call.cc (build_over_call): Fold more indirect refs for trivial
> > > >   assignment op.
> > > >   * class.cc (type_has_non_deleted_trivial_default_ctor): Create.
> > > >   * constexpr.cc (cxx_eval_call_expression): Start lifetime of
> > > >   union member before entering constructor.
> > > >   (cxx_eval_store_expression): Activate member for
> > > >   value-initialised union. Check for accessing inactive union
> > > >   member indirectly.
> > > >   * cp-tree.h (type_has_non_deleted_trivial_default_ctor):
> > > >   Forward declare.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > >   * g++.dg/cpp1y/constexpr-89336-3.C: Fix union initialisation.
> > > >   * g++.dg/cpp1y/constexpr-union6.C: New test.
> > > >   * g++.dg/cpp2a/constexpr-union2.C: New test.
> > > >   * g++.dg/cpp2a/constexpr-union3.C: New test.
> > > >   * g++.dg/cpp2a/constexp

[PATCH v3] Introduce -finline-stringops

2023-09-22 Thread Alexandre Oliva
On Sep 21, 2023, Alexandre Oliva  wrote:

> On Sep 15, 2023, Alexandre Oliva  wrote:
>> On Jun 22, 2023, Alexandre Oliva  wrote:
>>> On Jun  2, 2023, Alexandre Oliva  wrote:
 Introduce -finline-stringops

>>> Ping?  https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620472.html

>> Ping?

> Here's a refreshed and improved patch, that improves the memcmp
> expansion a little, dropping redundant barriers and conditional
> branches.

I was advised by linaro's CI that there are some failures to inline
memcpy (bcopy, really) in the memmove test, that only required memmove
inlining.  Oops.  That was a mistake in the testcase.  There was no
reason to demand memcpy to be inlined.


While looking into it, I found another case in which the memset expander
could loop forever.  The assumption that can_store_by_pieces would
return true at least for 1-byte stores doesn't hold on aarch64.  A
1-byte store only expands with setmem, not with store_by_pieces.

I had expected store_by_pieces to always pass for QImode constants, and
perhaps also for other natural machine modes, but...  I guess I can't
count on that.  I see aarch64's SET_RATIO and CLEAR_RATIO are sometimes
set to zero, so as to disable the by-pieces machinery even for
single-instruction stores, and that setmem actually outputs better code
than by_pieces for all but single-byte stores, and equivalent code for
single-byte stores.

So it looks like integrating setmem into by-multiple-pieces would be the
way to go, but...  there's no way to test whether setmem is available
for a certain combination of operands, as there is for store_by_pieces.
So, given that -finline-stringops is not so much about performance as it
is about avoiding runtime dependencies, I'm only making sure we have a
fallback to use for the single-byte case.

I suppose I could introduce some target hook for machines to test for
supported setmem operands without compile-time rtl allocation, and use
that if defined, but it could be cumbersome to introduce only to improve
performance in scenarios where performance is not critical.

Regstrapped on x86_64-linux-gnu.  Also tested on an aarch64-elf target.
Ok to install?


Introduce -finline-stringops

try_store_by_multiple_pieces was added not long ago, enabling
variable-sized memset to be expanded inline when the worst-case
in-range constant length would, using conditional blocks with powers
of two to cover all possibilities of length and alignment.

This patch introduces -finline-stringops[=fn] to request expansions to
start with a loop, so as to still take advantage of known alignment
even with long lengths, but without necessarily adding store blocks
for every power of two.

This makes it possible for the supported stringops (memset, memcpy,
memmove, memset) to be expanded, even if storing a single byte per
iteration.  Surely efficient implementations can run faster, with a
pre-loop to increase alignment, but that would likely be excessive for
inline expansions.

Still, in some cases, such as in freestanding environments, users
prefer to inline such stringops, especially those that the compiler
may introduce itself, even if the expansion is not as performant as a
highly optimized C library implementation could be, to avoid
depending on a C runtime library.


for  gcc/ChangeLog

* expr.cc (emit_block_move_hints): Take ctz of len.  Obey
-finline-stringops.  Use oriented or sized loop.
(emit_block_move): Take ctz of len, and pass it on.
(emit_block_move_via_sized_loop): New.
(emit_block_move_via_oriented_loop): New.
(emit_block_move_via_loop): Take incr.  Move an incr-sized
block per iteration.
(emit_block_cmp_via_cmpmem): Take ctz of len.  Obey
-finline-stringops.
(emit_block_cmp_via_loop): New.
* expr.h (emit_block_move): Add ctz of len defaulting to zero.
(emit_block_move_hints): Likewise.
(emit_block_cmp_hints): Likewise.
* builtins.cc (expand_builtin_memory_copy_args): Pass ctz of
len to emit_block_move_hints.
(try_store_by_multiple_pieces): Support starting with a loop.
(expand_builtin_memcmp): Pass ctz of len to
emit_block_cmp_hints.
(expand_builtin): Allow inline expansion of memset, memcpy,
memmove and memcmp if requested.
* common.opt (finline-stringops): New.
(ilsop_fn): New enum.
* flag-types.h (enum ilsop_fn): New.
* doc/invoke.texi (-finline-stringops): Add.

for  gcc/testsuite/ChangeLog

* gcc.dg/torture/inline-mem-cmp-1.c: New.
* gcc.dg/torture/inline-mem-cpy-1.c: New.
* gcc.dg/torture/inline-mem-cpy-cmp-1.c: New.
* gcc.dg/torture/inline-mem-move-1.c: New.
* gcc.dg/torture/inline-mem-set-1.c: New.
---
 gcc/builtins.cc|  149 +++-
 gcc/common.opt |   34 ++
 gcc/doc/invoke.texi|   15 +
 gcc/expr.cc   

Re: [PATCH] RISC-V/testsuite: Fix ILP32 RVV failures from missing

2023-09-22 Thread Jeff Law




On 9/22/23 17:18, Maciej W. Rozycki wrote:

In non-multilib installations system headers may not be available for
compilation options using a non-default model, causing build errors such
as:

In file included from .../include/features.h:527,
  from .../include/assert.h:35,
  from 
.../gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-template.h:2,
  from 
.../gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c:4:
.../include/gnu/stubs.h:11:11: fatal error: gnu/stubs-ilp32d.h: No such file or 
directory

Therefore we have to be very cautious when trying to use a non-default
model in the testsuite, preferably avoiding to rely on headers that have
not been supplied by GCC itself, or otherwise verifying in a preparatory
step whether the given model is buildable in a given test environment.

In this case however we can easily avoid the issue, because 
facilities are not used at all by "vmv-imm-template.h", which includes
the header.  Remove the inclusion then, turning these issues:

FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize 
(test for excess errors)
UNRESOLVED: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 
-ftree-vectorize  scan-assembler-times vmv.v.i 32
UNRESOLVED: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 
-ftree-vectorize  scan-assembler-times vmv.v.x 8
FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize (test 
for excess errors)
UNRESOLVED: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize  
scan-assembler-times vmv.v.i 32
UNRESOLVED: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize  
scan-assembler-times vmv.v.x 8

into successful results:

PASS: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize 
(test for excess errors)
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize  
scan-assembler-times vmv.v.i 32
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize  
scan-assembler-times vmv.v.x 8
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize (test 
for excess errors)
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize  
scan-assembler-times vmv.v.i 32
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize  
scan-assembler-times vmv.v.x 8

in a plain LP64 `riscv64-linux-gnu' configuration.

gcc/testsuite/
* gcc.target/riscv/rvv/autovec/vmv-imm-template.h: Remove
 inclusion.

OK
jeff


Re: Re: [Committed] RISC-V: Support VLS INT <-> FP conversions

2023-09-22 Thread 钟居哲
Confirm it is a latent bug already existed long time ago but we were lucky that 
we didn't trigger this issue before.

This patch didn't involve a new bug.

Li pan from intel will send a patch fix it soon.

Thanks for report.



juzhe.zh...@rivai.ai
 
From: Edwin Lu
Date: 2023-09-23 06:38
To: Juzhe-Zhong; gcc-patches
CC: patrick; gnu-toolchain
Subject: Re: [Committed] RISC-V: Support VLS INT <-> FP conversions
Hi Juzhe,
 
I was testing this patch and found it introduced a gfortran regression 
in gfortran.dg/host_assoc_function_7.f90. More info here: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111545
 
Edwin
 
On 9/20/2023 7:17 PM, Juzhe-Zhong wrote:
> Support INT <-> FP VLS auto-vectorization patterns.
> 
> Regression passed.
> Committed.
> 
> gcc/ChangeLog:
> 
> * config/riscv/autovec.md: Extend VLS modes.
> * config/riscv/vector-iterators.md: Ditto.
> * config/riscv/vector.md: Ditto.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/riscv/rvv/autovec/vls/convert-1.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/convert-10.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/convert-11.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/convert-12.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/convert-2.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/convert-3.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/convert-4.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/convert-5.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/convert-6.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/convert-7.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/convert-8.c: New test.
> * gcc.target/riscv/rvv/autovec/vls/convert-9.c: New test.
> 
> ---
>   gcc/config/riscv/autovec.md   |  12 +-
>   gcc/config/riscv/vector-iterators.md  | 202 ++
>   gcc/config/riscv/vector.md|  20 +-
>   .../riscv/rvv/autovec/vls/convert-1.c |  74 +++
>   .../riscv/rvv/autovec/vls/convert-10.c|  80 +++
>   .../riscv/rvv/autovec/vls/convert-11.c|  54 +
>   .../riscv/rvv/autovec/vls/convert-12.c|  36 
>   .../riscv/rvv/autovec/vls/convert-2.c |  74 +++
>   .../riscv/rvv/autovec/vls/convert-3.c |  58 +
>   .../riscv/rvv/autovec/vls/convert-4.c |  36 
>   .../riscv/rvv/autovec/vls/convert-5.c |  80 +++
>   .../riscv/rvv/autovec/vls/convert-6.c |  55 +
>   .../riscv/rvv/autovec/vls/convert-7.c |  37 
>   .../riscv/rvv/autovec/vls/convert-8.c |  58 +
>   .../riscv/rvv/autovec/vls/convert-9.c |  22 ++
>   15 files changed, 882 insertions(+), 16 deletions(-)
>   create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-1.c
>   create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-10.c
>   create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-11.c
>   create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-12.c
>   create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-2.c
>   create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-3.c
>   create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-4.c
>   create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-5.c
>   create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-6.c
>   create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-7.c
>   create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-8.c
>   create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-9.c
> 
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 75ed7ae4f2e..55c0a04df3b 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -847,7 +847,7 @@
>   (define_insn_and_split "2"
> [(set (match_operand: 0 "register_operand")
>   (any_fix:
> -   (match_operand:VF 1 "register_operand")))]
> +   (match_operand:V_VLSF 1 "register_operand")))]
> "TARGET_VECTOR && can_create_pseudo_p ()"
> "#"
> "&& 1"
> @@ -868,8 +868,8 @@
>   ;; -
>   
>   (define_insn_and_split "2"
> -  [(set (match_operand:VF 0 "register_operand")
> - (any_float:VF
> +  [(set (match_operand:V_VLSF 0 "register_operand")
> + (any_float:V_VLSF
> (match_operand: 1 "register_operand")))]
> "TARGET_VECTOR && can_create_pseudo_p ()"
> "#"
> @@ -916,8 +916,8 @@
>   ;; - vfwcvt.f.x.v
>   ;; -
>   (define_insn_and_split "2"
> -  [(set (match_operand:VF 0 "register_operand")
> - (any_float:VF
> +  [(set (match_operand:V_VLSF 0 "register_operand")
> + (any_float:V_VLSF
> (match_operand: 1 "register_operand")))]
> "TARGET_VECTOR && can_create_pseudo_p ()"
> "#"
> @@ -940,7 +940,7 @@
>   (define_insn_

[Committed] RISC-V: Add VLS unary combine patterns

2023-09-22 Thread Juzhe-Zhong
gcc/ChangeLog:

* config/riscv/autovec-opt.md: Add VLS modes for conditional ABS/SQRT.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/cond_abs-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_sqrt-1.c: New test.

---
 gcc/config/riscv/autovec-opt.md   | 30 +--
 .../riscv/rvv/autovec/vls/cond_abs-1.c| 50 +++
 .../riscv/rvv/autovec/vls/cond_sqrt-1.c   | 50 +++
 3 files changed, 113 insertions(+), 17 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_abs-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_sqrt-1.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index ed9c0777eb9..6c6609d24bb 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -99,41 +99,37 @@
 ;; Currently supported operations:
 ;;   abs(FP)
 (define_insn_and_split "*cond_abs"
-  [(set (match_operand:VF 0 "register_operand")
-(if_then_else:VF
-  (match_operand: 3 "register_operand")
-  (abs:VF (match_operand:VF 1 "nonmemory_operand"))
-  (match_operand:VF 2 "register_operand")))]
+  [(set (match_operand:V_VLSF 0 "register_operand")
+(if_then_else:V_VLSF
+  (match_operand: 1 "register_operand")
+  (abs:V_VLSF (match_operand:V_VLSF 2 "nonmemory_operand"))
+  (match_operand:V_VLSF 3 "register_operand")))]
   "TARGET_VECTOR && can_create_pseudo_p ()"
   "#"
   "&& 1"
   [(const_int 0)]
 {
-  emit_insn (gen_cond_len_abs (operands[0], operands[3], operands[1],
-operands[2],
-gen_int_mode (GET_MODE_NUNITS 
(mode), Pmode),
-const0_rtx));
+  insn_code icode = code_for_pred (ABS, mode);
+  riscv_vector::expand_cond_unop (icode, operands);
   DONE;
 }
 [(set_attr "type" "vector")])
 
 ;; Combine vfsqrt.v and cond_mask
 (define_insn_and_split "*cond_"
-  [(set (match_operand:VF 0 "register_operand")
- (if_then_else:VF
+  [(set (match_operand:V_VLSF 0 "register_operand")
+ (if_then_else:V_VLSF
(match_operand: 1 "register_operand")
-   (any_float_unop:VF
- (match_operand:VF 2 "register_operand"))
-   (match_operand:VF 3 "register_operand")))]
+   (any_float_unop:V_VLSF
+ (match_operand:V_VLSF 2 "register_operand"))
+   (match_operand:V_VLSF 3 "register_operand")))]
   "TARGET_VECTOR && can_create_pseudo_p ()"
   "#"
   "&& 1"
   [(const_int 0)]
 {
   insn_code icode = code_for_pred (, mode);
-  rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
-   gen_int_mode (GET_MODE_NUNITS (mode), Pmode)};
-  riscv_vector::expand_cond_len_unop (icode, ops);
+  riscv_vector::expand_cond_unop (icode, operands);
   DONE;
 }
 [(set_attr "type" "vector")])
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_abs-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_abs-1.c
new file mode 100644
index 000..3eaabce9611
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_abs-1.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 
--param=riscv-autovec-lmul=m8 -fdump-tree-optimized" } */
+
+#include "def.h"
+
+DEF_COND_UNOP (cond_abs, 4, v4hf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 8, v8hf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 16, v16hf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 32, v32hf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 64, v64hf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 128, v128hf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 256, v256hf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 512, v512hf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 1024, v1024hf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 2048, v2048hf, __builtin_fabs)
+
+DEF_COND_UNOP (cond_abs, 4, v4sf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 8, v8sf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 16, v16sf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 32, v32sf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 64, v64sf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 128, v128sf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 256, v256sf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 512, v512sf, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 1024, v1024sf, __builtin_fabs)
+
+DEF_COND_UNOP (cond_abs, 4, v4df, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 8, v8df, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 16, v16df, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 32, v32df, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 64, v64df, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 128, v128df, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 256, v256df, __builtin_fabs)
+DEF_COND_UNOP (cond_abs, 512, v512df, __builtin_fabs)
+
+/* { dg-final { scan-assembler-times {vfabs\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0.t} 
27 } } */
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-assembler-not {vm

Re: [PATCH v3] RISC-V: Suport FP floor auto-vectorization

2023-09-22 Thread 钟居哲
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-23 09:19
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v3] RISC-V: Suport FP floor auto-vectorization
From: Pan Li 
 
This patch would like to support auto-vectorization for the
floor API in math.h. It depends on the -ffast-math option.
 
When we would like to call floor/floorf like v2 = floor (v1), we will
convert it into below insns (reference the implementation of llvm).
 
* vfcvt.x.f v3, v1, RDN
* vfcvt.f.x v2, v3
 
However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.
 
  +---+---+-+
  | raw float | binary layout | after floor |
  +---+---+-+
  | 8388607.5 | 0x4aff| 8388607.0   |
  | 8388608.0 | 0x4b00| 8388608.0   |
  | 8388609.0 | 0x4b01| 8388609.0   |
  +---+---+-+
 
All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
 
Befor this patch:
math-floor-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callceilf
  fsw fa0,-4(s1)
  bne s0,s2,.L3
 
After this patch:
  ...
  fsrmi   2   // Rounding Down
.L4:
  vfabs.v v1,v2
  vmflt.vfv0,v1,fa5
  vfcvt.x.f.v v3,v2,v0.t
  vfcvt.f.x.v v1,v3,v0.t
  vfsgnj.vv   v1,v1,v2
  bne .L4
.L14:
  fsrma6
  ret
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (floor2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_floor): New function decl.
* config/riscv/riscv-v.cc (gen_floor_const_fp): New function impl.
(expand_vec_floor): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-floor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-floor-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 11 
gcc/config/riscv/riscv-protos.h   |  5 ++
gcc/config/riscv/riscv-v.cc   | 35 +++-
.../riscv/rvv/autovec/unop/math-floor-0.c | 23 
.../riscv/rvv/autovec/unop/math-floor-1.c | 23 
.../riscv/rvv/autovec/unop/math-floor-2.c | 23 
.../riscv/rvv/autovec/unop/math-floor-3.c | 25 +
.../riscv/rvv/autovec/unop/math-floor-run-1.c | 39 +
.../riscv/rvv/autovec/unop/math-floor-run-2.c | 39 +
.../riscv/rvv/autovec/vls/math-floor-1.c  | 56 +++
10 files changed, 277 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-floor-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 6f35fb1bd9e..a005e17457e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2209,6 +2209,7 @@ (define_expand "avg3_ceil"
;; -
;; Includes:
;; - ceil/ceilf
+;; - floor/floorf
;; -
(define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2219,3 +2220,13 @@ (define_expand "ceil2"
 DONE;
   }
)
+
+(define_expand "floor2"
+  [(match_operand:V_VLSF 0 "register_operand")
+   (match_operand:V_VLSF 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_floor (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 34becfbaba8..63eb2475705 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -253,6 +253,9 @@ enum insn_flags : unsigned int
   /* Means INSN has FRM operand and the value is FRM_RUP.  */
   FRM_RUP_P = 1 << 16,
+
+  /* Means INSN has FRM operand and the value is FRM_RDN.  */
+  FRM_RDN_P = 1 << 17,
};
enum insn_type : unsig

[PATCH v3] RISC-V: Suport FP floor auto-vectorization

2023-09-22 Thread pan2 . li
From: Pan Li 

This patch would like to support auto-vectorization for the
floor API in math.h. It depends on the -ffast-math option.

When we would like to call floor/floorf like v2 = floor (v1), we will
convert it into below insns (reference the implementation of llvm).

* vfcvt.x.f v3, v1, RDN
* vfcvt.f.x v2, v3

However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.

  +---+---+-+
  | raw float | binary layout | after floor |
  +---+---+-+
  | 8388607.5 | 0x4aff| 8388607.0   |
  | 8388608.0 | 0x4b00| 8388608.0   |
  | 8388609.0 | 0x4b01| 8388609.0   |
  +---+---+-+

All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.

Befor this patch:
math-floor-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callceilf
  fsw fa0,-4(s1)
  bne s0,s2,.L3

After this patch:
  ...
  fsrmi   2   // Rounding Down
.L4:
  vfabs.v v1,v2
  vmflt.vfv0,v1,fa5
  vfcvt.x.f.v v3,v2,v0.t
  vfcvt.f.x.v v1,v3,v0.t
  vfsgnj.vv   v1,v1,v2
  bne .L4
.L14:
  fsrma6
  ret

Please note VLS mode is also involved in this patch and covered by the
test cases.

gcc/ChangeLog:

* config/riscv/autovec.md (floor2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_floor): New function decl.
* config/riscv/riscv-v.cc (gen_floor_const_fp): New function impl.
(expand_vec_floor): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-floor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-floor-1.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   | 11 
 gcc/config/riscv/riscv-protos.h   |  5 ++
 gcc/config/riscv/riscv-v.cc   | 35 +++-
 .../riscv/rvv/autovec/unop/math-floor-0.c | 23 
 .../riscv/rvv/autovec/unop/math-floor-1.c | 23 
 .../riscv/rvv/autovec/unop/math-floor-2.c | 23 
 .../riscv/rvv/autovec/unop/math-floor-3.c | 25 +
 .../riscv/rvv/autovec/unop/math-floor-run-1.c | 39 +
 .../riscv/rvv/autovec/unop/math-floor-run-2.c | 39 +
 .../riscv/rvv/autovec/vls/math-floor-1.c  | 56 +++
 10 files changed, 277 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-floor-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 6f35fb1bd9e..a005e17457e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2209,6 +2209,7 @@ (define_expand "avg3_ceil"
 ;; -
 ;; Includes:
 ;; - ceil/ceilf
+;; - floor/floorf
 ;; -
 (define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2219,3 +2220,13 @@ (define_expand "ceil2"
 DONE;
   }
 )
+
+(define_expand "floor2"
+  [(match_operand:V_VLSF 0 "register_operand")
+   (match_operand:V_VLSF 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_floor (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 34becfbaba8..63eb2475705 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -253,6 +253,9 @@ enum insn_flags : unsigned int
 
   /* Means INSN has FRM operand and the value is FRM_RUP.  */
   FRM_RUP_P = 1 << 16,
+
+  /* Means INSN has FRM operand and the value is FRM_RDN.  */
+  FRM_RDN_P = 1 << 17,
 };
 
 enum insn_type : unsigned int
@@ -294,6 +297,7 @@ enum insn_type : unsigned int
   UNARY_OP_TAMU = __M

RE: [PATCH v1] RISC-V: Remove FP run test for ceil.

2023-09-22 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: 钟居哲 
Sent: Saturday, September 23, 2023 9:07 AM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Remove FP run test for ceil.

Ok


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-09-23 09:06
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Remove FP run test for ceil.
From: Pan Li mailto:pan2...@intel.com>>

FP16 is not well reconciled when linking.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: Remove.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
.../riscv/rvv/autovec/unop/math-ceil-run-0.c  | 39 ---
1 file changed, 39 deletions(-)
delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
deleted file mode 100644
index 600c161159d..000
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
+++ /dev/null
@@ -1,39 +0,0 @@
-/* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "-std=c2x -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
-
-#include "test-math.h"
-
-#define ARRAY_SIZE 128
-
-_Float16 in[ARRAY_SIZE];
-_Float16 out[ARRAY_SIZE];
-_Float16 ref[ARRAY_SIZE];
-
-TEST_UNARY_CALL (_Float16, __builtin_ceilf16)
-TEST_ASSERT (_Float16)
-
-TEST_INIT (_Float16, 1.2, 2.0, 1)
-TEST_INIT (_Float16, -1.2, -1.0, 2)
-TEST_INIT (_Float16, 3.0, 3.0, 3)
-TEST_INIT (_Float16, 1023.5, 1024.0, 4)
-TEST_INIT (_Float16, 1025.0, 1025.0, 5)
-TEST_INIT (_Float16, 0.0, 0.0, 6)
-TEST_INIT (_Float16, -0.0, -0.0, 7)
-TEST_INIT (_Float16, -1023.5, -1023.0, 8)
-TEST_INIT (_Float16, -1024.0, -1024.0, 9)
-
-int
-main ()
-{
-  RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-
-  return 0;
-}
--
2.34.1




Re: [PATCH v1] RISC-V: Remove FP run test for ceil.

2023-09-22 Thread 钟居哲
Ok



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-23 09:06
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Remove FP run test for ceil.
From: Pan Li 
 
FP16 is not well reconciled when linking.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: Remove.
 
Signed-off-by: Pan Li 
---
.../riscv/rvv/autovec/unop/math-ceil-run-0.c  | 39 ---
1 file changed, 39 deletions(-)
delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
deleted file mode 100644
index 600c161159d..000
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
+++ /dev/null
@@ -1,39 +0,0 @@
-/* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "-std=c2x -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
-
-#include "test-math.h"
-
-#define ARRAY_SIZE 128
-
-_Float16 in[ARRAY_SIZE];
-_Float16 out[ARRAY_SIZE];
-_Float16 ref[ARRAY_SIZE];
-
-TEST_UNARY_CALL (_Float16, __builtin_ceilf16)
-TEST_ASSERT (_Float16)
-
-TEST_INIT (_Float16, 1.2, 2.0, 1)
-TEST_INIT (_Float16, -1.2, -1.0, 2)
-TEST_INIT (_Float16, 3.0, 3.0, 3)
-TEST_INIT (_Float16, 1023.5, 1024.0, 4)
-TEST_INIT (_Float16, 1025.0, 1025.0, 5)
-TEST_INIT (_Float16, 0.0, 0.0, 6)
-TEST_INIT (_Float16, -0.0, -0.0, 7)
-TEST_INIT (_Float16, -1023.5, -1023.0, 8)
-TEST_INIT (_Float16, -1024.0, -1024.0, 9)
-
-int
-main ()
-{
-  RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-
-  return 0;
-}
-- 
2.34.1
 
 


[PATCH v1] RISC-V: Remove FP run test for ceil.

2023-09-22 Thread pan2 . li
From: Pan Li 

FP16 is not well reconciled when linking.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: Remove.

Signed-off-by: Pan Li 
---
 .../riscv/rvv/autovec/unop/math-ceil-run-0.c  | 39 ---
 1 file changed, 39 deletions(-)
 delete mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
deleted file mode 100644
index 600c161159d..000
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
+++ /dev/null
@@ -1,39 +0,0 @@
-/* { dg-do run { target { riscv_vector } } } */
-/* { dg-additional-options "-std=c2x -O3 -ftree-vectorize -fno-vect-cost-model 
-ffast-math" } */
-
-#include "test-math.h"
-
-#define ARRAY_SIZE 128
-
-_Float16 in[ARRAY_SIZE];
-_Float16 out[ARRAY_SIZE];
-_Float16 ref[ARRAY_SIZE];
-
-TEST_UNARY_CALL (_Float16, __builtin_ceilf16)
-TEST_ASSERT (_Float16)
-
-TEST_INIT (_Float16, 1.2, 2.0, 1)
-TEST_INIT (_Float16, -1.2, -1.0, 2)
-TEST_INIT (_Float16, 3.0, 3.0, 3)
-TEST_INIT (_Float16, 1023.5, 1024.0, 4)
-TEST_INIT (_Float16, 1025.0, 1025.0, 5)
-TEST_INIT (_Float16, 0.0, 0.0, 6)
-TEST_INIT (_Float16, -0.0, -0.0, 7)
-TEST_INIT (_Float16, -1023.5, -1023.0, 8)
-TEST_INIT (_Float16, -1024.0, -1024.0, 9)
-
-int
-main ()
-{
-  RUN_TEST (_Float16, 1, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 2, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 3, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 4, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 5, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 6, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 7, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 8, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-  RUN_TEST (_Float16, 9, __builtin_ceilf16, in, out, ref, ARRAY_SIZE);
-
-  return 0;
-}
-- 
2.34.1



[PATCH] PHIOPT: Fix minmax_replacement for three way

2023-09-22 Thread Andrew Pinski
So when diamond bb support was added to minmax_replacement in 
r13-1950-g9bb19e143cfe,
the code was not expecting the alt_middle_bb not to exist if it was empty (for 
threeway_p).
So when factor_out_conditional_conversion was used to factor out conversions, 
it turns out
the assumption for alt_middle_bb to be wrong and we ended up with threeway_p 
being true but
having middle_bb being empty but alt_middle_bb not being empty which causes 
wrong code in
many cases.

This patch fixes the issue by adding a test for the 2 cases where the 
assumption on
threeway_p case having the other bb being empty.

Changes made:
v2: Fix test for `(a <= u) b = MAX(a, d) else b = u`.

Note my plan for GCC 15 is remove minmax_replacement as match.pd will catch all 
cases
at that point.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/111469

gcc/ChangeLog:

* tree-ssa-phiopt.cc (minmax_replacement): Fix
the assumption for the `non-diamond` handling cases
of diamond code.

gcc/testsuite/ChangeLog:

* gcc.c-torture/execute/pr111469-1.c: New test.
---
 .../gcc.c-torture/execute/pr111469-1.c| 38 +++
 gcc/tree-ssa-phiopt.cc|  9 -
 2 files changed, 45 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr111469-1.c

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr111469-1.c 
b/gcc/testsuite/gcc.c-torture/execute/pr111469-1.c
new file mode 100644
index 000..b68d5989eac
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr111469-1.c
@@ -0,0 +1,38 @@
+/* PR tree-optimization/111469 */
+
+long f;
+char *g;
+__attribute__((noinline))
+char o() {
+  char l;
+  while (f)
+;
+  l = *g;
+  return l;
+}
+
+/* factor_out_conditional_conversion is able to remove the casts
+   from the 2 bbs (correctly)
+   but then minmax_replacement should not optimize this to a MIN_EXPR
+   as o has side effects. */
+
+__attribute__((noinline))
+unsigned short gg(unsigned short a, unsigned short b)
+{
+  short d;
+  if (a > b)
+  {
+d= b;
+  }
+  else
+  {
+o();
+d = a;
+  }
+  return d;
+}
+
+int main(void)
+{
+  gg(3, 2);
+}
diff --git a/gcc/tree-ssa-phiopt.cc b/gcc/tree-ssa-phiopt.cc
index 3835d25d08c..312a6f9082b 100644
--- a/gcc/tree-ssa-phiopt.cc
+++ b/gcc/tree-ssa-phiopt.cc
@@ -1823,7 +1823,9 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
   arg_false = arg0;
 }
 
-  if (empty_block_p (middle_bb))
+  if (empty_block_p (middle_bb)
+  && (!threeway_p
+ || empty_block_p (alt_middle_bb)))
 {
   if ((operand_equal_for_phi_arg_p (arg_true, smaller)
   || (alt_smaller
@@ -2006,7 +2008,8 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
 
   return true;
 }
-  else
+  else if (!threeway_p
+  || empty_block_p (alt_middle_bb))
 {
   /* Recognize the following case, assuming d <= u:
 
@@ -2182,6 +2185,8 @@ minmax_replacement (basic_block cond_bb, basic_block 
middle_bb, basic_block alt_
  SSA_OP_DEF));
   gsi_move_before (&gsi_from, &gsi);
 }
+  else
+return false;
 
   /* Emit the statement to compute min/max.  */
   gimple_seq stmts = NULL;
-- 
2.31.1



Re: [PATCH v2] RISC-V: Suport FP floor auto-vectorization

2023-09-22 Thread 钟居哲
LGTM. But I think you should remove FP16 run tests.

So plz send a patch first remove FP16 run test of CEIL first.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-23 08:40
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Suport FP floor auto-vectorization
From: Pan Li 
 
This patch would like to support auto-vectorization for the
floor API in math.h. It depends on the -ffast-math option.
 
When we would like to call floor/floorf like v2 = floor (v1), we will
convert it into below insns (reference the implementation of llvm).
 
* vfcvt.x.f v3, v1, RDN
* vfcvt.f.x v2, v3
 
However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.
 
  +---+---+-+
  | raw float | binary layout | after floor |
  +---+---+-+
  | 8388607.5 | 0x4aff| 8388607.0   |
  | 8388608.0 | 0x4b00| 8388608.0   |
  | 8388609.0 | 0x4b01| 8388609.0   |
  +---+---+-+
 
All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.
 
Befor this patch:
math-floor-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callceilf
  fsw fa0,-4(s1)
  bne s0,s2,.L3
 
After this patch:
  ...
  fsrmi   2   // Rounding Down
.L4:
  vfabs.v v1,v2
  vmflt.vfv0,v1,fa5
  vfcvt.x.f.v v3,v2,v0.t
  vfcvt.f.x.v v1,v3,v0.t
  vfsgnj.vv   v1,v1,v2
  bne .L4
.L14:
  fsrma6
  ret
 
Please note VLS mode is also involved in this patch and covered by the
test cases.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (floor2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_floor): New function decl.
* config/riscv/riscv-v.cc (gen_floor_const_fp): New function impl.
(expand_vec_floor): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-floor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-floor-1.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/autovec.md   | 11 
gcc/config/riscv/riscv-protos.h   |  5 ++
gcc/config/riscv/riscv-v.cc   | 35 +++-
.../riscv/rvv/autovec/unop/math-floor-0.c | 23 
.../riscv/rvv/autovec/unop/math-floor-1.c | 23 
.../riscv/rvv/autovec/unop/math-floor-2.c | 23 
.../riscv/rvv/autovec/unop/math-floor-3.c | 25 +
.../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +
.../riscv/rvv/autovec/unop/math-floor-run-1.c | 39 +
.../riscv/rvv/autovec/unop/math-floor-run-2.c | 39 +
.../riscv/rvv/autovec/vls/math-floor-1.c  | 56 +++
11 files changed, 316 insertions(+), 2 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-floor-1.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 6f35fb1bd9e..a005e17457e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2209,6 +2209,7 @@ (define_expand "avg3_ceil"
;; -
;; Includes:
;; - ceil/ceilf
+;; - floor/floorf
;; -
(define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2219,3 +2220,13 @@ (define_expand "ceil2"
 DONE;
   }
)
+
+(define_expand "floor2"
+  [(match_operand:V_VLSF 0 "register_operand")
+   (match_operand:V_VLSF 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_floor (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 34becfbaba8..63eb2475705 100644
-

[PATCH v2] RISC-V: Suport FP floor auto-vectorization

2023-09-22 Thread pan2 . li
From: Pan Li 

This patch would like to support auto-vectorization for the
floor API in math.h. It depends on the -ffast-math option.

When we would like to call floor/floorf like v2 = floor (v1), we will
convert it into below insns (reference the implementation of llvm).

* vfcvt.x.f v3, v1, RDN
* vfcvt.f.x v2, v3

However, the floating point value may not need the cvt as above if
its mantissa is zero. For example single precision floating point below.

  +---+---+-+
  | raw float | binary layout | after floor |
  +---+---+-+
  | 8388607.5 | 0x4aff| 8388607.0   |
  | 8388608.0 | 0x4b00| 8388608.0   |
  | 8388609.0 | 0x4b01| 8388609.0   |
  +---+---+-+

All single floating point glte 8388608.0 will have all zero mantisaa.
We leverage vmflt and mask to filter them out in vector and only do the
cvt on mask.

Befor this patch:
math-floor-1.c:21:1: missed: couldn't vectorize loop
  ...
.L3:
  flw fa0,0(s0)
  addis0,s0,4
  addis1,s1,4
  callceilf
  fsw fa0,-4(s1)
  bne s0,s2,.L3

After this patch:
  ...
  fsrmi   2   // Rounding Down
.L4:
  vfabs.v v1,v2
  vmflt.vfv0,v1,fa5
  vfcvt.x.f.v v3,v2,v0.t
  vfcvt.f.x.v v1,v3,v0.t
  vfsgnj.vv   v1,v1,v2
  bne .L4
.L14:
  fsrma6
  ret

Please note VLS mode is also involved in this patch and covered by the
test cases.

gcc/ChangeLog:

* config/riscv/autovec.md (floor2): New pattern.
* config/riscv/riscv-protos.h (enum insn_flags): New enum type.
(enum insn_type): Ditto.
(expand_vec_floor): New function decl.
* config/riscv/riscv-v.cc (gen_floor_const_fp): New function impl.
(expand_vec_floor): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-floor-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-2.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-3.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/unop/math-floor-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/math-floor-1.c: New test.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/autovec.md   | 11 
 gcc/config/riscv/riscv-protos.h   |  5 ++
 gcc/config/riscv/riscv-v.cc   | 35 +++-
 .../riscv/rvv/autovec/unop/math-floor-0.c | 23 
 .../riscv/rvv/autovec/unop/math-floor-1.c | 23 
 .../riscv/rvv/autovec/unop/math-floor-2.c | 23 
 .../riscv/rvv/autovec/unop/math-floor-3.c | 25 +
 .../riscv/rvv/autovec/unop/math-floor-run-0.c | 39 +
 .../riscv/rvv/autovec/unop/math-floor-run-1.c | 39 +
 .../riscv/rvv/autovec/unop/math-floor-run-2.c | 39 +
 .../riscv/rvv/autovec/vls/math-floor-1.c  | 56 +++
 11 files changed, 316 insertions(+), 2 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-3.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-0.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-1.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-floor-run-2.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/math-floor-1.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 6f35fb1bd9e..a005e17457e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -2209,6 +2209,7 @@ (define_expand "avg3_ceil"
 ;; -
 ;; Includes:
 ;; - ceil/ceilf
+;; - floor/floorf
 ;; -
 (define_expand "ceil2"
   [(match_operand:V_VLSF 0 "register_operand")
@@ -2219,3 +2220,13 @@ (define_expand "ceil2"
 DONE;
   }
 )
+
+(define_expand "floor2"
+  [(match_operand:V_VLSF 0 "register_operand")
+   (match_operand:V_VLSF 1 "register_operand")]
+  "TARGET_VECTOR && !flag_trapping_math && !flag_rounding_math"
+  {
+riscv_vector::expand_vec_floor (operands[0], operands[1], mode, 
mode);
+DONE;
+  }
+)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 34becfbaba8..63eb2475705 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -253,6 +253,9 @@ enum insn_flags : unsigned int
 
   /* Means INSN has FRM operand and the value is FRM_RUP

[PATCH v4] c++: Check for indirect change of active union member in constexpr [PR101631,PR102286]

2023-09-22 Thread Nathaniel Shead
Now that bootstrap has finished, I have gotten regressions in the
following libstdc++ tests:

Running libstdc++:libstdc++-dg/conformance.exp ...
FAIL: 20_util/bitset/access/constexpr.cc -std=gnu++23 (test for excess errors)
FAIL: 20_util/bitset/access/constexpr.cc -std=gnu++26 (test for excess errors)
FAIL: 20_util/variant/constexpr.cc -std=gnu++20 (test for excess errors)
FAIL: 20_util/variant/constexpr.cc -std=gnu++26 (test for excess errors)
FAIL: 21_strings/basic_string/cons/char/constexpr.cc -std=gnu++20 (test for 
excess errors)
FAIL: 21_strings/basic_string/cons/char/constexpr.cc -std=gnu++26 (test for 
excess errors)
FAIL: 21_strings/basic_string/cons/wchar_t/constexpr.cc -std=gnu++20 (test for 
excess errors)
FAIL: 21_strings/basic_string/cons/wchar_t/constexpr.cc -std=gnu++26 (test for 
excess errors)
FAIL: 21_strings/basic_string/modifiers/swap/constexpr-wchar_t.cc -std=gnu++20 
(test for excess errors)
FAIL: 21_strings/basic_string/modifiers/swap/constexpr-wchar_t.cc -std=gnu++26 
(test for excess errors)
FAIL: 21_strings/basic_string/modifiers/swap/constexpr.cc -std=gnu++20 (test 
for excess errors)
FAIL: 21_strings/basic_string/modifiers/swap/constexpr.cc -std=gnu++26 (test 
for excess errors)
FAIL: std/ranges/adaptors/join_with/1.cc -std=gnu++23 (test for excess errors)
UNRESOLVED: std/ranges/adaptors/join_with/1.cc -std=gnu++23 compilation failed 
to produce executable
FAIL: std/ranges/adaptors/join_with/1.cc -std=gnu++26 (test for excess errors)
UNRESOLVED: std/ranges/adaptors/join_with/1.cc -std=gnu++26 compilation failed 
to produce executable

On investigation though it looks like the issue might be with libstdc++
rather than the patch itself; running the failing tests using clang with
libstdc++ also produces similar errors, and my reading of the code
suggests that this is correct.

What's the way forward here? Should I look at creating a patch to fix
the libstdc++ issues before resubmitting this patch for the C++
frontend? Or should I submit a version of this patch without the
`std::construct_at` changes and wait till libstdc++ gets fixed for that?

On Sat, Sep 23, 2023 at 01:01:20AM +1000, Nathaniel Shead wrote:
> On Fri, Sep 22, 2023 at 02:21:15PM +0100, Jason Merrill wrote:
> > On 9/21/23 09:41, Nathaniel Shead wrote:
> > > I've updated the error messages, and also fixed another bug I found
> > > while retesting (value-initialised unions weren't considered to have any
> > > active member yet).
> > > 
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu.
> > > 
> > > -- >8 --
> > > 
> > > This patch adds checks for attempting to change the active member of a
> > > union by methods other than a member access expression.
> > > 
> > > To be able to properly distinguish `*(&u.a) = ` from `u.a = `, this
> > > patch redoes the solution for c++/59950 to avoid extranneous *&; it
> > > seems that the only case that needed the workaround was when copying
> > > empty classes.
> > > 
> > > This patch also ensures that constructors for a union field mark that
> > > field as the active member before entering the call itself; this ensures
> > > that modifications of the field within the constructor's body don't
> > > cause false positives (as these will not appear to be member access
> > > expressions). This means that we no longer need to start the lifetime of
> > > empty union members after the constructor body completes.
> > > 
> > > As a drive-by fix, this patch also ensures that value-initialised unions
> > > are considered to have activated their initial member for the purpose of
> > > checking stores, which catches some additional mistakes pre-C++20.
> > > 
> > >   PR c++/101631
> > > 
> > > gcc/cp/ChangeLog:
> > > 
> > >   * call.cc (build_over_call): Fold more indirect refs for trivial
> > >   assignment op.
> > >   * class.cc (type_has_non_deleted_trivial_default_ctor): Create.
> > >   * constexpr.cc (cxx_eval_call_expression): Start lifetime of
> > >   union member before entering constructor.
> > >   (cxx_eval_store_expression): Activate member for
> > >   value-initialised union. Check for accessing inactive union
> > >   member indirectly.
> > >   * cp-tree.h (type_has_non_deleted_trivial_default_ctor):
> > >   Forward declare.
> > > 
> > > gcc/testsuite/ChangeLog:
> > > 
> > >   * g++.dg/cpp1y/constexpr-89336-3.C: Fix union initialisation.
> > >   * g++.dg/cpp1y/constexpr-union6.C: New test.
> > >   * g++.dg/cpp2a/constexpr-union2.C: New test.
> > >   * g++.dg/cpp2a/constexpr-union3.C: New test.
> > >   * g++.dg/cpp2a/constexpr-union4.C: New test.
> > >   * g++.dg/cpp2a/constexpr-union5.C: New test.
> > > 
> > > Signed-off-by: Nathaniel Shead 
> > > ---
> > >   gcc/cp/call.cc|  11 +-
> > >   gcc/cp/class.cc   |   8 ++
> > >   gcc/cp/constexpr.cc   | 129 +-
> > >   gcc/cp/cp-tree.h  |   1 +
> > >   .../g++.dg/cpp1y/constexpr-89336-3.C  |   2 +-
> >

Re: [Committed] RISC-V: Extend VLS modes in 'VWEXTI' iterator

2023-09-22 Thread Patrick O'Neill

Hi Juzhe,

I'm seeing a few new regressions from this patch on glibc rv32gcv.
I filed a bugzilla for the ICE: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111546


Patrick

On 9/19/23 19:24, Juzhe-Zhong wrote:

This patch extends 'VWEXT' iterator so that we will support
integer extension/integer truncate/integer average VLS patterns.

This patch reduce these following FAILs:

FAIL: gcc.dg/pr92301.c execution test
XPASS: gcc.dg/vect/bb-slp-subgroups-3.c -flto -ffat-lto-objects  scan-tree-dump-times 
slp2 "optimized: basic block" 2
XPASS: gcc.dg/vect/bb-slp-subgroups-3.c scan-tree-dump-times slp2 "optimized: basic 
block" 2

The pr92301.c is the latent bug in middle-end GIMPLE FOLD.
We are just lucky that this test passes with this patch which makes us not 
trigger the GIMPLE FOLD bug again.
  
gcc/ChangeLog:


* config/riscv/riscv-v.cc (can_find_related_mode_p): New function.
(vectorize_related_mode): Add VLS related modes.
* config/riscv/vector-iterators.md: Extend VLS modes.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/narrow-1.c: Adapt testcase.
* gcc.target/riscv/rvv/autovec/binop/narrow-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/binop/narrow-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/cmp/vcond-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/cmp/vcond-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/cmp/vcond-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-18.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-19.c: Ditto.
* gcc.target/riscv/rvv/autovec/pr110950.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-10.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-11.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-12.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/def.h: Ditto.
* gcc.target/riscv/rvv/autovec/vls/div-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-7.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-8.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-9.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/avg-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/avg-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/avg-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/avg-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/avg-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/avg-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/ext-1.c: New test.
* gcc.targ

[PATCH] RISC-V/testsuite: Fix ILP32 RVV failures from missing

2023-09-22 Thread Maciej W. Rozycki
In non-multilib installations system headers may not be available for 
compilation options using a non-default model, causing build errors such 
as:

In file included from .../include/features.h:527,
 from .../include/assert.h:35,
 from 
.../gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-template.h:2,
 from 
.../gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c:4:
.../include/gnu/stubs.h:11:11: fatal error: gnu/stubs-ilp32d.h: No such file or 
directory

Therefore we have to be very cautious when trying to use a non-default 
model in the testsuite, preferably avoiding to rely on headers that have 
not been supplied by GCC itself, or otherwise verifying in a preparatory 
step whether the given model is buildable in a given test environment.

In this case however we can easily avoid the issue, because  
facilities are not used at all by "vmv-imm-template.h", which includes 
the header.  Remove the inclusion then, turning these issues:

FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize 
(test for excess errors)
UNRESOLVED: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 
-ftree-vectorize  scan-assembler-times vmv.v.i 32
UNRESOLVED: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 
-ftree-vectorize  scan-assembler-times vmv.v.x 8
FAIL: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize (test 
for excess errors)
UNRESOLVED: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize  
scan-assembler-times vmv.v.i 32
UNRESOLVED: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize  
scan-assembler-times vmv.v.x 8

into successful results:

PASS: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize 
(test for excess errors)
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize  
scan-assembler-times vmv.v.i 32
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c -O3 -ftree-vectorize  
scan-assembler-times vmv.v.x 8
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize (test 
for excess errors)
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize  
scan-assembler-times vmv.v.i 32
PASS: gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c -O3 -ftree-vectorize  
scan-assembler-times vmv.v.x 8

in a plain LP64 `riscv64-linux-gnu' configuration.

gcc/testsuite/
* gcc.target/riscv/rvv/autovec/vmv-imm-template.h: Remove 
 inclusion.
---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-template.h |1 -
 1 file changed, 1 deletion(-)

gcc-test-riscv-rvv-assert.diff
Index: gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-template.h
===
--- gcc.orig/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-template.h
+++ gcc/gcc/testsuite/gcc.target/riscv/rvv/autovec/vmv-imm-template.h
@@ -1,5 +1,4 @@
 #include 
-#include 
 
 #define VMV_POS(TYPE,VAL)  \
   __attribute__ ((noipa))   \


Re: [Committed] RISC-V: Support VLS INT <-> FP conversions

2023-09-22 Thread Edwin Lu

Hi Juzhe,

I was testing this patch and found it introduced a gfortran regression 
in gfortran.dg/host_assoc_function_7.f90. More info here: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111545


Edwin

On 9/20/2023 7:17 PM, Juzhe-Zhong wrote:

Support INT <-> FP VLS auto-vectorization patterns.

Regression passed.
Committed.

gcc/ChangeLog:

* config/riscv/autovec.md: Extend VLS modes.
* config/riscv/vector-iterators.md: Ditto.
* config/riscv/vector.md: Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/convert-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/convert-10.c: New test.
* gcc.target/riscv/rvv/autovec/vls/convert-11.c: New test.
* gcc.target/riscv/rvv/autovec/vls/convert-12.c: New test.
* gcc.target/riscv/rvv/autovec/vls/convert-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/convert-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls/convert-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls/convert-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls/convert-6.c: New test.
* gcc.target/riscv/rvv/autovec/vls/convert-7.c: New test.
* gcc.target/riscv/rvv/autovec/vls/convert-8.c: New test.
* gcc.target/riscv/rvv/autovec/vls/convert-9.c: New test.

---
  gcc/config/riscv/autovec.md   |  12 +-
  gcc/config/riscv/vector-iterators.md  | 202 ++
  gcc/config/riscv/vector.md|  20 +-
  .../riscv/rvv/autovec/vls/convert-1.c |  74 +++
  .../riscv/rvv/autovec/vls/convert-10.c|  80 +++
  .../riscv/rvv/autovec/vls/convert-11.c|  54 +
  .../riscv/rvv/autovec/vls/convert-12.c|  36 
  .../riscv/rvv/autovec/vls/convert-2.c |  74 +++
  .../riscv/rvv/autovec/vls/convert-3.c |  58 +
  .../riscv/rvv/autovec/vls/convert-4.c |  36 
  .../riscv/rvv/autovec/vls/convert-5.c |  80 +++
  .../riscv/rvv/autovec/vls/convert-6.c |  55 +
  .../riscv/rvv/autovec/vls/convert-7.c |  37 
  .../riscv/rvv/autovec/vls/convert-8.c |  58 +
  .../riscv/rvv/autovec/vls/convert-9.c |  22 ++
  15 files changed, 882 insertions(+), 16 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-1.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-10.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-11.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-12.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-2.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-3.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-4.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-5.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-6.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-7.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-8.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/convert-9.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 75ed7ae4f2e..55c0a04df3b 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -847,7 +847,7 @@
  (define_insn_and_split "2"
[(set (match_operand: 0 "register_operand")
(any_fix:
- (match_operand:VF 1 "register_operand")))]
+ (match_operand:V_VLSF 1 "register_operand")))]
"TARGET_VECTOR && can_create_pseudo_p ()"
"#"
"&& 1"
@@ -868,8 +868,8 @@
  ;; -
  
  (define_insn_and_split "2"

-  [(set (match_operand:VF 0 "register_operand")
-   (any_float:VF
+  [(set (match_operand:V_VLSF 0 "register_operand")
+   (any_float:V_VLSF
  (match_operand: 1 "register_operand")))]
"TARGET_VECTOR && can_create_pseudo_p ()"
"#"
@@ -916,8 +916,8 @@
  ;; - vfwcvt.f.x.v
  ;; -
  (define_insn_and_split "2"
-  [(set (match_operand:VF 0 "register_operand")
-   (any_float:VF
+  [(set (match_operand:V_VLSF 0 "register_operand")
+   (any_float:V_VLSF
  (match_operand: 1 "register_operand")))]
"TARGET_VECTOR && can_create_pseudo_p ()"
"#"
@@ -940,7 +940,7 @@
  (define_insn_and_split "2"
[(set (match_operand: 0 "register_operand")
(any_fix:
- (match_operand:VF 1 "register_operand")))]
+ (match_operand:V_VLSF 1 "register_operand")))]
"TARGET_VECTOR && can_create_pseudo_p ()"
"#"
"&& 1"
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 053d84c0c7d..19f3ec3ef74 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-itera

[PATCH] fortran: error recovery on duplicate declaration of class variable [PR95710]

2023-09-22 Thread Harald Anlauf
Dear all,

the attached simple and obvious patch fixes several NULL pointer
dereferences that are encountered for a duplicate declaration of
a class variable.  Another one from Gerhard's torture tests...

Regtested on x86_64-pc-linux-gnu.

I intend to commit within 24h unless there are comments.

Thanks,
Harald

From 0027c58c172889bdb5c09ecea0faf3c48624dc21 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Fri, 22 Sep 2023 21:06:00 +0200
Subject: [PATCH] fortran: error recovery on duplicate declaration of class
 variable [PR95710]

gcc/fortran/ChangeLog:

	PR fortran/95710
	* class.cc (gfc_build_class_symbol): Do not try to build class
	container for invalid typespec.
	* resolve.cc (resolve_fl_var_and_proc): Prevent NULL pointer
	dereference.
	(resolve_symbol): Likewise.

gcc/testsuite/ChangeLog:

	PR fortran/95710
	* gfortran.dg/pr95710.f90: New test.
---
 gcc/fortran/class.cc  |  4 
 gcc/fortran/resolve.cc|  4 +++-
 gcc/testsuite/gfortran.dg/pr95710.f90 | 17 +
 3 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/pr95710.f90

diff --git a/gcc/fortran/class.cc b/gcc/fortran/class.cc
index 9d0c802b867..5c43b77dba3 100644
--- a/gcc/fortran/class.cc
+++ b/gcc/fortran/class.cc
@@ -647,6 +647,10 @@ gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr,

   gcc_assert (as);

+  /* We cannot build the class container now.  */
+  if (attr->class_ok && (!ts->u.derived || !ts->u.derived->components))
+return false;
+
   /* Class container has already been built with same name.  */
   if (attr->class_ok
   && ts->u.derived->components->attr.dimension >= attr->dimension
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 1042b8c18e8..861f69ac20f 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -13326,6 +13326,7 @@ resolve_fl_var_and_proc (gfc_symbol *sym, int mp_flag)
 	  && sym->ts.u.derived
 	  && !sym->attr.select_type_temporary
 	  && !UNLIMITED_POLY (sym)
+	  && CLASS_DATA (sym)
 	  && CLASS_DATA (sym)->ts.u.derived
 	  && !gfc_type_is_extensible (CLASS_DATA (sym)->ts.u.derived))
 	{
@@ -16068,7 +16069,8 @@ resolve_symbol (gfc_symbol *sym)
   specification_expr = saved_specification_expr;
 }

-  if (sym->ts.type == BT_CLASS && sym->attr.class_ok && sym->ts.u.derived)
+  if (sym->ts.type == BT_CLASS && sym->attr.class_ok && sym->ts.u.derived
+  && CLASS_DATA (sym))
 {
   as = CLASS_DATA (sym)->as;
   class_attr = CLASS_DATA (sym)->attr;
diff --git a/gcc/testsuite/gfortran.dg/pr95710.f90 b/gcc/testsuite/gfortran.dg/pr95710.f90
new file mode 100644
index 000..566c38d0a9d
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr95710.f90
@@ -0,0 +1,17 @@
+! { dg-do compile }
+! PR fortran/95710 - ICE on duplicate declaration of class variable
+! Contributed by G.Steinmetz
+
+module m
+  interface
+ module function s()
+ end
+  end interface
+end
+submodule(m) m2
+contains
+  module function s()
+class(*), allocatable :: x
+class(*), allocatable :: x ! { dg-error "Unclassifiable statement" }
+  end
+end
--
2.35.3



Re: [pushed] c++: unroll pragma in templates [PR111529]

2023-09-22 Thread Andrew Pinski
On Fri, Sep 22, 2023 at 6:01 AM Jason Merrill  wrote:
>
> Tested x86_64-pc-linux-gnu, applying to trunk.
>
> -- 8< --
>
> We were failing to handle ANNOTATE_EXPR in tsubst_copy_and_build, leading to
> problems with substitution of any wrapped expressions.
>
> Let's also not tell users that lambda templates are available in C++14.

This part of the patch fixes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108026 .

>
> PR c++/111529
>
> gcc/cp/ChangeLog:
>
> * parser.cc (cp_parser_lambda_declarator_opt): Don't suggest
> -std=c++14 for lambda templates.
> * pt.cc (tsubst_expr): Move ANNOTATE_EXPR handling...
> (tsubst_copy_and_build): ...here.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/ext/unroll-4.C: New test.
> ---
>  gcc/cp/parser.cc|  7 ++-
>  gcc/cp/pt.cc| 14 +++---
>  gcc/testsuite/g++.dg/ext/unroll-4.C | 16 
>  3 files changed, 25 insertions(+), 12 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/ext/unroll-4.C
>
> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
> index 0e1cbbfe051..f3abae716fe 100644
> --- a/gcc/cp/parser.cc
> +++ b/gcc/cp/parser.cc
> @@ -11695,11 +11695,8 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, 
> tree lambda_expr)
>   an opening angle if present.  */
>if (cp_lexer_next_token_is (parser->lexer, CPP_LESS))
>  {
> -  if (cxx_dialect < cxx14)
> -   pedwarn (parser->lexer->next_token->location, OPT_Wc__14_extensions,
> -"lambda templates are only available with "
> -"%<-std=c++14%> or %<-std=gnu++14%>");
> -  else if (pedantic && cxx_dialect < cxx20)
> +  if (cxx_dialect < cxx20
> + && (pedantic || cxx_dialect < cxx14))
> pedwarn (parser->lexer->next_token->location, OPT_Wc__20_extensions,
>  "lambda templates are only available with "
>  "%<-std=c++20%> or %<-std=gnu++20%>");
> diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
> index 9b100e12a23..ea5379098a5 100644
> --- a/gcc/cp/pt.cc
> +++ b/gcc/cp/pt.cc
> @@ -19913,13 +19913,6 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
> complain, tree in_decl)
> templated_operator_saved_lookups (t),
> complain));
>
> -case ANNOTATE_EXPR:
> -  tmp = RECUR (TREE_OPERAND (t, 0));
> -  RETURN (build3_loc (EXPR_LOCATION (t), ANNOTATE_EXPR,
> - TREE_TYPE (tmp), tmp,
> - RECUR (TREE_OPERAND (t, 1)),
> - RECUR (TREE_OPERAND (t, 2;
> -
>  case PREDICT_EXPR:
>RETURN (add_stmt (copy_node (t)));
>
> @@ -21868,6 +21861,13 @@ tsubst_copy_and_build (tree t,
> RETURN (op);
>}
>
> +case ANNOTATE_EXPR:
> +  op1 = RECUR (TREE_OPERAND (t, 0));
> +  RETURN (build3_loc (EXPR_LOCATION (t), ANNOTATE_EXPR,
> + TREE_TYPE (op1), op1,
> + RECUR (TREE_OPERAND (t, 1)),
> + RECUR (TREE_OPERAND (t, 2;
> +
>  default:
>/* Handle Objective-C++ constructs, if appropriate.  */
>{
> diff --git a/gcc/testsuite/g++.dg/ext/unroll-4.C 
> b/gcc/testsuite/g++.dg/ext/unroll-4.C
> new file mode 100644
> index 000..d488aca974e
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/unroll-4.C
> @@ -0,0 +1,16 @@
> +// PR c++/111529
> +// { dg-do compile { target c++11 } }
> +// { dg-additional-options -Wno-c++20-extensions }
> +
> +template 
> +void f() {
> +  []() {
> +#pragma GCC unroll 9
> +for (int i = 1; i; --i) {
> +}
> +  };
> +}
> +
> +int main() {
> +  f<0>();
> +}
>
> base-commit: 4c496020764057453415f1ae599950724ec0e871
> --
> 2.39.3
>


Re: [PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument.

2023-09-22 Thread Vladimir Makarov



On 9/22/23 06:56, Hongyu Wang wrote:

Like base_reg_class, INDEX_REG_CLASS also does not support backend insn.
Add index_reg_class with insn argument for lra/reload usage.

gcc/ChangeLog:

* addresses.h (index_reg_class): New wrapper function like
base_reg_class.
* doc/tm.texi: Document INSN_INDEX_REG_CLASS.
* doc/tm.texi.in: Ditto.
* lra-constraints.cc (index_part_to_reg): Pass index_class.
(process_address_1): Calls index_reg_class with curr_insn and
replace INDEX_REG_CLASS with its return value index_cl.
* reload.cc (find_reloads_address): Likewise.
(find_reloads_address_1): Likewise.


The patch is ok for me to commit it to the trunk.  Thank you.

So all changes to the RA have been reviewed.  You just need an approval 
to the rest patches from an x86-64 maintainer.




Re: [PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-22 Thread Vladimir Makarov



On 9/22/23 06:56, Hongyu Wang wrote:

From: Kong Lingling 

Current reload infrastructure does not support selective base_reg_class
for backend insn. Add new macros with insn parameters to base_reg_class
for lra/reload usage.

gcc/ChangeLog:

* addresses.h (base_reg_class): Add insn argument and new macro
INSN_BASE_REG_CLASS.
(regno_ok_for_base_p_1): Add insn argument and new macro
REGNO_OK_FOR_INSN_BASE_P.
(regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1.
* doc/tm.texi: Document INSN_BASE_REG_CLASS and
REGNO_OK_FOR_INSN_BASE_P.
* doc/tm.texi.in: Ditto.
* lra-constraints.cc (process_address_1): Pass insn to
base_reg_class.
(curr_insn_transform): Ditto.
* reload.cc (find_reloads): Ditto.
(find_reloads_address): Ditto.
(find_reloads_address_1): Ditto.
(find_reloads_subreg_address): Ditto.
* reload1.cc (maybe_fix_stack_asms): Ditto.


The patch is ok for committing to the trunk.  Thank you.

It would be nice to add to the documentation that INSN_BASE_REG_CLASS, 
INSN_INDEX_REG_CLASS, and REGNO_OK_FOR_INSN_BASE_P if defined have 
priority over older corresponding macros as it is already documented for 
REGNO_MODE_CODE_OK_FOR_BASE_P relating to REGNO_OK_FOR_BASE_P. But this 
small issue can be addressed later.





[PATCH v4] c++: Check for indirect change of active union member in constexpr [PR101631,PR102286]

2023-09-22 Thread Nathaniel Shead
On Fri, Sep 22, 2023 at 02:21:15PM +0100, Jason Merrill wrote:
> On 9/21/23 09:41, Nathaniel Shead wrote:
> > I've updated the error messages, and also fixed another bug I found
> > while retesting (value-initialised unions weren't considered to have any
> > active member yet).
> > 
> > Bootstrapped and regtested on x86_64-pc-linux-gnu.
> > 
> > -- >8 --
> > 
> > This patch adds checks for attempting to change the active member of a
> > union by methods other than a member access expression.
> > 
> > To be able to properly distinguish `*(&u.a) = ` from `u.a = `, this
> > patch redoes the solution for c++/59950 to avoid extranneous *&; it
> > seems that the only case that needed the workaround was when copying
> > empty classes.
> > 
> > This patch also ensures that constructors for a union field mark that
> > field as the active member before entering the call itself; this ensures
> > that modifications of the field within the constructor's body don't
> > cause false positives (as these will not appear to be member access
> > expressions). This means that we no longer need to start the lifetime of
> > empty union members after the constructor body completes.
> > 
> > As a drive-by fix, this patch also ensures that value-initialised unions
> > are considered to have activated their initial member for the purpose of
> > checking stores, which catches some additional mistakes pre-C++20.
> > 
> > PR c++/101631
> > 
> > gcc/cp/ChangeLog:
> > 
> > * call.cc (build_over_call): Fold more indirect refs for trivial
> > assignment op.
> > * class.cc (type_has_non_deleted_trivial_default_ctor): Create.
> > * constexpr.cc (cxx_eval_call_expression): Start lifetime of
> > union member before entering constructor.
> > (cxx_eval_store_expression): Activate member for
> > value-initialised union. Check for accessing inactive union
> > member indirectly.
> > * cp-tree.h (type_has_non_deleted_trivial_default_ctor):
> > Forward declare.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > * g++.dg/cpp1y/constexpr-89336-3.C: Fix union initialisation.
> > * g++.dg/cpp1y/constexpr-union6.C: New test.
> > * g++.dg/cpp2a/constexpr-union2.C: New test.
> > * g++.dg/cpp2a/constexpr-union3.C: New test.
> > * g++.dg/cpp2a/constexpr-union4.C: New test.
> > * g++.dg/cpp2a/constexpr-union5.C: New test.
> > 
> > Signed-off-by: Nathaniel Shead 
> > ---
> >   gcc/cp/call.cc|  11 +-
> >   gcc/cp/class.cc   |   8 ++
> >   gcc/cp/constexpr.cc   | 129 +-
> >   gcc/cp/cp-tree.h  |   1 +
> >   .../g++.dg/cpp1y/constexpr-89336-3.C  |   2 +-
> >   gcc/testsuite/g++.dg/cpp1y/constexpr-union6.C |  13 ++
> >   gcc/testsuite/g++.dg/cpp2a/constexpr-union2.C |  30 
> >   gcc/testsuite/g++.dg/cpp2a/constexpr-union3.C |  45 ++
> >   gcc/testsuite/g++.dg/cpp2a/constexpr-union4.C |  29 
> >   gcc/testsuite/g++.dg/cpp2a/constexpr-union5.C |  71 ++
> >   10 files changed, 296 insertions(+), 43 deletions(-)
> >   create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-union6.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union2.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union3.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union4.C
> >   create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union5.C
> > 
> > diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
> > index e8dafbd8ba6..c1fb8807d3f 100644
> > --- a/gcc/cp/call.cc
> > +++ b/gcc/cp/call.cc
> > @@ -10330,10 +10330,7 @@ build_over_call (struct z_candidate *cand, int 
> > flags, tsubst_flags_t complain)
> >&& DECL_OVERLOADED_OPERATOR_IS (fn, NOP_EXPR)
> >&& trivial_fn_p (fn))
> >   {
> > -  /* Don't use cp_build_fold_indirect_ref, op= returns an lvalue even 
> > if
> > -the object argument isn't one.  */
> > -  tree to = cp_build_indirect_ref (input_location, argarray[0],
> > -  RO_ARROW, complain);
> > +  tree to = cp_build_fold_indirect_ref (argarray[0]);
> > tree type = TREE_TYPE (to);
> > tree as_base = CLASSTYPE_AS_BASE (type);
> > tree arg = argarray[1];
> > @@ -10341,7 +10338,11 @@ build_over_call (struct z_candidate *cand, int 
> > flags, tsubst_flags_t complain)
> > if (is_really_empty_class (type, /*ignore_vptr*/true))
> > {
> > - /* Avoid copying empty classes.  */
> > + /* Avoid copying empty classes, but ensure op= returns an lvalue even
> > +if the object argument isn't one. This isn't needed in other cases
> > +since MODIFY_EXPR is always considered an lvalue.  */
> > + to = cp_build_addr_expr (to, tf_none);
> > + to = cp_build_indirect_ref (input_location, to, RO_ARROW, complain);
> >   val = build2 (COMPOUND_EXPR, type, arg, to);
> >   suppress_warning (val, OPT_Wunused);
> > }
> > di

[pushed] c++ __integer_pack conversion again [PR111357]

2023-09-22 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

As Jakub pointed out, the real problem here is that in a partial
substitution we're forgetting the conversion to the type of the non-type
template argument, because maybe_convert_nontype_argument doesn't do
anything with value-dependent arguments.  I'm experimenting with changing
that, but in the meantime we can work around it here.

PR c++/111357

gcc/cp/ChangeLog:

* pt.cc (expand_integer_pack): Use IMPLICIT_CONV_EXPR.
---
 gcc/cp/pt.cc | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index ea5379098a5..73ac1cb597c 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -3769,6 +3769,13 @@ expand_integer_pack (tree call, tree args, 
tsubst_flags_t complain,
 {
   if (hi != ohi)
{
+ /* Work around maybe_convert_nontype_argument not doing this for
+dependent arguments.  Don't use IMPLICIT_CONV_EXPR_NONTYPE_ARG
+because that will make tsubst_copy_and_build ignore it.  */
+ tree type = tsubst (TREE_TYPE (ohi), args, complain, in_decl);
+ if (!TREE_TYPE (hi) || !same_type_p (type, TREE_TYPE (hi)))
+   hi = build1 (IMPLICIT_CONV_EXPR, type, hi);
+
  call = copy_node (call);
  CALL_EXPR_ARG (call, 0) = hi;
}
@@ -3779,8 +3786,6 @@ expand_integer_pack (tree call, tree args, tsubst_flags_t 
complain,
 }
   else
 {
-  hi = perform_implicit_conversion_flags (integer_type_node, hi, complain,
- LOOKUP_IMPLICIT);
   hi = instantiate_non_dependent_expr (hi, complain);
   hi = cxx_constant_value (hi, complain);
   int len = valid_constant_size_p (hi) ? tree_to_shwi (hi) : -1;

base-commit: 22cda0ca5fb406f22925bbf51ab152a958e3319d
-- 
2.39.3



[pushed] c++: constexpr and designated initializer

2023-09-22 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

The change of active member being non-constant (before C++20) results in a
CONSTRUCTOR with a null value for the first field, don't crash.

gcc/cp/ChangeLog:

* constexpr.cc (free_constructor): Handle null ce->value.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-union7.C: New test.
---
 gcc/cp/constexpr.cc   | 2 +-
 gcc/testsuite/g++.dg/cpp2a/constexpr-union7.C | 6 ++
 2 files changed, 7 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union7.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index a673a6022f1..2a6601c0cbc 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -1753,7 +1753,7 @@ free_constructor (tree t)
{
  constructor_elt *ce;
  for (HOST_WIDE_INT i = 0; vec_safe_iterate (elts, i, &ce); ++i)
-   if (TREE_CODE (ce->value) == CONSTRUCTOR)
+   if (ce->value && TREE_CODE (ce->value) == CONSTRUCTOR)
  vec_safe_push (ctors, ce->value);
  ggc_free (elts);
}
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-union7.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-union7.C
new file mode 100644
index 000..230fa6e7d06
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-union7.C
@@ -0,0 +1,6 @@
+// { dg-do compile { target c++14 } }
+// { dg-options "" }
+
+union U { int i; float f; };
+constexpr auto g (U u) { return (u.i = 42); } // { dg-error "active member" "" 
{ target c++17_down } }
+static_assert (g({.f = 3.14}) == 42); // { dg-error "non-constant" "" { target 
c++17_down } }

base-commit: 9c62af101e11e1cce573c2b3d2e18b403412dbc8
-- 
2.39.3



Re: [PATCH v3] c++: Catch indirect change of active union member in constexpr [PR101631]

2023-09-22 Thread Jason Merrill

On 9/21/23 09:41, Nathaniel Shead wrote:

I've updated the error messages, and also fixed another bug I found
while retesting (value-initialised unions weren't considered to have any
active member yet).

Bootstrapped and regtested on x86_64-pc-linux-gnu.

-- >8 --

This patch adds checks for attempting to change the active member of a
union by methods other than a member access expression.

To be able to properly distinguish `*(&u.a) = ` from `u.a = `, this
patch redoes the solution for c++/59950 to avoid extranneous *&; it
seems that the only case that needed the workaround was when copying
empty classes.

This patch also ensures that constructors for a union field mark that
field as the active member before entering the call itself; this ensures
that modifications of the field within the constructor's body don't
cause false positives (as these will not appear to be member access
expressions). This means that we no longer need to start the lifetime of
empty union members after the constructor body completes.

As a drive-by fix, this patch also ensures that value-initialised unions
are considered to have activated their initial member for the purpose of
checking stores, which catches some additional mistakes pre-C++20.

PR c++/101631

gcc/cp/ChangeLog:

* call.cc (build_over_call): Fold more indirect refs for trivial
assignment op.
* class.cc (type_has_non_deleted_trivial_default_ctor): Create.
* constexpr.cc (cxx_eval_call_expression): Start lifetime of
union member before entering constructor.
(cxx_eval_store_expression): Activate member for
value-initialised union. Check for accessing inactive union
member indirectly.
* cp-tree.h (type_has_non_deleted_trivial_default_ctor):
Forward declare.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-89336-3.C: Fix union initialisation.
* g++.dg/cpp1y/constexpr-union6.C: New test.
* g++.dg/cpp2a/constexpr-union2.C: New test.
* g++.dg/cpp2a/constexpr-union3.C: New test.
* g++.dg/cpp2a/constexpr-union4.C: New test.
* g++.dg/cpp2a/constexpr-union5.C: New test.

Signed-off-by: Nathaniel Shead 
---
  gcc/cp/call.cc|  11 +-
  gcc/cp/class.cc   |   8 ++
  gcc/cp/constexpr.cc   | 129 +-
  gcc/cp/cp-tree.h  |   1 +
  .../g++.dg/cpp1y/constexpr-89336-3.C  |   2 +-
  gcc/testsuite/g++.dg/cpp1y/constexpr-union6.C |  13 ++
  gcc/testsuite/g++.dg/cpp2a/constexpr-union2.C |  30 
  gcc/testsuite/g++.dg/cpp2a/constexpr-union3.C |  45 ++
  gcc/testsuite/g++.dg/cpp2a/constexpr-union4.C |  29 
  gcc/testsuite/g++.dg/cpp2a/constexpr-union5.C |  71 ++
  10 files changed, 296 insertions(+), 43 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-union6.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union2.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union3.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union4.C
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-union5.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index e8dafbd8ba6..c1fb8807d3f 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -10330,10 +10330,7 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
   && DECL_OVERLOADED_OPERATOR_IS (fn, NOP_EXPR)
   && trivial_fn_p (fn))
  {
-  /* Don't use cp_build_fold_indirect_ref, op= returns an lvalue even if
-the object argument isn't one.  */
-  tree to = cp_build_indirect_ref (input_location, argarray[0],
-  RO_ARROW, complain);
+  tree to = cp_build_fold_indirect_ref (argarray[0]);
tree type = TREE_TYPE (to);
tree as_base = CLASSTYPE_AS_BASE (type);
tree arg = argarray[1];
@@ -10341,7 +10338,11 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
  
if (is_really_empty_class (type, /*ignore_vptr*/true))

{
- /* Avoid copying empty classes.  */
+ /* Avoid copying empty classes, but ensure op= returns an lvalue even
+if the object argument isn't one. This isn't needed in other cases
+since MODIFY_EXPR is always considered an lvalue.  */
+ to = cp_build_addr_expr (to, tf_none);
+ to = cp_build_indirect_ref (input_location, to, RO_ARROW, complain);
  val = build2 (COMPOUND_EXPR, type, arg, to);
  suppress_warning (val, OPT_Wunused);
}
diff --git a/gcc/cp/class.cc b/gcc/cp/class.cc
index b71333af1f8..e31aeb8e68b 100644
--- a/gcc/cp/class.cc
+++ b/gcc/cp/class.cc
@@ -5688,6 +5688,14 @@ type_has_virtual_destructor (tree type)
return (dtor && DECL_VIRTUAL_P (dtor));
  }
  
+/* True iff class TYPE has a non-deleted trivial default

+   constructor.  */
+
+bool type_has_non_de

[pushed] c++: unroll pragma in templates [PR111529]

2023-09-22 Thread Jason Merrill
Tested x86_64-pc-linux-gnu, applying to trunk.

-- 8< --

We were failing to handle ANNOTATE_EXPR in tsubst_copy_and_build, leading to
problems with substitution of any wrapped expressions.

Let's also not tell users that lambda templates are available in C++14.

PR c++/111529

gcc/cp/ChangeLog:

* parser.cc (cp_parser_lambda_declarator_opt): Don't suggest
-std=c++14 for lambda templates.
* pt.cc (tsubst_expr): Move ANNOTATE_EXPR handling...
(tsubst_copy_and_build): ...here.

gcc/testsuite/ChangeLog:

* g++.dg/ext/unroll-4.C: New test.
---
 gcc/cp/parser.cc|  7 ++-
 gcc/cp/pt.cc| 14 +++---
 gcc/testsuite/g++.dg/ext/unroll-4.C | 16 
 3 files changed, 25 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ext/unroll-4.C

diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
index 0e1cbbfe051..f3abae716fe 100644
--- a/gcc/cp/parser.cc
+++ b/gcc/cp/parser.cc
@@ -11695,11 +11695,8 @@ cp_parser_lambda_declarator_opt (cp_parser* parser, 
tree lambda_expr)
  an opening angle if present.  */
   if (cp_lexer_next_token_is (parser->lexer, CPP_LESS))
 {
-  if (cxx_dialect < cxx14)
-   pedwarn (parser->lexer->next_token->location, OPT_Wc__14_extensions,
-"lambda templates are only available with "
-"%<-std=c++14%> or %<-std=gnu++14%>");
-  else if (pedantic && cxx_dialect < cxx20)
+  if (cxx_dialect < cxx20
+ && (pedantic || cxx_dialect < cxx14))
pedwarn (parser->lexer->next_token->location, OPT_Wc__20_extensions,
 "lambda templates are only available with "
 "%<-std=c++20%> or %<-std=gnu++20%>");
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 9b100e12a23..ea5379098a5 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -19913,13 +19913,6 @@ tsubst_expr (tree t, tree args, tsubst_flags_t 
complain, tree in_decl)
templated_operator_saved_lookups (t),
complain));
 
-case ANNOTATE_EXPR:
-  tmp = RECUR (TREE_OPERAND (t, 0));
-  RETURN (build3_loc (EXPR_LOCATION (t), ANNOTATE_EXPR,
- TREE_TYPE (tmp), tmp,
- RECUR (TREE_OPERAND (t, 1)),
- RECUR (TREE_OPERAND (t, 2;
-
 case PREDICT_EXPR:
   RETURN (add_stmt (copy_node (t)));
 
@@ -21868,6 +21861,13 @@ tsubst_copy_and_build (tree t,
RETURN (op);
   }
 
+case ANNOTATE_EXPR:
+  op1 = RECUR (TREE_OPERAND (t, 0));
+  RETURN (build3_loc (EXPR_LOCATION (t), ANNOTATE_EXPR,
+ TREE_TYPE (op1), op1,
+ RECUR (TREE_OPERAND (t, 1)),
+ RECUR (TREE_OPERAND (t, 2;
+
 default:
   /* Handle Objective-C++ constructs, if appropriate.  */
   {
diff --git a/gcc/testsuite/g++.dg/ext/unroll-4.C 
b/gcc/testsuite/g++.dg/ext/unroll-4.C
new file mode 100644
index 000..d488aca974e
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/unroll-4.C
@@ -0,0 +1,16 @@
+// PR c++/111529
+// { dg-do compile { target c++11 } }
+// { dg-additional-options -Wno-c++20-extensions }
+
+template 
+void f() {
+  []() {
+#pragma GCC unroll 9
+for (int i = 1; i; --i) {
+}
+  };
+}
+
+int main() {
+  f<0>();
+}

base-commit: 4c496020764057453415f1ae599950724ec0e871
-- 
2.39.3



RE: [PATCH v2] RISC-V: Refine the code gen for ceil auto vectorization.

2023-09-22 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, September 22, 2023 8:19 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v2] RISC-V: Refine the code gen for ceil auto vectorization.

LGTM.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-09-22 20:16
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v2] RISC-V: Refine the code gen for ceil auto vectorization.
From: Pan Li mailto:pan2...@intel.com>>

We vectorized below ceil code already.

void
test_ceil (float *out, float *in, int count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_ceilf (in[i]);
}

Before this patch:
vfmv.v.xv4,fa0 // can be removed
vfabs.v v0,v1
vmv1r.v v2,v1  // can be removed
vmflt.vvv0,v0,v4   // can be refined to vmflt.vf
vfcvt.x.f.v v3,v1,v0.t
vfcvt.f.x.v v2,v3,v0.t
vfsgnj.vv   v2,v2,v1

After this patch:
vfabs.v v1,v2
vmflt.vfv0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv   v1,v1,v2

We can generate better code include below items.

* Remove vfmv.v.f.
* Take vmflt.vf instead of vmflt.vv.
* Remove vmv1r.v.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_float_cmp_mask): Refactor.
(emit_vec_float_cmp_mask): Rename.
(expand_vec_copysign): Ditto.
(emit_vec_copysign): Ditto.
(emit_vec_abs): New function impl.
(emit_vec_cvt_x_f): Ditto.
(emit_vec_cvt_f_x): Ditto.
(expand_vec_ceil): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Adjust body check.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/riscv-v.cc   | 81 ---
.../riscv/rvv/autovec/unop/math-ceil-0.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-1.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-2.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-3.c  |  5 +-
5 files changed, 54 insertions(+), 47 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 4d0e1d8d1a9..251d827d973 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3557,36 +3557,27 @@ gen_ceil_const_fp (machine_mode inner_mode)
}
static rtx
-expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
-machine_mode vec_fp_mode)
+emit_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
+ machine_mode vec_fp_mode)
{
-  /* Step-1: Get the abs float value for mask generation.  */
-  rtx tmp = gen_reg_rtx (vec_fp_mode);
-  rtx abs_ops[] = {tmp, fp_vector};
-  insn_code icode = code_for_pred (ABS, vec_fp_mode);
-  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
-
-  /* Step-2: Prepare the scalar float compare register.  */
+  /* Step-1: Prepare the scalar float compare register.  */
   rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode));
   emit_insn (gen_move_insn (fp_reg, fp_scalar));
-  /* Step-3: Prepare the vector float compare register.  */
-  rtx vec_dup = gen_reg_rtx (vec_fp_mode);
-  icode = code_for_pred_broadcast (vec_fp_mode);
-  rtx vfmv_ops[] = {vec_dup, fp_reg};
-  emit_vlmax_insn (icode, UNARY_OP, vfmv_ops);
-
-  /* Step-4: Generate the mask.  */
+  /* Step-2: Generate the mask.  */
   machine_mode mask_mode = get_mask_mode (vec_fp_mode);
   rtx mask = gen_reg_rtx (mask_mode);
-  expand_vec_cmp (mask, code, tmp, vec_dup);
+  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, fp_vector, fp_reg);
+  rtx cmp_ops[] = {mask, cmp, fp_vector, fp_reg};
+  insn_code icode = code_for_pred_cmp_scalar (vec_fp_mode);
+  emit_vlmax_insn (icode, COMPARE_OP, cmp_ops);
   return mask;
}
static void
-expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1,
-  machine_mode vec_mode)
+emit_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1,
+machine_mode vec_mode)
{
   rtx sgnj_ops[] = {op_dest, op_src_0, op_src_1};
   insn_code icode = code_for_pred (UNSPEC_VCOPYSIGN, vec_mode);
@@ -3594,30 +3585,58 @@ expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx 
op_src_1,
   emit_vlmax_insn (icode, BINARY_OP, sgnj_ops);
}
+static void
+emit_vec_abs (rtx op_dest, rtx op_src, machine_mode vec_mode)
+{
+  rtx abs_ops[] = {op_dest, op_src};
+  insn_code icode = code_for_pred (ABS, vec_mode);
+
+  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
+}
+
+static void
+emit_vec_cvt_x_f (rtx op_dest, rtx op_src, rtx mask,
+   insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
+
+static void
+emit_vec_cvt_f_x

Re: [PATCH v2] RISC-V: Refine the code gen for ceil auto vectorization.

2023-09-22 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-22 20:16
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v2] RISC-V: Refine the code gen for ceil auto vectorization.
From: Pan Li 
 
We vectorized below ceil code already.
 
void
test_ceil (float *out, float *in, int count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_ceilf (in[i]);
}
 
Before this patch:
vfmv.v.xv4,fa0 // can be removed
vfabs.v v0,v1
vmv1r.v v2,v1  // can be removed
vmflt.vvv0,v0,v4   // can be refined to vmflt.vf
vfcvt.x.f.v v3,v1,v0.t
vfcvt.f.x.v v2,v3,v0.t
vfsgnj.vv   v2,v2,v1
 
After this patch:
vfabs.v v1,v2
vmflt.vfv0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv   v1,v1,v2
 
We can generate better code include below items.
 
* Remove vfmv.v.f.
* Take vmflt.vf instead of vmflt.vv.
* Remove vmv1r.v.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_vec_float_cmp_mask): Refactor.
(emit_vec_float_cmp_mask): Rename.
(expand_vec_copysign): Ditto.
(emit_vec_copysign): Ditto.
(emit_vec_abs): New function impl.
(emit_vec_cvt_x_f): Ditto.
(emit_vec_cvt_f_x): Ditto.
(expand_vec_ceil): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Adjust body check.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-v.cc   | 81 ---
.../riscv/rvv/autovec/unop/math-ceil-0.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-1.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-2.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-3.c  |  5 +-
5 files changed, 54 insertions(+), 47 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 4d0e1d8d1a9..251d827d973 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3557,36 +3557,27 @@ gen_ceil_const_fp (machine_mode inner_mode)
}
static rtx
-expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
-machine_mode vec_fp_mode)
+emit_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
+ machine_mode vec_fp_mode)
{
-  /* Step-1: Get the abs float value for mask generation.  */
-  rtx tmp = gen_reg_rtx (vec_fp_mode);
-  rtx abs_ops[] = {tmp, fp_vector};
-  insn_code icode = code_for_pred (ABS, vec_fp_mode);
-  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
-
-  /* Step-2: Prepare the scalar float compare register.  */
+  /* Step-1: Prepare the scalar float compare register.  */
   rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode));
   emit_insn (gen_move_insn (fp_reg, fp_scalar));
-  /* Step-3: Prepare the vector float compare register.  */
-  rtx vec_dup = gen_reg_rtx (vec_fp_mode);
-  icode = code_for_pred_broadcast (vec_fp_mode);
-  rtx vfmv_ops[] = {vec_dup, fp_reg};
-  emit_vlmax_insn (icode, UNARY_OP, vfmv_ops);
-
-  /* Step-4: Generate the mask.  */
+  /* Step-2: Generate the mask.  */
   machine_mode mask_mode = get_mask_mode (vec_fp_mode);
   rtx mask = gen_reg_rtx (mask_mode);
-  expand_vec_cmp (mask, code, tmp, vec_dup);
+  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, fp_vector, fp_reg);
+  rtx cmp_ops[] = {mask, cmp, fp_vector, fp_reg};
+  insn_code icode = code_for_pred_cmp_scalar (vec_fp_mode);
+  emit_vlmax_insn (icode, COMPARE_OP, cmp_ops);
   return mask;
}
static void
-expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1,
-  machine_mode vec_mode)
+emit_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1,
+machine_mode vec_mode)
{
   rtx sgnj_ops[] = {op_dest, op_src_0, op_src_1};
   insn_code icode = code_for_pred (UNSPEC_VCOPYSIGN, vec_mode);
@@ -3594,30 +3585,58 @@ expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx 
op_src_1,
   emit_vlmax_insn (icode, BINARY_OP, sgnj_ops);
}
+static void
+emit_vec_abs (rtx op_dest, rtx op_src, machine_mode vec_mode)
+{
+  rtx abs_ops[] = {op_dest, op_src};
+  insn_code icode = code_for_pred (ABS, vec_mode);
+
+  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
+}
+
+static void
+emit_vec_cvt_x_f (rtx op_dest, rtx op_src, rtx mask,
+   insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
+
+static void
+emit_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask,
+   insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_fp_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred (FLOAT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_fp_ops);
+}
+
void
expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
machine_mode vec_int_mode)
{
-  /* Step-1: Generate the mask on const fp.  */
+  /* Step-1: Get the abs float value for mask generation.  */
+  emit_vec_abs (op_0, op_1, vec_fp_mode);
+
+  /* Step-2: Generate the mask

[PATCH v2] RISC-V: Refine the code gen for ceil auto vectorization.

2023-09-22 Thread pan2 . li
From: Pan Li 

We vectorized below ceil code already.

void
test_ceil (float *out, float *in, int count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_ceilf (in[i]);
}

Before this patch:
vfmv.v.xv4,fa0 // can be removed
vfabs.v v0,v1
vmv1r.v v2,v1  // can be removed
vmflt.vvv0,v0,v4   // can be refined to vmflt.vf
vfcvt.x.f.v v3,v1,v0.t
vfcvt.f.x.v v2,v3,v0.t
vfsgnj.vv   v2,v2,v1

After this patch:
vfabs.v v1,v2
vmflt.vfv0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv   v1,v1,v2

We can generate better code include below items.

* Remove vfmv.v.f.
* Take vmflt.vf instead of vmflt.vv.
* Remove vmv1r.v.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_float_cmp_mask): Refactor.
(emit_vec_float_cmp_mask): Rename.
(expand_vec_copysign): Ditto.
(emit_vec_copysign): Ditto.
(emit_vec_abs): New function impl.
(emit_vec_cvt_x_f): Ditto.
(emit_vec_cvt_f_x): Ditto.
(expand_vec_ceil): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Adjust body check.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-v.cc   | 81 ---
 .../riscv/rvv/autovec/unop/math-ceil-0.c  |  5 +-
 .../riscv/rvv/autovec/unop/math-ceil-1.c  |  5 +-
 .../riscv/rvv/autovec/unop/math-ceil-2.c  |  5 +-
 .../riscv/rvv/autovec/unop/math-ceil-3.c  |  5 +-
 5 files changed, 54 insertions(+), 47 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 4d0e1d8d1a9..251d827d973 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3557,36 +3557,27 @@ gen_ceil_const_fp (machine_mode inner_mode)
 }
 
 static rtx
-expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
-  machine_mode vec_fp_mode)
+emit_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
+machine_mode vec_fp_mode)
 {
-  /* Step-1: Get the abs float value for mask generation.  */
-  rtx tmp = gen_reg_rtx (vec_fp_mode);
-  rtx abs_ops[] = {tmp, fp_vector};
-  insn_code icode = code_for_pred (ABS, vec_fp_mode);
-  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
-
-  /* Step-2: Prepare the scalar float compare register.  */
+  /* Step-1: Prepare the scalar float compare register.  */
   rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode));
   emit_insn (gen_move_insn (fp_reg, fp_scalar));
 
-  /* Step-3: Prepare the vector float compare register.  */
-  rtx vec_dup = gen_reg_rtx (vec_fp_mode);
-  icode = code_for_pred_broadcast (vec_fp_mode);
-  rtx vfmv_ops[] = {vec_dup, fp_reg};
-  emit_vlmax_insn (icode, UNARY_OP, vfmv_ops);
-
-  /* Step-4: Generate the mask.  */
+  /* Step-2: Generate the mask.  */
   machine_mode mask_mode = get_mask_mode (vec_fp_mode);
   rtx mask = gen_reg_rtx (mask_mode);
-  expand_vec_cmp (mask, code, tmp, vec_dup);
+  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, fp_vector, fp_reg);
+  rtx cmp_ops[] = {mask, cmp, fp_vector, fp_reg};
+  insn_code icode = code_for_pred_cmp_scalar (vec_fp_mode);
+  emit_vlmax_insn (icode, COMPARE_OP, cmp_ops);
 
   return mask;
 }
 
 static void
-expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1,
-machine_mode vec_mode)
+emit_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1,
+  machine_mode vec_mode)
 {
   rtx sgnj_ops[] = {op_dest, op_src_0, op_src_1};
   insn_code icode = code_for_pred (UNSPEC_VCOPYSIGN, vec_mode);
@@ -3594,30 +3585,58 @@ expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx 
op_src_1,
   emit_vlmax_insn (icode, BINARY_OP, sgnj_ops);
 }
 
+static void
+emit_vec_abs (rtx op_dest, rtx op_src, machine_mode vec_mode)
+{
+  rtx abs_ops[] = {op_dest, op_src};
+  insn_code icode = code_for_pred (ABS, vec_mode);
+
+  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
+}
+
+static void
+emit_vec_cvt_x_f (rtx op_dest, rtx op_src, rtx mask,
+ insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
+
+static void
+emit_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask,
+ insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_fp_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred (FLOAT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_fp_ops);
+}
+
 void
 expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 machine_mode vec_int_mode)
 {
-  /* Step-1: Generate the mask on const fp.  */
+  /* Step-1: Get the abs float value for mask generation.  */
+  emit_vec_abs (op_0, op_1, vec_fp_mode);
+
+  /* Step-2: Gen

RE: [PATCH v1] RISC-V: Refine the code gen for ceil auto vectorization.

2023-09-22 Thread Li, Pan2
Sure thing, will send V2 for this.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, September 22, 2023 7:26 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Refine the code gen for ceil auto vectorization.

I prefer change expand_vec_copysign into emit_vec_copysign。

Likewise, emit_fabs. ...etc.


juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-09-22 19:19
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Refine the code gen for ceil auto vectorization.
From: Pan Li mailto:pan2...@intel.com>>

We vectorized below ceil code already.

void
test_ceil (float *out, float *in, int count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_ceilf (in[i]);
}

Before this patch:
vfmv.v.xv4,fa0 // can be removed
vfabs.v v0,v1
vmv1r.v v2,v1  // can be removed
vmflt.vvv0,v0,v4   // can be refined to vmflt.vf
vfcvt.x.f.v v3,v1,v0.t
vfcvt.f.x.v v2,v3,v0.t
vfsgnj.vv   v2,v2,v1

After this patch:
vfabs.v v1,v2
vmflt.vfv0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv   v1,v1,v2

We can generate better code include below items.

* Remove vfmv.v.f.
* Take vmflt.vf instead of vmflt.vv.
* Remove vmv1r.v.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_float_cmp_mask): Refactor.
(expand_vec_abs): New function impl.
(expand_vec_cvt_x_f): Ditto.
(expand_vec_cvt_f_x): Ditto.
(expand_vec_ceil): Refine.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Adjust body check.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
gcc/config/riscv/riscv-v.cc   | 71 ---
.../riscv/rvv/autovec/unop/math-ceil-0.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-1.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-2.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-3.c  |  5 +-
5 files changed, 49 insertions(+), 42 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 4d0e1d8d1a9..ea2b01f6a6e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3560,26 +3560,17 @@ static rtx
expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
   machine_mode vec_fp_mode)
{
-  /* Step-1: Get the abs float value for mask generation.  */
-  rtx tmp = gen_reg_rtx (vec_fp_mode);
-  rtx abs_ops[] = {tmp, fp_vector};
-  insn_code icode = code_for_pred (ABS, vec_fp_mode);
-  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
-
-  /* Step-2: Prepare the scalar float compare register.  */
+  /* Step-1: Prepare the scalar float compare register.  */
   rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode));
   emit_insn (gen_move_insn (fp_reg, fp_scalar));
-  /* Step-3: Prepare the vector float compare register.  */
-  rtx vec_dup = gen_reg_rtx (vec_fp_mode);
-  icode = code_for_pred_broadcast (vec_fp_mode);
-  rtx vfmv_ops[] = {vec_dup, fp_reg};
-  emit_vlmax_insn (icode, UNARY_OP, vfmv_ops);
-
-  /* Step-4: Generate the mask.  */
+  /* Step-2: Generate the mask.  */
   machine_mode mask_mode = get_mask_mode (vec_fp_mode);
   rtx mask = gen_reg_rtx (mask_mode);
-  expand_vec_cmp (mask, code, tmp, vec_dup);
+  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, fp_vector, fp_reg);
+  rtx cmp_ops[] = {mask, cmp, fp_vector, fp_reg};
+  insn_code icode = code_for_pred_cmp_scalar (vec_fp_mode);
+  emit_vlmax_insn (icode, COMPARE_OP, cmp_ops);
   return mask;
}
@@ -3594,29 +3585,57 @@ expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx 
op_src_1,
   emit_vlmax_insn (icode, BINARY_OP, sgnj_ops);
}
+static void
+expand_vec_abs (rtx op_dest, rtx op_src, machine_mode vec_mode)
+{
+  rtx abs_ops[] = {op_dest, op_src};
+  insn_code icode = code_for_pred (ABS, vec_mode);
+
+  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
+}
+
+static void
+expand_vec_cvt_x_f (rtx op_dest, rtx op_src, rtx mask,
+ insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
+
+static void
+expand_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask,
+ insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_fp_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred (FLOAT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_fp_ops);
+}
+
void
expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
machine_mode vec_int_mode)
{
-  /* Step-1: Generate the mask on const fp.  */
+  /* Step-1: Get the abs float value for mask gen

Re: [PATCH v2 1/2] c++: Initial support for P0847R7 (Deducing This) [PR102609]

2023-09-22 Thread Jason Merrill

On 9/21/23 07:28, waffl3x wrote:

This seems like a reasonable place for it since 'this' is supposed to
precede the decl-specifiers, and since we are parsing initial attributes
here rather than in the caller. You will want to give an error if
found_decl_spec is set. And elsewhere complain about 'this' on
parameters after the first (in cp_parser_parameter_declaration_list?),
or in a non-member/lambda (in grokdeclarator?).


Bringing this back up, I recalled another detail regarding this.

I'm pretty sure that found_decl_spec can be false when parsing the
second or latter decl-specifier. I tested it quickly and I believe I am
correct. I raise this as my diagnostics patch introduces another
variable to track whether we are on the first decl-specifier, given the
results of my quick test, I believe that was the correct choice.


Makes sense.


This kinda unclear machinery is what makes me really want to refactor
this code, but I've resisted as it would be inappropriate to try to do
so while implementing a feature. Once I am finished implementing
`deducing this` would you be open to me refactoring grokdeclarator and
it's various auxiliary functions?


Yes, but I'll warn you that grokdeclarator has resisted refactoring for 
a long time...


Jason



Re: [PATCH v1] RISC-V: Refine the code gen for ceil auto vectorization.

2023-09-22 Thread juzhe.zh...@rivai.ai
I prefer change expand_vec_copysign into emit_vec_copysign。

Likewise, emit_fabs. ...etc.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-22 19:19
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Refine the code gen for ceil auto vectorization.
From: Pan Li 
 
We vectorized below ceil code already.
 
void
test_ceil (float *out, float *in, int count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_ceilf (in[i]);
}
 
Before this patch:
vfmv.v.xv4,fa0 // can be removed
vfabs.v v0,v1
vmv1r.v v2,v1  // can be removed
vmflt.vvv0,v0,v4   // can be refined to vmflt.vf
vfcvt.x.f.v v3,v1,v0.t
vfcvt.f.x.v v2,v3,v0.t
vfsgnj.vv   v2,v2,v1
 
After this patch:
vfabs.v v1,v2
vmflt.vfv0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv   v1,v1,v2
 
We can generate better code include below items.
 
* Remove vfmv.v.f.
* Take vmflt.vf instead of vmflt.vv.
* Remove vmv1r.v.
 
gcc/ChangeLog:
 
* config/riscv/riscv-v.cc (expand_vec_float_cmp_mask): Refactor.
(expand_vec_abs): New function impl.
(expand_vec_cvt_x_f): Ditto.
(expand_vec_cvt_f_x): Ditto.
(expand_vec_ceil): Refine.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Adjust body check.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-v.cc   | 71 ---
.../riscv/rvv/autovec/unop/math-ceil-0.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-1.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-2.c  |  5 +-
.../riscv/rvv/autovec/unop/math-ceil-3.c  |  5 +-
5 files changed, 49 insertions(+), 42 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 4d0e1d8d1a9..ea2b01f6a6e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3560,26 +3560,17 @@ static rtx
expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
   machine_mode vec_fp_mode)
{
-  /* Step-1: Get the abs float value for mask generation.  */
-  rtx tmp = gen_reg_rtx (vec_fp_mode);
-  rtx abs_ops[] = {tmp, fp_vector};
-  insn_code icode = code_for_pred (ABS, vec_fp_mode);
-  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
-
-  /* Step-2: Prepare the scalar float compare register.  */
+  /* Step-1: Prepare the scalar float compare register.  */
   rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode));
   emit_insn (gen_move_insn (fp_reg, fp_scalar));
-  /* Step-3: Prepare the vector float compare register.  */
-  rtx vec_dup = gen_reg_rtx (vec_fp_mode);
-  icode = code_for_pred_broadcast (vec_fp_mode);
-  rtx vfmv_ops[] = {vec_dup, fp_reg};
-  emit_vlmax_insn (icode, UNARY_OP, vfmv_ops);
-
-  /* Step-4: Generate the mask.  */
+  /* Step-2: Generate the mask.  */
   machine_mode mask_mode = get_mask_mode (vec_fp_mode);
   rtx mask = gen_reg_rtx (mask_mode);
-  expand_vec_cmp (mask, code, tmp, vec_dup);
+  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, fp_vector, fp_reg);
+  rtx cmp_ops[] = {mask, cmp, fp_vector, fp_reg};
+  insn_code icode = code_for_pred_cmp_scalar (vec_fp_mode);
+  emit_vlmax_insn (icode, COMPARE_OP, cmp_ops);
   return mask;
}
@@ -3594,29 +3585,57 @@ expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx 
op_src_1,
   emit_vlmax_insn (icode, BINARY_OP, sgnj_ops);
}
+static void
+expand_vec_abs (rtx op_dest, rtx op_src, machine_mode vec_mode)
+{
+  rtx abs_ops[] = {op_dest, op_src};
+  insn_code icode = code_for_pred (ABS, vec_mode);
+
+  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
+}
+
+static void
+expand_vec_cvt_x_f (rtx op_dest, rtx op_src, rtx mask,
+ insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
+
+static void
+expand_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask,
+ insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_fp_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred (FLOAT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_fp_ops);
+}
+
void
expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
machine_mode vec_int_mode)
{
-  /* Step-1: Generate the mask on const fp.  */
+  /* Step-1: Get the abs float value for mask generation.  */
+  expand_vec_abs (op_0, op_1, vec_fp_mode);
+
+  /* Step-2: Generate the mask on const fp.  */
   rtx const_fp = gen_ceil_const_fp (GET_MODE_INNER (vec_fp_mode));
-  rtx mask = expand_vec_float_cmp_mask (op_1, LT, const_fp, vec_fp_mode);
+  rtx mask = expand_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode);
-  /* Step-2: Convert to integer on mask, with rounding up (aka ceil).  */
+  /* Step-3: Convert to integer on mask, with rounding up (aka ceil).  */
   rtx tmp = gen_reg_rtx (vec_int_mode);
-  rtx cvt_x_ops[] = {t

[PATCH v1] RISC-V: Refine the code gen for ceil auto vectorization.

2023-09-22 Thread pan2 . li
From: Pan Li 

We vectorized below ceil code already.

void
test_ceil (float *out, float *in, int count)
{
  for (unsigned i = 0; i < count; i++)
out[i] = __builtin_ceilf (in[i]);
}

Before this patch:
vfmv.v.xv4,fa0 // can be removed
vfabs.v v0,v1
vmv1r.v v2,v1  // can be removed
vmflt.vvv0,v0,v4   // can be refined to vmflt.vf
vfcvt.x.f.v v3,v1,v0.t
vfcvt.f.x.v v2,v3,v0.t
vfsgnj.vv   v2,v2,v1

After this patch:
vfabs.v v1,v2
vmflt.vfv0,v1,fa5
vfcvt.x.f.v v3,v2,v0.t
vfcvt.f.x.v v1,v3,v0.t
vfsgnj.vv   v1,v1,v2

We can generate better code include below items.

* Remove vfmv.v.f.
* Take vmflt.vf instead of vmflt.vv.
* Remove vmv1r.v.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vec_float_cmp_mask): Refactor.
(expand_vec_abs): New function impl.
(expand_vec_cvt_x_f): Ditto.
(expand_vec_cvt_f_x): Ditto.
(expand_vec_ceil): Refine.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Adjust body check.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto.

Signed-off-by: Pan Li 
---
 gcc/config/riscv/riscv-v.cc   | 71 ---
 .../riscv/rvv/autovec/unop/math-ceil-0.c  |  5 +-
 .../riscv/rvv/autovec/unop/math-ceil-1.c  |  5 +-
 .../riscv/rvv/autovec/unop/math-ceil-2.c  |  5 +-
 .../riscv/rvv/autovec/unop/math-ceil-3.c  |  5 +-
 5 files changed, 49 insertions(+), 42 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 4d0e1d8d1a9..ea2b01f6a6e 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3560,26 +3560,17 @@ static rtx
 expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar,
   machine_mode vec_fp_mode)
 {
-  /* Step-1: Get the abs float value for mask generation.  */
-  rtx tmp = gen_reg_rtx (vec_fp_mode);
-  rtx abs_ops[] = {tmp, fp_vector};
-  insn_code icode = code_for_pred (ABS, vec_fp_mode);
-  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
-
-  /* Step-2: Prepare the scalar float compare register.  */
+  /* Step-1: Prepare the scalar float compare register.  */
   rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode));
   emit_insn (gen_move_insn (fp_reg, fp_scalar));
 
-  /* Step-3: Prepare the vector float compare register.  */
-  rtx vec_dup = gen_reg_rtx (vec_fp_mode);
-  icode = code_for_pred_broadcast (vec_fp_mode);
-  rtx vfmv_ops[] = {vec_dup, fp_reg};
-  emit_vlmax_insn (icode, UNARY_OP, vfmv_ops);
-
-  /* Step-4: Generate the mask.  */
+  /* Step-2: Generate the mask.  */
   machine_mode mask_mode = get_mask_mode (vec_fp_mode);
   rtx mask = gen_reg_rtx (mask_mode);
-  expand_vec_cmp (mask, code, tmp, vec_dup);
+  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, fp_vector, fp_reg);
+  rtx cmp_ops[] = {mask, cmp, fp_vector, fp_reg};
+  insn_code icode = code_for_pred_cmp_scalar (vec_fp_mode);
+  emit_vlmax_insn (icode, COMPARE_OP, cmp_ops);
 
   return mask;
 }
@@ -3594,29 +3585,57 @@ expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx 
op_src_1,
   emit_vlmax_insn (icode, BINARY_OP, sgnj_ops);
 }
 
+static void
+expand_vec_abs (rtx op_dest, rtx op_src, machine_mode vec_mode)
+{
+  rtx abs_ops[] = {op_dest, op_src};
+  insn_code icode = code_for_pred (ABS, vec_mode);
+
+  emit_vlmax_insn (icode, UNARY_OP, abs_ops);
+}
+
+static void
+expand_vec_cvt_x_f (rtx op_dest, rtx op_src, rtx mask,
+   insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_x_ops);
+}
+
+static void
+expand_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask,
+   insn_type type, machine_mode vec_mode)
+{
+  rtx cvt_fp_ops[] = {op_dest, mask, op_dest, op_src};
+  insn_code icode = code_for_pred (FLOAT, vec_mode);
+
+  emit_vlmax_insn (icode, type, cvt_fp_ops);
+}
+
 void
 expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode,
 machine_mode vec_int_mode)
 {
-  /* Step-1: Generate the mask on const fp.  */
+  /* Step-1: Get the abs float value for mask generation.  */
+  expand_vec_abs (op_0, op_1, vec_fp_mode);
+
+  /* Step-2: Generate the mask on const fp.  */
   rtx const_fp = gen_ceil_const_fp (GET_MODE_INNER (vec_fp_mode));
-  rtx mask = expand_vec_float_cmp_mask (op_1, LT, const_fp, vec_fp_mode);
+  rtx mask = expand_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode);
 
-  /* Step-2: Convert to integer on mask, with rounding up (aka ceil).  */
+  /* Step-3: Convert to integer on mask, with rounding up (aka ceil).  */
   rtx tmp = gen_reg_rtx (vec_int_mode);
-  rtx cvt_x_ops[] = {tmp, mask, tmp, op_1};
-  insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_fp_mode);
-  emit_vlmax_insn (icode, UNARY_OP_TAMU_FRM_RUP, cvt

[PATCH 09/13] [APX EGPR] Handle legacy insn that only support GPR16 (1/5)

2023-09-22 Thread Hongyu Wang
From: Kong Lingling 

These legacy insn in opcode map0/1 only support GPR16,
and do not have vex/evex counterpart, directly adjust constraints and
add gpr32 attr to patterns.

insn list:
1. xsave/xsave64, xrstor/xrstor64
2. xsaves/xsaves64, xrstors/xrstors64
3. xsavec/xsavec64
4. xsaveopt/xsaveopt64
5. fxsave64/fxrstor64

gcc/ChangeLog:

* config/i386/i386.md (): Set attr gpr32 0 and constraint
jm.
(_rex64): Likewise.
(_rex64): Likewise.
(64): Likewise.
(fxsave64): Likewise.
(fxstore64): Likewise.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Add apxf check.
* gcc.target/i386/apx-legacy-insn-check-norex2.c: New test.
* gcc.target/i386/apx-legacy-insn-check-norex2-asm.c: New assembler 
test.

Co-authored-by: Hongyu Wang 
Co-authored-by: Hongtao Liu 
---
 gcc/config/i386/i386.md   | 18 +++
 .../i386/apx-legacy-insn-check-norex2-asm.c   |  5 
 .../i386/apx-legacy-insn-check-norex2.c   | 30 +++
 gcc/testsuite/lib/target-supports.exp | 10 +++
 4 files changed, 57 insertions(+), 6 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index b9eaea78f00..6cf86b798a8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -25626,11 +25626,12 @@ (define_insn "fxsave"
 (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "fxsave64"
-  [(set (match_operand:BLK 0 "memory_operand" "=m")
+  [(set (match_operand:BLK 0 "memory_operand" "=jm")
(unspec_volatile:BLK [(const_int 0)] UNSPECV_FXSAVE64))]
   "TARGET_64BIT && TARGET_FXSR"
   "fxsave64\t%0"
   [(set_attr "type" "other")
+   (set_attr "gpr32" "0")
(set_attr "memory" "store")
(set (attr "length")
 (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
@@ -25646,11 +25647,12 @@ (define_insn "fxrstor"
 (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "fxrstor64"
-  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "m")]
+  [(unspec_volatile [(match_operand:BLK 0 "memory_operand" "jm")]
UNSPECV_FXRSTOR64)]
   "TARGET_64BIT && TARGET_FXSR"
   "fxrstor64\t%0"
   [(set_attr "type" "other")
+   (set_attr "gpr32" "0")
(set_attr "memory" "load")
(set (attr "length")
 (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
@@ -25704,7 +25706,7 @@ (define_insn ""
 (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "_rex64"
-  [(set (match_operand:BLK 0 "memory_operand" "=m")
+  [(set (match_operand:BLK 0 "memory_operand" "=jm")
(unspec_volatile:BLK
 [(match_operand:SI 1 "register_operand" "a")
  (match_operand:SI 2 "register_operand" "d")]
@@ -25713,11 +25715,12 @@ (define_insn "_rex64"
   "\t%0"
   [(set_attr "type" "other")
(set_attr "memory" "store")
+   (set_attr "gpr32" "0")
(set (attr "length")
 (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn ""
-  [(set (match_operand:BLK 0 "memory_operand" "=m")
+  [(set (match_operand:BLK 0 "memory_operand" "=jm")
(unspec_volatile:BLK
 [(match_operand:SI 1 "register_operand" "a")
  (match_operand:SI 2 "register_operand" "d")]
@@ -25726,6 +25729,7 @@ (define_insn ""
   "\t%0"
   [(set_attr "type" "other")
(set_attr "memory" "store")
+   (set_attr "gpr32" "0")
(set (attr "length")
 (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
 
@@ -25743,7 +25747,7 @@ (define_insn ""
 
 (define_insn "_rex64"
[(unspec_volatile:BLK
- [(match_operand:BLK 0 "memory_operand" "m")
+ [(match_operand:BLK 0 "memory_operand" "jm")
   (match_operand:SI 1 "register_operand" "a")
   (match_operand:SI 2 "register_operand" "d")]
  ANY_XRSTOR)]
@@ -25751,12 +25755,13 @@ (define_insn "_rex64"
   "\t%0"
   [(set_attr "type" "other")
(set_attr "memory" "load")
+   (set_attr "gpr32" "0")
(set (attr "length")
 (symbol_ref "ix86_attr_length_address_default (insn) + 3"))])
 
 (define_insn "64"
[(unspec_volatile:BLK
- [(match_operand:BLK 0 "memory_operand" "m")
+ [(match_operand:BLK 0 "memory_operand" "jm")
   (match_operand:SI 1 "register_operand" "a")
   (match_operand:SI 2 "register_operand" "d")]
  ANY_XRSTOR64)]
@@ -25764,6 +25769,7 @@ (define_insn "64"
   "64\t%0"
   [(set_attr "type" "other")
(set_attr "memory" "load")
+   (set_attr "gpr32" "0")
(set (attr "length")
 (symbol_ref "ix86_attr_length_address_default (insn) + 4"))])
 
diff --git a/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c 
b/gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
new file mode 100644
index 000..7ecc861435f
--- /dev/null

[PATCH 13/13] [APX EGPR] Handle vex insns that only support GPR16 (5/5)

2023-09-22 Thread Hongyu Wang
From: Kong Lingling 

These vex insn may have legacy counterpart that could support EGPR,
but they do not have evex counterpart. Split out its vex part from
patterns and set the vex part to non-EGPR supported by adjusting
constraints and attr_gpr32.

insn list:
1. vmovmskpd/vmovmskps
2. vpmovmskb
3. vrsqrtss/vrsqrtps
4. vrcpss/vrcpps
5. vhaddpd/vhaddps, vhsubpd/vhsubps
6. vldmxcsr/vstmxcsr
7. vaddsubpd/vaddsubps
8. vlddqu
9. vtestps/vtestpd
10. vmaskmovps/vmaskmovpd, vpmaskmovd/vpmaskmovq
11. vperm2f128/vperm2i128
12. vinserti128/vinsertf128
13. vbroadcasti128/vbroadcastf128
14. vcmppd/vcmpps, vcmpss/vcmpsd
15. vgatherdps/vgatherqps, vgatherdpd/vgatherqpd

gcc/ChangeLog:

* config/i386/constraints.md (jb): New constraint for vsib memory
that does not allow gpr32.
* config/i386/i386.md: (setcc__sse): Replace m to jm for avx
alternative and set attr_gpr32 to 0.
(movmsk_df): Split avx/noavx alternatives and  replace "r" to "jr" for
avx alternative.
(_rcp2): Split avx/noavx alternatives and replace
"m/Bm" to "jm/ja" for avx alternative, set its gpr32 attr to 0.
(*rsqrtsf2_sse): Likewise.
* config/i386/mmx.md (mmx_pmovmskb): Split alternative 1 to
avx/noavx and assign jr/r constraint to dest.
* config/i386/sse.md (_movmsk):
Split avx/noavx alternatives and replace "r" to "jr" for avx 
alternative.
(*_movmsk_ext): Likewise.
(*_movmsk_lt): Likewise.
(*_movmsk_ext_lt): Likewise.
(*_movmsk_shift): Likewise.
(*_movmsk_ext_shift): Likewise.
(_pmovmskb): Likewise.
(*_pmovmskb_zext): Likewise.
(*sse2_pmovmskb_ext): Likewise.
(*_pmovmskb_lt): Likewise.
(*_pmovmskb_zext_lt): Likewise.
(*sse2_pmovmskb_ext_lt): Likewise.
(_rcp2): Split avx/noavx alternatives and replace
"m/Bm" to "jm/ja" for avx alternative, set its attr_gpr32 to 0.
(sse_vmrcpv4sf2): Likewise.
(*sse_vmrcpv4sf2): Likewise.
(rsqrt2): Likewise.
(sse_vmrsqrtv4sf2): Likewise.
(*sse_vmrsqrtv4sf2): Likewise.
(avx_hv4df3): Likewise.
(sse3_hsubv2df3): Likewise.
(avx_hv8sf3): Likewise.
(sse3_hv4sf3): Likewise.
(_lddqu): Likewise.
(avx_cmp3): Likewise.
(avx_vmcmp3): Likewise.
(*sse2_gt3): Likewise.
(sse_ldmxcsr): Likewise.
(sse_stmxcsr): Likewise.
(avx_vtest): Replace m to jm for
avx alternative and set attr_gpr32 to 0.
(avx2_permv2ti): Likewise.
(*avx_vperm2f128_full): Likewise.
(*avx_vperm2f128_nozero): Likewise.
(vec_set_lo_v32qi): Likewise.
(_maskload): Likewise.
(_maskstore: Likewise.
(avx_cmp3): Likewise.
(avx_vmcmp3): Likewise.
(*_maskcmp3_comm): Likewise.
(*avx2_gathersi): Replace Tv to jb and set
attr_gpr32 to 0.
(*avx2_gathersi_2): Likewise.
(*avx2_gatherdi): Likewise.
(*avx2_gatherdi_2): Likewise.
(*avx2_gatherdi_3): Likewise.
(*avx2_gatherdi_4): Likewise.
(avx_vbroadcastf128_): Restrict non-egpr alternative to
noavx512vl, set its constraint to jm and set attr_gpr32 to 0.
(vec_set_lo_): Likewise.
(vec_set_lo_): Likewise for SF/SI modes.
(vec_set_hi_): Likewise.
(vec_set_hi_): Likewise for SF/SI modes.
(vec_set_hi_): Likewise.
(vec_set_lo_): Likewise.
(avx2_set_hi_v32qi): Likewise.

Co-authored-by: Hongyu Wang 
Co-authored-by: Hongtao Liu 
---
 gcc/config/i386/constraints.md |   6 +
 gcc/config/i386/i386.md|  47 +++--
 gcc/config/i386/mmx.md |  11 +-
 gcc/config/i386/sse.md | 320 +
 4 files changed, 242 insertions(+), 142 deletions(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index 36c268d7f9b..dc91bd94b27 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -428,3 +428,9 @@ (define_special_memory_constraint "ja"
   (and (match_operand 0 "vector_memory_operand")
(not (and (match_test "TARGET_APX_EGPR")
 (match_test "x86_extended_rex2reg_mentioned_p (op)")
+
+(define_address_constraint "jb"
+  "VSIB address operand without EGPR"
+  (and (match_operand 0 "vsib_address_operand")
+   (not (and (match_test "TARGET_APX_EGPR")
+(match_test "x86_extended_rex2reg_mentioned_p (op)")
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index c09ee3989cb..a0ba1752a54 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -554,7 +554,8 @@ (define_attr "isa" 
"base,x64,nox64,x64_sse2,x64_sse4,x64_sse4_noavx,
avx,noavx,avx2,noavx2,bmi,bmi2,fma4,fma,avx512f,noavx512f,
avx512bw,noavx512bw,avx512dq,noavx512dq,fma_or_avx512vl,
avx512vl,noavx512vl,avxvnni,avx512vnnivl,avx512fp16,avxifma,

[PATCH 08/13] [APX EGPR] Handle GPR16 only vector move insns

2023-09-22 Thread Hongyu Wang
For vector move insns like vmovdqa/vmovdqu, their evex counterparts
requrire explicit suffix 64/32/16/8. The usage of these instruction
are prohibited under AVX10_1 or AVX512F, so for we select
vmovaps/vmovups for vector load/store insns that contains EGPR if
ther is no AVX512VL, and keep the original move insn selection
otherwise.

gcc/ChangeLog:

* config/i386/i386.cc (ix86_get_ssemov): Check if egpr is used,
adjust mnemonic for vmovduq/vmovdqa.
* config/i386/sse.md 
(*_vinsert_0):
Check if egpr is used, adjust mnemonic for vmovdqu/vmovdqa.
(avx_vec_concat): Likewise, and separate alternative 0 to
avx_noavx512f.

Co-authored-by: Kong Lingling 
Co-authored-by: Hongtao Liu 
---
 gcc/config/i386/i386.cc | 42 +++--
 gcc/config/i386/sse.md  | 34 +++--
 2 files changed, 60 insertions(+), 16 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index ea94663eb68..5d47c2af25e 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -5478,6 +5478,12 @@ ix86_get_ssemov (rtx *operands, unsigned size,
   bool evex_reg_p = (size == 64
 || EXT_REX_SSE_REG_P (operands[0])
 || EXT_REX_SSE_REG_P (operands[1]));
+
+  bool egpr_p = (TARGET_APX_EGPR
+&& (x86_extended_rex2reg_mentioned_p (operands[0])
+|| x86_extended_rex2reg_mentioned_p (operands[1])));
+  bool egpr_vl = egpr_p && TARGET_AVX512VL;
+
   machine_mode scalar_mode;
 
   const char *opcode = NULL;
@@ -5550,12 +5556,18 @@ ix86_get_ssemov (rtx *operands, unsigned size,
{
case E_HFmode:
case E_BFmode:
- if (evex_reg_p)
+ if (evex_reg_p || egpr_vl)
opcode = (misaligned_p
  ? (TARGET_AVX512BW
 ? "vmovdqu16"
 : "vmovdqu64")
  : "vmovdqa64");
+ else if (egpr_p)
+   opcode = (misaligned_p
+ ? (TARGET_AVX512BW
+? "vmovdqu16"
+: "%vmovups")
+ : "%vmovaps");
  else
opcode = (misaligned_p
  ? (TARGET_AVX512BW
@@ -5570,8 +5582,10 @@ ix86_get_ssemov (rtx *operands, unsigned size,
  opcode = misaligned_p ? "%vmovupd" : "%vmovapd";
  break;
case E_TFmode:
- if (evex_reg_p)
+ if (evex_reg_p || egpr_vl)
opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+ else if (egpr_p)
+   opcode = misaligned_p ? "%vmovups" : "%vmovaps";
  else
opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
  break;
@@ -5584,12 +5598,18 @@ ix86_get_ssemov (rtx *operands, unsigned size,
   switch (scalar_mode)
{
case E_QImode:
- if (evex_reg_p)
+ if (evex_reg_p || egpr_vl)
opcode = (misaligned_p
  ? (TARGET_AVX512BW
 ? "vmovdqu8"
 : "vmovdqu64")
  : "vmovdqa64");
+ else if (egpr_p)
+   opcode = (misaligned_p
+ ? (TARGET_AVX512BW
+? "vmovdqu8"
+: "%vmovups")
+ : "%vmovaps");
  else
opcode = (misaligned_p
  ? (TARGET_AVX512BW
@@ -5598,12 +5618,18 @@ ix86_get_ssemov (rtx *operands, unsigned size,
  : "%vmovdqa");
  break;
case E_HImode:
- if (evex_reg_p)
+ if (evex_reg_p || egpr_vl)
opcode = (misaligned_p
  ? (TARGET_AVX512BW
 ? "vmovdqu16"
 : "vmovdqu64")
  : "vmovdqa64");
+ else if (egpr_p)
+   opcode = (misaligned_p
+ ? (TARGET_AVX512BW
+? "vmovdqu16"
+: "%vmovups")
+ : "%vmovaps");
  else
opcode = (misaligned_p
  ? (TARGET_AVX512BW
@@ -5612,16 +5638,20 @@ ix86_get_ssemov (rtx *operands, unsigned size,
  : "%vmovdqa");
  break;
case E_SImode:
- if (evex_reg_p)
+ if (evex_reg_p || egpr_vl)
opcode = misaligned_p ? "vmovdqu32" : "vmovdqa32";
+ else if (egpr_p)
+   opcode = misaligned_p ? "%vmovups" : "%vmovaps";
  else
opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
  break;
case E_DImode:
case E_TImode:
case E_OImode:
- if (evex_reg_p)
+ if (evex_reg_p || egpr_vl)
opcode = misaligned_p ? "vmovdqu64" : "vmovdqa64";
+ else if (egpr_p)
+   opcode = misaligned_p ? "%vmovups" : "%vmovaps";
  else
opcode = misaligned_p ? "%vmovdqu" : "%vmovdqa";
  break;
diff

[PATCH 10/13] [APX EGPR] Handle legacy insns that only support GPR16 (2/5)

2023-09-22 Thread Hongyu Wang
From: Kong Lingling 

These legacy insns in opcode map2/3 have vex but no evex
counterpart, disable EGPR for them by adjusting alternatives and
attr_gpr32.

insn list:
1. phaddw/vphaddw, phaddd/vphaddd, phaddsw/vphaddsw
2. phsubw/vphsubw, phsubd/vphsubd, phsubsw/vphsubsw
3. psignb/vpsginb, psignw/vpsignw, psignd/vpsignd
4. blendps/vblendps, blendpd/vblendpd
5. blendvps/vblendvps, blendvpd/vblendvpd
6. pblendvb/vpblendvb, pblendw/vpblendw
7. mpsadbw/vmpsadbw
8. dpps/vddps, dppd/vdppd
9. pcmpeqq/vpcmpeqq, pcmpgtq/vpcmpgtq

gcc/ChangeLog:

* config/i386/sse.md (avx2_phwv16hi3): Set
attr gpr32 0 and constraint jm/ja to all mem alternatives.
(ssse3_phwv8hi3): Likewise.
(ssse3_phwv4hi3): Likewise.
(avx2_phdv8si3): Likewise.
(ssse3_phdv4si3): Likewise.
(ssse3_phdv2si3): Likewise.
(_psign3): Likewise.
(ssse3_psign3): Likewise.
(_blend_blendv_blendv_lt): Likewise.
(*_blendv_not_ltint: Likewise.
(_dp): Likewise.
(_mpsadbw): Likewise.
(_pblendvb): Likewise.
(*_pblendvb_lt): Likewise.
(sse4_1_pblend): Likewise.
(*avx2_pblend): Likewise.
(avx2_permv2ti): Likewise.
(*avx_vperm2f128_nozero): Likewise.
(*avx2_eq3): Likewise.
(*sse4_1_eqv2di3): Likewise.
(sse4_2_gtv2di3): Likewise.
(avx2_gt3): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-legacy-insn-check-norex2.c: Add
sse/vex intrinsic tests.

Co-authored-by: Hongyu Wang 
Co-authored-by: Hongtao Liu 
---
 gcc/config/i386/sse.md|  73 
 .../i386/apx-legacy-insn-check-norex2.c   | 106 ++
 2 files changed, 155 insertions(+), 24 deletions(-)

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 256b0eedbbb..a7858a7f8cf 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -16831,7 +16831,7 @@ (define_insn "*avx2_eq3"
   [(set (match_operand:VI_256 0 "register_operand" "=x")
(eq:VI_256
  (match_operand:VI_256 1 "nonimmediate_operand" "%x")
- (match_operand:VI_256 2 "nonimmediate_operand" "xm")))]
+ (match_operand:VI_256 2 "nonimmediate_operand" "jm")))]
   "TARGET_AVX2 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "vpcmpeq\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "type" "ssecmp")
@@ -16839,6 +16839,7 @@ (define_insn "*avx2_eq3"
  (if_then_else (eq (const_string "mode") (const_string "V4DImode"))
   (const_string "1")
   (const_string "*")))
+   (set_attr "gpr32" "0")
(set_attr "prefix" "vex")
(set_attr "mode" "OI")])
 
@@ -17021,7 +17022,7 @@ (define_insn "*sse4_1_eqv2di3"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
(eq:V2DI
  (match_operand:V2DI 1 "vector_operand" "%0,0,x")
- (match_operand:V2DI 2 "vector_operand" "YrBm,*xBm,xm")))]
+ (match_operand:V2DI 2 "vector_operand" "Yrja,*xja,xjm")))]
   "TARGET_SSE4_1 && !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
pcmpeqq\t{%2, %0|%0, %2}
@@ -17029,6 +17030,7 @@ (define_insn "*sse4_1_eqv2di3"
vpcmpeqq\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "ssecmp")
+   (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "TI")])
@@ -17037,13 +17039,14 @@ (define_insn "*sse2_eq3"
   [(set (match_operand:VI124_128 0 "register_operand" "=x,x")
(eq:VI124_128
  (match_operand:VI124_128 1 "vector_operand" "%0,x")
- (match_operand:VI124_128 2 "vector_operand" "xBm,xm")))]
+ (match_operand:VI124_128 2 "vector_operand" "xBm,xjm")))]
   "TARGET_SSE2
&& !(MEM_P (operands[1]) && MEM_P (operands[2]))"
   "@
pcmpeq\t{%2, %0|%0, %2}
vpcmpeq\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,avx")
+   (set_attr "gpr32" "1,0")
(set_attr "type" "ssecmp")
(set_attr "prefix" "orig,vex")
(set_attr "mode" "TI")])
@@ -17052,7 +17055,7 @@ (define_insn "sse4_2_gtv2di3"
   [(set (match_operand:V2DI 0 "register_operand" "=Yr,*x,x")
(gt:V2DI
  (match_operand:V2DI 1 "register_operand" "0,0,x")
- (match_operand:V2DI 2 "vector_operand" "YrBm,*xBm,xm")))]
+ (match_operand:V2DI 2 "vector_operand" "Yrja,*xja,xjm")))]
   "TARGET_SSE4_2"
   "@
pcmpgtq\t{%2, %0|%0, %2}
@@ -17060,6 +17063,7 @@ (define_insn "sse4_2_gtv2di3"
vpcmpgtq\t{%2, %1, %0|%0, %1, %2}"
   [(set_attr "isa" "noavx,noavx,avx")
(set_attr "type" "ssecmp")
+   (set_attr "gpr32" "0")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")
(set_attr "mode" "TI")])
@@ -17068,7 +17072,7 @@ (define_insn "avx2_gt3"
   [(set (match_operand:VI_256 0 "register_operand" "=x")
(gt:VI_256
  (match_operand:VI_256 1 "register_operand" "x")
- (match_operand:VI_256 2 "nonimmediate_operand" "xm")))]
+ (match_operand:VI_256 2 "nonimmediat

[PATCH 06/13] [APX EGPR] Add backend hook for base_reg_class/index_reg_class.

2023-09-22 Thread Hongyu Wang
From: Kong Lingling 

Add backend helper functions to verify if a rtx_insn can adopt EGPR to
its base/index reg of memory operand. The verification rule goes like
  1. For asm insn, enable/disable EGPR by ix86_apx_inline_asm_use_gpr32.
  2. Disable EGPR for unrecognized insn.
  3. If which_alternative is not decided, loop through enabled alternatives
  and check its attr_gpr32. Only enable EGPR when all enabled
  alternatives has attr_gpr32 = 1.
  4. If which_alternative is decided, enable/disable EGPR by its corresponding
  attr_gpr32.

gcc/ChangeLog:

* config/i386/i386-protos.h (ix86_insn_base_reg_class): New
prototype.
(ix86_regno_ok_for_insn_base_p): Likewise.
(ix86_insn_index_reg_class): Likewise.
* config/i386/i386.cc (ix86_memory_address_use_extended_reg_class_p):
New helper function to scan the insn.
(ix86_insn_base_reg_class): New function to choose BASE_REG_CLASS.
(ix86_regno_ok_for_insn_base_p): Likewise for base regno.
(ix86_insn_index_reg_class): Likewise for INDEX_REG_CLASS.
* config/i386/i386.h (INSN_BASE_REG_CLASS): Define.
(REGNO_OK_FOR_INSN_BASE_P): Likewise.
(INSN_INDEX_REG_CLASS): Likewise.
(enum reg_class): Add INDEX_GPR16.
(GENERAL_GPR16_REGNO_P): Define.
* config/i386/i386.md (gpr32): New attribute.

Co-authored-by: Hongyu Wang 
Co-authored-by: Hongtao Liu 
---
 gcc/config/i386/i386-protos.h |  3 ++
 gcc/config/i386/i386.cc   | 89 +++
 gcc/config/i386/i386.h| 17 ++-
 gcc/config/i386/i386.md   |  3 ++
 4 files changed, 111 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index bd4782800c4..a54e3f6b1dc 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -79,6 +79,9 @@ extern bool ix86_expand_set_or_cpymem (rtx, rtx, rtx, rtx, 
rtx, rtx,
   rtx, rtx, rtx, rtx, bool);
 extern bool ix86_expand_cmpstrn_or_cmpmem (rtx, rtx, rtx, rtx, rtx, bool);
 
+extern enum reg_class ix86_insn_base_reg_class (rtx_insn *);
+extern bool ix86_regno_ok_for_insn_base_p (int, rtx_insn *);
+extern enum reg_class ix86_insn_index_reg_class (rtx_insn *);
 extern bool constant_address_p (rtx);
 extern bool legitimate_pic_operand_p (rtx);
 extern bool legitimate_pic_address_disp_p (rtx);
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index fb1672f0b3d..5af0de4dae7 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -11062,6 +11062,95 @@ ix86_validate_address_register (rtx op)
   return NULL_RTX;
 }
 
+/* Return true if insn memory address can use any available reg
+   in BASE_REG_CLASS or INDEX_REG_CLASS, otherwise false.
+   For APX, some instruction can't be encoded with gpr32
+   which is BASE_REG_CLASS or INDEX_REG_CLASS, for that case
+   returns false.  */
+static bool
+ix86_memory_address_use_extended_reg_class_p (rtx_insn* insn)
+{
+  /* LRA will do some initialization with insn == NULL,
+ return the maximum reg class for that.
+ For other cases, real insn will be passed and checked.  */
+  bool ret = true;
+  if (TARGET_APX_EGPR && insn)
+{
+  if (asm_noperands (PATTERN (insn)) >= 0
+ || GET_CODE (PATTERN (insn)) == ASM_INPUT)
+   return ix86_apx_inline_asm_use_gpr32;
+
+  if (INSN_CODE (insn) < 0)
+   return false;
+
+  /* Try recog the insn before calling get_attr_gpr32. Save
+the current recog_data first.  */
+  /* Also save which_alternative for current recog.  */
+
+  struct recog_data_d recog_data_save = recog_data;
+  int which_alternative_saved = which_alternative;
+
+  /* Update the recog_data for alternative check. */
+  if (recog_data.insn != insn)
+   extract_insn_cached (insn);
+
+  /* If alternative is not set, loop throught each alternative
+of insn and get gpr32 attr for all enabled alternatives.
+If any enabled alternatives has 0 value for gpr32, disallow
+gpr32 for addressing.  */
+  if (which_alternative_saved == -1)
+   {
+ alternative_mask enabled = get_enabled_alternatives (insn);
+ bool curr_insn_gpr32 = false;
+ for (int i = 0; i < recog_data.n_alternatives; i++)
+   {
+ if (!TEST_BIT (enabled, i))
+   continue;
+ which_alternative = i;
+ curr_insn_gpr32 = get_attr_gpr32 (insn);
+ if (!curr_insn_gpr32)
+   ret = false;
+   }
+   }
+  else
+   {
+ which_alternative = which_alternative_saved;
+ ret = get_attr_gpr32 (insn);
+   }
+
+  recog_data = recog_data_save;
+  which_alternative = which_alternative_saved;
+}
+
+  return ret;
+}
+
+/* For APX, some instructions can't be encoded with gpr32.  */
+enum reg_class
+ix86_insn_base_reg_class (rtx_insn* insn)
+{
+  if (ix86_memory_address_use_extended_reg_cl

[PATCH 12/13] [APX_EGPR] Handle legacy insns that only support GPR16 (4/5)

2023-09-22 Thread Hongyu Wang
From: Kong Lingling 

The APX enabled hardware should also be AVX10 enabled, thus for map2/3 insns
with evex counterpart, we assume auto promotion to EGPR under APX_F if the
insn uses GPR32. So for below insns, we disabled EGPR usage for their sse
mnenomics, while allowing egpr generation of their v prefixed mnemonics.

insn list:
1. pabsb/pabsw/pabsd
2. pextrb/pextrw/pextrd/pextrq
3. pinsrb/pinsrd/pinsrq
4. pshufb
5. extractps/insertps
6. pmaddubsw
7. pmulhrsw
8. packusdw
9. palignr
10. movntdqa
11. mpsadbw
12. pmuldq/pmulld
13. pmaxsb/pmaxsd, pminsb/pminsd
pmaxud/pmaxuw, pminud/pminuw
14. (pmovsxbw/pmovsxbd/pmovsxbq,
 pmovsxwd/pmovsxwq, pmovsxdq
 pmovzxbw/pmovzxbd/pmovzxbq,
 pmovzxwd/pmovzxwq, pmovzxdq)
15. aesdec/aesdeclast, aesenc/aesenclast
16. pclmulqdq
17. gf2p8affineqb/gf2p8affineinvqb/gf2p8mulb

gcc/ChangeLog:

* config/i386/i386.md (*movhi_internal): Split out non-gpr
supported pextrw with mem constraint to avx/noavx alternatives,
set jm and attr gpr32 0 to the noavx alternative.
(*mov_internal): Likewise.
* config/i386/mmx.md (mmx_pshufbv8qi3): Change "r/m/Bm" to
"jr/jm/ja" and set_attr gpr32 0 for noavx alternative.
(mmx_pshufbv4qi3): Likewise.
(*mmx_pinsrd): Likewise.
(*mmx_pinsrb): Likewise.
(*pinsrb): Likewise.
(mmx_pshufbv8qi3): Likewise.
(mmx_pshufbv4qi3): Likewise.
(@sse4_1_insertps_): Likewise.
(*mmx_pextrw): Split altrenatives and map non-EGPR
constraints, attr_gpr32 and attr_isa to noavx mnemonics.
(*movv2qi_internal): Likewise.
(*pextrw): Likewise.
(*mmx_pextrb): Likewise.
(*mmx_pextrb_zext): Likewise.
(*pextrb): Likewise.
(*pextrb_zext): Likewise.
(vec_extractv2si_1): Likewise.
(vec_extractv2si_1_zext): Likewise.
* config/i386/sse.md: (vi128_h_r): New mode attr for
pinsr{bw}/pextr{bw} with reg operand.
(*abs2): Split altrenatives and %v in mnemonics, map
non-EGPR constraints, gpr32 and isa attrs to noavx mnemonics.
(*vec_extract): Likewise.
(*vec_extract): Likewise for HFBF pattern.
(*vec_extract_zext): Likewise.
(*vec_extractv4si_1): Likewise.
(*vec_extractv4si_zext): Likewise.
(*vec_extractv2di_1): Likewise.
(*vec_concatv2si_sse4_1): Likewise.
(_pinsr): Likewise.
(vec_concatv2di): Likewise.
(*sse4_1_v2qiv2di2_1): Likewise.
(ssse3_avx2>_pshufb3): Change "r/m/Bm" to
"jr/jm/ja" and set_attr gpr32 0 for noavx alternative, split
%v for avx/noavx alternatives if necessary.
(*vec_concatv2sf_sse4_1): Likewise.
(*sse4_1_extractps): Likewise.
(vec_set_0): Likewise for VI4F_128.
(*vec_setv4sf_sse4_1): Likewise.
(@sse4_1_insertps): Likewise.
(ssse3_pmaddubsw128): Likewise.
(*_pmulhrsw3): Likewise.
(_packusdw): Likewise.
(_palignr): Likewise.
(_movntdqa): Likewise.
(_mpsadbw): Likewise.
(*sse4_1_mulv2siv2di3): Likewise.
(*_mul3): Likewise.
(*sse4_1_3): Likewise.
(*v8hi3): Likewise.
(*v16qi3): Likewise.
(*sse4_1_v8qiv8hi2_1): Likewise.
(*sse4_1_zero_extendv8qiv8hi2_3): Likewise.
(*sse4_1_zero_extendv8qiv8hi2_4): Likewise.
(*sse4_1_v4qiv4si2_1): Likewise.
(*sse4_1_v4hiv4si2_1): Likewise.
(*sse4_1_zero_extendv4hiv4si2_3): Likewise.
(*sse4_1_zero_extendv4hiv4si2_4): Likewise.
(*sse4_1_v2hiv2di2_1): Likewise.
(*sse4_1_v2siv2di2_1): Likewise.
(*sse4_1_zero_extendv2siv2di2_3): Likewise.
(*sse4_1_zero_extendv2siv2di2_4): Likewise.
(aesdec): Likewise.
(aesdeclast): Likewise.
(aesenc): Likewise.
(aesenclast): Likewise.
(pclmulqdq): Likewise.
(vgf2p8affineinvqb_): Likewise.
(vgf2p8affineqb_): Likewise.
(vgf2p8mulb_): Likewise.

Co-authored-by: Hongyu Wang 
Co-authored-by: Hongtao Liu 
---
 gcc/config/i386/i386.md |  42 +++---
 gcc/config/i386/mmx.md  | 143 -
 gcc/config/i386/sse.md  | 274 ++--
 3 files changed, 289 insertions(+), 170 deletions(-)

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 271d417146c..c09ee3989cb 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -2868,9 +2868,9 @@ (define_peephole2
 
 (define_insn "*movhi_internal"
   [(set (match_operand:HI 0 "nonimmediate_operand"
-"=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,m")
+"=r,r,r,m ,*k,*k ,r ,m ,*k ,?r,?*v,*v,*v,*v,jm,m")
(match_operand:HI 1 "general_operand"
-"r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C ,*v,m ,*v"))]
+"r ,n,m,rn,r ,*km,*k,*k,CBC,*v,r  ,C ,*v,m ,*x,*v"))]
   "!(MEM_P (operands[0]) && MEM_P (operands[1]))
&& ix86_hardreg_mov_ok (operands[0], operands[1])"
 {
@@ -2925,15 +2925,21 @@ (define_insn "*movhi_in

[PATCH 04/13] [APX EGPR] Add 16 new integer general purpose registers

2023-09-22 Thread Hongyu Wang
From: Kong Lingling 

Extend GENERAL_REGS with extra r16-r31 registers like REX registers,
named as REX2 registers. They will only be enabled under
TARGET_APX_EGPR.

gcc/ChangeLog:

* config/i386/i386-protos.h (x86_extended_rex2reg_mentioned_p):
New function prototype.
* config/i386/i386.cc (regclass_map): Add mapping for 16 new
general registers.
(debugger64_register_map): Likewise.
(ix86_conditional_register_usage): Clear REX2 register when APX
disabled.
(ix86_code_end): Add handling for REX2 reg.
(print_reg): Likewise.
(ix86_output_jmp_thunk_or_indirect): Likewise.
(ix86_output_indirect_branch_via_reg): Likewise.
(ix86_attr_length_vex_default): Likewise.
(ix86_emit_save_regs): Adjust to allow saving r31.
(ix86_register_priority): Set REX2 reg priority same as REX.
(x86_extended_reg_mentioned_p): Add check for REX2 regs.
(x86_extended_rex2reg_mentioned_p): New function.
* config/i386/i386.h (CALL_USED_REGISTERS): Add new extended
registers.
(REG_ALLOC_ORDER): Likewise.
(FIRST_REX2_INT_REG): Define.
(LAST_REX2_INT_REG): Ditto.
(GENERAL_REGS): Add 16 new registers.
(INT_SSE_REGS): Likewise.
(FLOAT_INT_REGS): Likewise.
(FLOAT_INT_SSE_REGS): Likewise.
(INT_MASK_REGS): Likewise.
(ALL_REGS):Likewise.
(REX2_INT_REG_P): Define.
(REX2_INT_REGNO_P): Ditto.
(GENERAL_REGNO_P): Add REX2_INT_REGNO_P.
(REGNO_OK_FOR_INDEX_P): Ditto.
(REG_OK_FOR_INDEX_NONSTRICT_P): Add new extended registers.
* config/i386/i386.md: Add 16 new integer general
registers.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-egprs-names.c: New test.
* gcc.target/i386/apx-spill_to_egprs-1.c: Likewise.
* gcc.target/i386/apx-interrupt-1.c: Likewise.

Co-authored-by: Hongyu Wang 
Co-authored-by: Hongtao Liu 
---
 gcc/config/i386/i386-protos.h |   1 +
 gcc/config/i386/i386.cc   |  67 ++--
 gcc/config/i386/i386.h|  46 +---
 gcc/config/i386/i386.md   |  18 +++-
 .../gcc.target/i386/apx-egprs-names.c |  17 +++
 .../gcc.target/i386/apx-interrupt-1.c | 102 ++
 .../gcc.target/i386/apx-spill_to_egprs-1.c|  25 +
 7 files changed, 252 insertions(+), 24 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-egprs-names.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 9ffb125fc2b..bd4782800c4 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -64,6 +64,7 @@ extern bool symbolic_reference_mentioned_p (rtx);
 extern bool extended_reg_mentioned_p (rtx);
 extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
 extern bool x86_extended_reg_mentioned_p (rtx);
+extern bool x86_extended_rex2reg_mentioned_p (rtx);
 extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
 extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 477e6cecc38..fb1672f0b3d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -169,7 +169,12 @@ enum reg_class const regclass_map[FIRST_PSEUDO_REGISTER] =
   ALL_SSE_REGS, ALL_SSE_REGS, ALL_SSE_REGS, ALL_SSE_REGS,
   /* Mask registers.  */
   ALL_MASK_REGS, MASK_REGS, MASK_REGS, MASK_REGS,
-  MASK_REGS, MASK_REGS, MASK_REGS, MASK_REGS
+  MASK_REGS, MASK_REGS, MASK_REGS, MASK_REGS,
+  /* REX2 registers */
+  GENERAL_REGS, GENERAL_REGS, GENERAL_REGS, GENERAL_REGS,
+  GENERAL_REGS, GENERAL_REGS, GENERAL_REGS, GENERAL_REGS,
+  GENERAL_REGS, GENERAL_REGS, GENERAL_REGS, GENERAL_REGS,
+  GENERAL_REGS, GENERAL_REGS, GENERAL_REGS, GENERAL_REGS,
 };
 
 /* The "default" register map used in 32bit mode.  */
@@ -227,7 +232,10 @@ int const debugger64_register_map[FIRST_PSEUDO_REGISTER] =
   /* AVX-512 registers 24-31 */
   75, 76, 77, 78, 79, 80, 81, 82,
   /* Mask registers */
-  118, 119, 120, 121, 122, 123, 124, 125
+  118, 119, 120, 121, 122, 123, 124, 125,
+  /* rex2 extend interger registers */
+  130, 131, 132, 133, 134, 135, 136, 137,
+  138, 139, 140, 141, 142, 143, 144, 145
 };
 
 /* Define the register numbers to be used in Dwarf debugging information.
@@ -521,6 +529,13 @@ ix86_conditional_register_usage (void)
 
   accessible_reg_set &= ~reg_class_contents[ALL_MASK_REGS];
 }
+
+  /* If APX is disabled, disable the registers.  */
+  if (! (TARGET_APX_EGPR && TARGET_64BIT))
+{
+  for (i = FIRST_REX2_INT_REG; i <= LAST_REX2_INT_REG; i++)
+   CLEAR_HARD_REG_BIT (accessible_reg_set, i);
+}
 }
 
 /* Canonicalize a comparison from one we don't have to one we do have.  */
@@ -6188,6 +6203,13 @@ ix86_co

[PATCH 11/13] [APX EGPR] Handle legacy insns that only support GPR16 (3/5)

2023-09-22 Thread Hongyu Wang
From: Kong Lingling 

Disable EGPR usage for below legacy insns in opcode map2/3 that have vex
but no evex counterpart.

insn list:
1. phminposuw/vphminposuw
2. ptest/vptest
3. roundps/vroundps, roundpd/vroundpd,
   roundss/vroundss, roundsd/vroundsd
4. pcmpestri/vpcmpestri, pcmpestrm/vpcmpestrm
5. pcmpistri/vpcmpistri, pcmpistrm/vpcmpistrm
6. aesimc/vaesimc, aeskeygenassist/vaeskeygenassist

gcc/ChangeLog:

* config/i386/i386-protos.h (x86_evex_reg_mentioned_p): New
prototype.
* config/i386/i386.cc (x86_evex_reg_mentioned_p): New
function.
* config/i386/i386.md (sse4_1_round2): Set attr gpr32 0
and constraint jm to all non-evex alternatives, adjust
alternative outputs if evex reg is mentioned.
* config/i386/sse.md (_ptest): Set attr gpr32 0
and constraint jm/ja to all non-evex alternatives.
(ptesttf2): Likewise.
(_round): Likewise.
(sse4_2_pcmpestri): Likewise.
(sse4_2_pcmpestrm): Likewise.
(sse4_2_pcmpestr_cconly): Likewise.
(sse4_2_pcmpistr): Likewise.
(sse4_2_pcmpistri): Likewise.
(sse4_2_pcmpistrm): Likewise.
(sse4_2_pcmpistr_cconly): Likewise.
(aesimc): Likewise.
(aeskeygenassist): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-legacy-insn-check-norex2.c: Add intrinsic
tests.

Co-authored-by: Hongyu Wang 
Co-authored-by: Hongtao Liu 
---
 gcc/config/i386/i386-protos.h |  1 +
 gcc/config/i386/i386.cc   | 13 +++
 gcc/config/i386/i386.md   |  3 +-
 gcc/config/i386/sse.md| 93 +--
 .../i386/apx-legacy-insn-check-norex2.c   | 55 ++-
 5 files changed, 132 insertions(+), 33 deletions(-)

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index a54e3f6b1dc..28d0eab11d5 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -65,6 +65,7 @@ extern bool extended_reg_mentioned_p (rtx);
 extern bool x86_extended_QIreg_mentioned_p (rtx_insn *);
 extern bool x86_extended_reg_mentioned_p (rtx);
 extern bool x86_extended_rex2reg_mentioned_p (rtx);
+extern bool x86_evex_reg_mentioned_p (rtx [], int);
 extern bool x86_maybe_negate_const_int (rtx *, machine_mode);
 extern machine_mode ix86_cc_mode (enum rtx_code, rtx, rtx);
 
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 5d47c2af25e..58fa054635a 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -22937,6 +22937,19 @@ x86_extended_rex2reg_mentioned_p (rtx insn)
   return false;
 }
 
+/* Return true when rtx operands mentions register that must be encoded using
+   evex prefix.  */
+bool
+x86_evex_reg_mentioned_p (rtx operands[], int nops)
+{
+  int i;
+  for (i = 0; i < nops; i++)
+if (EXT_REX_SSE_REG_P (operands[i])
+   || x86_extended_rex2reg_mentioned_p (operands[i]))
+  return true;
+  return false;
+}
+
 /* If profitable, negate (without causing overflow) integer constant
of mode MODE at location LOC.  Return true in this case.  */
 bool
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 6cf86b798a8..271d417146c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -21603,7 +21603,7 @@ (define_expand "significand2"
 (define_insn "sse4_1_round2"
   [(set (match_operand:MODEFH 0 "register_operand" "=x,x,x,v,v")
(unspec:MODEFH
- [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,m,v,m")
+ [(match_operand:MODEFH 1 "nonimmediate_operand" "0,x,jm,v,m")
   (match_operand:SI 2 "const_0_to_15_operand")]
  UNSPEC_ROUND))]
   "TARGET_SSE4_1"
@@ -21616,6 +21616,7 @@ (define_insn "sse4_1_round2"
   [(set_attr "type" "ssecvt")
(set_attr "prefix_extra" "1,1,1,*,*")
(set_attr "length_immediate" "1")
+   (set_attr "gpr32" "1,1,0,1,1")
(set_attr "prefix" "maybe_vex,maybe_vex,maybe_vex,evex,evex")
(set_attr "isa" "noavx512f,noavx512f,noavx512f,avx512f,avx512f")
(set_attr "avx_partial_xmm_update" "false,false,true,false,true")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index a7858a7f8cf..4db3940e422 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -22610,11 +22610,12 @@ (define_insn "avx2_pblendd"
 
 (define_insn "sse4_1_phminposuw"
   [(set (match_operand:V8HI 0 "register_operand" "=Yr,*x,x")
-   (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "YrBm,*xBm,xm")]
+   (unspec:V8HI [(match_operand:V8HI 1 "vector_operand" "Yrja,*xja,xjm")]
 UNSPEC_PHMINPOSUW))]
   "TARGET_SSE4_1"
   "%vphminposuw\t{%1, %0|%0, %1}"
   [(set_attr "isa" "noavx,noavx,avx")
+   (set_attr "gpr32" "0")
(set_attr "type" "sselog1")
(set_attr "prefix_extra" "1")
(set_attr "prefix" "orig,orig,vex")
@@ -23803,12 +23804,13 @@ (define_insn "avx_vtest"
 (define_insn "*_ptest"
   [(set (reg FLAGS_REG)
(unspec [(match_operand:V_AVX 0 "register_op

[PATCH 01/13] [APX EGPR] middle-end: Add insn argument to base_reg_class

2023-09-22 Thread Hongyu Wang
From: Kong Lingling 

Current reload infrastructure does not support selective base_reg_class
for backend insn. Add new macros with insn parameters to base_reg_class
for lra/reload usage.

gcc/ChangeLog:

* addresses.h (base_reg_class): Add insn argument and new macro
INSN_BASE_REG_CLASS.
(regno_ok_for_base_p_1): Add insn argument and new macro
REGNO_OK_FOR_INSN_BASE_P.
(regno_ok_for_base_p): Add insn argument and parse to ok_for_base_p_1.
* doc/tm.texi: Document INSN_BASE_REG_CLASS and
REGNO_OK_FOR_INSN_BASE_P.
* doc/tm.texi.in: Ditto.
* lra-constraints.cc (process_address_1): Pass insn to
base_reg_class.
(curr_insn_transform): Ditto.
* reload.cc (find_reloads): Ditto.
(find_reloads_address): Ditto.
(find_reloads_address_1): Ditto.
(find_reloads_subreg_address): Ditto.
* reload1.cc (maybe_fix_stack_asms): Ditto.

Co-authored-by: Hongyu Wang 
Co-authored-by: Hongtao Liu 
---
 gcc/addresses.h| 19 +++
 gcc/doc/tm.texi| 14 ++
 gcc/doc/tm.texi.in | 14 ++
 gcc/lra-constraints.cc | 15 +--
 gcc/reload.cc  | 30 ++
 gcc/reload1.cc |  2 +-
 6 files changed, 71 insertions(+), 23 deletions(-)

diff --git a/gcc/addresses.h b/gcc/addresses.h
index 3519c241c6d..2c92927bd51 100644
--- a/gcc/addresses.h
+++ b/gcc/addresses.h
@@ -28,8 +28,12 @@ inline enum reg_class
 base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
addr_space_t as ATTRIBUTE_UNUSED,
enum rtx_code outer_code ATTRIBUTE_UNUSED,
-   enum rtx_code index_code ATTRIBUTE_UNUSED)
+   enum rtx_code index_code ATTRIBUTE_UNUSED,
+   rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
 {
+#ifdef INSN_BASE_REG_CLASS
+  return INSN_BASE_REG_CLASS (insn);
+#else
 #ifdef MODE_CODE_BASE_REG_CLASS
   return MODE_CODE_BASE_REG_CLASS (MACRO_MODE (mode), as, outer_code,
   index_code);
@@ -44,6 +48,7 @@ base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
   return BASE_REG_CLASS;
 #endif
 #endif
+#endif
 }
 
 /* Wrapper function to unify target macros REGNO_MODE_CODE_OK_FOR_BASE_P,
@@ -56,8 +61,12 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
 machine_mode mode ATTRIBUTE_UNUSED,
 addr_space_t as ATTRIBUTE_UNUSED,
 enum rtx_code outer_code ATTRIBUTE_UNUSED,
-enum rtx_code index_code ATTRIBUTE_UNUSED)
+enum rtx_code index_code ATTRIBUTE_UNUSED,
+rtx_insn* insn ATTRIBUTE_UNUSED = NULL)
 {
+#ifdef REGNO_OK_FOR_INSN_BASE_P
+  return REGNO_OK_FOR_INSN_BASE_P (regno, insn);
+#else
 #ifdef REGNO_MODE_CODE_OK_FOR_BASE_P
   return REGNO_MODE_CODE_OK_FOR_BASE_P (regno, MACRO_MODE (mode), as,
outer_code, index_code);
@@ -72,6 +81,7 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
   return REGNO_OK_FOR_BASE_P (regno);
 #endif
 #endif
+#endif
 }
 
 /* Wrapper around ok_for_base_p_1, for use after register allocation is
@@ -79,12 +89,13 @@ ok_for_base_p_1 (unsigned regno ATTRIBUTE_UNUSED,
 
 inline bool
 regno_ok_for_base_p (unsigned regno, machine_mode mode, addr_space_t as,
-enum rtx_code outer_code, enum rtx_code index_code)
+enum rtx_code outer_code, enum rtx_code index_code,
+rtx_insn *insn = NULL)
 {
   if (regno >= FIRST_PSEUDO_REGISTER && reg_renumber[regno] >= 0)
 regno = reg_renumber[regno];
 
-  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code);
+  return ok_for_base_p_1 (regno, mode, as, outer_code, index_code, insn);
 }
 
 #endif /* GCC_ADDRESSES_H */
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index b0779724d30..5b1e2a11f89 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2568,6 +2568,13 @@ of an address, @code{ADDRESS} for something that occurs 
in an
 index expression if @var{outer_code} is @code{PLUS}; @code{SCRATCH} otherwise.
 @end defmac
 
+@defmac INSN_BASE_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+base register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of base register
+compared with other insns.
+@end defmac
+
 @defmac INDEX_REG_CLASS
 A macro whose definition is the name of the class to which a valid
 index register must belong.  An index register is one used in an
@@ -2618,6 +2625,13 @@ corresponding index expression if @var{outer_code} is 
@code{PLUS};
 that appear outside a @code{MEM}, i.e., as an @code{address_operand}.
 @end defmac
 
+@defmac REGNO_OK_FOR_INSN_BASE_P (@var{num}, @var{insn})
+A C expression which is nonzero if register number @var{num} is
+suitable for use as a base register in operand addresses for a specified
+@var{insn}. This macro is used when some backend insn may have limi

[PATCH 03/13] [APX_EGPR] Initial support for APX_F

2023-09-22 Thread Hongyu Wang
From: Kong Lingling 

Add -mapx-features= enumeration to separate subfeatures of APX_F.
-mapxf is treated same as previous ISA flag, while it sets
-mapx-features=apx_all that enables all subfeatures.

gcc/ChangeLog:

* common/config/i386/cpuinfo.h (XSTATE_APX_F): New macro.
(XCR_APX_F_ENABLED_MASK): Likewise.
(get_available_features): Detect APX_F under
* common/config/i386/i386-common.cc (OPTION_MASK_ISA2_APX_F_SET): New.
(OPTION_MASK_ISA2_APX_F_UNSET): Likewise.
(ix86_handle_option): Handle -mapxf.
* common/config/i386/i386-cpuinfo.h (FEATURE_APX_F): New.
* common/config/i386/i386-isas.h: Add entry for APX_F.
* config/i386/cpuid.h (bit_APX_F): New.
* config/i386/i386.h (bit_APX_F): (TARGET_APX_EGPR,
TARGET_APX_PUSH2POP2, TARGET_APX_NDD): New define.
* config/i386/i386-opts.h (enum apx_features): New enum.
* config/i386/i386-isa.def (APX_F): New DEF_PTA.
* config/i386/i386-options.cc (ix86_function_specific_save):
Save ix86_apx_features.
(ix86_function_specific_restore): Restore it.
(ix86_valid_target_attribute_inner_p): Add mapxf.
(ix86_option_override_internal): Set ix86_apx_features for PTA
and TARGET_APX_F. Also reports error when APX_F is set but not
having TARGET_64BIT.
* config/i386/i386.opt: (-mapxf): New ISA flag option.
(-mapx=): New enumeration option.
(apx_features): New enum type.
(apx_none): New enum value.
(apx_egpr): Likewise.
(apx_push2pop2): Likewise.
(apx_ndd): Likewise.
(apx_all): Likewise.
* doc/invoke.texi: Document mapxf.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-1.c: New test.

Co-authored-by: Hongyu Wang 
Co-authored-by: Hongtao Liu 
---
 gcc/common/config/i386/cpuinfo.h  | 12 +++-
 gcc/common/config/i386/i386-common.cc | 17 +
 gcc/common/config/i386/i386-cpuinfo.h |  1 +
 gcc/common/config/i386/i386-isas.h|  1 +
 gcc/config/i386/cpuid.h   |  1 +
 gcc/config/i386/i386-isa.def  |  1 +
 gcc/config/i386/i386-options.cc   | 18 ++
 gcc/config/i386/i386-opts.h   |  8 
 gcc/config/i386/i386.h|  4 
 gcc/config/i386/i386.opt  | 25 +
 gcc/doc/invoke.texi   | 11 +++
 gcc/testsuite/gcc.target/i386/apx-1.c |  8 
 12 files changed, 102 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-1.c

diff --git a/gcc/common/config/i386/cpuinfo.h b/gcc/common/config/i386/cpuinfo.h
index 24ae0dbf0ac..141d3743316 100644
--- a/gcc/common/config/i386/cpuinfo.h
+++ b/gcc/common/config/i386/cpuinfo.h
@@ -678,6 +678,7 @@ get_available_features (struct __processor_model *cpu_model,
 #define XSTATE_HI_ZMM  0x80
 #define XSTATE_TILECFG 0x2
 #define XSTATE_TILEDATA0x4
+#define XSTATE_APX_F   0x8
 
 #define XCR_AVX_ENABLED_MASK \
   (XSTATE_SSE | XSTATE_YMM)
@@ -685,11 +686,13 @@ get_available_features (struct __processor_model 
*cpu_model,
   (XSTATE_SSE | XSTATE_YMM | XSTATE_OPMASK | XSTATE_ZMM | XSTATE_HI_ZMM)
 #define XCR_AMX_ENABLED_MASK \
   (XSTATE_TILECFG | XSTATE_TILEDATA)
+#define XCR_APX_F_ENABLED_MASK XSTATE_APX_F
 
-  /* Check if AVX and AVX512 are usable.  */
+  /* Check if AVX, AVX512 and APX are usable.  */
   int avx_usable = 0;
   int avx512_usable = 0;
   int amx_usable = 0;
+  int apx_usable = 0;
   /* Check if KL is usable.  */
   int has_kl = 0;
   if ((ecx & bit_OSXSAVE))
@@ -709,6 +712,8 @@ get_available_features (struct __processor_model *cpu_model,
}
   amx_usable = ((xcrlow & XCR_AMX_ENABLED_MASK)
== XCR_AMX_ENABLED_MASK);
+  apx_usable = ((xcrlow & XCR_APX_F_ENABLED_MASK)
+   == XCR_APX_F_ENABLED_MASK);
 }
 
 #define set_feature(f) \
@@ -922,6 +927,11 @@ get_available_features (struct __processor_model 
*cpu_model,
  if (edx & bit_AMX_COMPLEX)
set_feature (FEATURE_AMX_COMPLEX);
}
+ if (apx_usable)
+   {
+ if (edx & bit_APX_F)
+   set_feature (FEATURE_APX_F);
+   }
}
 }
 
diff --git a/gcc/common/config/i386/i386-common.cc 
b/gcc/common/config/i386/i386-common.cc
index 95468b7c405..86596e96ad1 100644
--- a/gcc/common/config/i386/i386-common.cc
+++ b/gcc/common/config/i386/i386-common.cc
@@ -123,6 +123,7 @@ along with GCC; see the file COPYING3.  If not see
 #define OPTION_MASK_ISA2_SM3_SET OPTION_MASK_ISA2_SM3
 #define OPTION_MASK_ISA2_SHA512_SET OPTION_MASK_ISA2_SHA512
 #define OPTION_MASK_ISA2_SM4_SET OPTION_MASK_ISA2_SM4
+#define OPTION_MASK_ISA2_APX_F_SET OPTION_MASK_ISA2_APX_F
 
 /* SSE4 includes both SSE4.1 and SSE4.2. -msse4 should be the same
as -msse4.2.  */
@@ -309,6 +310,7 @@ along with GCC; see the fi

[PATCH 07/13] [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint.

2023-09-22 Thread Hongyu Wang
From: Kong Lingling 

In inline asm, we do not know if the insn can use EGPR, so disable EGPR
usage by default via mapping the common reg/mem constraint to non-EGPR
constraints.

The full list of mapping goes like

  "g" -> "jrjmi"
  "r" -> "jr"
  "m" -> "jm"
  "<" -> "j<"
  ">" -> "j>"
  "o" -> "jo"
  "V" -> "jV"
  "p" -> "jp"
  "Bm" -> "ja

For memory constraints, we add an option -mapx-inline-asm-use-gpr32
to allow/disallow gpr32 usage in any memory related constraints, as
base_reg_class/index_reg_class cannot aware whether the asm insn
support gpr32 or not.

gcc/ChangeLog:

* config/i386/i386.cc (map_egpr_constraints): New funciton to
map common constraints to EGPR prohibited constraints.
(ix86_md_asm_adjust): Calls map_egpr_constraints.
* config/i386/i386.opt: Add option mapx-inline-asm-use-gpr32.

gcc/testsuite/ChangeLog:

* gcc.target/i386/apx-inline-gpr-norex2.c: New test.

Co-authored-by: Hongyu Wang 
Co-authored-by: Hongtao Liu 
---
 gcc/config/i386/i386.cc   | 92 +++
 gcc/config/i386/i386.opt  |  5 +
 .../gcc.target/i386/apx-inline-gpr-norex2.c   | 25 +
 3 files changed, 122 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 5af0de4dae7..ea94663eb68 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -17,6 +17,7 @@ You should have received a copy of the GNU General Public 
License
 along with GCC; see the file COPYING3.  If not see
 .  */
 
+#define INCLUDE_STRING
 #define IN_TARGET_CODE 1
 
 #include "config.h"
@@ -23161,6 +23162,93 @@ ix86_c_mode_for_suffix (char suffix)
   return VOIDmode;
 }
 
+/* Helper function to map common constraints to non-EGPR ones.
+   All related constraints have h prefix, and h plus Upper letter
+   means the constraint is strictly EGPR enabled, while h plus
+   lower letter indicates the constraint is strictly gpr16 only.
+
+   Specially for "g" constraint, split it to rmi as there is
+   no corresponding general constraint define for backend.
+
+   Here is the full list to map constraints that may involve
+   gpr to h prefixed.
+
+   "g" -> "jrjmi"
+   "r" -> "jr"
+   "m" -> "jm"
+   "<" -> "j<"
+   ">" -> "j>"
+   "o" -> "jo"
+   "V" -> "jV"
+   "p" -> "jp"
+   "Bm" -> "ja"
+*/
+
+static void map_egpr_constraints (vec &constraints)
+{
+  for (size_t i = 0; i < constraints.length(); i++)
+{
+  const char *cur = constraints[i];
+
+  if (startswith (con, "=@cc"))
+   continue;
+
+  int len = strlen (cur);
+  auto_vec buf;
+
+  for (int j = 0; j < len; j++)
+   {
+ switch (cur[j])
+   {
+   case 'g':
+ buf.safe_push ('j');
+ buf.safe_push ('r');
+ buf.safe_push ('j');
+ buf.safe_push ('m');
+ buf.safe_push ('i');
+ break;
+   case 'r':
+   case 'm':
+   case '<':
+   case '>':
+   case 'o':
+   case 'V':
+   case 'p':
+ buf.safe_push ('j');
+ buf.safe_push (cur[j]);
+ break;
+   case 'B':
+ if (cur[j + 1] == 'm')
+   {
+ buf.safe_push ('j');
+ buf.safe_push ('a');
+ j++;
+   }
+ else
+   {
+ buf.safe_push (cur[j]);
+ buf.safe_push (cur[j + 1]);
+ j++;
+   }
+ break;
+   case 'T':
+   case 'Y':
+   case 'W':
+   case 'j':
+ buf.safe_push (cur[j]);
+ buf.safe_push (cur[j + 1]);
+ j++;
+ break;
+   default:
+ buf.safe_push (cur[j]);
+ break;
+   }
+   }
+  buf.safe_push ('\0');
+  constraints[i] = xstrdup (buf.address ());
+}
+}
+
 /* Worker function for TARGET_MD_ASM_ADJUST.
 
We implement asm flag outputs, and maintain source compatibility
@@ -23175,6 +23263,10 @@ ix86_md_asm_adjust (vec &outputs, vec & 
/*inputs*/,
   bool saw_asm_flag = false;
 
   start_sequence ();
+
+  if (TARGET_APX_EGPR && !ix86_apx_inline_asm_use_gpr32)
+map_egpr_constraints (constraints);
+
   for (unsigned i = 0, n = outputs.length (); i < n; ++i)
 {
   const char *con = constraints[i];
diff --git a/gcc/config/i386/i386.opt b/gcc/config/i386/i386.opt
index d89b5bbc5e8..d4a7b7ec839 100644
--- a/gcc/config/i386/i386.opt
+++ b/gcc/config/i386/i386.opt
@@ -1335,3 +1335,8 @@ Enum(apx_features) String(ndd) Value(apx_ndd) Set(4)
 
 EnumValue
 Enum(apx_features) String(all) Value(apx_all) Set(1)
+
+mapx-inline-asm-use-gpr32
+Target Var(ix86_apx_inline_asm_use_gpr32) Init(0)
+Enable GPR32 in inline asm when APX_EGPR enabled, do not
+hook reg or mem constraint in inline asm to GPR16.
diff --git a/gcc/

[PATCH 05/13] [APX EGPR] Add register and memory constraints that disallow EGPR

2023-09-22 Thread Hongyu Wang
From: Kong Lingling 

For APX, as we extended the GENERAL_REG_CLASS, new constraints are
needed to restrict insns that cannot adopt EGPR either in its reg or
memory operands. We added a series of constraints for general/backend
ones that related to GPR usage. All of them are prefixed with "j" to
indicate the constraints does not allow EGPR.

gcc/ChangeLog:

* config/i386/constraints.md (jr): New register constraint
that prohibits EGPR.
(jR): Constraint that force usage of EGPR.
(jm): New memory constraint that prohibits EGPR.
(ja): Likewise for Bm constraint.
(jb): Likewise for Tv constraint.
(j<): New auto-dec memory constraint that prohibits EGPR.
(j>): Likewise for ">" constraint.
(jo): Likewise for "o" constraint.
(jv): Likewise for "V" constraint.
(jp): Likewise for "p" constraint.
* config/i386/i386.h (enum reg_class): Add new reg class
GENERAL_GPR16.

Co-authored-by: Hongyu Wang 
Co-authored-by: Hongtao Liu 
---
 gcc/config/i386/constraints.md | 59 +-
 gcc/config/i386/i386.h |  4 +++
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/constraints.md b/gcc/config/i386/constraints.md
index fd490f39110..36c268d7f9b 100644
--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -19,7 +19,7 @@
 
 ;;; Unused letters:
 ;;;   H
-;;;   h j   z
+;;;   j   z
 
 ;; Integer register constraints.
 ;; It is not necessary to define 'r' here.
@@ -371,3 +371,60 @@ (define_address_constraint "Tv"
 (define_address_constraint "Ts"
   "Address operand without segment register"
   (match_operand 0 "address_no_seg_operand"))
+
+;; Constraint that force to use EGPR, can only adopt to register class.
+(define_register_constraint  "jR" "GENERAL_REGS")
+
+(define_register_constraint  "jr"
+ "TARGET_APX_EGPR ? GENERAL_GPR16 : GENERAL_REGS")
+
+(define_memory_constraint "jm"
+  "@internal memory operand without GPR32."
+  (and (match_operand 0 "memory_operand")
+   (not (and (match_test "TARGET_APX_EGPR")
+(match_test "x86_extended_rex2reg_mentioned_p (op)")
+
+(define_constraint "j<"
+  "@internal auto-dec memory operand without GPR32."
+  (and (and (match_code "mem")
+   (ior (match_test "GET_CODE (XEXP (op, 0)) == PRE_DEC")
+(match_test "GET_CODE (XEXP (op, 0)) == POST_DEC")))
+   (not (and (match_test "TARGET_APX_EGPR")
+(match_test "x86_extended_rex2reg_mentioned_p (op)")
+
+(define_constraint "j>"
+  "@internal auto-dec memory operand without GPR32."
+  (and (and (match_code "mem")
+   (ior (match_test "GET_CODE (XEXP (op, 0)) == PRE_INC")
+(match_test "GET_CODE (XEXP (op, 0)) == POST_INC")))
+   (not (and (match_test "TARGET_APX_EGPR")
+(match_test "x86_extended_rex2reg_mentioned_p (op)")
+
+(define_memory_constraint "jo"
+  "@internal offsetable memory operand without GPR32."
+  (and (and (match_code "mem")
+   (match_test "offsettable_nonstrict_memref_p (op)"))
+   (not (and (match_test "TARGET_APX_EGPR")
+(match_test "x86_extended_rex2reg_mentioned_p (op)")
+
+(define_constraint "jV"
+  "@internal non-offsetable memory operand without GPR32."
+  (and (and (match_code "mem")
+   (match_test "memory_address_addr_space_p (GET_MODE (op),
+ XEXP (op, 0),
+ MEM_ADDR_SPACE (op))")
+   (not (match_test "offsettable_nonstrict_memref_p (op)")))
+   (not (and (match_test "TARGET_APX_EGPR")
+(match_test "x86_extended_rex2reg_mentioned_p (op)")
+
+(define_address_constraint "jp"
+  "@internal general address operand without GPR32"
+  (and (match_test "address_operand (op, VOIDmode)")
+   (not (and (match_test "TARGET_APX_EGPR")
+(match_test "x86_extended_rex2reg_mentioned_p (op)")
+
+(define_special_memory_constraint "ja"
+  "@internal vector memory operand without GPR32."
+  (and (match_operand 0 "vector_memory_operand")
+   (not (and (match_test "TARGET_APX_EGPR")
+(match_test "x86_extended_rex2reg_mentioned_p (op)")
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 215f6b8db55..66b8764e82b 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1295,6 +1295,8 @@ enum reg_class
   %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15
   %r16 %r17 %r18 %r19 %r20 %r21 %r22 %r23
   %r24 %r25 %r26 %r27 %r28 %r29 %r30 %r31 */
+  GENERAL_GPR16,   /* %eax %ebx %ecx %edx %esi %edi %ebp %esp
+  %r8 %r9 %r10 %r11 %r12 %r13 %r14 %r15 */
   FP_TOP_REG, FP_SECOND_REG,   /* %st(0) %st(1) */
   FLOAT_REGS,
   SSE_FIRST_

[PATCH 02/13] [APX EGPR] middle-end: Add index_reg_class with insn argument.

2023-09-22 Thread Hongyu Wang
Like base_reg_class, INDEX_REG_CLASS also does not support backend insn.
Add index_reg_class with insn argument for lra/reload usage.

gcc/ChangeLog:

* addresses.h (index_reg_class): New wrapper function like
base_reg_class.
* doc/tm.texi: Document INSN_INDEX_REG_CLASS.
* doc/tm.texi.in: Ditto.
* lra-constraints.cc (index_part_to_reg): Pass index_class.
(process_address_1): Calls index_reg_class with curr_insn and
replace INDEX_REG_CLASS with its return value index_cl.
* reload.cc (find_reloads_address): Likewise.
(find_reloads_address_1): Likewise.

Co-authored-by: Kong Lingling 
Co-authored-by: Hongtao Liu 
---
 gcc/addresses.h| 10 ++
 gcc/doc/tm.texi|  7 +++
 gcc/doc/tm.texi.in |  7 +++
 gcc/lra-constraints.cc | 17 +
 gcc/reload.cc  |  4 ++--
 5 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/gcc/addresses.h b/gcc/addresses.h
index 2c92927bd51..08bf39cd56c 100644
--- a/gcc/addresses.h
+++ b/gcc/addresses.h
@@ -51,6 +51,16 @@ base_reg_class (machine_mode mode ATTRIBUTE_UNUSED,
 #endif
 }
 
+inline enum reg_class
+index_reg_class (rtx_insn *insn ATTRIBUTE_UNUSED = NULL)
+{
+#ifdef INSN_INDEX_REG_CLASS
+  return INSN_INDEX_REG_CLASS (insn);
+#else
+  return INDEX_REG_CLASS;
+#endif
+}
+
 /* Wrapper function to unify target macros REGNO_MODE_CODE_OK_FOR_BASE_P,
REGNO_MODE_OK_FOR_REG_BASE_P, REGNO_MODE_OK_FOR_BASE_P and
REGNO_OK_FOR_BASE_P.
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index 5b1e2a11f89..c566f7a1105 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -2582,6 +2582,13 @@ address where its value is either multiplied by a scale 
factor or
 added to another register (as well as added to a displacement).
 @end defmac
 
+@defmac INSN_INDEX_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+index register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of index register
+compared with other insns.
+@end defmac
+
 @defmac REGNO_OK_FOR_BASE_P (@var{num})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as a base register in operand addresses.
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index f6e63ad8871..3182d0d7c75 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -2164,6 +2164,13 @@ address where its value is either multiplied by a scale 
factor or
 added to another register (as well as added to a displacement).
 @end defmac
 
+@defmac INSN_INDEX_REG_CLASS (@var{insn})
+A C expression whose value is the register class to which a valid
+index register for a specified @var{insn} must belong. This macro is
+used when some backend insns may have limited usage of index register
+compared with other insns.
+@end defmac
+
 @defmac REGNO_OK_FOR_BASE_P (@var{num})
 A C expression which is nonzero if register number @var{num} is
 suitable for use as a base register in operand addresses.
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index 6dc77af86cd..0c8e28e0194 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -3399,12 +3399,12 @@ base_plus_disp_to_reg (struct address_info *ad, rtx 
disp)
 /* Make reload of index part of address AD.  Return the new
pseudo.  */
 static rtx
-index_part_to_reg (struct address_info *ad)
+index_part_to_reg (struct address_info *ad, enum reg_class index_class)
 {
   rtx new_reg;
 
   new_reg = lra_create_new_reg (GET_MODE (*ad->index), NULL_RTX,
-   INDEX_REG_CLASS, NULL, "index term");
+   index_class, NULL, "index term");
   expand_mult (GET_MODE (*ad->index), *ad->index_term,
   GEN_INT (get_index_scale (ad)), new_reg, 1);
   return new_reg;
@@ -3659,13 +3659,14 @@ process_address_1 (int nop, bool check_only_p,
   /* If INDEX_REG_CLASS is assigned to base_term already and isn't to
  index_term, swap them so to avoid assigning INDEX_REG_CLASS to both
  when INDEX_REG_CLASS is a single register class.  */
+  enum reg_class index_cl = index_reg_class (curr_insn);
   if (ad.base_term != NULL
   && ad.index_term != NULL
-  && ira_class_hard_regs_num[INDEX_REG_CLASS] == 1
+  && ira_class_hard_regs_num[index_cl] == 1
   && REG_P (*ad.base_term)
   && REG_P (*ad.index_term)
-  && in_class_p (*ad.base_term, INDEX_REG_CLASS, NULL)
-  && ! in_class_p (*ad.index_term, INDEX_REG_CLASS, NULL))
+  && in_class_p (*ad.base_term, index_cl, NULL)
+  && ! in_class_p (*ad.index_term, index_cl, NULL))
 {
   std::swap (ad.base, ad.index);
   std::swap (ad.base_term, ad.index_term);
@@ -3689,7 +3690,7 @@ process_address_1 (int nop, bool check_only_p,
 }
   if (ad.index_term != NULL
   && process_addr_reg (ad.index_term, check_only_p,
-  before, NULL, INDEX_REG_CLASS))
+  before, NUL

[PATCH v2 00/13] Support Intel APX EGPR

2023-09-22 Thread Hongyu Wang
Hi,

This is a v2 patch for APX support which follows-up previous discussion in
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628904.html

As discussed in previous thread, the inverse approach to extend base/index reg
support with new memory constraints requrires much more effort both in
middle-end and backend, so we keep the backend implementation logic. Also for
inline asm, it is hard to provide memory constraint to use EGPR by force due to
the limitation of base/index reg class hook, so we just add one register class
that allows EGPR usage by force.

The major changes are

  1. Add new macros INSN_BASE_REG_CLASS/REGNO_OK_FOR_INSN_BASE_P instead of
  using old MODE_CODE macros.
  2. Add a series of constraints that prohibits EGPR as counterparts of common
  constraints that may involve register/memory/address. All those are prefixed
  with "j" to avoid confilts with previous constraints.
  3. Support constraint mapping for all gpr related common constraints in
  inline asm. 

Bootstrapped/regtested x86_64-linux-gnu.
Ok for trunk?

Hongyu Wang (2):
  [APX EGPR] middle-end: Add index_reg_class with insn argument.
  [APX EGPR] Handle GPR16 only vector move insns

Kong Lingling (11):
  [APX EGPR] middle-end: Add insn argument to base_reg_class
  [APX_EGPR] Initial support for APX_F
  [APX EGPR] Add 16 new integer general purpose registers
  [APX EGPR] Add register and memory constraints that disallow EGPR
  [APX EGPR] Add backend hook for base_reg_class/index_reg_class.
  [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR
constraint.
  [APX EGPR] Handle legacy insn that only support GPR16 (1/5)
  [APX EGPR] Handle legacy insns that only support GPR16 (2/5)
  [APX EGPR] Handle legacy insns that only support GPR16 (3/5)
  [APX_EGPR] Handle legacy insns that only support GPR16 (4/5)
  [APX EGPR] Handle vex insns that only support GPR16 (5/5)

 gcc/addresses.h   |  29 +-
 gcc/common/config/i386/cpuinfo.h  |  12 +-
 gcc/common/config/i386/i386-common.cc |  17 +
 gcc/common/config/i386/i386-cpuinfo.h |   1 +
 gcc/common/config/i386/i386-isas.h|   1 +
 gcc/config/i386/constraints.md|  65 +-
 gcc/config/i386/cpuid.h   |   1 +
 gcc/config/i386/i386-isa.def  |   1 +
 gcc/config/i386/i386-options.cc   |  18 +
 gcc/config/i386/i386-opts.h   |   8 +
 gcc/config/i386/i386-protos.h |   5 +
 gcc/config/i386/i386.cc   | 303 ++-
 gcc/config/i386/i386.h|  69 +-
 gcc/config/i386/i386.md   | 131 ++-
 gcc/config/i386/i386.opt  |  30 +
 gcc/config/i386/mmx.md| 154 ++--
 gcc/config/i386/sse.md| 792 --
 gcc/doc/invoke.texi   |  11 +-
 gcc/doc/tm.texi   |  21 +
 gcc/doc/tm.texi.in|  21 +
 gcc/lra-constraints.cc|  32 +-
 gcc/reload.cc |  34 +-
 gcc/reload1.cc|   2 +-
 gcc/testsuite/gcc.target/i386/apx-1.c |   8 +
 .../gcc.target/i386/apx-egprs-names.c |  17 +
 .../gcc.target/i386/apx-inline-gpr-norex2.c   |  25 +
 .../gcc.target/i386/apx-interrupt-1.c | 102 +++
 .../i386/apx-legacy-insn-check-norex2-asm.c   |   5 +
 .../i386/apx-legacy-insn-check-norex2.c   | 181 
 .../gcc.target/i386/apx-spill_to_egprs-1.c|  25 +
 gcc/testsuite/lib/target-supports.exp |  10 +
 31 files changed, 1683 insertions(+), 448 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-egprs-names.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c
 create mode 100644 
gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c

-- 
2.31.1



[PATCH] light expander sra

2023-09-22 Thread Jiufu Guo
Hi,

There are a few PRs (meta-bug PR101926) on various targets.
The root causes of them are similar: the aggeragte param/
returns are passed by multi-registers, but they are
stored to stack from registers first; and then, access the 
parameter through stack slot.

This would lead to generating suboptimal instructions,
because rtl passes may be not able to optimize it.

While a general idea, it would be better to access the
aggregate parameters/returns directly through registers
and get faster code.  This idea would be a kind of SRA
(using the scalar registers to access the aggregate
parameter/returns).

Adopting an sra for parameters/returns in the expander
would be practical. 
For aggregate parameters that passed via multi-registers,
the INCOMING_RTL contains a list of the "incoming hard
registers" in a parallel rtx (or using the first register
and total size to indicate the list of registers). 
It is similar for returns, the outgoing registers are
recorded in DECL_RTL of DECL_RESULT for the function.
When accessing the aggregate parm/return, we could figure
out which parts of incoming/outgoing are accessed.

The initial implementation for this sra (light-expander-sra)
contains below parts:

a. Check if the aggregate parameters/returns are ok/profitable
  to scalarize the accesses, and set the scalar pseudos for
  the aggregate parameter/return.
  - This is done in "expand_function_start", after the
incoming/outgoing hard registers are determined for the
paramter(s)/return.
Now, the scalarized registers are recorded in DECL_RTL
for the parameter/return in parallel form.
  - When set the DECL_RTL, "scalarizable_aggregate" is called 
to check the accesses are ok/profitable to scalarize.
To support more cases, we can enhance this function.
For example:
- 'reverse storage order' can be supported.
- 'TImode/vector-mode from multi-regs' may be supported.
- some cases on 'writing to parameter'/'overlap accesses'
  maybe ok to support.

b. When expanding the accesses on the parameters/returns,
  according to the access info (e.g. bitpos,bitsize, mode),
  compute the scalars(pseudos), and use the pseudos to 
  expand the access.  For example:
  - Expand the component access of a parameter: "_1 = arg.f1"
or the access on a whole parameter: rhs of "_2 = arg"
  - Expand the assignment to a return val:
"D.xx = yy; or D.xx.f = zz" where D.xx occurs on return
stmt.
  - This is mainly done in expr.cc(expand_expr_real_1, 
expand_assignment).  To expand accesses, function
"extract_sub_member" is used to compute the scalar rtx.

Except above two parts, some work is done in the GIMPLE tree: 
collect sra candidates for parameters/returns, and collect
the SRA access info.
This is mainly done at the beginning of the expander pass by
the class (named expand_sra) and its member functions.
Below are two items of this part.
 - Collect light-expand-sra candidates.
  Each parameter is checked if it has the proper aggregate type.
  Collect return val (VAR_P) on each return stmts if the function
  is returning via registers.  
  This is implemented in expand_sra::collect_sra_candidates. 

 - Build/collect/manage all the access on the candidates.
  The function "scan_function" is used to do this work, it goes
  through all basicblocks and all interesting stmts (phi, return,
  assign, call, asm) are checked.
  If there is an interesting expression (e.g. COMPONENT_REF or
  PARM_DECL) on the aggregate parameter, record the required info
  for the access (e.g. pos, size, type, base).
  For some access, it may be not possible to use scalar registers.
  e.g.: the aggregate parameter is address-taken and
  accessed via memory. "foo(struct S arg) {bar (&arg);}"

Another thing about this patch: trying to common code for
light-expand-sra, tree-sra, and ipa-sra.
 - Now, the class "expand_sra" is based on "sra_default_analyzer".
  "sra_default_analyzer" provides stmt analyzing interfaces.
 - "scan_function" provides the capability to go through functions
  and check interesting sra stmts.
 - We can continue refactoring the code to share similar frameworks
  even if the behaviors are slightly different between those SRAs.

This patch is tested on ppc64{,le}.
Thanks a lot for your review in advance?


BR,
Jeff (Jiufu Guo)

PR target/65421

gcc/ChangeLog:

* cfgexpand.cc (struct access): New class.
(struct expand_sra): New class.
(expand_sra::collect_sra_candidates): New member function.
(expand_sra::add_sra_candidate): Likewise.
(expand_sra::build_access): Likewise.
(expand_sra::analyze_phi): Likewise.
(expand_sra::analyze_assign): Likewise.
(expand_sra::visit_base): Likewise.
(expand_sra::protect_mem_access_in_stmt): Likewise.
(expand_sra::expand_sra):  Class constructor.
(expand_sra::~expand_sra): Class destructor.
(expand_sra::scalarizable_access):  New member function.
(ex

RE: [PATCH v1] RISC-V: Move ceil test cases to unop folder

2023-09-22 Thread Li, Pan2
Committed, thanks Juzhe.

Pan

From: juzhe.zh...@rivai.ai 
Sent: Friday, September 22, 2023 5:14 PM
To: Li, Pan2 ; gcc-patches 
Cc: Li, Pan2 ; Wang, Yanzhang ; 
kito.cheng 
Subject: Re: [PATCH v1] RISC-V: Move ceil test cases to unop folder

ok



juzhe.zh...@rivai.ai

From: pan2.li
Date: 2023-09-22 17:11
To: gcc-patches
CC: juzhe.zhong; 
pan2.li; 
yanzhang.wang; 
kito.cheng
Subject: [PATCH v1] RISC-V: Move ceil test cases to unop folder
From: Pan Li mailto:pan2...@intel.com>>

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/math-ceil-0.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c: ...here.
* gcc.target/riscv/rvv/autovec/test-math.h: Moved to...
* gcc.target/riscv/rvv/autovec/unop/test-math.h: ...here.

Signed-off-by: Pan Li mailto:pan2...@intel.com>>
---
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-0.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-1.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-2.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-3.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-0.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-1.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-2.c | 0
gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/test-math.h | 0
8 files changed, 0 insertions(+), 0 deletions(-)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-0.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-1.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-2.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-3.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-0.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-1.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-2.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/test-math.h (100%)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil

Re: [PATCH] c++: improve class NTTP object pretty printing [PR111471]

2023-09-22 Thread Jason Merrill

On 9/21/23 11:53, Jason Merrill wrote:

On 9/20/23 10:13, Patrick Palka wrote:

On Tue, 19 Sep 2023, Patrick Palka wrote:


On Tue, 19 Sep 2023, Jason Merrill wrote:


On 9/19/23 12:40, Patrick Palka wrote:

Tested on x86_64-pc-linux-gnu, does this look OK for trunk/13?


OK for trunk.  What's your argument for backporting?


Thanks.  I don't feel strongly about it, but I was thinking that since
we typically backport C++20-only correctness fixes to the most recent
release branch, C++20-only diagnostic improvements might be suitable
too?




-- >8 --

1. Move class NTTP object pretty printing to a more general spot in
 the pretty printer.


FWIW this first change isn't just a refactoring, it means we now pretty
print an NTTP object that appears elsewhere besides in a template
argument list, e.g. in a parameter mapping:

Before:

diagnostic19.C:8:15: note: the expression ‘((const A)V).value [with V 
= _ZTAXtl1AEE]’ evaluated to ‘false’


After:

diagnostic19.C:8:15: note: the expression ‘(V).value [with V = 
A{false}]’ evaluated to ‘false’


Ah, that is a pretty big improvement.  The patch is OK.


...for 13 as well.

Jason



Re: [PATCH v1] RISC-V: Move ceil test cases to unop folder

2023-09-22 Thread juzhe.zh...@rivai.ai
ok




juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-09-22 17:11
To: gcc-patches
CC: juzhe.zhong; pan2.li; yanzhang.wang; kito.cheng
Subject: [PATCH v1] RISC-V: Move ceil test cases to unop folder
From: Pan Li 
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/math-ceil-0.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c: ...here.
* gcc.target/riscv/rvv/autovec/test-math.h: Moved to...
* gcc.target/riscv/rvv/autovec/unop/test-math.h: ...here.
 
Signed-off-by: Pan Li 
---
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-0.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-1.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-2.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-3.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-0.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-1.c | 0
.../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-2.c | 0
gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/test-math.h | 0
8 files changed, 0 insertions(+), 0 deletions(-)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-0.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-1.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-2.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-3.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-0.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-1.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-2.c 
(100%)
rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/test-math.h (100%)
 
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
-- 
2.34.1
 
 


[PATCH v1] RISC-V: Move ceil test cases to unop folder

2023-09-22 Thread pan2 . li
From: Pan Li 

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/math-ceil-0.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-3.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-0.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-1.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c: ...here.
* gcc.target/riscv/rvv/autovec/math-ceil-run-2.c: Moved to...
* gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c: ...here.
* gcc.target/riscv/rvv/autovec/test-math.h: Moved to...
* gcc.target/riscv/rvv/autovec/unop/test-math.h: ...here.

Signed-off-by: Pan Li 
---
 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-0.c | 0
 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-1.c | 0
 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-2.c | 0
 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-3.c | 0
 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-0.c | 0
 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-1.c | 0
 .../gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-2.c | 0
 gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/test-math.h | 0
 8 files changed, 0 insertions(+), 0 deletions(-)
 rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-0.c 
(100%)
 rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-1.c 
(100%)
 rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-2.c 
(100%)
 rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-3.c 
(100%)
 rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-0.c 
(100%)
 rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-1.c 
(100%)
 rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/math-ceil-run-2.c 
(100%)
 rename gcc/testsuite/gcc.target/riscv/rvv/autovec/{ => unop}/test-math.h (100%)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-0.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-1.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-2.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-3.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-0.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-0.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-1.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-1.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/math-ceil-run-2.c
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-run-2.c
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
similarity index 100%
rename from gcc/testsuite/gcc.target/riscv/rvv/autovec/test-math.h
rename to gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/test-math.h
-- 
2.34.1



[PING] [PATCH v2] aarch64: Fine-grained ldp and stp policies with test-cases.

2023-09-22 Thread Manos Anagnostakis
Kind ping for reviewing this patch. It's tested and does not cause
regressions:
https://patchwork.sourceware.org/project/gcc/patch/20230828143744.7574-1-manos.anagnosta...@vrull.eu/

Thank you in advance!

On Mon, Aug 28, 2023 at 5:37 PM Manos Anagnostakis <
manos.anagnosta...@vrull.eu> wrote:

> This patch implements the following TODO in gcc/config/aarch64/aarch64.cc
> to provide the requested behaviour for handling ldp and stp:
>
>   /* Allow the tuning structure to disable LDP instruction formation
>  from combining instructions (e.g., in peephole2).
>  TODO: Implement fine-grained tuning control for LDP and STP:
>1. control policies for load and store separately;
>2. support the following policies:
>   - default (use what is in the tuning structure)
>   - always
>   - never
>   - aligned (only if the compiler can prove that the
> load will be aligned to 2 * element_size)  */
>
> It provides two new and concrete command-line options -mldp-policy and
> -mstp-policy
> to give the ability to control load and store policies seperately as
> stated in part 1 of the TODO.
>
> The accepted values for both options are:
> - default: Use the ldp/stp policy defined in the corresponding tuning
>   structure.
> - always: Emit ldp/stp regardless of alignment.
> - never: Do not emit ldp/stp.
> - aligned: In order to emit ldp/stp, first check if the load/store will
>   be aligned to 2 * element_size.
>
> gcc/ChangeLog:
> * config/aarch64/aarch64-protos.h (struct tune_params): Add
> appropriate enums for the policies.
> * config/aarch64/aarch64-tuning-flags.def
> (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning
> options.
> * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New
> function to parse ldp-policy option.
> (aarch64_parse_stp_policy): New function to parse stp-policy
> option.
> (aarch64_override_options_internal): Call parsing functions.
> (aarch64_operands_ok_for_ldpstp): Add option-value check and
> alignment check and remove superseded ones
> (aarch64_operands_adjust_ok_for_ldpstp): Add option-value check and
> alignment check and remove superseded ones.
> * config/aarch64/aarch64.opt: Add options.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/aarch64/ldp_aligned.c: New test.
> * gcc.target/aarch64/ldp_always.c: New test.
> * gcc.target/aarch64/ldp_never.c: New test.
> * gcc.target/aarch64/stp_aligned.c: New test.
> * gcc.target/aarch64/stp_always.c: New test.
> * gcc.target/aarch64/stp_never.c: New test.
>
> Signed-off-by: Manos Anagnostakis 
> ---
> Changes in v2:
> - Fixed commited ldp tests to correctly trigger
>   and test aarch64_operands_adjust_ok_for_ldpstp in aarch64.cc.
> - Added "-mcpu=generic" to commited tests to guarantee generic
> target code
>   generation and not cause the regressions of v1.
>
>  gcc/config/aarch64/aarch64-protos.h   |  24 ++
>  gcc/config/aarch64/aarch64-tuning-flags.def   |   8 -
>  gcc/config/aarch64/aarch64.cc | 229 ++
>  gcc/config/aarch64/aarch64.opt|   8 +
>  .../gcc.target/aarch64/ldp_aligned.c  |  66 +
>  gcc/testsuite/gcc.target/aarch64/ldp_always.c |  66 +
>  gcc/testsuite/gcc.target/aarch64/ldp_never.c  |  66 +
>  .../gcc.target/aarch64/stp_aligned.c  |  60 +
>  gcc/testsuite/gcc.target/aarch64/stp_always.c |  60 +
>  gcc/testsuite/gcc.target/aarch64/stp_never.c  |  60 +
>  10 files changed, 586 insertions(+), 61 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h
> b/gcc/config/aarch64/aarch64-protos.h
> index 70303d6fd95..be1d73490ed 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -568,6 +568,30 @@ struct tune_params
>/* Place prefetch struct pointer at the end to enable type checking
>   errors when tune_params misses elements (e.g., from erroneous
> merges).  */
>const struct cpu_prefetch_tune *prefetch;
> +/* An enum specifying how to handle load pairs using a fine-grained
> policy:
> +   - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned
> +   to at least double the alignment of the type.
> +   - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment.
> +   - LDP_POLICY_NEVER: Do not emit ldp.  */
> +
> +  enum aarch64_ldp_policy_model
> +  {
> +LDP_POLICY_ALIGNED,
> +LDP_P

[Committed] RISC-V: Remove @ of vec_duplicate pattern

2023-09-22 Thread Juzhe-Zhong
It's obvious the @ of vec_duplicate pattern is duplicate.

Regression passed.

Committed.
gcc/ChangeLog:

* config/riscv/riscv-v.cc (gen_const_vector_dup): Use global expand 
function.
* config/riscv/vector.md (@vec_duplicate): Remove @.
(vec_duplicate): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 4 +---
 gcc/config/riscv/vector.md  | 2 +-
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index a3672bad521..4d0e1d8d1a9 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -696,9 +696,7 @@ gen_const_vector_dup (machine_mode mode, poly_int64 val)
 {
   /* When VAL is const_poly_int value, we need to explicitly broadcast
 it into a vector using RVV broadcast instruction.  */
-  rtx dup = gen_reg_rtx (mode);
-  emit_insn (gen_vec_duplicate (mode, dup, c));
-  return dup;
+  return expand_vector_broadcast (mode, c);
 }
return gen_const_vec_duplicate (mode, c);
 }
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 73f90dea36b..d5300a33946 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -1371,7 +1371,7 @@
 ;; This pattern only handles duplicates of non-constant inputs.
 ;; Constant vectors go through the movm pattern instead.
 ;; So "direct_broadcast_operand" can only be mem or reg, no CONSTANT.
-(define_insn_and_split "@vec_duplicate"
+(define_insn_and_split "vec_duplicate"
   [(set (match_operand:V_VLS 0 "register_operand")
 (vec_duplicate:V_VLS
   (match_operand: 1 "direct_broadcast_operand")))]
-- 
2.36.3



[PATCH 3/3] aarch64: Convert aarch64 multi choice patterns to new syntax

2023-09-22 Thread Andrea Corallo
[Resending this with the patch compressed as it's more than 400 KB...]

Hi all,
this patch converts a number of multi multi choice patterns within the
aarch64 backend to the new syntax.

The list of the converted patterns is in the Changelog.

For completeness here follows the list of multi choice patterns that
were rejected for conversion by my parser, they typically have some C
as asm output and require some manual intervention:
aarch64_simd_vec_set, aarch64_get_lane,
aarch64_cmdi, aarch64_cmdi, aarch64_cmtstdi,
*aarch64_movv8di, *aarch64_be_mov, *aarch64_be_movci,
*aarch64_be_mov, *aarch64_be_movxi, *aarch64_sve_mov_le,
*aarch64_sve_mov_be, @aarch64_pred_mov,
@aarch64_sve_gather_prefetch,
@aarch64_sve_gather_prefetch,
*aarch64_sve_gather_prefetch_sxtw,
*aarch64_sve_gather_prefetch_uxtw,
@aarch64_vec_duplicate_vq_le, *vec_extract_0,
*vec_extract_v128, *cmp_and,
*fcm_and_combine, @aarch64_sve_ext,
@aarch64_sve2_aba, *sibcall_insn, *sibcall_value_insn,
*xor_one_cmpl3, *insv_reg_,
*aarch64_bfi_,
*aarch64_bfidi_subreg_, *aarch64_bfxil,
*aarch64_bfxilsi_uxtw,
*aarch64_cvtf2_mult,
atomic_store.

Bootstraped and reg tested on aarch64-unknown-linux-gnu, also I
analysed tmp-mddump.md (from 'make mddump') and could not find
effective differences, okay for trunk?

Bests

  Andrea

gcc/ChangeLog:

* config/aarch64/aarch64.md (@ccmp)
(@ccmp_rev, *call_insn, *call_value_insn)
(*mov_aarch64, load_pair_sw_)
(load_pair_dw_)
(store_pair_sw_)
(store_pair_dw_, *extendsidi2_aarch64)
(*zero_extendsidi2_aarch64, *load_pair_zero_extendsidi2_aarch64)
(*extend2_aarch64)
(*zero_extend2_aarch64)
(*extendqihi2_aarch64, *zero_extendqihi2_aarch64)
(*add3_aarch64, *addsi3_aarch64_uxtw, *add3_poly_1)
(add3_compare0, *addsi3_compare0_uxtw)
(*add3_compareC_cconly, add3_compareC)
(*add3_compareV_cconly_imm, add3_compareV_imm)
(*add3nr_compare0, subdi3, subv_imm)
(*cmpv_insn, sub3_compare1_imm, neg2)
(cmp, fcmp, fcmpe, *cmov_insn)
(*cmovsi_insn_uxtw, 3, *si3_uxtw)
(*and3_compare0, *andsi3_compare0_uxtw, one_cmpl2)
(*_one_cmpl3, *and3nr_compare0)
(*aarch64_ashl_sisd_or_int_3)
(*aarch64_lshr_sisd_or_int_3)
(*aarch64_ashr_sisd_or_int_3, *ror3_insn)
(*si3_insn_uxtw, _trunc2)
(2)
(3)
(3)
(*aarch64_3_cssc, copysign3_insn): Update
to new syntax.

* config/aarch64/aarch64-sve2.md (@aarch64_scatter_stnt)
(@aarch64_scatter_stnt_)
(*aarch64_mul_unpredicated_)
(@aarch64_pred_, *cond__2)
(*cond__3, *cond__any)
(*cond__z, @aarch64_pred_)
(*cond__2, *cond__3)
(*cond__any, @aarch64_sve_)
(@aarch64_sve__lane_)
(@aarch64_sve_add_mul_lane_)
(@aarch64_sve_sub_mul_lane_, @aarch64_sve2_xar)
(*aarch64_sve2_bcax, @aarch64_sve2_eor3)
(*aarch64_sve2_nor, *aarch64_sve2_nand)
(*aarch64_sve2_bsl, *aarch64_sve2_nbsl)
(*aarch64_sve2_bsl1n, *aarch64_sve2_bsl2n)
(*aarch64_sve2_sra, @aarch64_sve_add_)
(*aarch64_sve2_aba, @aarch64_sve_add_)
(@aarch64_sve_add__lane_)
(@aarch64_sve_qadd_)
(@aarch64_sve_qadd__lane_)
(@aarch64_sve_sub_)
(@aarch64_sve_sub__lane_)
(@aarch64_sve_qsub_)
(@aarch64_sve_qsub__lane_)
(@aarch64_sve_, @aarch64__lane_)
(@aarch64_pred_)
(@aarch64_pred_, *cond__2)
(*cond__z, @aarch64_sve_)
(@aarch64__lane_, @aarch64_sve_)
(@aarch64__lane_, @aarch64_pred_)
(*cond__any_relaxed)
(*cond__any_strict)
(@aarch64_pred_, *cond_)
(@aarch64_pred_, *cond_)
(*cond__strict): Update to new syntax.

* config/aarch64/aarch64-sve.md (*aarch64_sve_mov_ldr_str)
(*aarch64_sve_mov_no_ldr_str, @aarch64_pred_mov)
(*aarch64_sve_mov, aarch64_wrffr)
(mask_scatter_store)
(*mask_scatter_store_xtw_unpacked)
(*mask_scatter_store_sxtw)
(*mask_scatter_store_uxtw)
(@aarch64_scatter_store_trunc)
(@aarch64_scatter_store_trunc)
(*aarch64_scatter_store_trunc_sxtw)
(*aarch64_scatter_store_trunc_uxtw)
(*vec_duplicate_reg, vec_shl_insert_)
(vec_series, @extract__)
(@aarch64_pred_, *cond__2)
(*cond__any, @aarch64_pred_)
(@aarch64_sve_revbhw_)
(@cond_)
(*2)
(@aarch64_pred_sxt)
(@aarch64_cond_sxt)
(*cond_uxt_2, *cond_uxt_any, *cnot)
(*cond_cnot_2, *cond_cnot_any)
(@aarch64_pred_, *cond__2_relaxed)
(*cond__2_strict, *cond__any_relaxed)
(*cond__any_strict, @aarch64_pred_)
(*cond__2, *cond__3)
(*cond__any, add3, sub3)
(@aarch64_pred_abd, *aarch64_cond_abd_2)
(*aarch64_cond_abd_3, *aarch64_cond_abd_any)
(@aarch64_sve_, @aarch64_pred_)
(*cond__2, *cond__z)

[PATCH 1/3] recog: Improve parser for pattern new compact syntax

2023-09-22 Thread Andrea Corallo
From: Richard Sandiford 

Hi all,

this is to add support to the new compact pattern syntax for the case
where the constraints do appear unsorted like:

(define_insn "*si3_insn_uxtw"
  [(set (match_operand:DI 0 "register_operand")
(zero_extend:DI (SHIFT_no_rotate:SI
 (match_operand:SI 1 "register_operand")
 (match_operand:QI 2 "aarch64_reg_or_shift_imm_si"]
  ""
  {@ [cons: =0, 2,   1]
 [  r,  Uss, r] \\t%w0, %w1, %2
 [  r,  r,   r] \\t%w0, %w1, %w2
  }
  [(set_attr "type" "bfx,shift_reg")]
)

Best Regards

  Andrea

gcc/Changelog

2023-09-20  Richard Sandiford  

* gensupport.cc (convert_syntax): Updated to support unordered
constraints in compact syntax.
---
 gcc/gensupport.cc | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index f7164b3214d..7e125e3d8db 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -896,19 +896,6 @@ convert_syntax (rtx x, file_location loc)
 
   parse_section_layout (loc, &templ, "cons:", tconvec, true);
 
-  /* Check for any duplicate cons entries and sort based on i.  */
-  for (auto e : tconvec)
-{
-  unsigned idx = e.idx;
-  if (idx >= convec.size ())
-   convec.resize (idx + 1);
-
-  if (convec[idx].idx >= 0)
-   fatal_at (loc, "duplicate cons number found: %d", idx);
-  convec[idx] = e;
-}
-  tconvec.clear ();
-
   if (*templ != ']')
 {
   if (*templ == ';')
@@ -951,13 +938,13 @@ convert_syntax (rtx x, file_location loc)
  new_templ += '\n';
  new_templ.append (buffer);
  /* Parse the constraint list, then the attribute list.  */
- if (convec.size () > 0)
-   parse_section (&templ, convec.size (), alt_no, convec, loc,
+ if (tconvec.size () > 0)
+   parse_section (&templ, tconvec.size (), alt_no, tconvec, loc,
   "constraint");
 
  if (attrvec.size () > 0)
{
- if (convec.size () > 0 && !expect_char (&templ, ';'))
+ if (tconvec.size () > 0 && !expect_char (&templ, ';'))
fatal_at (loc, "expected `;' to separate constraints "
   "and attributes in alternative %d", alt_no);
 
@@ -1027,6 +1014,19 @@ convert_syntax (rtx x, file_location loc)
   ++alt_no;
 }
 
+  /* Check for any duplicate cons entries and sort based on i.  */
+  for (auto e : tconvec)
+{
+  unsigned idx = e.idx;
+  if (idx >= convec.size ())
+   convec.resize (idx + 1);
+
+  if (convec[idx].idx >= 0)
+   fatal_at (loc, "duplicate cons number found: %d", idx);
+  convec[idx] = e;
+}
+  tconvec.clear ();
+
   /* Write the constraints and attributes into their proper places.  */
   if (convec.size () > 0)
 add_constraints (x, loc, convec);
-- 
2.25.1



[PATCH 2/3] recog: Support space in "[ cons"

2023-09-22 Thread Andrea Corallo
Hi all,

this is to allow for spaces before "cons:" in the definitions of
patterns using the new compact syntax, ex:

(define_insn "aarch64_simd_dup"
  [(set (match_operand:VDQ_I 0 "register_operand")
(vec_duplicate:VDQ_I
  (match_operand: 1 "register_operand")))]
  "TARGET_SIMD"
  {@ [ cons: =0 , 1  ; attrs: type  ]
 [ w, w  ; neon_dup  ] dup\t%0., %1.[0]
 [ w, ?r ; neon_from_gp  ] dup\t%0., %1
  }
)

gcc/Changelog

2023-09-20  Andrea Corallo  

* gensupport.cc (convert_syntax): Skip spaces before "cons:"
in new compact pattern syntax.
---
 gcc/gensupport.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/gensupport.cc b/gcc/gensupport.cc
index 7e125e3d8db..dd920d673b4 100644
--- a/gcc/gensupport.cc
+++ b/gcc/gensupport.cc
@@ -894,6 +894,8 @@ convert_syntax (rtx x, file_location loc)
   if (!expect_char (&templ, '['))
 fatal_at (loc, "expecing `[' to begin section list");
 
+  skip_spaces (&templ);
+
   parse_section_layout (loc, &templ, "cons:", tconvec, true);
 
   if (*templ != ']')
-- 
2.25.1



[PATCH] RISC-V: Add VLS conditional patterns support

2023-09-22 Thread Juzhe-Zhong
Regression passed.

Committed.

gcc/ChangeLog:

* config/riscv/autovec.md: Add VLS conditional patterns.
* config/riscv/riscv-protos.h (expand_cond_unop): Ditto.
(expand_cond_binop): Ditto.
(expand_cond_ternop): Ditto.
* config/riscv/riscv-v.cc (expand_cond_unop): Ditto.
(expand_cond_binop): Ditto.
(expand_cond_ternop): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/def.h: Add VLS conditional tests.
* gcc.target/riscv/rvv/autovec/vls/cond_add-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_add-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_and-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_div-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_div-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fma-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fma-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fms-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fnma-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fnma-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_fnms-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_ior-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_max-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_max-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_min-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_min-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_mod-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_mul-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_mul-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_neg-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_neg-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_not-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_shift-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_shift-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_sub-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_sub-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls/cond_xor-1.c: New test.

---
 gcc/config/riscv/autovec.md   | 200 +++---
 gcc/config/riscv/riscv-protos.h   |   3 +
 gcc/config/riscv/riscv-v.cc   |  45 
 .../riscv/rvv/autovec/vls/cond_add-1.c| 104 +
 .../riscv/rvv/autovec/vls/cond_add-2.c|  50 +
 .../riscv/rvv/autovec/vls/cond_and-1.c| 104 +
 .../riscv/rvv/autovec/vls/cond_div-1.c|  58 +
 .../riscv/rvv/autovec/vls/cond_div-2.c|  50 +
 .../riscv/rvv/autovec/vls/cond_fma-1.c|  62 ++
 .../riscv/rvv/autovec/vls/cond_fma-2.c|  50 +
 .../riscv/rvv/autovec/vls/cond_fms-1.c|  50 +
 .../riscv/rvv/autovec/vls/cond_fnma-1.c   |  62 ++
 .../riscv/rvv/autovec/vls/cond_fnma-2.c   |  50 +
 .../riscv/rvv/autovec/vls/cond_fnms-1.c   |  50 +
 .../riscv/rvv/autovec/vls/cond_ior-1.c| 104 +
 .../riscv/rvv/autovec/vls/cond_max-1.c| 104 +
 .../riscv/rvv/autovec/vls/cond_max-2.c|  50 +
 .../riscv/rvv/autovec/vls/cond_min-1.c| 104 +
 .../riscv/rvv/autovec/vls/cond_min-2.c|  50 +
 .../riscv/rvv/autovec/vls/cond_mod-1.c|  58 +
 .../riscv/rvv/autovec/vls/cond_mul-1.c| 104 +
 .../riscv/rvv/autovec/vls/cond_mul-2.c|  50 +
 .../riscv/rvv/autovec/vls/cond_neg-1.c|  62 ++
 .../riscv/rvv/autovec/vls/cond_neg-2.c|  50 +
 .../riscv/rvv/autovec/vls/cond_not-1.c|  62 ++
 .../riscv/rvv/autovec/vls/cond_shift-1.c  |  57 +
 .../riscv/rvv/autovec/vls/cond_shift-2.c  |  56 +
 .../riscv/rvv/autovec/vls/cond_sub-1.c| 104 +
 .../riscv/rvv/autovec/vls/cond_sub-2.c|  50 +
 .../riscv/rvv/autovec/vls/cond_xor-1.c| 104 +
 .../gcc.target/riscv/rvv/autovec/vls/def.h|  70 ++
 31 files changed, 2059 insertions(+), 118 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_add-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_add-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_and-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_div-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_div-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_fma-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_fma-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_fms-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/cond_fnma-1.c
 create mode 100644 gcc/testsuite/gcc.

[PATCH 2/2] RISC-V: Fix ICE by expansion and register coercion

2023-09-22 Thread Tsukasa OI
From: Tsukasa OI 

A "prefetch" instruction on RISC-V GCC emits a machine hint instruction
directly when the 'Zicbop' extension is enabled but it could cause an ICE
when the address argument of __builtin_prefetch is a integral constant
(such like 0 [NULL] or some other [but possibly not all] fixed addresses).

It fixes the problem by changing "prefetch" from a native instruction to
an expansion and coercing the address to a register there.

gcc/ChangeLog:

* config/riscv/riscv.md (prefetch): Expand to a native prefetch
instruction instead of emitting a machine instruction directly.
Coerce the address argument into a register.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbop-by-common-ice-1.c: New ICE test.
* gcc.target/riscv/cmo-zicbop-by-common-ice-2.c: Ditto.
---
 gcc/config/riscv/riscv.md | 43 ---
 .../riscv/cmo-zicbop-by-common-ice-1.c| 13 ++
 .../riscv/cmo-zicbop-by-common-ice-2.c|  7 +++
 3 files changed, 48 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-2.c

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index eaa8b6a9f085..12e78b60980e 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -3451,21 +3451,6 @@
   [(set_attr "type" "cbo")]
 )
 
-(define_insn "prefetch"
-  [(prefetch (match_operand 0 "address_operand" "r")
- (match_operand 1 "imm5_operand" "i")
- (match_operand 2 "const_int_operand" "n"))]
-  "TARGET_ZICBOP"
-{
-  switch (INTVAL (operands[1]))
-  {
-case 0: return "prefetch.r\t%a0";
-case 1: return "prefetch.w\t%a0";
-default: gcc_unreachable ();
-  }
-}
-  [(set_attr "type" "cbo")])
-
 (define_insn "riscv_prefetch_i_"
   [(unspec_volatile:X [(match_operand:X 0 "register_operand" "r")
   (match_operand:X 1 "const_int_operand" "n")]
@@ -3490,6 +3475,34 @@
   "prefetch.w\t%1(%0)"
   [(set_attr "type" "cbo")])
 
+(define_expand "prefetch"
+  [(prefetch (match_operand 0 "address_operand" "")
+(match_operand 1 "const_int_operand" "")
+(match_operand 2 "const_int_operand" ""))]
+  "TARGET_ZICBOP"
+{
+  operands[0] = force_reg (Pmode, operands[0]);
+  switch (INTVAL (operands[1]))
+{
+case 0:
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_prefetch_r_di (operands[0], const0_rtx));
+  else
+   emit_insn (gen_riscv_prefetch_r_si (operands[0], const0_rtx));
+  break;
+case 1:
+  if (TARGET_64BIT)
+   emit_insn (gen_riscv_prefetch_w_di (operands[0], const0_rtx));
+  else
+   emit_insn (gen_riscv_prefetch_w_si (operands[0], const0_rtx));
+  break;
+default:
+  gcc_unreachable ();
+}
+  DONE;
+}
+  [(set_attr "type" "cbo")])
+
 (define_expand "extv"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(sign_extract:GPR (match_operand:GPR 1 "register_operand" "r")
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-1.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-1.c
new file mode 100644
index ..47e83f29cc5c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-1.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32i_zicbop -mabi=ilp32" } */
+
+void foo (void)
+{
+  /* Second argument defaults to zero (read).  */
+  __builtin_prefetch (0);
+  __builtin_prefetch (0, 0);
+  __builtin_prefetch (0, 1);
+}
+
+/* { dg-final { scan-assembler-times "prefetch\\.r" 2 } } */
+/* { dg-final { scan-assembler-times "prefetch\\.w" 1 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-2.c 
b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-2.c
new file mode 100644
index ..a245b8163c1f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-2.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64i_zicbop -mabi=lp64" } */
+
+#include "cmo-zicbop-by-common-ice-1.c"
+
+/* { dg-final { scan-assembler-times "prefetch\\.r" 2 } } */
+/* { dg-final { scan-assembler-times "prefetch\\.w" 1 } } */
-- 
2.42.0



[PATCH 1/2] RISC-V: Define not broken prefetch builtins

2023-09-22 Thread Tsukasa OI
From: Tsukasa OI 

__builtin_riscv_zicbop_cbo_prefetchi (corresponding "prefetch.i"
instruction from the 'Zicbop' extension) is completely broken and new
builtin is required for replacement.

However, it required more than defining new builtin and/or instruction.

1.  Support for variable argument function prototype for RISC-V builtins
(corresponding "..." on C-based languages)
2.  Support for (non-vector) RISC-V builtins with custom expansion
(on RVV intrinsics, custom expansion is already implemented)

Along with other minor changes, not broken "prefetch.i" intrinsic is
defined as follows:

void __builtin_riscv_prefetch_i (void *addr, ...);

Optional second argument (defaults to zero and must be a compile-time
constant integer) is the offset from given address (-2048 <= x < 2048 and
must be a multiple of 32, due to "prefetch.i" constraints).  Third or later
arguments are ignored (like other builtin functions).

This commit also defines builtin functions for "prefetch.r" and "prefetch.w"
instructions for consistency:

void __builtin_riscv_prefetch_r (void *addr, ...);
void __builtin_riscv_prefetch_w (void *addr, ...);

Those instructions can be emitted using __builtin_prefetch but has no
control of the offset field.

gcc/ChangeLog:

* config/riscv/riscv-builtins.cc: Rename availabilities
"prefetchi{32,64}" to "prefetch{32,64}".
(RISCV_FTYPE_NAME_VAR1): Similar to RISCV_FTYPE_NAME1 but for
variable argument function prototype.
(DEF_RISCV_FTYPE_VAR): Similar to DEF_RISCV_FTYPE but calls
RISCV_FTYPE_NAME_VAR* instead.
(enum riscv_builtin_type): Add RISCV_BUILTIN_CUSTOM for builtin
with custom expansion.
(struct riscv_builtin_description): Add custom expansion function.
(RISCV_BUILTIN): Modified to set "expand_function".
(RISCV_CUSTOM_BUILTIN): New.  Similar to RISCV_BUILTIN but only for
builtins with custom expansion function.
(riscv_expand_builtin): Handle RISCV_BUILTIN_CUSTOM builtin.
(expand_builtin_prefetch_riscv): New custom expansion function for
"prefetch.[irw]" instructions from the 'Zicbop' extension.
* config/riscv/riscv-cmo.def
(__builtin_riscv_zicbop_cbo_prefetchi): Remove since it's broken.
(__builtin_riscv_prefetch_i): New.
(__builtin_riscv_prefetch_r): New.
(__builtin_riscv_prefetch_w): New.
* config/riscv/riscv-ftypes.def: Add variable argument prototype
for "void func(void*, ...)".
* config/riscv/riscv.md (unspecv): Remove UNSPECV_PREI and add
UNSPECV_PREFETCH_[IRW].
(riscv_prefetchi_): Remove.
(riscv_prefetch_i_): New.
(riscv_prefetch_r_): New.
(riscv_prefetch_w_): New.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/cmo-zicbop-1.c: Refine to test new builtins.
* gcc.target/riscv/cmo-zicbop-2.c: Ditto.
* gcc.target/riscv/cmo-zicbop-3.c: New NULL prefetching test.
* gcc.target/riscv/cmo-zicbop-4.c: New failure test.
* gcc.target/riscv/cmo-zicbop-5.c: Ditto.
* gcc.target/riscv/cmo-zicbop-6.c: Ditto.
* gcc.target/riscv/cmo-zicbop-by-common-1.c: New test for
__builtin_prefetch and the 'Zicbop' extension.
* gcc.target/riscv/cmo-zicbop-by-common-2.c: Ditto.
* gcc.target/riscv/cmo-zicbop-by-common-3.c: Ditto.
---
 gcc/config/riscv/riscv-builtins.cc| 112 +-
 gcc/config/riscv/riscv-cmo.def|   8 +-
 gcc/config/riscv/riscv-ftypes.def |   1 +
 gcc/config/riscv/riscv.md |  30 -
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c |  41 ---
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c |  33 ++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-3.c |  29 +
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-4.c |  14 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-5.c |  14 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-6.c |  38 ++
 .../gcc.target/riscv/cmo-zicbop-by-common-1.c |  17 +++
 .../gcc.target/riscv/cmo-zicbop-by-common-2.c |   7 ++
 .../gcc.target/riscv/cmo-zicbop-by-common-3.c |  13 ++
 13 files changed, 305 insertions(+), 52 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-3.c

diff --git a/gcc/config/riscv/riscv-builtins.cc 
b/gcc/config/riscv/riscv-builtins.cc
index 3fe3a89dcc25..4f422e0891c2 100644
--- a/gcc/config/riscv/riscv-builtins.cc
+++ b/gcc/config/riscv/riscv-builtins.cc
@@ -47,12 +47,16 @@ along with GCC; see the file COPYING3.  If not see
 #define

[PATCH 0/2] RISC-V: Define not broken prefetch builtins

2023-09-22 Thread Tsukasa OI
Hello,

As I explained earlier:
,
the builtin function for RISC-V "__builtin_riscv_zicbop_cbo_prefetchi" is
completely broken.  Instead, this patch set (in PATCH 1/2) creates three
new, working builtin intrinsics.

void __builtin_riscv_prefetch_i(void *addr, [intptr_t offset,] ...);
void __builtin_riscv_prefetch_r(void *addr, [intptr_t offset,] ...);
void __builtin_riscv_prefetch_w(void *addr, [intptr_t offset,] ...);


For consistency with "prefetch.i" and the reason I describe later (which
requires native instructions for "prefetch.r" and "prefetch.w"), I decided
to make builtin functions for "prefetch.[rw]" as well.

Optional second argument (named "offset" here) defaults to zero and must be
a compile-time integral constant.  Also, it must be a valid offset for a
"prefetch.[irw]" HINT instruction (x % 32 == 0 && x >= -2048 && x < 2048).

They are defined if the 'Zicbop' extension is supported and expands to:

> prefetch.i offset(addr_reg)  ; __builtin_riscv_prefetch_i
> prefetch.r offset(addr_reg)  ; __builtin_riscv_prefetch_r
> prefetch.w offset(addr_reg)  ; __builtin_riscv_prefetch_w


The hardest part of this patch set was to support builtin function with
variable argument (making "offset" optional).  It required:

1.  Support for variable argument function prototype for RISC-V builtins
(corresponding "..." on C-based languages)
2.  Support for (non-vector) RISC-V builtins with custom expansion
(on RVV intrinsics, custom expansion is already implemented)


... and PATCH 2/2 fixes an ICE while I'm investigating regular prefetch
builtin (__builtin_prefetch).  If the 'Zicbop' extension is enabled,
__builtin_prefetch with the first argument NULL or (not all but) some
fixed addresses (like ((void*)0x20)) can cause an ICE.  This is because
the "r" constraint is not checked and a constant can be a first argument
of target-specific "prefetch" RTL instruction.

PATCH 2/2 fixes this issue by:

1.  Making "prefetch" not an instruction but instead an expansion
(this is not rare; e.g. on i386) and
2.  Coercing the address argument into a register in the expansion

It requires separate instructions for "prefetch.[rw]" and I decided to make
those prefetch instructions very similar to "prefetch.i".  That's one of the
reasons I created builtins corresponding those.


Sincerely,
Tsukasa




Tsukasa OI (2):
  RISC-V: Define not broken prefetch builtins
  RISC-V: Fix ICE by expansion and register coercion

 gcc/config/riscv/riscv-builtins.cc| 112 +-
 gcc/config/riscv/riscv-cmo.def|   8 +-
 gcc/config/riscv/riscv-ftypes.def |   1 +
 gcc/config/riscv/riscv.md |  67 ---
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c |  41 ---
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c |  33 ++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-3.c |  29 +
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-4.c |  14 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-5.c |  14 +++
 gcc/testsuite/gcc.target/riscv/cmo-zicbop-6.c |  38 ++
 .../gcc.target/riscv/cmo-zicbop-by-common-1.c |  17 +++
 .../gcc.target/riscv/cmo-zicbop-by-common-2.c |   7 ++
 .../gcc.target/riscv/cmo-zicbop-by-common-3.c |  13 ++
 .../riscv/cmo-zicbop-by-common-ice-1.c|  13 ++
 .../riscv/cmo-zicbop-by-common-ice-2.c|   7 ++
 15 files changed, 350 insertions(+), 64 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-5.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-6.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-2.c


base-commit: 40ac613627205dd4d24ae136917e48b357fee758
-- 
2.42.0