Re: [PATCH v2] RISC-V: Add autovec FP unary operations.

2023-06-15 Thread Michael Collison

Hi Robin,

Looks good to me except for note that this seems to depend on a new 
function: emit_vlmax_fp_insn which appears to be part of your autovec FP 
binary operation. So that patch would need to be merged first from what 
I can see.


On 6/15/23 11:12, Robin Dapp via Gcc-patches wrote:

Hi,

changes from V1:
   - Use VF_AUTO iterator.
   - Don't mention vfsqrt7.

This patch adds floating-point autovec expanders for vfneg, vfabs as well as
vfsqrt and the accompanying tests.

Similary to the binop tests, there are flavors for zvfh now.

gcc/ChangeLog:

* config/riscv/autovec.md (2): Add unop expanders.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/unop/abs-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c: New test.
---
  gcc/config/riscv/autovec.md   | 36 ++-
  .../riscv/rvv/autovec/unop/abs-run.c  |  6 ++--
  .../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  3 +-
  .../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  3 +-
  .../riscv/rvv/autovec/unop/abs-template.h | 14 +++-
  .../riscv/rvv/autovec/unop/abs-zvfh-run.c | 35 ++
  .../riscv/rvv/autovec/unop/vfsqrt-run.c   | 29 +++
  .../riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c   | 10 ++
  .../riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c   | 10 ++
  .../riscv/rvv/autovec/unop/vfsqrt-template.h  | 31 
  .../riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c  | 32 +
  .../riscv/rvv/autovec/unop/vneg-run.c |  6 ++--
  .../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  3 +-
  .../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  3 +-
  .../riscv/rvv/autovec/unop/vneg-template.h|  5 ++-
  .../riscv/rvv/autovec/unop/vneg-zvfh-run.c| 26 ++
  16 files changed, 241 insertions(+), 11 deletions(-)
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-zvfh-run.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-run.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv32gcv.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-rv64gcv.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-template.h
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vfsqrt-zvfh-run.c
  create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-zvfh-run.c

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 94452c932a4..5b84eaaf052 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -513,7 +513,7 @@ (define_expand "2"
  })
  
  ;; ---

-;; - ABS expansion to vmslt and vneg
+;; - [INT] ABS expansion to vmslt and vneg.
  ;; 
---
  
  (define_expand "abs2"

@@ -532,6 +532,40 @@ (define_expand "abs2"
DONE;
  })
  
+;; ---

+;;  [FP] Unary operations
+;; 
---
+;; Includes:
+;; - vfneg.v/vfabs.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF_AUTO 0 "register_operand")
+(any_float_unop_nofrm:VF_AUTO
+ (match_operand:VF_AUTO 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
+;; 
---
+;; - [FP] Square root
+;; 
---
+;; Includes:
+;; - vfsqrt.v
+;; 
---
+(define_expand "2"
+  [(set (match_operand:VF_AUTO 0 "register_operand")
+(any_float_unop:VF_AUTO
+ (match_operand:VF_AUTO 

Re: [PATCH v2] RISC-V: Add autovec FP binary operations.

2023-06-15 Thread Michael Collison

Robin,

Why do we need '-ffast-math' with the tests?

On 6/15/23 11:10, Robin Dapp via Gcc-patches wrote:

Hi,

changes from V1:
  - Add VF_AUTO iterator and use it.
  - Ensured we don't ICE with -march=rv64gcv_zfhmin.

this implements the floating-point autovec expanders for binary
operations: vfadd, vfsub, vfdiv, vfmul, vfmax, vfmin and adds
tests.

The existing tests are split up into non-_Float16 and _Float16
flavors as we cannot rely on the zvfh extension being present.

As long as we do not have full middle-end support we need
-ffast-math for the tests.

gcc/ChangeLog:

* config/riscv/autovec.md (3): Implement binop
expander.
* config/riscv/riscv-protos.h (emit_vlmax_fp_insn): Declare.
(emit_vlmax_fp_minmax_insn): Declare.
(enum frm_field_enum): Rename this...
(enum rounding_mode): ...to this.
* config/riscv/riscv-v.cc (emit_vlmax_fp_insn): New function
(emit_vlmax_fp_minmax_insn): New function.
* config/riscv/riscv.cc (riscv_const_insns): Clarify const
vector handling.
(riscv_libgcc_floating_mode_supported_p): Adjust comment.
(riscv_excess_precision): Do not convert to float for ZVFH.
* config/riscv/vector-iterators.md: Add VF_AUTO iterator.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmax-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmin-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vmul-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-run.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-rv64gcv.c: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vsub-template.h: Add FP.
* gcc.target/riscv/rvv/autovec/binop/vadd-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vdiv-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmax-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmin-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vmul-zvfh-run.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vsub-zvfh-run.c: New test.
---
  gcc/config/riscv/autovec.md   | 36 +
  gcc/config/riscv/riscv-protos.h   |  5 +-
  gcc/config/riscv/riscv-v.cc   | 74 ++-
  gcc/config/riscv/riscv.cc | 27 +--
  gcc/config/riscv/vector-iterators.md  | 28 +++
  .../riscv/rvv/autovec/binop/vadd-run.c| 12 ++-
  .../riscv/rvv/autovec/binop/vadd-rv32gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vadd-rv64gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vadd-template.h   | 11 ++-
  .../riscv/rvv/autovec/binop/vadd-zvfh-run.c   | 54 ++
  .../riscv/rvv/autovec/binop/vdiv-run.c|  8 +-
  .../riscv/rvv/autovec/binop/vdiv-rv32gcv.c|  7 +-
  .../riscv/rvv/autovec/binop/vdiv-rv64gcv.c|  7 +-
  .../riscv/rvv/autovec/binop/vdiv-template.h   |  8 +-
  .../riscv/rvv/autovec/binop/vdiv-zvfh-run.c   | 37 ++
  .../riscv/rvv/autovec/binop/vmax-run.c|  9 ++-
  .../riscv/rvv/autovec/binop/vmax-rv32gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vmax-rv64gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vmax-template.h   |  8 +-
  .../riscv/rvv/autovec/binop/vmax-zvfh-run.c   | 38 ++
  .../riscv/rvv/autovec/binop/vmin-run.c| 10 ++-
  .../riscv/rvv/autovec/binop/vmin-rv32gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vmin-rv64gcv.c|  3 +-
  .../riscv/rvv/autovec/binop/vmin-template.h   |  8 +-
  .../riscv/rvv/autovec/binop/vmin-zvfh-run.c   | 37 ++
  .../riscv/rvv/autovec/binop/vmul-run.c|  8 +-
  

Re: [PATCH v6 0/9] RISC-V: autovec: Add autovec support

2023-05-05 Thread Michael Collison
Because everyone was commenting that we needed vector load/store support 
(including Juzhe). Juzhe specifically pointed me to his patch for the 
load/store patterns in his review of my code. Would you like me to 
remove the patterns?


On 5/5/23 12:34, Kito Cheng wrote:

Errr, why you just mixed in JuZhe’s patch set into this patch set?

Michael Collison 於 2023年5月5日 週五,23:47寫道:

This series of patches adds foundational support for RISC-V
auto-vectorization support. These patches are based on the current
upstream rvv vector intrinsic support and is not a new
implementation. Most of the implementation consists of adding the
new vector cost model, the autovectorization patterns themselves
and target hooks. This implementation only provides support for
integer addition and subtraction as a proof of concept. This patch
set should not be construed to be feature complete. Based on
conversations with the community these patches are intended to lay
the groundwork for feature completion and collaboration within the
RISC-V community.

These patches are largely based off the work of Juzhe Zhong
(juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>) of RiVAI. More
specifically the rvv-next branch at:
https://github.com/riscv-collab/riscv-gcc.git
<https://github.com/riscv-collab/riscv-gcc.git>is the foundation
of this patch set.

As discussed on this list, if these patches are approved they will
be merged into a "auto-vectorization" branch once gcc-13 branches
for release. There are two known issues related to crashes (assert
failures) associated with tree vectorization; one of which I have
sent a patch for and have received feedback.

Changes in v6:
- Incorporated upstream comments, added target hook for
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

Changes in v5:

- Incorporated upstream comments large to delete unnecessary code

Changes in v4:

- Added support for binary integer operations and test cases
- Fixed bug to support 8-bit integer vectorization
- Fixed several assert errors related to non-multiple of two
vector modes

Changes in v3:

- Removed the cost model and cost hooks based on feedback from
Richard Biener
- Used RVV_VUNDEF macro to fix failing patterns

Changes in v2

- Updated ChangeLog entry to include RiVAI contributions
- Fixed ChangeLog email formatting
- Fixed gnu formatting issues in the code

Kevin Lee (1):
  RISC-V:autovec: This patch supports 8 bit auto-vectorization in
    riscv.

Michael Collison (8):
  RISC-V: Add new predicates and function prototypes
  RISC-V: autovec: Export policy functions to global scope
  RISC-V:autovec: Add auto-vectorization support functions
  RISC-V:autovec: Add target vectorization hooks
  RISC-V:autovec: Add autovectorization patterns for binary integer &
    len_load/store
  RISC-V:autovec: Add autovectorization tests for add & sub
  vect: Verify that GET_MODE_NUNITS is a multiple of 2.
  RISC-V:autovec: Add autovectorization tests for binary integer

 gcc/config/riscv/riscv-opts.h                 |  10 ++
 gcc/config/riscv/riscv-protos.h               |   9 ++
 gcc/config/riscv/riscv-v.cc                   |  91 
 gcc/config/riscv/riscv-vector-builtins.cc     |   4 +-
 gcc/config/riscv/riscv-vector-builtins.h      |   3 +
 gcc/config/riscv/riscv.cc                     | 130
++
 gcc/config/riscv/riscv.md                     |   1 +
 gcc/config/riscv/vector-auto.md               |  74 ++
 gcc/config/riscv/vector.md                    |   4 +-
 .../riscv/rvv/autovec/loop-add-rv32.c         |  25 
 .../gcc.target/riscv/rvv/autovec/loop-add.c   |  25 
 .../riscv/rvv/autovec/loop-and-rv32.c         |  25 
 .../gcc.target/riscv/rvv/autovec/loop-and.c   |  25 
 .../riscv/rvv/autovec/loop-div-rv32.c         |  27 
 .../gcc.target/riscv/rvv/autovec/loop-div.c   |  27 
 .../riscv/rvv/autovec/loop-max-rv32.c         |  26 
 .../gcc.target/riscv/rvv/autovec/loop-max.c   |  26 
 .../riscv/rvv/autovec/loop-min-rv32.c         |  26 
 .../gcc.target/riscv/rvv/autovec/loop-min.c   |  26 
 .../riscv/rvv/autovec/loop-mod-rv32.c         |  27 
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   |  27 
 .../riscv/rvv/autovec/loop-mul-rv32.c         |  25 
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   |  25 
 .../riscv/rvv/autovec/loop-or-rv32.c          |  25 
 .../gcc.target/riscv/rvv/autovec/loop-or.c    |  25 
 .../riscv/rvv/autovec/loop-sub-rv32.c         |  25 
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  25 
 .../riscv/rvv/autovec/loop-xor-rv32.c         |  25 
 .../gcc.target/riscv/rvv/autovec/loop-xor.c  

[PATCH v6 6/9] RISC-V:autovec: Add autovectorization tests for add & sub

2023-05-05 Thread Michael Collison
2023-03-02  Michael Collison  
Vineet Gupta 

* gcc.target/riscv/rvv/autovec: New directory
for autovectorization tests.
* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: New
test to verify code generation of vector add on rv32.
* gcc.target/riscv/rvv/autovec/loop-add.c: New
test to verify code generation of vector add on rv64.
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: New
test to verify code generation of vector subtract on rv32.
* gcc.target/riscv/rvv/autovec/loop-sub.c: New
test to verify code generation of vector subtract on rv64.
---
 .../riscv/rvv/autovec/loop-add-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   | 24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   | 24 +++
 4 files changed, 96 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
new file mode 100644
index 000..bdc3b6892e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
new file mode 100644
index 000..d7f992c7d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
new file mode 100644
index 000..7d0a40ec539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] - b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvsub\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
new file mode 100644
index 000..c8900884f83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)

[PATCH v6 9/9] RISC-V:autovec: This patch supports 8 bit auto-vectorization in riscv.

2023-05-05 Thread Michael Collison
From: Kevin Lee 

2023-04-14 Kevin Lee 
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: Support 8bit
type
* gcc.target/riscv/rvv/autovec/loop-add.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-and-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-and.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-div-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-div.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-max-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-max.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-min-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-min.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mod-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mod.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mul-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mul.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-or-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-or.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-sub.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-xor-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-xor.c: Ditto
---
 .../gcc.target/riscv/rvv/autovec/loop-add-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c  |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-and-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c  |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-div-rv32.c   | 10 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c  | 10 ++
 .../gcc.target/riscv/rvv/autovec/loop-max-rv32.c   |  9 +
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max.c  |  9 +
 .../gcc.target/riscv/rvv/autovec/loop-min-rv32.c   |  9 +
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min.c  |  9 +
 .../gcc.target/riscv/rvv/autovec/loop-mod-rv32.c   | 10 ++
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod.c  | 10 ++
 .../gcc.target/riscv/rvv/autovec/loop-mul-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul.c  |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-or-rv32.c|  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or.c   |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-sub-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c  |  7 ---
 .../gcc.target/riscv/rvv/autovec/loop-xor-rv32.c   |  7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor.c  |  7 ---
 20 files changed, 92 insertions(+), 68 deletions(-)

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
index bdc3b6892e9..d2765e67d0d 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d 
--param=riscv-autovec-preference=fixed-vlmax -mno-strict-align" } */
 
 #include 
 
@@ -10,8 +10,9 @@
   dst[i] = a[i] + b[i];\
   }
 
-/* *int8_t not autovec currently. */
 #define TEST_ALL() \
+ TEST_TYPE(int8_t) \
+ TEST_TYPE(uint8_t)\
  TEST_TYPE(int16_t)\
  TEST_TYPE(uint16_t)   \
  TEST_TYPE(int32_t)\
@@ -21,4 +22,4 @@
 
 TEST_ALL()
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
index d7f992c7d27..c43f6d3e8cb 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d 
--param=riscv-autovec-preference=fixed-vlmax -mno-strict-align" } */
 
 #include 
 
@@ -10,8 +10,9 @@
   dst[i] = a[i] + b[i];\
   }
 
-/* *int8_t not autovec currently. */
 #define TEST_ALL() \
+ TEST_TYPE(int8_t) \
+ TEST_TYPE(uint8_t)\
  TEST_TYPE(int16_t)\
  TEST_TYPE(uint16_t)   \
  TEST_TYPE(int32_t)\
@@ -21,4 +22,4 @@
 
 TEST_ALL()
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
index eb1ac5b44fd..703f4843c2b 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
+++ 

[PATCH v6 7/9] RISC-V: autovec: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-05-05 Thread Michael Collison
While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
where GET_MODE_NUNITS is equal to one.

Tested on RISCV and x86_64-linux-gnu. Okay?

2023-03-09  Michael Collison  

* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a multiple of 2.
---
 gcc/tree-vect-slp.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index b299e209b5b..3b7a21724ec 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
(GET_MODE_BITSIZE (int_mode), 1);
  tree vector_type
= get_vectype_for_scalar_type (vinfo, int_type, count);
+ poly_int64 half_nelts;
  if (vector_type
  && VECTOR_MODE_P (TYPE_MODE (vector_type))
  && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
-  GET_MODE_SIZE (base_vector_mode)))
+  GET_MODE_SIZE (base_vector_mode))
+ && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
+2, _nelts))
{
  /* Try fusing consecutive sequences of COUNT / NVECTORS elements
 together into elements of type INT_TYPE and using the result
@@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
  poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
  vec_perm_builder sel1 (nelts, 2, 3);
  vec_perm_builder sel2 (nelts, 2, 3);
- poly_int64 half_nelts = exact_div (nelts, 2);
+
  for (unsigned int i = 0; i < 3; ++i)
{
  sel1.quick_push (i);
-- 
2.34.1



[PATCH v6 8/9] RISC-V:autovec: Add autovectorization tests for binary integer

2023-05-05 Thread Michael Collison
2023-04-05  Michael Collison  

* gcc.target/riscv/rvv/autovec/loop-and-rv32.c: New
test to verify code generation of vector "and" on rv32.
* gcc.target/riscv/rvv/autovec/loop-and.c: New
test to verify code generation of vector "and" on rv64.
* gcc.target/riscv/rvv/autovec/loop-div-rv32.c: New
test to verify code generation of vector divide on rv32.
* gcc.target/riscv/rvv/autovec/loop-div.c: New
test to verify code generation of vector divide on rv64.
* gcc.target/riscv/rvv/autovec/loop-max-rv32.c: New
test to verify code generation of vector maximum on rv32.
* gcc.target/riscv/rvv/autovec/loop-max.c: New
test to verify code generation of vector maximum on rv64.
* gcc.target/riscv/rvv/autovec/loop-min-rv32.c: New
test to verify code generation of vector minimum on rv32.
* gcc.target/riscv/rvv/autovec/loop-min.c: New
test to verify code generation of vector minimum on rv64.
* gcc.target/riscv/rvv/autovec/loop-mod-rv32.c: New
test to verify code generation of vector modulus on rv32.
* gcc.target/riscv/rvv/autovec/loop-mod.c: New
test to verify code generation of vector modulus on rv64.
* gcc.target/riscv/rvv/autovec/loop-mul-rv32.c: New
test to verify code generation of vector multiply on rv32.
* gcc.target/riscv/rvv/autovec/loop-mul.c: New
test to verify code generation of vector multiply on rv64.
* gcc.target/riscv/rvv/autovec/loop-or-rv32.c: New
test to verify code generation of vector "or" on rv32.
* gcc.target/riscv/rvv/autovec/loop-or.c: New
test to verify code generation of vector "or" on rv64.
* gcc.target/riscv/rvv/autovec/loop-xor-rv32.c: New
test to verify code generation of vector xor on rv32.
* gcc.target/riscv/rvv/autovec/loop-xor.c: New
test to verify code generation of vector xor on rv64.
---
 .../riscv/rvv/autovec/loop-and-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-and.c   | 24 ++
 .../riscv/rvv/autovec/loop-div-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-div.c   | 25 +++
 .../riscv/rvv/autovec/loop-max-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-max.c   | 25 +++
 .../riscv/rvv/autovec/loop-min-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-min.c   | 25 +++
 .../riscv/rvv/autovec/loop-mod-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   | 25 +++
 .../riscv/rvv/autovec/loop-mul-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   | 24 ++
 .../riscv/rvv/autovec/loop-or-rv32.c  | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-or.c| 24 ++
 .../riscv/rvv/autovec/loop-xor-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   | 24 ++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  4 +++
 17 files changed, 396 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
new file mode 100644
index 000..eb1ac5b44fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  vo

[PATCH v6 4/9] RISC-V:autovec: Add target vectorization hooks

2023-05-05 Thread Michael Collison
2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.cc
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(riscv_support_vector_misalignment): Implement
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
(TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT): Ditto.
---
 gcc/config/riscv/riscv.cc | 130 ++
 1 file changed, 130 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 1e328f6a801..1425f50d80a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgrtl.h"
+#include "sel-sched.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "gimple-expr.h"
+#include "tree-vectorizer.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -7138,6 +7147,112 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, 
unsigned int *factor,
   return RISCV_DWARF_VLENB;
 }
 
+/* Implement TARGET_ESTIMATED_POLY_VALUE.
+   Look into the tuning structure for an estimate.
+   KIND specifies the type of requested estimate: min, max or likely.
+   For cores with a known RVV width all three estimates are the same.
+   For generic RVV tuning we want to distinguish the maximum estimate from
+   the minimum and likely ones.
+   The likely estimate is the same as the minimum in that case to give a
+   conservative behavior of auto-vectorizing with RVV when it is a win
+   even for 128-bit RVV.
+   When RVV width information is available VAL.coeffs[1] is multiplied by
+   the number of VQ chunks over the initial Advanced SIMD 128 bits.  */
+
+static HOST_WIDE_INT
+riscv_estimated_poly_value (poly_int64 val,
+   poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
+{
+  unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
+? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
+: (unsigned int) RVV_SCALABLE;
+
+  /* If there is no core-specific information then the minimum and likely
+ values are based on 128-bit vectors and the maximum is based on
+ the architectural maximum of 65536 bits.  */
+  if (width_source == RVV_SCALABLE)
+switch (kind)
+  {
+  case POLY_VALUE_MIN:
+  case POLY_VALUE_LIKELY:
+   return val.coeffs[0];
+
+  case POLY_VALUE_MAX:
+   return val.coeffs[0] + val.coeffs[1] * 15;
+  }
+
+  /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating the
+ lowest as likely.  This could be made more general if future -mtune
+ options need it to be.  */
+  if (kind == POLY_VALUE_MAX)
+width_source = 1 << floor_log2 (width_source);
+  else
+width_source = least_bit_hwi (width_source);
+
+  /* If the core provides width information, use that.  */
+  HOST_WIDE_INT over_128 = width_source - 128;
+  return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
+}
+
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE.  */
+
+static machine_mode
+riscv_preferred_simd_mode (scalar_mode mode)
+{
+  if (TARGET_VECTOR)
+return riscv_vector::riscv_vector_preferred_simd_mode (mode);
+
+  return word_mode;
+}
+
+bool
+riscv_support_vector_misalignment (machine_mode mode,
+  const_tree type ATTRIBUTE_UNUSED,
+  int misalignment,
+  bool is_packed ATTRIBUTE_UNUSED)
+{
+  if (TARGET_VECTOR)
+{
+  if (STRICT_ALIGNMENT)
+   {
+ /* Return if movmisalign pattern is not supported for this mode.  */
+ if (optab_handler (movmisalign_optab, mode) == CODE_FOR_nothing)
+   return false;
+
+ /* Misalignment factor is unknown at compile time.  */
+ if (misalignment == -1)
+   return false;
+   }
+  return true;
+}
+
+  return default_builtin_support_vector_misalignment (mode, type, misalignment,
+ is_packed);
+}
+
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE.  */
+
+static opt_machine_mode
+riscv_get_mask_mode (machine_mode mode)
+{
+  machine_mode mask_mode = VOI

[PATCH v6 3/9] RISC-V:autovec: Add auto-vectorization support functions

2023-05-05 Thread Michael Collison
2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-v.cc
(riscv_vector_preferred_simd_mode): New function.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
---
 gcc/config/riscv/riscv-v.cc | 91 +
 1 file changed, 91 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 99c414cc910..7faffb55046 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -39,9 +39,11 @@
 #include "emit-rtl.h"
 #include "tm_p.h"
 #include "target.h"
+#include "targhooks.h"
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
+#include "riscv-vector-builtins.h"
 #include "rtx-vector-builder.h"
 
 using namespace riscv_vector;
@@ -176,6 +178,56 @@ calculate_ratio (unsigned int sew, enum vlmul_type vlmul)
   return ratio;
 }
 
+/* SCALABLE means that the vector-length is agnostic (run-time invariant and
+   compile-time unknown). FIXED meands that the vector-length is specific
+   (compile-time known). Both RVV_SCALABLE and RVV_FIXED_VLMAX are doing
+   auto-vectorization using VLMAX vsetvl configuration.  */
+static bool
+autovec_use_vlmax_p (void)
+{
+  return riscv_autovec_preference == RVV_SCALABLE
+|| riscv_autovec_preference == RVV_FIXED_VLMAX;
+}
+
+/* Return the vectorization machine mode for RVV according to LMUL.  */
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode)
+{
+  /* We only enable auto-vectorization when TARGET_MIN_VLEN >= 128 &&
+ riscv_autovec_lmul < RVV_M2. Since GCC loop vectorizer report ICE
+ when we enable -march=rv64gc_zve32* and -march=rv32gc_zve64*.
+ in the 'can_duplicate_and_interleave_p' of tree-vect-slp.cc. Since we have
+ VNx1SImode in -march=*zve32* and VNx1DImode in -march=*zve64*, they are
+ enabled in targetm. vector_mode_supported_p and SLP vectorizer will try to
+ use them. Currently, we can support auto-vectorization in
+ -march=rv32_zve32x_zvl128b. Wheras, -march=rv32_zve32x_zvl32b or
+ -march=rv32_zve32x_zvl64b are disabled.
+ */
+  if (autovec_use_vlmax_p ())
+{
+  /* If TARGET_MIN_VLEN < 128, we don't allow LMUL < 2
+auto-vectorization since Loop Vectorizer may use VNx1SImode or
+VNx1DImode to vectorize which will create ICE in the
+'can_duplicate_and_interleave_p' of tree-vect-slp.cc.  */
+  if (TARGET_MIN_VLEN < 128 && riscv_autovec_lmul < RVV_M2)
+   return word_mode;
+  /* We use LMUL = 1 as base bytesize which is BYTES_PER_RISCV_VECTOR and
+riscv_autovec_lmul as multiply factor to calculate the the NUNITS to
+get the auto-vectorization mode.  */
+  poly_uint64 nunits;
+  poly_uint64 vector_size
+   = BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul);
+  poly_uint64 scalar_size = GET_MODE_SIZE (mode);
+  gcc_assert (multiple_p (vector_size, scalar_size, ));
+  machine_mode rvv_mode;
+  if (get_vector_mode (mode, nunits).exists (_mode))
+   return rvv_mode;
+}
+  /* TODO: We will support minimum length VLS auto-vectorization in the future.
+   */
+  return word_mode;
+}
+
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
@@ -430,6 +482,45 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }
 
+/* Return the mask policy for no predication.  */
+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return the tail policy for no predication.  */
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_tail_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return true if it is a RVV mask mode.  */
+bool
+riscv_vector_mask_mode_p (machine_mode mode)
+{
+  return (mode == VNx1BImode || mode == VNx2BImode || mode == VNx4BImode
+ || mode == VNx8BImode || mode == VNx16BImode || mode == VNx32BImode
+ || mode == VNx64BImode);
+}
+
+/* Return the appropriate mask mode for MODE.  */
+
+opt_machine_mode
+riscv_vector_get_mask_mode (machine_mode mode)
+{
+  machine_mode mask_mode;
+  int nf = 1;
+
+  FOR_EACH_MODE_IN_CLASS (mask_mode, MODE_VECTOR_BOOL)
+  if (GET_MODE_INNER (mask_mode) == BImode
+  && known_eq (GET_MODE_NUNITS (mask_mode) * nf, GET_MODE_NUNITS (mode))
+  && riscv_vector_mask_mode_p (mask_mode))
+return mask_mode;
+  return default_get_mask_mode (mode);
+}
+
 /* Return the RVV vector mode that has NUNITS elements of mode INNER_MODE.
This function is not only used by builtins, but also will be used by
auto-vectorization in the future.  */
-- 
2.34.1



[PATCH v6 2/9] RISC-V: autovec: Export policy functions to global scope

2023-05-05 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Remove static declaration to to make externally visible.
(get_mask_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
New external declaration.
(get_mask_policy_for_pred): Ditto.
---
 gcc/config/riscv/riscv-vector-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-vector-builtins.h  | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 434bd8e157b..f0ebc095fa7 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2496,7 +2496,7 @@ use_real_merge_p (enum predication_type_index pred)
 
 /* Get TAIL policy for predication. If predication indicates TU, return the TU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_tail_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tu || pred == PRED_TYPE_tum || pred == PRED_TYPE_tumu)
@@ -2506,7 +2506,7 @@ get_tail_policy_for_pred (enum predication_type_index 
pred)
 
 /* Get MASK policy for predication. If predication indicates MU, return the MU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_mask_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu)
diff --git a/gcc/config/riscv/riscv-vector-builtins.h 
b/gcc/config/riscv/riscv-vector-builtins.h
index 8ffb9d33e33..de3fd6ca290 100644
--- a/gcc/config/riscv/riscv-vector-builtins.h
+++ b/gcc/config/riscv/riscv-vector-builtins.h
@@ -483,6 +483,9 @@ extern rvv_builtin_types_t builtin_types[NUM_VECTOR_TYPES + 
1];
 extern function_instance get_read_vl_instance (void);
 extern tree get_read_vl_decl (void);
 
+extern rtx get_tail_policy_for_pred (enum predication_type_index pred);
+extern rtx get_mask_policy_for_pred (enum predication_type_index pred);
+
 inline tree
 rvv_arg_type_info::get_scalar_type (vector_type_index type_idx) const
 {
-- 
2.34.1



[PATCH v6 5/9] RISC-V:autovec: Add autovectorization patterns for binary integer & len_load/store

2023-05-05 Thread Michael Collison
2023-04-25  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.md (riscv_vector_preferred_simd_mode): Include
vector-iterators.md.
* config/riscv/vector-auto.md: New file containing
autovectorization patterns.
* config/riscv/vector.md: Remove include of vector-iterators.md
and include vector-auto.md.
---
 gcc/config/riscv/riscv.md   |  1 +
 gcc/config/riscv/vector-auto.md | 74 +
 gcc/config/riscv/vector.md  |  4 +-
 3 files changed, 77 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index c508ee3ad89..e9b49eda617 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -140,6 +140,7 @@
 (include "predicates.md")
 (include "constraints.md")
 (include "iterators.md")
+(include "vector-iterators.md")
 
 ;; 
 ;;
diff --git a/gcc/config/riscv/vector-auto.md b/gcc/config/riscv/vector-auto.md
new file mode 100644
index 000..83d2ab6957a
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
@@ -0,0 +1,74 @@
+;; Machine description for RISC-V 'V' Extension for GNU compiler.
+;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+;; Contributed by Michael Collison (colli...@rivosinc.com, Rivos Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; len_load/len_store is a sub-optimal pattern for RVV auto-vectorization 
support.
+;; We will replace them when len_maskload/len_maskstore is supported in loop 
vectorizer.
+(define_expand "len_load_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand:V 1 "memory_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
+
+(define_expand "len_store_"
+  [(match_operand:V 0 "memory_operand")
+   (match_operand:V 1 "register_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
+
+;; -
+;;  [INT] Vector binary patterns
+;; -
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+   (any_int_binop:VI (match_operand:VI 1 "")
+ (match_operand:VI 2 "")))]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = RVV_VUNDEF (mode);
+  rtx vl = gen_reg_rtx (Pmode);
+  emit_vlmax_vsetvl (mode, vl);
+  rtx mask_policy = get_mask_policy_no_pred ();
+  rtx tail_policy = get_tail_policy_no_pred ();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx (NONVLMAX);
+
+  emit_insn (gen_pred_ (operands[0], mask, merge, operands[1], 
operands[2],
+vl, tail_policy, mask_policy, 
vlmax_avl_p));
+
+  DONE;
+})
+
+
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 1642822d098..5c9252c281b 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -26,8 +26,6 @@
 ;; - Auto-vectorization (TBD)
 ;; - Combine optimization (TBD)
 
-(include "vector-iterators.md")
-
 (define_constants [
(INVALID_ATTRIBUTE255)
(X0_REGNUM  0)
@@ -368,6 +366,8 @@
   (symbol_ref "INTVAL (operands[4])")]
(const_int INVALID_ATTRIBUTE)))
 
+(include "vector-auto.md")
+
 ;; -
 ;;  Miscellaneous Operations
 ;; -
-- 
2.34.1



[PATCH v6 1/9] RISC-V: autovec: Add new predicates and function prototypes

2023-05-05 Thread Michael Collison
2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-protos.h
(riscv_vector_preferred_simd_mode): New.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(emit_vlmax_vsetvl): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(vlmul_field_enum): Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
Remove static scope.
* config/riscv/riscv-opts.h (riscv_vector_lmul_enum): New enum.
---
 gcc/config/riscv/riscv-opts.h   | 10 ++
 gcc/config/riscv/riscv-protos.h |  9 +
 2 files changed, 19 insertions(+)

diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 4207db240ea..00c4ab222ae 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,7 @@ enum stack_protector_guard {
   SSP_GLOBAL   /* global canary */
 };
 
+
 /* RISC-V auto-vectorization preference.  */
 enum riscv_autovec_preference_enum {
   NO_AUTOVEC,
@@ -82,6 +83,15 @@ enum riscv_autovec_lmul_enum {
   RVV_M8 = 8
 };
 
+/* vectorization factor.  */
+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 33eb574aadc..fb39b856735 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -243,4 +243,13 @@ th_mempair_output_move (rtx[4], bool, machine_mode, 
RTX_CODE);
 #endif
 
 extern bool riscv_use_divmod_expander (void);
+/* Routines implemented in riscv-v.cc.  */
+
+namespace riscv_vector {
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode);
+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
+extern rtx get_mask_policy_no_pred ();
+extern rtx get_tail_policy_no_pred ();
+}
 #endif /* ! GCC_RISCV_PROTOS_H */
-- 
2.34.1



[PATCH v6 0/9] RISC-V: autovec: Add autovec support

2023-05-05 Thread Michael Collison
This series of patches adds foundational support for RISC-V auto-vectorization 
support. These patches are based on the current upstream rvv vector intrinsic 
support and is not a new implementation. Most of the implementation consists of 
adding the new vector cost model, the autovectorization patterns themselves and 
target hooks. This implementation only provides support for integer addition 
and subtraction as a proof of concept. This patch set should not be construed 
to be feature complete. Based on conversations with the community these patches 
are intended to lay the groundwork for feature completion and collaboration 
within the RISC-V community.

These patches are largely based off the work of Juzhe Zhong 
(juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>) of RiVAI. More specifically 
the rvv-next branch at: https://github.com/riscv-collab/riscv-gcc.git 
<https://github.com/riscv-collab/riscv-gcc.git>is the foundation of this patch 
set. 

As discussed on this list, if these patches are approved they will be merged 
into a "auto-vectorization" branch once gcc-13 branches for release. There are 
two known issues related to crashes (assert failures) associated with tree 
vectorization; one of which I have sent a patch for and have received feedback. 

Changes in v6:
- Incorporated upstream comments, added target hook for 
TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT

Changes in v5:

- Incorporated upstream comments large to delete unnecessary code

Changes in v4:

- Added support for binary integer operations and test cases
- Fixed bug to support 8-bit integer vectorization
- Fixed several assert errors related to non-multiple of two vector modes

Changes in v3:

- Removed the cost model and cost hooks based on feedback from Richard Biener
- Used RVV_VUNDEF macro to fix failing patterns

Changes in v2 

- Updated ChangeLog entry to include RiVAI contributions 
- Fixed ChangeLog email formatting 
- Fixed gnu formatting issues in the code 

Kevin Lee (1):
  RISC-V:autovec: This patch supports 8 bit auto-vectorization in riscv.

Michael Collison (8):
  RISC-V: Add new predicates and function prototypes
  RISC-V: autovec: Export policy functions to global scope
  RISC-V:autovec: Add auto-vectorization support functions
  RISC-V:autovec: Add target vectorization hooks
  RISC-V:autovec: Add autovectorization patterns for binary integer &
len_load/store
  RISC-V:autovec: Add autovectorization tests for add & sub
  vect: Verify that GET_MODE_NUNITS is a multiple of 2.
  RISC-V:autovec: Add autovectorization tests for binary integer

 gcc/config/riscv/riscv-opts.h |  10 ++
 gcc/config/riscv/riscv-protos.h   |   9 ++
 gcc/config/riscv/riscv-v.cc   |  91 
 gcc/config/riscv/riscv-vector-builtins.cc |   4 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +
 gcc/config/riscv/riscv.cc | 130 ++
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/vector-auto.md   |  74 ++
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/loop-add-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-add.c   |  25 
 .../riscv/rvv/autovec/loop-and-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-and.c   |  25 
 .../riscv/rvv/autovec/loop-div-rv32.c |  27 
 .../gcc.target/riscv/rvv/autovec/loop-div.c   |  27 
 .../riscv/rvv/autovec/loop-max-rv32.c |  26 
 .../gcc.target/riscv/rvv/autovec/loop-max.c   |  26 
 .../riscv/rvv/autovec/loop-min-rv32.c |  26 
 .../gcc.target/riscv/rvv/autovec/loop-min.c   |  26 
 .../riscv/rvv/autovec/loop-mod-rv32.c |  27 
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   |  27 
 .../riscv/rvv/autovec/loop-mul-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   |  25 
 .../riscv/rvv/autovec/loop-or-rv32.c  |  25 
 .../gcc.target/riscv/rvv/autovec/loop-or.c|  25 
 .../riscv/rvv/autovec/loop-sub-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  25 
 .../riscv/rvv/autovec/loop-xor-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   |  25 
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   4 +
 gcc/tree-vect-slp.cc  |   7 +-
 31 files changed, 843 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c
 cr

Re: [PATCH v5 03/10] RISC-V:autovec: Add auto-vectorization support functions

2023-05-03 Thread Michael Collison

HI Kito,

I see there have been many comments on the 
"riscv_vector_preferred_simd_mode" hook, is there an updated version?


On 5/3/23 06:53, Kito Cheng wrote:

@@ -176,6 +178,46 @@ calculate_ratio (unsigned int sew, enum vlmul_type vlmul)
return ratio;
  }

+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */
+
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode)

JuZhe's patch[1] has been implemented and his version handles
types/modes in the right way IMO,
so I would like to take his version for this hook.

[1] 
https://patchwork.sourceware.org/project/gcc/patch/20230419164214.1032017-3-juzhe.zh...@rivai.ai/


Re: [PATCH v4 05/10] RISC-V: autovec: Add autovectorization patterns for binary integer operations

2023-04-26 Thread Michael Collison

Hi Robin and Juzhe,

Just took a look and I like the approach.

On 4/26/23 19:43, juzhe.zhong wrote:

Yeah,Robin stuff is what I want and is making perfect sense for me.
 Replied Message 
FromRobin Dapp 
Date04/27/2023 02:15
To 	juzhe.zh...@rivai.ai 
,

collison ,
gcc-patches 
Cc  jeffreyalaw ,
Kito.cheng ,
kito.cheng ,
palmer ,
palmer 
Subject 	Re: [PATCH v4 05/10] RISC-V:autovec: Add autovectorization 
patterns for binary integer operations


Hi Michael,

I have the diff below for the binops in my tree locally.
Maybe something like this works for you? Untested but compiles and
the expander helpers would need to be fortified obviously.

Regards
Robin

--

gcc/ChangeLog:

   * config/riscv/autovec.md (3): New binops expander.
   * config/riscv/riscv-protos.h (emit_nonvlmax_binop): Define.
   * config/riscv/riscv-v.cc (emit_pred_binop): New function.
   (emit_nonvlmax_binop): New function.
   * config/riscv/vector-iterators.md: New iterator.
---
gcc/config/riscv/autovec.md  | 12 
gcc/config/riscv/riscv-protos.h  |  1 +
gcc/config/riscv/riscv-v.cc  | 89 
gcc/config/riscv/vector-iterators.md | 20 +++
4 files changed, 97 insertions(+), 25 deletions(-)

diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b5d46ff57ab..c21d241f426 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -47,3 +47,15 @@ (define_expand "len_store_"
                 operands[1], operands[2], mode);
  DONE;
})
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+    (any_int_binop:VI (match_operand:VI 1 "register_operand")
+              (match_operand:VI 2 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_binop (code_for_pred (, 
mode),

+                 operands[0], operands[1], operands[2],
+                 gen_reg_rtx (Pmode), mode);
+  DONE;
+})
diff --git a/gcc/config/riscv/riscv-protos.h 
b/gcc/config/riscv/riscv-protos.h

index f6ea6846736..5cca543c773 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -163,6 +163,7 @@ void emit_hard_vlmax_vsetvl (machine_mode, rtx);
void emit_vlmax_op (unsigned, rtx, rtx, machine_mode);
void emit_vlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
void emit_nonvlmax_op (unsigned, rtx, rtx, rtx, machine_mode);
+void emit_nonvlmax_binop (unsigned, rtx, rtx, rtx, rtx, machine_mode);
enum vlmul_type get_vlmul (machine_mode);
unsigned int get_ratio (machine_mode);
int get_ta (rtx);
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 5e69427ac54..98ebc052340 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -52,7 +52,7 @@ namespace riscv_vector {
template  class insn_expander
{
public:
-  insn_expander () : m_opno (0) {}
+  insn_expander () : m_opno (0), has_dest(false) {}
  void add_output_operand (rtx x, machine_mode mode)
  {
create_output_operand (_ops[m_opno++], x, mode);
@@ -83,6 +83,44 @@ public:
add_input_operand (gen_int_mode (type, Pmode), Pmode);
  }

+  void set_dest_and_mask (rtx mask, rtx dest, machine_mode mask_mode)
+  {
+    dest_mode = GET_MODE (dest);
+    has_dest = true;
+
+    add_output_operand (dest, dest_mode);
+
+    if (mask)
+  add_input_operand (mask, GET_MODE (mask));
+    else
+  add_all_one_mask_operand (mask_mode);
+
+    add_vundef_operand (dest_mode);
+  }
+
+  void set_len_and_policy (rtx len, bool vlmax_p)
+    {
+  gcc_assert (has_dest);
+  gcc_assert (len || vlmax_p);
+
+  if (len)
+    add_input_operand (len, Pmode);
+  else
+    {
+      rtx vlmax = gen_reg_rtx (Pmode);
+      emit_vlmax_vsetvl (dest_mode, vlmax);
+      add_input_operand (vlmax, Pmode);
+    }
+
+  if (GET_MODE_CLASS (dest_mode) != MODE_VECTOR_BOOL)
+    add_policy_operand (get_prefer_tail_policy (), 
get_prefer_mask_policy ());

+
+  if (vlmax_p)
+    add_avl_type_operand (avl_type::VLMAX);
+  else
+    add_avl_type_operand (avl_type::NONVLMAX);
+    }
+
  void expand (enum insn_code icode, bool temporary_volatile_p = false)
  {
if (temporary_volatile_p)
@@ -96,6 +134,8 @@ public:

private:
  int m_opno;
+  bool has_dest;
+  machine_mode dest_mode;
  expand_operand m_ops[MAX_OPERANDS];
};

@@ -183,37 +223,29 @@ emit_pred_op (unsigned icode, rtx mask, rtx 
dest, rtx src, rtx len,

     machine_mode mask_mode, bool vlmax_p)
{
  insn_expander<8> e;
-  machine_mode mode = GET_MODE (dest);
+  e.set_dest_and_mask (mask, dest, mask_mode);

-  e.add_output_operand (dest, mode);
-
-  if (mask)
-    e.add_input_operand (mask, GET_MODE (mask));
-  else
-    e.add_all_one_mask_operand (mask_mode);
+  

[PATCH v5 04/10] RISC-V:autovec: Add target vectorization hooks

2023-04-26 Thread Michael Collison
2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.cc
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_autovectorize_vector_modes): Implement
TARGET_AUTOVECTORIZE_VECTOR_MODES.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
---
 gcc/config/riscv/riscv.cc | 129 ++
 1 file changed, 129 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dc47434fac4..77209b161f6 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgrtl.h"
+#include "sel-sched.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "gimple-expr.h"
+#include "tree-vectorizer.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -275,6 +284,9 @@ poly_uint16 riscv_vector_chunks;
 /* The number of bytes in a vector chunk.  */
 unsigned riscv_bytes_per_vector_chunk;
 
+/* Prefer vf for auto-vectorizer.  */
+unsigned riscv_vectorization_factor;
+
 /* Index R is the smallest register class that contains register R.  */
 const enum reg_class riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   GR_REGS, GR_REGS,GR_REGS,GR_REGS,
@@ -6363,6 +6375,9 @@ riscv_option_override (void)
 
   /* Convert -march to a chunks count.  */
   riscv_vector_chunks = riscv_convert_vector_bits ();
+
+  if (TARGET_VECTOR)
+riscv_vectorization_factor = RVV_LMUL1;
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -7057,6 +7072,105 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, 
unsigned int *factor,
   return RISCV_DWARF_VLENB;
 }
 
+/* Implement TARGET_ESTIMATED_POLY_VALUE.
+   Look into the tuning structure for an estimate.
+   KIND specifies the type of requested estimate: min, max or likely.
+   For cores with a known RVV width all three estimates are the same.
+   For generic RVV tuning we want to distinguish the maximum estimate from
+   the minimum and likely ones.
+   The likely estimate is the same as the minimum in that case to give a
+   conservative behavior of auto-vectorizing with RVV when it is a win
+   even for 128-bit RVV.
+   When RVV width information is available VAL.coeffs[1] is multiplied by
+   the number of VQ chunks over the initial Advanced SIMD 128 bits.  */
+
+static HOST_WIDE_INT
+riscv_estimated_poly_value (poly_int64 val,
+   poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
+{
+  unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
+? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
+: (unsigned int) RVV_SCALABLE;
+
+  /* If there is no core-specific information then the minimum and likely
+ values are based on 128-bit vectors and the maximum is based on
+ the architectural maximum of 65536 bits.  */
+  if (width_source == RVV_SCALABLE)
+switch (kind)
+  {
+  case POLY_VALUE_MIN:
+  case POLY_VALUE_LIKELY:
+   return val.coeffs[0];
+
+  case POLY_VALUE_MAX:
+   return val.coeffs[0] + val.coeffs[1] * 15;
+  }
+
+  /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating the
+ lowest as likely.  This could be made more general if future -mtune
+ options need it to be.  */
+  if (kind == POLY_VALUE_MAX)
+width_source = 1 << floor_log2 (width_source);
+  else
+width_source = least_bit_hwi (width_source);
+
+  /* If the core provides width information, use that.  */
+  HOST_WIDE_INT over_128 = width_source - 128;
+  return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
+}
+
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE.  */
+
+static machine_mode
+riscv_preferred_simd_mode (scalar_mode mode)
+{
+  if (TARGET_VECTOR)
+return riscv_vector::riscv_vector_preferred_simd_mode (mode);
+
+  return word_mode;
+}
+
+/* Implement TARGET_AUTOVECTORIZE_VECTOR_MODES for RVV.  */
+static unsigned int
+riscv_autovectorize_vector_modes (vector_modes *modes, bool)
+{
+  if (!TARGET_VECTOR)
+return 0;
+
+  if (riscv_vectorization_factor == RVV_LMUL1)
+{
+  modes->safe_push (VNx16QImode);
+  modes->safe_push (VNx8QImode);
+  modes->saf

[PATCH v5 07/10] vect: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-04-26 Thread Michael Collison
While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
where GET_MODE_NUNITS is equal to one.

Tested on RISCV and x86_64-linux-gnu. Okay?

2023-03-09  Michael Collison  

* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a multiple of 2.
---
 gcc/tree-vect-slp.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index d73deaecce0..a64fe454e19 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
(GET_MODE_BITSIZE (int_mode), 1);
  tree vector_type
= get_vectype_for_scalar_type (vinfo, int_type, count);
+ poly_int64 half_nelts;
  if (vector_type
  && VECTOR_MODE_P (TYPE_MODE (vector_type))
  && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
-  GET_MODE_SIZE (base_vector_mode)))
+  GET_MODE_SIZE (base_vector_mode))
+ && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
+2, _nelts))
{
  /* Try fusing consecutive sequences of COUNT / NVECTORS elements
 together into elements of type INT_TYPE and using the result
@@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
  poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
  vec_perm_builder sel1 (nelts, 2, 3);
  vec_perm_builder sel2 (nelts, 2, 3);
- poly_int64 half_nelts = exact_div (nelts, 2);
+
  for (unsigned int i = 0; i < 3; ++i)
{
  sel1.quick_push (i);
-- 
2.34.1



[PATCH v5 02/10] RISC-V: autovec: Export policy functions to global scope

2023-04-26 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Remove static declaration to to make externally visible.
(get_mask_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
New external declaration.
(get_mask_policy_for_pred): Ditto.
---
 gcc/config/riscv/riscv-vector-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-vector-builtins.h  | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 01cea23d3e6..1ed9e4acc40 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2493,7 +2493,7 @@ use_real_merge_p (enum predication_type_index pred)
 
 /* Get TAIL policy for predication. If predication indicates TU, return the TU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_tail_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tu || pred == PRED_TYPE_tum || pred == PRED_TYPE_tumu)
@@ -2503,7 +2503,7 @@ get_tail_policy_for_pred (enum predication_type_index 
pred)
 
 /* Get MASK policy for predication. If predication indicates MU, return the MU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_mask_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu)
diff --git a/gcc/config/riscv/riscv-vector-builtins.h 
b/gcc/config/riscv/riscv-vector-builtins.h
index 8ffb9d33e33..de3fd6ca290 100644
--- a/gcc/config/riscv/riscv-vector-builtins.h
+++ b/gcc/config/riscv/riscv-vector-builtins.h
@@ -483,6 +483,9 @@ extern rvv_builtin_types_t builtin_types[NUM_VECTOR_TYPES + 
1];
 extern function_instance get_read_vl_instance (void);
 extern tree get_read_vl_decl (void);
 
+extern rtx get_tail_policy_for_pred (enum predication_type_index pred);
+extern rtx get_mask_policy_for_pred (enum predication_type_index pred);
+
 inline tree
 rvv_arg_type_info::get_scalar_type (vector_type_index type_idx) const
 {
-- 
2.34.1



[PATCH v5 10/10] RISC-V: autovec: This patch supports 8 bit auto-vectorization in riscv.

2023-04-26 Thread Michael Collison
From: Kevin Lee 

2023-04-14 Kevin Lee 
gcc/testsuite/ChangeLog:

* config/riscv/riscv.cc (riscv_autovectorize_vector_modes): Add
new vector mode
* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: Support 8bit
type
* gcc.target/riscv/rvv/autovec/loop-add.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-and-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-and.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-div-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-div.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-max-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-max.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-min-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-min.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mod-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mod.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mul-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mul.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-or-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-or.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-sub.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-xor-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-xor.c: Ditto
---
 gcc/config/riscv/riscv.cc | 1 +
 .../gcc.target/riscv/rvv/autovec/loop-add-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c | 5 +++--
 .../gcc.target/riscv/rvv/autovec/loop-and-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c | 5 +++--
 .../gcc.target/riscv/rvv/autovec/loop-div-rv32.c  | 8 +---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c | 8 +---
 .../gcc.target/riscv/rvv/autovec/loop-max-rv32.c  | 7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max.c | 7 ---
 .../gcc.target/riscv/rvv/autovec/loop-min-rv32.c  | 7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min.c | 7 ---
 .../gcc.target/riscv/rvv/autovec/loop-mod-rv32.c  | 8 +---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod.c | 8 +---
 .../gcc.target/riscv/rvv/autovec/loop-mul-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul.c | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or-rv32.c | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or.c  | 5 +++--
 .../gcc.target/riscv/rvv/autovec/loop-sub-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c | 5 +++--
 .../gcc.target/riscv/rvv/autovec/loop-xor-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor.c | 5 +++--
 21 files changed, 73 insertions(+), 48 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 77209b161f6..f293414acd1 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7143,6 +7143,7 @@ riscv_autovectorize_vector_modes (vector_modes *modes, 
bool)
   modes->safe_push (VNx8QImode);
   modes->safe_push (VNx4QImode);
   modes->safe_push (VNx2QImode);
+  modes->safe_push (VNx1QImode);
 }
 
   return 0;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
index bdc3b6892e9..76f5a3a3ff5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -10,8 +10,9 @@
   dst[i] = a[i] + b[i];\
   }
 
-/* *int8_t not autovec currently. */
 #define TEST_ALL() \
+ TEST_TYPE(int8_t) \
+ TEST_TYPE(uint8_t)\
  TEST_TYPE(int16_t)\
  TEST_TYPE(uint16_t)   \
  TEST_TYPE(int32_t)\
@@ -21,4 +22,4 @@
 
 TEST_ALL()
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
index d7f992c7d27..3d1e10bf4e1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -10,8 +10,9 @@
   dst[i] = a[i] + b[i];\
   }
 
-/* *int8_t not autovec currently. */
 #define TEST_ALL() \
+ TEST_TYPE(int8_t) \
+ TEST_TYPE(uint8_t)\
  TEST_TYPE(int16_t)\
  TEST_TYPE(uint16_t)   \
  TEST_TYPE(int32_t)\
@@ -21,4 +22,4 @@
 
 TEST_ALL()
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
index eb1ac5b44fd..a4c7abfb0ad 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
+++ 

[PATCH v5 03/10] RISC-V:autovec: Add auto-vectorization support functions

2023-04-26 Thread Michael Collison
2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-v.cc
(riscv_vector_preferred_simd_mode): New function.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
---
 gcc/config/riscv/riscv-v.cc | 79 +
 1 file changed, 79 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 392f5d02e17..ecd98680d64 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -39,9 +39,11 @@
 #include "emit-rtl.h"
 #include "tm_p.h"
 #include "target.h"
+#include "targhooks.h"
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
+#include "riscv-vector-builtins.h"
 #include "rtx-vector-builder.h"
 
 using namespace riscv_vector;
@@ -176,6 +178,46 @@ calculate_ratio (unsigned int sew, enum vlmul_type vlmul)
   return ratio;
 }
 
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */
+
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode)
+{
+  if (!TARGET_VECTOR)
+return word_mode;
+
+  switch (mode)
+{
+case E_QImode:
+  return VNx8QImode;
+  break;
+case E_HImode:
+  return VNx4HImode;
+  break;
+case E_SImode:
+  return VNx2SImode;
+  break;
+case E_DImode:
+  if (riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+ && riscv_vector_elen_flags != MASK_VECTOR_ELEN_FP_32)
+   return VNx1DImode;
+  break;
+case E_SFmode:
+  if (TARGET_HARD_FLOAT && riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+ && riscv_vector_elen_flags != MASK_VECTOR_ELEN_64)
+   return VNx2SFmode;
+  break;
+case E_DFmode:
+  if (TARGET_DOUBLE_FLOAT && TARGET_VECTOR_ELEN_FP_64)
+   return VNx1DFmode;
+  break;
+default:
+  break;
+}
+
+  return word_mode;
+}
+
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
@@ -421,6 +463,43 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }
 
+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return true if it is a RVV mask mode.  */
+bool
+riscv_vector_mask_mode_p (machine_mode mode)
+{
+  return (mode == VNx1BImode || mode == VNx2BImode || mode == VNx4BImode
+ || mode == VNx8BImode || mode == VNx16BImode || mode == VNx32BImode
+ || mode == VNx64BImode);
+}
+
+/* Implement TARGET_VECTORIZE_GET_MASK_MODE for RVV.  */
+
+opt_machine_mode
+riscv_vector_get_mask_mode (machine_mode mode)
+{
+  machine_mode mask_mode;
+  int nf = 1;
+
+  FOR_EACH_MODE_IN_CLASS (mask_mode, MODE_VECTOR_BOOL)
+  if (GET_MODE_INNER (mask_mode) == BImode
+  && known_eq (GET_MODE_NUNITS (mask_mode) * nf, GET_MODE_NUNITS (mode))
+  && riscv_vector_mask_mode_p (mask_mode))
+return mask_mode;
+  return default_get_mask_mode (mode);
+}
+
 /* Return the RVV vector mode that has NUNITS elements of mode INNER_MODE.
This function is not only used by builtins, but also will be used by
auto-vectorization in the future.  */
-- 
2.34.1



[PATCH v5 09/10] RISC-V: autovec: This patch adds a guard for VNx1 vectors that are present in ports like riscv.

2023-04-26 Thread Michael Collison
From: Kevin Lee 

Kevin Lee 
gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_grouped_store_supported): Add new
condition
---
 gcc/tree-vect-data-refs.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 8daf7bd7dd3..df393ba723d 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -5399,6 +5399,8 @@ vect_grouped_store_supported (tree vectype, unsigned 
HOST_WIDE_INT count)
  poly_uint64 nelt = GET_MODE_NUNITS (mode);
 
  /* The encoding has 2 interleaved stepped patterns.  */
+if(!multiple_p (nelt, 2))
+  return false;
  vec_perm_builder sel (nelt, 2, 3);
  sel.quick_grow (6);
  for (i = 0; i < 3; i++)
-- 
2.34.1



[PATCH v5 08/10] RISC-V:autovec: Add autovectorization tests for binary integer

2023-04-26 Thread Michael Collison
2023-04-05  Michael Collison  

* gcc.target/riscv/rvv/autovec/loop-and-rv32.c: New
test to verify code generation of vector "and" on rv32.
* gcc.target/riscv/rvv/autovec/loop-and.c: New
test to verify code generation of vector "and" on rv64.
* gcc.target/riscv/rvv/autovec/loop-div-rv32.c: New
test to verify code generation of vector divide on rv32.
* gcc.target/riscv/rvv/autovec/loop-div.c: New
test to verify code generation of vector divide on rv64.
* gcc.target/riscv/rvv/autovec/loop-max-rv32.c: New
test to verify code generation of vector maximum on rv32.
* gcc.target/riscv/rvv/autovec/loop-max.c: New
test to verify code generation of vector maximum on rv64.
* gcc.target/riscv/rvv/autovec/loop-min-rv32.c: New
test to verify code generation of vector minimum on rv32.
* gcc.target/riscv/rvv/autovec/loop-min.c: New
test to verify code generation of vector minimum on rv64.
* gcc.target/riscv/rvv/autovec/loop-mod-rv32.c: New
test to verify code generation of vector modulus on rv32.
* gcc.target/riscv/rvv/autovec/loop-mod.c: New
test to verify code generation of vector modulus on rv64.
* gcc.target/riscv/rvv/autovec/loop-mul-rv32.c: New
test to verify code generation of vector multiply on rv32.
* gcc.target/riscv/rvv/autovec/loop-mul.c: New
test to verify code generation of vector multiply on rv64.
* gcc.target/riscv/rvv/autovec/loop-or-rv32.c: New
test to verify code generation of vector "or" on rv32.
* gcc.target/riscv/rvv/autovec/loop-or.c: New
test to verify code generation of vector "or" on rv64.
* gcc.target/riscv/rvv/autovec/loop-xor-rv32.c: New
test to verify code generation of vector xor on rv32.
* gcc.target/riscv/rvv/autovec/loop-xor.c: New
test to verify code generation of vector xor on rv64.
---
 .../riscv/rvv/autovec/loop-and-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-and.c   | 24 ++
 .../riscv/rvv/autovec/loop-div-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-div.c   | 25 +++
 .../riscv/rvv/autovec/loop-max-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-max.c   | 25 +++
 .../riscv/rvv/autovec/loop-min-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-min.c   | 25 +++
 .../riscv/rvv/autovec/loop-mod-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   | 25 +++
 .../riscv/rvv/autovec/loop-mul-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   | 24 ++
 .../riscv/rvv/autovec/loop-or-rv32.c  | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-or.c| 24 ++
 .../riscv/rvv/autovec/loop-xor-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   | 24 ++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  3 +++
 17 files changed, 395 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
new file mode 100644
index 000..eb1ac5b44fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  vo

[PATCH v5 00/10] RISC-V: autovec: Add autovec support

2023-04-26 Thread Michael Collison
This series of patches adds foundational support for RISC-V auto-vectorization 
support. These patches are based on the current upstream rvv vector intrinsic 
support and is not a new implementation. Most of the implementation consists of 
adding the new vector cost model, the autovectorization patterns themselves and 
target hooks. This implementation only provides support for integer addition 
and subtraction as a proof of concept. This patch set should not be construed 
to be feature complete. Based on conversations with the community these patches 
are intended to lay the groundwork for feature completion and collaboration 
within the RISC-V community.

These patches are largely based off the work of Juzhe Zhong 
(juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>) of RiVAI. More specifically 
the rvv-next branch at: https://github.com/riscv-collab/riscv-gcc.git 
<https://github.com/riscv-collab/riscv-gcc.git>is the foundation of this patch 
set. 

As discussed on this list, if these patches are approved they will be merged 
into a "auto-vectorization" branch once gcc-13 branches for release. There are 
two known issues related to crashes (assert failures) associated with tree 
vectorization; one of which I have sent a patch for and have received feedback. 

Changes in v5:

- Incorporated upstream comments large to delete unnecessary code

Changes in v4:

- Added support for binary integer operations and test cases
- Fixed bug to support 8-bit integer vectorization
- Fixed several assert errors related to non-multiple of two vector modes

Changes in v3:

- Removed the cost model and cost hooks based on feedback from Richard Biener
- Used RVV_VUNDEF macro to fix failing patterns

Changes in v2 

- Updated ChangeLog entry to include RiVAI contributions 
- Fixed ChangeLog email formatting 
- Fixed gnu formatting issues in the code 


Kevin Lee (2):
  This patch adds a guard for VNx1 vectors that are present in ports
like riscv.
  This patch supports 8 bit auto-vectorization in riscv.

Michael Collison (8):
  RISC-V: Add new predicates and function prototypes
  RISC-V: autovec: Export policy functions to global scope
  RISC-V:autovec: Add auto-vectorization support functions
  RISC-V:autovec: Add target vectorization hooks
  RISC-V:autovec: Add autovectorization patterns for binary integer &
len_load/store
  RISC-V:autovec: Add autovectorization tests for add & sub
  vect: Verify that GET_MODE_NUNITS is a multiple of 2.
  RISC-V:autovec: Add autovectorization tests for binary integer

 gcc/config/riscv/predicates.md|  13 ++
 gcc/config/riscv/riscv-opts.h |  29 
 gcc/config/riscv/riscv-protos.h   |   9 ++
 gcc/config/riscv/riscv-v.cc   |  79 +++
 gcc/config/riscv/riscv-vector-builtins.cc |   4 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +
 gcc/config/riscv/riscv.cc | 130 ++
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/vector-auto.md   |  74 ++
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/loop-add-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-add.c   |  25 
 .../riscv/rvv/autovec/loop-and-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-and.c   |  25 
 .../riscv/rvv/autovec/loop-div-rv32.c |  27 
 .../gcc.target/riscv/rvv/autovec/loop-div.c   |  27 
 .../riscv/rvv/autovec/loop-max-rv32.c |  26 
 .../gcc.target/riscv/rvv/autovec/loop-max.c   |  26 
 .../riscv/rvv/autovec/loop-min-rv32.c |  26 
 .../gcc.target/riscv/rvv/autovec/loop-min.c   |  26 
 .../riscv/rvv/autovec/loop-mod-rv32.c |  27 
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   |  27 
 .../riscv/rvv/autovec/loop-mul-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   |  25 
 .../riscv/rvv/autovec/loop-or-rv32.c  |  25 
 .../gcc.target/riscv/rvv/autovec/loop-or.c|  25 
 .../riscv/rvv/autovec/loop-sub-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  25 
 .../riscv/rvv/autovec/loop-xor-rv32.c |  25 
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   |  25 
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   3 +
 gcc/tree-vect-data-refs.cc|   2 +
 gcc/tree-vect-slp.cc  |   7 +-
 33 files changed, 864 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div-rv32.c
 create mode 100644 

[PATCH v5 06/10] RISC-V:autovec: Add autovectorization tests for add & sub

2023-04-26 Thread Michael Collison
2023-03-02  Michael Collison  
Vineet Gupta 

* gcc.target/riscv/rvv/autovec: New directory
for autovectorization tests.
* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: New
test to verify code generation of vector add on rv32.
* gcc.target/riscv/rvv/autovec/loop-add.c: New
test to verify code generation of vector add on rv64.
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: New
test to verify code generation of vector subtract on rv32.
* gcc.target/riscv/rvv/autovec/loop-sub.c: New
test to verify code generation of vector subtract on rv64.
---
 .../riscv/rvv/autovec/loop-add-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   | 24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   | 24 +++
 4 files changed, 96 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
new file mode 100644
index 000..bdc3b6892e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
new file mode 100644
index 000..d7f992c7d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
new file mode 100644
index 000..7d0a40ec539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] - b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvsub\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
new file mode 100644
index 000..c8900884f83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)

[PATCH v5 05/10] RISC-V:autovec: Add autovectorization patterns for binary integer & len_load/store

2023-04-26 Thread Michael Collison
2023-04-25  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.md (riscv_vector_preferred_simd_mode): Include
vector-iterators.md.
* config/riscv/vector-auto.md: New file containing
autovectorization patterns.
* config/riscv/vector.md: Remove include of vector-iterators.md
and include vector-auto.md.
---
 gcc/config/riscv/riscv.md   |  1 +
 gcc/config/riscv/vector-auto.md | 74 +
 gcc/config/riscv/vector.md  |  4 +-
 3 files changed, 77 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index bc384d9aedf..7f8f3a6cb18 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -135,6 +135,7 @@
 (include "predicates.md")
 (include "constraints.md")
 (include "iterators.md")
+(include "vector-iterators.md")
 
 ;; 
 ;;
diff --git a/gcc/config/riscv/vector-auto.md b/gcc/config/riscv/vector-auto.md
new file mode 100644
index 000..83d2ab6957a
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
@@ -0,0 +1,74 @@
+;; Machine description for RISC-V 'V' Extension for GNU compiler.
+;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+;; Contributed by Michael Collison (colli...@rivosinc.com, Rivos Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; len_load/len_store is a sub-optimal pattern for RVV auto-vectorization 
support.
+;; We will replace them when len_maskload/len_maskstore is supported in loop 
vectorizer.
+(define_expand "len_load_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand:V 1 "memory_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
+
+(define_expand "len_store_"
+  [(match_operand:V 0 "memory_operand")
+   (match_operand:V 1 "register_operand")
+   (match_operand 2 "vector_length_operand")
+   (match_operand 3 "const_0_operand")]
+  "TARGET_VECTOR"
+{
+  riscv_vector::emit_nonvlmax_op (code_for_pred_mov (mode), operands[0],
+ operands[1], operands[2], mode);
+  DONE;
+})
+
+;; -
+;;  [INT] Vector binary patterns
+;; -
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+   (any_int_binop:VI (match_operand:VI 1 "")
+ (match_operand:VI 2 "")))]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = RVV_VUNDEF (mode);
+  rtx vl = gen_reg_rtx (Pmode);
+  emit_vlmax_vsetvl (mode, vl);
+  rtx mask_policy = get_mask_policy_no_pred ();
+  rtx tail_policy = get_tail_policy_no_pred ();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx (NONVLMAX);
+
+  emit_insn (gen_pred_ (operands[0], mask, merge, operands[1], 
operands[2],
+vl, tail_policy, mask_policy, 
vlmax_avl_p));
+
+  DONE;
+})
+
+
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0ecca98f20c..2ac5b744503 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -26,8 +26,6 @@
 ;; - Auto-vectorization (TBD)
 ;; - Combine optimization (TBD)
 
-(include "vector-iterators.md")
-
 (define_constants [
(INVALID_ATTRIBUTE255)
(X0_REGNUM  0)
@@ -351,6 +349,8 @@
   (symbol_ref "INTVAL (operands[4])")]
(const_int INVALID_ATTRIBUTE)))
 
+(include "vector-auto.md")
+
 ;; -
 ;;  Miscellaneous Operations
 ;; -
-- 
2.34.1



[PATCH v5 01/10] RISC-V: autovec: Add new predicates and function prototypes

2023-04-26 Thread Michael Collison
2023-04-24  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-protos.h
(riscv_vector_preferred_simd_mode): New.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(emit_vlmax_vsetvl): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(vlmul_field_enum): Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
Remove static scope.
* config/riscv/predicates.md (p_reg_or_const_csr_operand):
New predicate.
(vector_reg_or_const_dup_operand): Ditto.
* config/riscv/riscv-opts.h (riscv_vector_bits_enum): New enum.
(riscv_vector_lmul_enum): Ditto.
(vlmul_field_enum): Ditto.
---
 gcc/config/riscv/predicates.md  | 13 +
 gcc/config/riscv/riscv-opts.h   | 29 +
 gcc/config/riscv/riscv-protos.h |  9 +
 3 files changed, 51 insertions(+)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8654dbc5943..b3f2d622c7b 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -264,6 +264,14 @@
 })
 
 ;; Predicates for the V extension.
+(define_special_predicate "p_reg_or_const_csr_operand"
+  (match_code "reg, subreg, const_int")
+{
+  if (CONST_INT_P (op))
+return satisfies_constraint_K (op);
+  return GET_MODE (op) == Pmode;
+})
+
 (define_special_predicate "vector_length_operand"
   (ior (match_operand 0 "pmode_register_operand")
(match_operand 0 "const_csr_operand")))
@@ -291,6 +299,11 @@
   (and (match_code "const_vector")
(match_test "rtx_equal_p (op, riscv_vector::gen_scalar_move_mask 
(GET_MODE (op)))")))
 
+(define_predicate "vector_reg_or_const_dup_operand"
+  (ior (match_operand 0 "register_operand")
+   (match_test "const_vec_duplicate_p (op)
+   && !CONST_POLY_INT_P (CONST_VECTOR_ELT (op, 0))")))
+
 (define_predicate "vector_mask_operand"
   (ior (match_operand 0 "register_operand")
(match_operand 0 "vector_all_trues_mask_operand")))
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index cf0cd669be4..af77df11430 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,35 @@ enum stack_protector_guard {
   SSP_GLOBAL   /* global canary */
 };
 
+/* RISC-V auto-vectorization preference.  */
+enum riscv_autovec_preference_enum {
+  NO_AUTOVEC,
+  RVV_SCALABLE,
+  RVV_FIXED_VLMAX
+};
+
+/* vectorization factor.  */
+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
+enum vlmul_field_enum
+{
+  VLMUL_FIELD_000, /* LMUL = 1.  */
+  VLMUL_FIELD_001, /* LMUL = 2.  */
+  VLMUL_FIELD_010, /* LMUL = 4.  */
+  VLMUL_FIELD_011, /* LMUL = 8.  */
+  VLMUL_FIELD_100, /* RESERVED.  */
+  VLMUL_FIELD_101, /* LMUL = 1/8.  */
+  VLMUL_FIELD_110, /* LMUL = 1/4.  */
+  VLMUL_FIELD_111, /* LMUL = 1/2.  */
+  MAX_VLMUL_FIELD
+};
+
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5244e8dcbf0..55056222e57 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -237,4 +237,13 @@ extern const char*
 th_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE);
 #endif
 
+/* Routines implemented in riscv-v.cc.  */
+
+namespace riscv_vector {
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode);
+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
+extern rtx get_mask_policy_no_pred ();
+extern rtx get_tail_policy_no_pred ();
+}
 #endif /* ! GCC_RISCV_PROTOS_H */
-- 
2.34.1



Re: [PATCH v4 05/10] RISC-V:autovec: Add autovectorization patterns for binary integer operations

2023-04-20 Thread Michael Collison

Hi Kito,

I will remove the unused UNSPECs, thank you for finding them.

I removed the include of "vector-iterators.md" because "riscv.md" 
already includes it and I was receiving multiple definition errors.


On 4/18/23 21:19, Kito Cheng wrote:

diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 70ad85b661b..7fae87968d7 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -34,6 +34,8 @@
UNSPEC_VMULHU
UNSPEC_VMULHSU

+  UNSPEC_VADD
+  UNSPEC_VSUB

Defined but unused?


UNSPEC_VADC
UNSPEC_VSBC
UNSPEC_VMADC
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0ecca98f20c..2ac5b744503 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -26,8 +26,6 @@
  ;; - Auto-vectorization (TBD)
  ;; - Combine optimization (TBD)

-(include "vector-iterators.md")
-

Why remove this?


Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-04-18 Thread Michael Collison

Juzhe and Kito,

Thank you for the clarification.

On 4/18/23 18:48, juzhe.zh...@rivai.ai wrote:

Yes, like kito said.
We won't enable VNx1DImode in auto-vectorization so it's meaningless 
to fix it here.
We dynamic adjust the minimum vector-length for different '-march' 
according to RVV ISA specification.

So we strongly suggest that we should drop this fix.

Thanks.

juzhe.zh...@rivai.ai

*From:* Kito Cheng <mailto:kito.ch...@gmail.com>
*Date:* 2023-04-19 02:21
*To:* Richard Biener <mailto:richard.guent...@gmail.com>; Jeff Law
<mailto:jeffreya...@gmail.com>; Palmer Dabbelt
<mailto:pal...@dabbelt.com>
    *CC:* Michael Collison <mailto:colli...@rivosinc.com>; gcc-patches
<mailto:gcc-patches@gcc.gnu.org>; 钟居哲 <mailto:juzhe.zh...@rivai.ai>
*Subject:* Re: [PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS
is a multiple of 2.
Few more background about RVV:
RISC-V has provide different VLEN configuration by different ISA
extension like `zve32x`, `zve64x` and `v`
zve32x just guarantee the minimal VLEN is 32 bits,
zve64x guarantee the minimal VLEN is 64 bits,
and v guarantee the minimal VLEN is 128 bits,
Current status (without that patch):
Zve32x: Mode for one vector register mode is VNx1SImode and VNx1DImode
is invalid mode
- one vector register could hold 1 + 1x SImode where x is 0~n, so it
might hold just one SI
Zve64x: Mode for one vector register mode is VNx1DImode or VNx2SImode
- one vector register could hold 1 + 1x DImode where x is 0~n, so it
might hold just one DI
- one vector register could hold 2 + 2x SImode where x is 0~n, so it
might hold just two SI
So what I want to say here is VNx1DImode is really NOT safe to assume
to have more than two DI in theory.
However `v` extension guarantees the minimal VLEN is 128 bits.
We are trying to introduce another type/mode mapping for this
configure:
v: Mode for one vector register mode is VNx2DImode or VNx4SImode
- one vector register could hold 2 + 2x DImode where x is 0~n, so it
will hold at least two DI
- one vector register could hold 4 + 4x SImode where x is 0~n, so it
will hold at least four DI
So GET_MODE_NUNITS for a single vector register with DI mode will
become 2 (VNx2DImode) if it is really possible, which is a more
precise way to model the vector extension for RISC-V .
On Tue, Apr 18, 2023 at 10:28 PM Kito Cheng 
wrote:
>
> Wait, VNx1DImode can be really evaluate to just one element if
> -march=rv64g_zve64x,
>
> I thinks this should be just fixed on backend by this patch:
>
>

https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zh...@rivai.ai/
>
> On Tue, Apr 18, 2023 at 2:12 PM Richard Biener via Gcc-patches
    >  wrote:
> >
> > On Mon, Apr 17, 2023 at 8:42 PM Michael Collison
 wrote:
> > >
> > > While working on autovectorizing for the RISCV port I
encountered an issue
> > > where can_duplicate_and_interleave_p assumes that
GET_MODE_NUNITS is a
> > > evenly divisible by two. The RISC-V target has vector modes
(e.g. VNx1DImode),
> > > where GET_MODE_NUNITS is equal to one.
    > > >
> > > Tested on RISCV and x86_64-linux-gnu. Okay?
> >
> > OK.
> >
> > > 2023-03-09  Michael Collison 
> > >
> > > * tree-vect-slp.cc (can_duplicate_and_interleave_p):
> > > Check that GET_MODE_NUNITS is a multiple of 2.
> > > ---
> > >  gcc/tree-vect-slp.cc | 7 +--
> > >  1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> > > index d73deaecce0..a64fe454e19 100644
> > > --- a/gcc/tree-vect-slp.cc
> > > +++ b/gcc/tree-vect-slp.cc
> > > @@ -423,10 +423,13 @@ can_duplicate_and_interleave_p
(vec_info *vinfo, unsigned int count,
> > > (GET_MODE_BITSIZE (int_mode), 1);
> > >   tree vector_type
> > > = get_vectype_for_scalar_type (vinfo, int_type,
count);
> > > + poly_int64 half_nelts;
> > >   if (vector_type
> > >   && VECTOR_MODE_P (TYPE_MODE (vector_type))
> > >   && known_eq (GET_MODE_SIZE (TYPE_MODE
(vector_type)),
> > > -  GET_MODE_SIZE (base_vector_mode)))
> > > +  GET_MODE_SIZE (base_vector_mode))
> > > + &

Re: [PATCH v4 09/10] This patch adds a guard for VNx1 vectors that are present in ports like riscv.

2023-04-18 Thread Michael Collison

Thanks Kito I will look into this.


On 4/18/23 10:26, Kito Cheng wrote:

I would prefer drop this patch from this patch series since I believe
https://patchwork.ozlabs.org/project/gcc/patch/20230414014518.15458-1-juzhe.zh...@rivai.ai/
is the right fix for this issue.

On Tue, Apr 18, 2023 at 2:40 AM Michael Collison  wrote:

From: Kevin Lee 

Kevin Lee 
gcc/ChangeLog:

 * tree-vect-data-refs.cc (vect_grouped_store_supported): Add new
condition
---
  gcc/tree-vect-data-refs.cc | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 8daf7bd7dd3..df393ba723d 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -5399,6 +5399,8 @@ vect_grouped_store_supported (tree vectype, unsigned 
HOST_WIDE_INT count)
   poly_uint64 nelt = GET_MODE_NUNITS (mode);

   /* The encoding has 2 interleaved stepped patterns.  */
+if(!multiple_p (nelt, 2))
+  return false;
   vec_perm_builder sel (nelt, 2, 3);
   sel.quick_grow (6);
   for (i = 0; i < 3; i++)
--
2.34.1



[PATCH v4 10/10] This patch supports 8 bit auto-vectorization in riscv.

2023-04-17 Thread Michael Collison
From: Kevin Lee 

2023-04-14 Kevin Lee 
gcc/testsuite/ChangeLog:

* config/riscv/riscv.cc (riscv_autovectorize_vector_modes): Add
new vector mode
* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: Support 8bit
type
* gcc.target/riscv/rvv/autovec/loop-add.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-and-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-and.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-div-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-div.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-max-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-max.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-min-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-min.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mod-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mod.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mul-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-mul.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-or-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-or.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-sub.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-xor-rv32.c: Ditto
* gcc.target/riscv/rvv/autovec/loop-xor.c: Ditto
---
 gcc/config/riscv/riscv.cc | 1 +
 .../gcc.target/riscv/rvv/autovec/loop-add-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c | 5 +++--
 .../gcc.target/riscv/rvv/autovec/loop-and-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c | 5 +++--
 .../gcc.target/riscv/rvv/autovec/loop-div-rv32.c  | 8 +---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c | 8 +---
 .../gcc.target/riscv/rvv/autovec/loop-max-rv32.c  | 7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max.c | 7 ---
 .../gcc.target/riscv/rvv/autovec/loop-min-rv32.c  | 7 ---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min.c | 7 ---
 .../gcc.target/riscv/rvv/autovec/loop-mod-rv32.c  | 8 +---
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod.c | 8 +---
 .../gcc.target/riscv/rvv/autovec/loop-mul-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul.c | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or-rv32.c | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or.c  | 5 +++--
 .../gcc.target/riscv/rvv/autovec/loop-sub-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c | 5 +++--
 .../gcc.target/riscv/rvv/autovec/loop-xor-rv32.c  | 5 +++--
 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor.c | 5 +++--
 21 files changed, 73 insertions(+), 48 deletions(-)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9af06d926cf..a2cb83e1916 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -7147,6 +7147,7 @@ riscv_autovectorize_vector_modes (vector_modes *modes, 
bool)
   modes->safe_push (VNx8QImode);
   modes->safe_push (VNx4QImode);
   modes->safe_push (VNx2QImode);
+  modes->safe_push (VNx1QImode);
 }
   else if (riscv_vectorization_factor == RVV_LMUL2)
 {
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
index bdc3b6892e9..76f5a3a3ff5 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -10,8 +10,9 @@
   dst[i] = a[i] + b[i];\
   }
 
-/* *int8_t not autovec currently. */
 #define TEST_ALL() \
+ TEST_TYPE(int8_t) \
+ TEST_TYPE(uint8_t)\
  TEST_TYPE(int16_t)\
  TEST_TYPE(uint16_t)   \
  TEST_TYPE(int32_t)\
@@ -21,4 +22,4 @@
 
 TEST_ALL()
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
index d7f992c7d27..3d1e10bf4e1 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -10,8 +10,9 @@
   dst[i] = a[i] + b[i];\
   }
 
-/* *int8_t not autovec currently. */
 #define TEST_ALL() \
+ TEST_TYPE(int8_t) \
+ TEST_TYPE(uint8_t)\
  TEST_TYPE(int16_t)\
  TEST_TYPE(uint16_t)   \
  TEST_TYPE(int32_t)\
@@ -21,4 +22,4 @@
 
 TEST_ALL()
 
-/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 8 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
index eb1ac5b44fd..a4c7abfb0ad 100644
--- 

[PATCH v4 08/10] RISC-V:autovec: Add autovectorization tests for binary integer

2023-04-17 Thread Michael Collison
2023-04-05  Michael Collison  

* gcc.target/riscv/rvv/autovec/loop-and-rv32.c: New
test to verify code generation of vector "and" on rv32.
* gcc.target/riscv/rvv/autovec/loop-and.c: New
test to verify code generation of vector "and" on rv64.
* gcc.target/riscv/rvv/autovec/loop-div-rv32.c: New
test to verify code generation of vector divide on rv32.
* gcc.target/riscv/rvv/autovec/loop-div.c: New
test to verify code generation of vector divide on rv64.
* gcc.target/riscv/rvv/autovec/loop-max-rv32.c: New
test to verify code generation of vector maximum on rv32.
* gcc.target/riscv/rvv/autovec/loop-max.c: New
test to verify code generation of vector maximum on rv64.
* gcc.target/riscv/rvv/autovec/loop-min-rv32.c: New
test to verify code generation of vector minimum on rv32.
* gcc.target/riscv/rvv/autovec/loop-min.c: New
test to verify code generation of vector minimum on rv64.
* gcc.target/riscv/rvv/autovec/loop-mod-rv32.c: New
test to verify code generation of vector modulus on rv32.
* gcc.target/riscv/rvv/autovec/loop-mod.c: New
test to verify code generation of vector modulus on rv64.
* gcc.target/riscv/rvv/autovec/loop-mul-rv32.c: New
test to verify code generation of vector multiply on rv32.
* gcc.target/riscv/rvv/autovec/loop-mul.c: New
test to verify code generation of vector multiply on rv64.
* gcc.target/riscv/rvv/autovec/loop-or-rv32.c: New
test to verify code generation of vector "or" on rv32.
* gcc.target/riscv/rvv/autovec/loop-or.c: New
test to verify code generation of vector "or" on rv64.
* gcc.target/riscv/rvv/autovec/loop-xor-rv32.c: New
test to verify code generation of vector xor on rv32.
* gcc.target/riscv/rvv/autovec/loop-xor.c: New
test to verify code generation of vector xor on rv64.
---
 .../riscv/rvv/autovec/loop-and-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-and.c   | 24 ++
 .../riscv/rvv/autovec/loop-div-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-div.c   | 25 +++
 .../riscv/rvv/autovec/loop-max-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-max.c   | 25 +++
 .../riscv/rvv/autovec/loop-min-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-min.c   | 25 +++
 .../riscv/rvv/autovec/loop-mod-rv32.c | 25 +++
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   | 25 +++
 .../riscv/rvv/autovec/loop-mul-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   | 24 ++
 .../riscv/rvv/autovec/loop-or-rv32.c  | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-or.c| 24 ++
 .../riscv/rvv/autovec/loop-xor-rv32.c | 24 ++
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   | 24 ++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  3 +++
 17 files changed, 395 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-max.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-min.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mod.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-mul.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-or.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-xor.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
new file mode 100644
index 000..eb1ac5b44fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  vo

[PATCH v4 07/10] vect: Verify that GET_MODE_NUNITS is a multiple of 2.

2023-04-17 Thread Michael Collison
While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
where GET_MODE_NUNITS is equal to one.

Tested on RISCV and x86_64-linux-gnu. Okay?

2023-03-09  Michael Collison  

* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a multiple of 2.
---
 gcc/tree-vect-slp.cc | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index d73deaecce0..a64fe454e19 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -423,10 +423,13 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
(GET_MODE_BITSIZE (int_mode), 1);
  tree vector_type
= get_vectype_for_scalar_type (vinfo, int_type, count);
+ poly_int64 half_nelts;
  if (vector_type
  && VECTOR_MODE_P (TYPE_MODE (vector_type))
  && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
-  GET_MODE_SIZE (base_vector_mode)))
+  GET_MODE_SIZE (base_vector_mode))
+ && multiple_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)),
+2, _nelts))
{
  /* Try fusing consecutive sequences of COUNT / NVECTORS elements
 together into elements of type INT_TYPE and using the result
@@ -434,7 +437,7 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
  poly_uint64 nelts = GET_MODE_NUNITS (TYPE_MODE (vector_type));
  vec_perm_builder sel1 (nelts, 2, 3);
  vec_perm_builder sel2 (nelts, 2, 3);
- poly_int64 half_nelts = exact_div (nelts, 2);
+
  for (unsigned int i = 0; i < 3; ++i)
{
  sel1.quick_push (i);
-- 
2.34.1



[PATCH v4 06/10] RISC-V:autovec: Add autovectorization tests for add & sub

2023-04-17 Thread Michael Collison
2023-03-02  Michael Collison  
Vineet Gupta 

* gcc.target/riscv/rvv/autovec: New directory
for autovectorization tests.
* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: New
test to verify code generation of vector add on rv32.
* gcc.target/riscv/rvv/autovec/loop-add.c: New
test to verify code generation of vector add on rv64.
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: New
test to verify code generation of vector subtract on rv32.
* gcc.target/riscv/rvv/autovec/loop-sub.c: New
test to verify code generation of vector subtract on rv64.
---
 .../riscv/rvv/autovec/loop-add-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   | 24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   | 24 +++
 4 files changed, 96 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
new file mode 100644
index 000..bdc3b6892e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
new file mode 100644
index 000..d7f992c7d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
new file mode 100644
index 000..7d0a40ec539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] - b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvsub\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
new file mode 100644
index 000..c8900884f83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)

[PATCH v4 09/10] This patch adds a guard for VNx1 vectors that are present in ports like riscv.

2023-04-17 Thread Michael Collison
From: Kevin Lee 

Kevin Lee 
gcc/ChangeLog:

* tree-vect-data-refs.cc (vect_grouped_store_supported): Add new
condition
---
 gcc/tree-vect-data-refs.cc | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc
index 8daf7bd7dd3..df393ba723d 100644
--- a/gcc/tree-vect-data-refs.cc
+++ b/gcc/tree-vect-data-refs.cc
@@ -5399,6 +5399,8 @@ vect_grouped_store_supported (tree vectype, unsigned 
HOST_WIDE_INT count)
  poly_uint64 nelt = GET_MODE_NUNITS (mode);
 
  /* The encoding has 2 interleaved stepped patterns.  */
+if(!multiple_p (nelt, 2))
+  return false;
  vec_perm_builder sel (nelt, 2, 3);
  sel.quick_grow (6);
  for (i = 0; i < 3; i++)
-- 
2.34.1



[PATCH v4 03/10] RISC-V:autovec: Add auto-vectorization support functions

2023-04-17 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-v.cc (riscv_classify_vlmul_field):
New function.
(riscv_vector_preferred_simd_mode): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(riscv_tuple_mode_p): Ditto.
(riscv_classify_nf): Ditto.
(riscv_vlmul_regsize): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
---
 gcc/config/riscv/riscv-v.cc | 176 
 1 file changed, 176 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 392f5d02e17..9df86419caa 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -39,9 +39,11 @@
 #include "emit-rtl.h"
 #include "tm_p.h"
 #include "target.h"
+#include "targhooks.h"
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
+#include "riscv-vector-builtins.h"
 #include "rtx-vector-builder.h"
 
 using namespace riscv_vector;
@@ -118,6 +120,41 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
  && IN_RANGE (INTVAL (elt), minval, maxval));
 }
 
+/* Return the vlmul field for a specific machine mode.  */
+unsigned int
+riscv_classify_vlmul_field (enum machine_mode mode)
+{
+  /* Make the decision based on the mode's enum value rather than its
+ properties, so that we keep the correct classification regardless
+ of -mriscv-vector-bits.  */
+  switch (mode)
+{
+case E_VNx8BImode:
+  return VLMUL_FIELD_111;
+
+case E_VNx4BImode:
+  return VLMUL_FIELD_110;
+
+case E_VNx2BImode:
+  return VLMUL_FIELD_101;
+
+case E_VNx16BImode:
+  return VLMUL_FIELD_000;
+
+case E_VNx32BImode:
+  return VLMUL_FIELD_001;
+
+case E_VNx64BImode:
+  return VLMUL_FIELD_010;
+
+default:
+  break;
+}
+
+  /* we don't care about VLMUL for Mask.  */
+  return VLMUL_FIELD_000;
+}
+
 /* Emit a vlmax vsetvl instruction.  This should only be used when
optimization is disabled or after vsetvl insertion pass.  */
 void
@@ -176,6 +213,64 @@ calculate_ratio (unsigned int sew, enum vlmul_type vlmul)
   return ratio;
 }
 
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */
+
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode, unsigned vf)
+{
+  if (!TARGET_VECTOR)
+return word_mode;
+
+  switch (mode)
+{
+case E_QImode:
+  return vf == 1   ? VNx8QImode
+: vf == 2 ? VNx16QImode
+: vf == 4 ? VNx32QImode
+  : VNx64QImode;
+  break;
+case E_HImode:
+  return vf == 1   ? VNx4HImode
+: vf == 2 ? VNx8HImode
+: vf == 4 ? VNx16HImode
+  : VNx32HImode;
+  break;
+case E_SImode:
+  return vf == 1   ? VNx2SImode
+: vf == 2 ? VNx4SImode
+: vf == 4 ? VNx8SImode
+  : VNx16SImode;
+  break;
+case E_DImode:
+  if (riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+ && riscv_vector_elen_flags != MASK_VECTOR_ELEN_FP_32)
+   return vf == 1   ? VNx1DImode
+  : vf == 2 ? VNx2DImode
+  : vf == 4 ? VNx4DImode
+: VNx8DImode;
+  break;
+case E_SFmode:
+  if (TARGET_HARD_FLOAT && riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+ && riscv_vector_elen_flags != MASK_VECTOR_ELEN_64)
+   return vf == 1   ? VNx2SFmode
+  : vf == 2 ? VNx4SFmode
+  : vf == 4 ? VNx8SFmode
+: VNx16SFmode;
+  break;
+case E_DFmode:
+  if (TARGET_DOUBLE_FLOAT && TARGET_VECTOR_ELEN_FP_64)
+   return vf == 1   ? VNx1DFmode
+  : vf == 2 ? VNx2DFmode
+  : vf == 4 ? VNx4DFmode
+: VNx8DFmode;
+  break;
+default:
+  break;
+}
+
+  return word_mode;
+}
+
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
@@ -421,6 +516,87 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }
 
+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return true if it is a RVV tuple mode.  */
+bool
+riscv_tuple_mode_p (machine_mode mode ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
+/* Return nf for a machine mode.  */
+int
+riscv_classify_nf (machine_mode mode)
+{
+  switch (mode)
+{
+
+default:
+  break;
+}
+
+  return 1;
+}
+
+/* Return vlmul register size for a machine mode.  */
+int
+riscv_vlmul_regsize (machine_mode mode)
+{
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
+return 1;
+  switch (riscv_classify_vl

[PATCH v4 05/10] RISC-V:autovec: Add autovectorization patterns for binary integer operations

2023-04-17 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.md (riscv_vector_preferred_simd_mode): Include
vector-iterators.md.
* config/riscv/vector-auto.md: New file containing
autovectorization patterns.
* config/riscv/vector-iterators.md (UNSPEC_VADD/UNSPEC_VSUB):
New unspecs for autovectorization patterns.
* config/riscv/vector.md: Remove include of vector-iterators.md
and include vector-auto.md.
---
 gcc/config/riscv/riscv.md|  1 +
 gcc/config/riscv/vector-auto.md  | 79 
 gcc/config/riscv/vector-iterators.md |  2 +
 gcc/config/riscv/vector.md   |  4 +-
 4 files changed, 84 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index bc384d9aedf..7f8f3a6cb18 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -135,6 +135,7 @@
 (include "predicates.md")
 (include "constraints.md")
 (include "iterators.md")
+(include "vector-iterators.md")
 
 ;; 
 ;;
diff --git a/gcc/config/riscv/vector-auto.md b/gcc/config/riscv/vector-auto.md
new file mode 100644
index 000..dc62f9af705
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
@@ -0,0 +1,79 @@
+;; Machine description for RISC-V 'V' Extension for GNU compiler.
+;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+;; Contributed by Michael Collison (colli...@rivosinc.com, Rivos Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+;; -
+;;  [INT] Addition
+;; -
+;; Includes:
+;; - vadd.vv
+;; - vadd.vx
+;; - vadd.vi
+;; -
+
+(define_expand "3"
+  [(set (match_operand:VI 0 "register_operand")
+   (any_int_binop:VI (match_operand:VI 1 "register_operand")
+ (match_operand:VI 2 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = RVV_VUNDEF (mode);
+  rtx vl = gen_reg_rtx (Pmode);
+  emit_vlmax_vsetvl (mode, vl);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_(operands[0], mask, merge, operands[1], 
operands[2],
+   vl, tail_policy, mask_policy, vlmax_avl_p));
+
+  DONE;
+})
+
+(define_expand "cond_3"
+  [(set (match_operand:VI 0 "register_operand")
+   (if_then_else:VI
+(unspec:
+ [(match_operand: 1 "register_operand")] UNSPEC_VPREDICATE)
+(any_int_binop:VI
+ (match_operand:VI 2 "register_operand")
+ (match_operand:VI 3 "register_operand"))
+(match_operand:VI 4 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = operands[4];
+  rtx vl = gen_reg_rtx (Pmode);
+  emit_vlmax_vsetvl (mode, vl);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = operands[1];
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_(operands[0], mask, merge, operands[2], 
operands[3],
+   vl, tail_policy, mask_policy, vlmax_avl_p));
+  DONE;
+})
+
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 70ad85b661b..7fae87968d7 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -34,6 +34,8 @@
   UNSPEC_VMULHU
   UNSPEC_VMULHSU
 
+  UNSPEC_VADD
+  UNSPEC_VSUB
   UNSPEC_VADC
   UNSPEC_VSBC
   UNSPEC_VMADC
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index 0ecca98f20c..2ac5b744503 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -26,8 +26,6 @@
 ;; - Auto-vectorization (TBD)
 ;; - Combine optimization (TBD)
 
-(include "vector-iterators.md")
-
 (define_

[PATCH v4 04/10] RISC-V:autovec: Add target vectorization hooks

2023-04-17 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.cc (riscv_option_override):
Set riscv_vectorization_factor.
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_autovectorize_vector_modes): Implement
TARGET_AUTOVECTORIZE_VECTOR_MODES.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Ditto.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
(TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK): Ditto.
---
 gcc/config/riscv/riscv.cc | 156 ++
 1 file changed, 156 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index dc47434fac4..9af06d926cf 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgrtl.h"
+#include "sel-sched.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "gimple-expr.h"
+#include "tree-vectorizer.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -275,6 +284,9 @@ poly_uint16 riscv_vector_chunks;
 /* The number of bytes in a vector chunk.  */
 unsigned riscv_bytes_per_vector_chunk;
 
+/* Prefer vf for auto-vectorizer.  */
+unsigned riscv_vectorization_factor;
+
 /* Index R is the smallest register class that contains register R.  */
 const enum reg_class riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   GR_REGS, GR_REGS,GR_REGS,GR_REGS,
@@ -6363,6 +6375,10 @@ riscv_option_override (void)
 
   /* Convert -march to a chunks count.  */
   riscv_vector_chunks = riscv_convert_vector_bits ();
+
+  if (TARGET_VECTOR)
+riscv_vectorization_factor = riscv_vector_lmul;
+
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -7057,6 +7073,128 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, 
unsigned int *factor,
   return RISCV_DWARF_VLENB;
 }
 
+/* Implement TARGET_ESTIMATED_POLY_VALUE.
+   Look into the tuning structure for an estimate.
+   KIND specifies the type of requested estimate: min, max or likely.
+   For cores with a known RVV width all three estimates are the same.
+   For generic RVV tuning we want to distinguish the maximum estimate from
+   the minimum and likely ones.
+   The likely estimate is the same as the minimum in that case to give a
+   conservative behavior of auto-vectorizing with RVV when it is a win
+   even for 128-bit RVV.
+   When RVV width information is available VAL.coeffs[1] is multiplied by
+   the number of VQ chunks over the initial Advanced SIMD 128 bits.  */
+
+static HOST_WIDE_INT
+riscv_estimated_poly_value (poly_int64 val,
+   poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
+{
+  unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
+? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
+: (unsigned int) RVV_SCALABLE;
+
+  /* If there is no core-specific information then the minimum and likely
+ values are based on 128-bit vectors and the maximum is based on
+ the architectural maximum of 2048 bits.  */
+  if (width_source == RVV_SCALABLE)
+switch (kind)
+  {
+  case POLY_VALUE_MIN:
+  case POLY_VALUE_LIKELY:
+   return val.coeffs[0];
+
+  case POLY_VALUE_MAX:
+   return val.coeffs[0] + val.coeffs[1] * 15;
+  }
+
+  /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating the
+ lowest as likely.  This could be made more general if future -mtune
+ options need it to be.  */
+  if (kind == POLY_VALUE_MAX)
+width_source = 1 << floor_log2 (width_source);
+  else
+width_source = least_bit_hwi (width_source);
+
+  /* If the core provides width information, use that.  */
+  HOST_WIDE_INT over_128 = width_source - 128;
+  return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
+}
+
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE.  */
+
+static machine_mode
+riscv_preferred_simd_mode (scalar_mode mode)
+{
+  machine_mode vmode =
+riscv_vector::riscv_vector_preferred_simd_mode (mode,
+   riscv_vectorization_factor);
+  if (VECTOR_MODE_P (

[PATCH v4 01/10] RISC-V: Add new predicates and function prototypes

2023-04-17 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-protos.h (riscv_classify_vlmul_field):
New external declaration.
(riscv_vector_preferred_simd_mode): Ditto.
(riscv_tuple_mode_p): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_classify_nf): Ditto.
(riscv_vlmul_regsize): Ditto.
(riscv_vector_preferred_simd_mode): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(emit_vlmax_vsetvl): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
* config/riscv/riscv-opts.h (riscv_vector_bits_enum): New enum.
(riscv_vector_lmul_enum): Ditto.
(vlmul_field_enum): Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
Remove static scope.
* config/riscv/riscv.opt (riscv_vector_lmul):
New option -mriscv_vector_lmul.
* config/riscv/predicates.md (p_reg_or_const_csr_operand):
New predicate.
(vector_reg_or_const_dup_operand): Ditto.
---
 gcc/config/riscv/predicates.md  | 13 +++
 gcc/config/riscv/riscv-opts.h   | 40 +
 gcc/config/riscv/riscv-protos.h | 14 
 gcc/config/riscv/riscv.opt  | 20 +
 4 files changed, 87 insertions(+)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 8654dbc5943..b3f2d622c7b 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -264,6 +264,14 @@
 })
 
 ;; Predicates for the V extension.
+(define_special_predicate "p_reg_or_const_csr_operand"
+  (match_code "reg, subreg, const_int")
+{
+  if (CONST_INT_P (op))
+return satisfies_constraint_K (op);
+  return GET_MODE (op) == Pmode;
+})
+
 (define_special_predicate "vector_length_operand"
   (ior (match_operand 0 "pmode_register_operand")
(match_operand 0 "const_csr_operand")))
@@ -291,6 +299,11 @@
   (and (match_code "const_vector")
(match_test "rtx_equal_p (op, riscv_vector::gen_scalar_move_mask 
(GET_MODE (op)))")))
 
+(define_predicate "vector_reg_or_const_dup_operand"
+  (ior (match_operand 0 "register_operand")
+   (match_test "const_vec_duplicate_p (op)
+   && !CONST_POLY_INT_P (CONST_VECTOR_ELT (op, 0))")))
+
 (define_predicate "vector_mask_operand"
   (ior (match_operand 0 "register_operand")
(match_operand 0 "vector_all_trues_mask_operand")))
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index cf0cd669be4..70711310749 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,46 @@ enum stack_protector_guard {
   SSP_GLOBAL   /* global canary */
 };
 
+/* RVV vector register sizes.  */
+enum riscv_vector_bits_enum
+{
+  RVV_SCALABLE,
+  RVV_NOT_IMPLEMENTED = RVV_SCALABLE,
+  RVV_64 = 64,
+  RVV_128 = 128,
+  RVV_256 = 256,
+  RVV_512 = 512,
+  RVV_1024 = 1024,
+  RVV_2048 = 2048,
+  RVV_4096 = 4096,
+  RVV_8192 = 8192,
+  RVV_16384 = 16384,
+  RVV_32768 = 32768,
+  RVV_65536 = 65536
+};
+
+/* vectorization factor.  */
+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
+enum vlmul_field_enum
+{
+  VLMUL_FIELD_000, /* LMUL = 1.  */
+  VLMUL_FIELD_001, /* LMUL = 2.  */
+  VLMUL_FIELD_010, /* LMUL = 4.  */
+  VLMUL_FIELD_011, /* LMUL = 8.  */
+  VLMUL_FIELD_100, /* RESERVED.  */
+  VLMUL_FIELD_101, /* LMUL = 1/8.  */
+  VLMUL_FIELD_110, /* LMUL = 1/4.  */
+  VLMUL_FIELD_111, /* LMUL = 1/2.  */
+  MAX_VLMUL_FIELD
+};
+
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 5244e8dcbf0..41f60f82a55 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -237,4 +237,18 @@ extern const char*
 th_mempair_output_move (rtx[4], bool, machine_mode, RTX_CODE);
 #endif
 
+/* Routines implemented in riscv-v.cc.  */
+
+namespace riscv_vector {
+extern unsigned int riscv_classify_vlmul_field (enum machine_mode m);
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode,
+ unsigned vf);
+extern bool riscv_tuple_mode_p (machine_mode);
+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern int riscv_classify_nf (machine_mode);
+extern int riscv_vlmul_regsize (machine_mode);
+extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
+extern rtx get_mask_policy_no_pred ();
+extern rtx get_tail_policy_no_pred ();
+}
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv.opt b/gcc/config/riscv/riscv.opt
index ff1dd4ddd4f..4db3b2cac55 100644
--- a/gcc/config/riscv/riscv.opt
+++ b/gcc/config/riscv/riscv.opt
@@ -70,6 +70,26 @@ Enum(abi_type) String(lp64f) Value(ABI_LP64F)
 E

[PATCH v4 00/10] RISC-V: Add autovec support

2023-04-17 Thread Michael Collison
This series of patches adds foundational support for RISC-V auto-vectorization 
support. These patches are based on the current upstream rvv vector intrinsic 
support and is not a new implementation. Most of the implementation consists of 
adding the new vector cost model, the autovectorization patterns themselves and 
target hooks. This implementation only provides support for integer addition 
and subtraction as a proof of concept. This patch set should not be construed 
to be feature complete. Based on conversations with the community these patches 
are intended to lay the groundwork for feature completion and collaboration 
within the RISC-V community.

These patches are largely based off the work of Juzhe Zhong 
(juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>) of RiVAI. More specifically 
the rvv-next branch at: https://github.com/riscv-collab/riscv-gcc.git 
<https://github.com/riscv-collab/riscv-gcc.git>is the foundation of this patch 
set. 

As discussed on this list, if these patches are approved they will be merged 
into a "auto-vectorization" branch once gcc-13 branches for release. There are 
two known issues related to crashes (assert failures) associated with tree 
vectorization; one of which I have sent a patch for and have received feedback. 

Changes in v4:

- Added support for binary integer operations and test cases
- Fixed bug to support 8-bit integer vectorization
- Fixed several assert errors related to non-multiple of two vector modes

Changes in v3:

- Removed the cost model and cost hooks based on feedback from Richard Biener
- Used RVV_VUNDEF macro to fix failing patterns

Changes in v2 

- Updated ChangeLog entry to include RiVAI contributions 
- Fixed ChangeLog email formatting 
- Fixed gnu formatting issues in the code 

Kevin Lee (2):
  This patch adds a guard for VNx1 vectors that are present in ports
like riscv.
  This patch supports 8 bit auto-vectorization in riscv.

Michael Collison (8):
  RISC-V: Add new predicates and function prototypes
  RISC-V: autovec: Export policy functions to global scope
  RISC-V:autovec: Add auto-vectorization support functions
  RISC-V:autovec: Add target vectorization hooks
  RISC-V:autovec: Add autovectorization patterns for binary integer
operations
  RISC-V:autovec: Add autovectorization tests for add & sub
  vect: Verify that GET_MODE_NUNITS is a multiple of 2.
  RISC-V:autovec: Add autovectorization tests for binary integer

 gcc/config/riscv/predicates.md|  13 ++
 gcc/config/riscv/riscv-opts.h |  40 
 gcc/config/riscv/riscv-protos.h   |  14 ++
 gcc/config/riscv/riscv-v.cc   | 176 ++
 gcc/config/riscv/riscv-vector-builtins.cc |   4 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   3 +
 gcc/config/riscv/riscv.cc | 157 
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/riscv.opt|  20 ++
 gcc/config/riscv/vector-auto.md   |  79 
 gcc/config/riscv/vector-iterators.md  |   2 +
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/loop-add-rv32.c |  25 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   |  25 +++
 .../riscv/rvv/autovec/loop-and-rv32.c |  25 +++
 .../gcc.target/riscv/rvv/autovec/loop-and.c   |  25 +++
 .../riscv/rvv/autovec/loop-div-rv32.c |  27 +++
 .../gcc.target/riscv/rvv/autovec/loop-div.c   |  27 +++
 .../riscv/rvv/autovec/loop-max-rv32.c |  26 +++
 .../gcc.target/riscv/rvv/autovec/loop-max.c   |  26 +++
 .../riscv/rvv/autovec/loop-min-rv32.c |  26 +++
 .../gcc.target/riscv/rvv/autovec/loop-min.c   |  26 +++
 .../riscv/rvv/autovec/loop-mod-rv32.c |  27 +++
 .../gcc.target/riscv/rvv/autovec/loop-mod.c   |  27 +++
 .../riscv/rvv/autovec/loop-mul-rv32.c |  25 +++
 .../gcc.target/riscv/rvv/autovec/loop-mul.c   |  25 +++
 .../riscv/rvv/autovec/loop-or-rv32.c  |  25 +++
 .../gcc.target/riscv/rvv/autovec/loop-or.c|  25 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c |  25 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  25 +++
 .../riscv/rvv/autovec/loop-xor-rv32.c |  25 +++
 .../gcc.target/riscv/rvv/autovec/loop-xor.c   |  25 +++
 gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   3 +
 gcc/tree-vect-data-refs.cc|   2 +
 gcc/tree-vect-slp.cc  |   7 +-
 35 files changed, 1031 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-and.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-div-rv32.c
 create mode 10064

[PATCH v4 02/10] RISC-V: autovec: Export policy functions to global scope

2023-04-17 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Remove static declaration to to make externally visible.
(get_mask_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
New external declaration.
(get_mask_policy_for_pred): Ditto.
---
 gcc/config/riscv/riscv-vector-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-vector-builtins.h  | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 01cea23d3e6..1ed9e4acc40 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2493,7 +2493,7 @@ use_real_merge_p (enum predication_type_index pred)
 
 /* Get TAIL policy for predication. If predication indicates TU, return the TU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_tail_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tu || pred == PRED_TYPE_tum || pred == PRED_TYPE_tumu)
@@ -2503,7 +2503,7 @@ get_tail_policy_for_pred (enum predication_type_index 
pred)
 
 /* Get MASK policy for predication. If predication indicates MU, return the MU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_mask_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu)
diff --git a/gcc/config/riscv/riscv-vector-builtins.h 
b/gcc/config/riscv/riscv-vector-builtins.h
index 8ffb9d33e33..de3fd6ca290 100644
--- a/gcc/config/riscv/riscv-vector-builtins.h
+++ b/gcc/config/riscv/riscv-vector-builtins.h
@@ -483,6 +483,9 @@ extern rvv_builtin_types_t builtin_types[NUM_VECTOR_TYPES + 
1];
 extern function_instance get_read_vl_instance (void);
 extern tree get_read_vl_decl (void);
 
+extern rtx get_tail_policy_for_pred (enum predication_type_index pred);
+extern rtx get_mask_policy_for_pred (enum predication_type_index pred);
+
 inline tree
 rvv_arg_type_info::get_scalar_type (vector_type_index type_idx) const
 {
-- 
2.34.1



[PATCH] vect: Verify that GET_MODE_NUNITS is greater than one.

2023-03-14 Thread Michael Collison
While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
evenly divisible by two. The RISC-V target has vector modes (e.g. VNx1DImode),
where GET_MODE_NUNITS is equal to one.

Tested on RISCV and x86_64-linux-gnu. Okay?

2023-03-09  Michael Collison  

* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is greater than one.
---
 gcc/tree-vect-slp.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9a4e000925e..add58113fa8 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -426,7 +426,8 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
  if (vector_type
  && VECTOR_MODE_P (TYPE_MODE (vector_type))
  && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
-  GET_MODE_SIZE (base_vector_mode)))
+  GET_MODE_SIZE (base_vector_mode))
+ && known_gt (GET_MODE_NUNITS (TYPE_MODE (vector_type)), 1))
{
  /* Try fusing consecutive sequences of COUNT / NVECTORS elements
 together into elements of type INT_TYPE and using the result
-- 
2.34.1



[PATCH] vect: Verify that GET_MODE_NUNITS is power-of-2

2023-03-10 Thread Michael Collison
While working on autovectorizing for the RISCV port I encountered an issue
where can_duplicate_and_interleave_p assumes that GET_MODE_NUNITS is a
power of two. The RISC-V target has vector modes (e.g. VNx1DImode) that
are not a power of two.

Tested on RISCV and x86_64-linux-gnu. Okay?

2023-03-09  Michael Collison  

* poly-int.h (exact_div_p): New function to
verify that argument is a power of 2 poly_int.
* tree-vect-slp.cc (can_duplicate_and_interleave_p):
Check that GET_MODE_NUNITS is a power of 2.
---
 gcc/poly-int.h   | 17 +
 gcc/tree-vect-slp.cc |  3 ++-
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/gcc/poly-int.h b/gcc/poly-int.h
index 12571455081..d09632f341f 100644
--- a/gcc/poly-int.h
+++ b/gcc/poly-int.h
@@ -2219,6 +2219,23 @@ multiple_p (const poly_int_pod , const 
poly_int_pod ,
   return constant_multiple_p (a, b, multiple);
 }
 
+/* Return true, if A is known to be a multiple of B.  */
+
+template
+inline bool
+exact_div_p (const poly_int_pod , Cb b)
+{
+  typedef POLY_CONST_COEFF (Ca, Cb) C;
+  poly_int r;
+  for (unsigned int i = 0; i < N; i++)
+{
+  if ((a.coeffs[i] % b) != 0)
+   return false;
+
+}
+  return true;
+}
+
 /* Return A / B, given that A is known to be a multiple of B.  */
 
 template
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 9a4e000925e..6be2036a13a 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -426,7 +426,8 @@ can_duplicate_and_interleave_p (vec_info *vinfo, unsigned 
int count,
  if (vector_type
  && VECTOR_MODE_P (TYPE_MODE (vector_type))
  && known_eq (GET_MODE_SIZE (TYPE_MODE (vector_type)),
-  GET_MODE_SIZE (base_vector_mode)))
+  GET_MODE_SIZE (base_vector_mode))
+ && exact_div_p (GET_MODE_NUNITS (TYPE_MODE (vector_type)), 2))
{
  /* Try fusing consecutive sequences of COUNT / NVECTORS elements
 together into elements of type INT_TYPE and using the result
-- 
2.34.1



[PATCH v2] vect: Check that vector factor is a compile-time constant

2023-03-08 Thread Michael Collison
2023-03-05  Michael Collison  

* tree-vect-loop-manip.cc (vect_do_peeling): Use
result of constant_lower_bound instead of vf in case
vf is not a compile time constant.
---
 gcc/tree-vect-loop-manip.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index d88edafa018..f60fa50e8f4 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2921,7 +2921,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, 
tree nitersm1,
   if (new_var_p)
{
  value_range vr (type,
- wi::to_wide (build_int_cst (type, vf)),
+ wi::to_wide (build_int_cst (type, lowest_vf)),
  wi::to_wide (TYPE_MAX_VALUE (type)));
  set_range_info (niters, vr);
}
-- 
2.34.1



[PATCH v3 6/6] RISC-V: autovec: Add autovectorization tests for add & sub

2023-03-07 Thread Michael Collison
2023-03-02  Michael Collison  
Vineet Gupta 

* gcc.target/riscv/rvv/autovec: New directory
for autovectorization tests.
* gcc.target/riscv/rvv/autovec/loop-add-rv32.c: New
test to verify code generation of vector add on rv32.
* gcc.target/riscv/rvv/autovec/loop-add.c: New
test to verify code generation of vector add on rv64.
* gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: New
test to verify code generation of vector subtract on rv32.
* gcc.target/riscv/rvv/autovec/loop-sub.c: New
test to verify code generation of vector subtract on rv64.
---
 .../riscv/rvv/autovec/loop-add-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   | 24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   | 24 +++
 4 files changed, 96 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
new file mode 100644
index 000..bdc3b6892e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
new file mode 100644
index 000..d7f992c7d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] + b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
new file mode 100644
index 000..7d0a40ec539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv -mabi=ilp32d" 
} */
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)\
+  dst[i] = a[i] - b[i];\
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL() \
+ TEST_TYPE(int16_t)\
+ TEST_TYPE(uint16_t)   \
+ TEST_TYPE(int32_t)\
+ TEST_TYPE(uint32_t)   \
+ TEST_TYPE(int64_t)\
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvsub\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
new file mode 100644
index 000..c8900884f83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv -mabi=lp64d" } 
*/
+
+#include 
+
+#define TEST_TYPE(TYPE)\
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)\
+  {\
+for (int i = 0; i < n; i++)

[PATCH v3 5/6] RISC-V: autovec: Add autovectorization patterns for add & sub

2023-03-07 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.md (riscv_vector_preferred_simd_mode): Include
vector-iterators.md.
* config/riscv/vector-auto.md: New file containing
autovectorization patterns.
* config/riscv/vector-iterators.md (UNSPEC_VADD/UNSPEC_VSUB):
New unspecs for autovectorization patterns.
* config/riscv/vector.md: Remove include of vector-iterators.md
and include vector-auto.md.
---
 gcc/config/riscv/riscv.md|   1 +
 gcc/config/riscv/vector-auto.md  | 172 +++
 gcc/config/riscv/vector-iterators.md |   2 +
 gcc/config/riscv/vector.md   |   4 +-
 4 files changed, 177 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6c3176042fb..a504ace72e5 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -131,6 +131,7 @@
 (include "predicates.md")
 (include "constraints.md")
 (include "iterators.md")
+(include "vector-iterators.md")
 
 ;; 
 ;;
diff --git a/gcc/config/riscv/vector-auto.md b/gcc/config/riscv/vector-auto.md
new file mode 100644
index 000..5227a73d96d
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
@@ -0,0 +1,172 @@
+;; Machine description for RISC-V 'V' Extension for GNU compiler.
+;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+;; Contributed by Michael Collison (colli...@rivosinc.com, Rivos Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+;; -
+;;  [INT] Addition
+;; -
+;; Includes:
+;; - vadd.vv
+;; - vadd.vx
+;; - vadd.vi
+;; -
+
+(define_expand "add3"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "vector_arith_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = RVV_VUNDEF (mode);
+  rtx vl = emit_vlmax_vsetvl (mode);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[1], 
operands[2],
+   vl, tail_policy, mask_policy, vlmax_avl_p));
+
+  DONE;
+})
+
+(define_expand "cond_add"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:VI 2 "register_operand")
+   (match_operand:VI 3 "vector_reg_or_const_dup_operand")
+   (match_operand:VI 4 "register_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = operands[4];
+  rtx vl = emit_vlmax_vsetvl (mode);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = operands[1];
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[2], 
operands[3],
+   vl, tail_policy, mask_policy, vlmax_avl_p));
+  DONE;
+})
+
+(define_expand "len_add"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "vector_reg_or_const_dup_operand")
+   (match_operand 3 "p_reg_or_const_csr_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = RVV_VUNDEF (mode);
+  rtx vl = operands[3];
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[1], 
operands[2],
+   vl, tail_policy, mask_policy, vlmax_avl_p));
+  DONE;
+})
+
+
+;; --

[PATCH v3 4/6] RISC-V: autovec: Add target vectorization hooks

2023-03-07 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv.cc (riscv_option_override):
Set riscv_vectorization_factor.
(riscv_estimated_poly_value): Implement
TARGET_ESTIMATED_POLY_VALUE.
(riscv_preferred_simd_mode): Implement
TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
(riscv_autovectorize_vector_modes): Implement
TARGET_AUTOVECTORIZE_VECTOR_MODES.
(riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
(riscv_empty_mask_is_expensive): Implement
TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
(riscv_vectorize_create_costs): Implement
TARGET_VECTORIZE_CREATE_COSTS.
(TARGET_ESTIMATED_POLY_VALUE): Register target macro.
(TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Ditto.
(TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
(TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
(TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK): Ditto.
---
 gcc/config/riscv/riscv.cc | 156 ++
 1 file changed, 156 insertions(+)

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index befb9b498b7..1ca9f3c7ae4 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,15 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgrtl.h"
+#include "sel-sched.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "gimple-expr.h"
+#include "tree-vectorizer.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -275,6 +284,9 @@ poly_uint16 riscv_vector_chunks;
 /* The number of bytes in a vector chunk.  */
 unsigned riscv_bytes_per_vector_chunk;
 
+/* Prefer vf for auto-vectorizer.  */
+unsigned riscv_vectorization_factor;
+
 /* Index R is the smallest register class that contains register R.  */
 const enum reg_class riscv_regno_to_class[FIRST_PSEUDO_REGISTER] = {
   GR_REGS, GR_REGS,GR_REGS,GR_REGS,
@@ -6199,6 +6211,10 @@ riscv_option_override (void)
 
   /* Convert -march to a chunks count.  */
   riscv_vector_chunks = riscv_convert_vector_bits ();
+
+  if (TARGET_VECTOR)
+riscv_vectorization_factor = riscv_vector_lmul;
+
 }
 
 /* Implement TARGET_CONDITIONAL_REGISTER_USAGE.  */
@@ -6893,6 +6909,128 @@ riscv_dwarf_poly_indeterminate_value (unsigned int i, 
unsigned int *factor,
   return RISCV_DWARF_VLENB;
 }
 
+/* Implement TARGET_ESTIMATED_POLY_VALUE.
+   Look into the tuning structure for an estimate.
+   KIND specifies the type of requested estimate: min, max or likely.
+   For cores with a known RVV width all three estimates are the same.
+   For generic RVV tuning we want to distinguish the maximum estimate from
+   the minimum and likely ones.
+   The likely estimate is the same as the minimum in that case to give a
+   conservative behavior of auto-vectorizing with RVV when it is a win
+   even for 128-bit RVV.
+   When RVV width information is available VAL.coeffs[1] is multiplied by
+   the number of VQ chunks over the initial Advanced SIMD 128 bits.  */
+
+static HOST_WIDE_INT
+riscv_estimated_poly_value (poly_int64 val,
+   poly_value_estimate_kind kind = POLY_VALUE_LIKELY)
+{
+  unsigned int width_source = BITS_PER_RISCV_VECTOR.is_constant ()
+? (unsigned int) BITS_PER_RISCV_VECTOR.to_constant ()
+: (unsigned int) RVV_SCALABLE;
+
+  /* If there is no core-specific information then the minimum and likely
+ values are based on 128-bit vectors and the maximum is based on
+ the architectural maximum of 2048 bits.  */
+  if (width_source == RVV_SCALABLE)
+switch (kind)
+  {
+  case POLY_VALUE_MIN:
+  case POLY_VALUE_LIKELY:
+   return val.coeffs[0];
+
+  case POLY_VALUE_MAX:
+   return val.coeffs[0] + val.coeffs[1] * 15;
+  }
+
+  /* Allow BITS_PER_RISCV_VECTOR to be a bitmask of different VL, treating the
+ lowest as likely.  This could be made more general if future -mtune
+ options need it to be.  */
+  if (kind == POLY_VALUE_MAX)
+width_source = 1 << floor_log2 (width_source);
+  else
+width_source = least_bit_hwi (width_source);
+
+  /* If the core provides width information, use that.  */
+  HOST_WIDE_INT over_128 = width_source - 128;
+  return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
+}
+
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE.  */
+
+static machine_mode
+riscv_preferred_simd_mode (scalar_mode mode)
+{
+  machine_mode vmode =
+riscv_vector::riscv_vector_preferred_simd_mode (mode,
+   riscv_vectorization_factor);
+  if (VECTOR_MODE_P (

[PATCH v3 2/6] RISC-V: autovec: Export policy functions to global scope

2023-03-07 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
Remove static declaration to to make externally visible.
(get_mask_policy_for_pred): Ditto.
* config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
New external declaration.
(get_mask_policy_for_pred): Ditto.
---
 gcc/config/riscv/riscv-vector-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-vector-builtins.h  | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 2d57086262b..352ffd8867d 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2448,7 +2448,7 @@ use_real_merge_p (enum predication_type_index pred)
 
 /* Get TAIL policy for predication. If predication indicates TU, return the TU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_tail_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tu || pred == PRED_TYPE_tum || pred == PRED_TYPE_tumu)
@@ -2458,7 +2458,7 @@ get_tail_policy_for_pred (enum predication_type_index 
pred)
 
 /* Get MASK policy for predication. If predication indicates MU, return the MU.
Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_mask_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu)
diff --git a/gcc/config/riscv/riscv-vector-builtins.h 
b/gcc/config/riscv/riscv-vector-builtins.h
index 8464aa9b7e9..d62d2bdab54 100644
--- a/gcc/config/riscv/riscv-vector-builtins.h
+++ b/gcc/config/riscv/riscv-vector-builtins.h
@@ -456,6 +456,8 @@ extern const char *const operand_suffixes[NUM_OP_TYPES];
 extern const rvv_builtin_suffixes type_suffixes[NUM_VECTOR_TYPES + 1];
 extern const char *const predication_suffixes[NUM_PRED_TYPES];
 extern rvv_builtin_types_t builtin_types[NUM_VECTOR_TYPES + 1];
+extern rtx get_tail_policy_for_pred (enum predication_type_index pred);
+extern rtx get_mask_policy_for_pred (enum predication_type_index pred);
 
 inline tree
 rvv_arg_type_info::get_scalar_type (vector_type_index type_idx) const
-- 
2.34.1



[PATCH v3 3/6] RISC-V: autovec: Add auto-vectorization support functions

2023-03-07 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-v.cc (riscv_classify_vlmul_field):
New function.
(riscv_vector_preferred_simd_mode): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
(riscv_tuple_mode_p): Ditto.
(riscv_classify_nf): Ditto.
(riscv_vlmul_regsize): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_vector_get_mask_mode): Ditto.
---
 gcc/config/riscv/riscv-v.cc | 176 
 1 file changed, 176 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 2d2de6e4a6c..d21bde1bda6 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -39,9 +39,11 @@
 #include "emit-rtl.h"
 #include "tm_p.h"
 #include "target.h"
+#include "targhooks.h"
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
+#include "riscv-vector-builtins.h"
 #include "rtx-vector-builder.h"
 
 using namespace riscv_vector;
@@ -109,6 +111,41 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT minval,
  && IN_RANGE (INTVAL (elt), minval, maxval));
 }
 
+/* Return the vlmul field for a specific machine mode.  */
+unsigned int
+riscv_classify_vlmul_field (enum machine_mode mode)
+{
+  /* Make the decision based on the mode's enum value rather than its
+ properties, so that we keep the correct classification regardless
+ of -mriscv-vector-bits.  */
+  switch (mode)
+{
+case E_VNx8BImode:
+  return VLMUL_FIELD_111;
+
+case E_VNx4BImode:
+  return VLMUL_FIELD_110;
+
+case E_VNx2BImode:
+  return VLMUL_FIELD_101;
+
+case E_VNx16BImode:
+  return VLMUL_FIELD_000;
+
+case E_VNx32BImode:
+  return VLMUL_FIELD_001;
+
+case E_VNx64BImode:
+  return VLMUL_FIELD_010;
+
+default:
+  break;
+}
+
+  /* we don't care about VLMUL for Mask.  */
+  return VLMUL_FIELD_000;
+}
+
 rtx
 emit_vlmax_vsetvl (machine_mode vmode)
 {
@@ -163,6 +200,64 @@ calculate_ratio (unsigned int sew, enum vlmul_type vlmul)
   return ratio;
 }
 
+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */
+
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode, unsigned vf)
+{
+  if (!TARGET_VECTOR)
+return word_mode;
+
+  switch (mode)
+{
+case E_QImode:
+  return vf == 1   ? VNx8QImode
+: vf == 2 ? VNx16QImode
+: vf == 4 ? VNx32QImode
+  : VNx64QImode;
+  break;
+case E_HImode:
+  return vf == 1   ? VNx4HImode
+: vf == 2 ? VNx8HImode
+: vf == 4 ? VNx16HImode
+  : VNx32HImode;
+  break;
+case E_SImode:
+  return vf == 1   ? VNx2SImode
+: vf == 2 ? VNx4SImode
+: vf == 4 ? VNx8SImode
+  : VNx16SImode;
+  break;
+case E_DImode:
+  if (riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+ && riscv_vector_elen_flags != MASK_VECTOR_ELEN_FP_32)
+   return vf == 1   ? VNx1DImode
+  : vf == 2 ? VNx2DImode
+  : vf == 4 ? VNx4DImode
+: VNx8DImode;
+  break;
+case E_SFmode:
+  if (TARGET_HARD_FLOAT && riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+ && riscv_vector_elen_flags != MASK_VECTOR_ELEN_64)
+   return vf == 1   ? VNx2SFmode
+  : vf == 2 ? VNx4SFmode
+  : vf == 4 ? VNx8SFmode
+: VNx16SFmode;
+  break;
+case E_DFmode:
+  if (TARGET_DOUBLE_FLOAT && TARGET_VECTOR_ELEN_FP_64)
+   return vf == 1   ? VNx1DFmode
+  : vf == 2 ? VNx2DFmode
+  : vf == 4 ? VNx4DFmode
+: VNx8DFmode;
+  break;
+default:
+  break;
+}
+
+  return word_mode;
+}
+
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
@@ -375,6 +470,87 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }
 
+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return true if it is a RVV tuple mode.  */
+bool
+riscv_tuple_mode_p (machine_mode mode ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
+/* Return nf for a machine mode.  */
+int
+riscv_classify_nf (machine_mode mode)
+{
+  switch (mode)
+{
+
+default:
+  break;
+}
+
+  return 1;
+}
+
+/* Return vlmul register size for a machine mode.  */
+int
+riscv_vlmul_regsize (machine_mode mode)
+{
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
+return 1;
+  switch (riscv_classify_vlmul_field (mode))
+{
+case VLMUL_FIELD_001:
+  return 2;
+case VLMUL_FIELD_010:
+   

[PATCH v3 0/6] RISC-V: autovec: Add auto-vectorization support

2023-03-07 Thread Michael Collison
This series of patches adds foundational support for RISC-V auto-vectorization 
support. These patches are based on the current upstream rvv vector intrinsic 
support and is not a new implementation. Most of the implementation consists of 
adding the new vector cost model, the autovectorization patterns themselves and 
target hooks. This implementation only provides support for integer addition 
and subtraction as a proof of concept. This patch set should not be construed 
to be feature complete. Based on conversations with the community these patches 
are intended to lay the groundwork for feature completion and collaboration 
within the RISC-V community.

These patches are largely based off the work of Juzhe Zhong 
(juzhe.zh...@rivai.ai<mailto:juzhe.zh...@rivai.ai>) of RiVAI. More specifically 
the rvv-next branch at: https://github.com/riscv-collab/riscv-gcc.git 
<https://github.com/riscv-collab/riscv-gcc.git>is the foundation of this patch 
set. 

As discussed on this list, if these patches are approved they will be merged 
into a "auto-vectorization" branch once gcc-13 branches for release. There are 
two known issues related to crashes (assert failures) associated with tree 
vectorization; one of which I have sent a patch for and have received feedback. 

Changes in v3:

- Removed the cost model and cost hooks based on feedback from Richard Biener
- Used RVV_VUNDEF macro to fix failing patterns

Changes in v2 

- Updated ChangeLog entry to include RiVAI contributions 
- Fixed ChangeLog email formatting 
- Fixed gnu formatting issues in the code 

Michael Collison (6):
  RISC-V: Add new predicates and function prototypes
  RISC-V: autovec: Export policy functions to global scope
  RISC-V:autovec: Add auto-vectorization support functions
  RISC-V:autovec: Add target vectorization hooks
  RISC-V:autovec: Add autovectorization patterns for add & sub
  RISC-V:autovec: Add autovectorization tests for add & sub

 gcc/config/riscv/predicates.md|  13 ++
 gcc/config/riscv/riscv-opts.h |  40 
 gcc/config/riscv/riscv-protos.h   |  15 ++
 gcc/config/riscv/riscv-v.cc   | 178 +-
 gcc/config/riscv/riscv-vector-builtins.cc |   4 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   2 +
 gcc/config/riscv/riscv.cc | 156 +++
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/riscv.opt|  20 ++
 gcc/config/riscv/vector-auto.md   | 172 +
 gcc/config/riscv/vector-iterators.md  |   2 +
 gcc/config/riscv/vector.md|   4 +-
 .../riscv/rvv/autovec/loop-add-rv32.c |  24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   |  24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c |  24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  24 +++
 16 files changed, 698 insertions(+), 5 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

-- 
2.34.1



[PATCH v3 1/6] RISC-V: autovec: Add new predicates and function prototypes

2023-03-07 Thread Michael Collison
2023-03-02  Michael Collison  
Juzhe Zhong  

* config/riscv/riscv-protos.h (riscv_classify_vlmul_field):
New external declaration.
(riscv_vector_preferred_simd_mode): Ditto.
(riscv_tuple_mode_p): Ditto.
(riscv_vector_mask_mode_p): Ditto.
(riscv_classify_nf): Ditto.
(riscv_vlmul_regsize): Ditto.
(riscv_vector_preferred_simd_mode): Ditto.
(riscv_vector_get_mask_mode): Ditto.
(emit_vlmax_vsetvl): Ditto.
(get_mask_policy_no_pred): Ditto.
(get_tail_policy_no_pred): Ditto.
* config/riscv/riscv-opts.h (riscv_vector_bits_enum): New enum.
(riscv_vector_lmul_enum): Ditto.
(vlmul_field_enum): Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
Remove static scope.
* config/riscv/riscv.opt (riscv_vector_lmul):
New option -mriscv_vector_lmul.
* config/riscv/predicates.md (p_reg_or_const_csr_operand):
New predicate.
(vector_reg_or_const_dup_operand): Ditto.
---
 gcc/config/riscv/predicates.md  | 13 +++
 gcc/config/riscv/riscv-opts.h   | 40 +
 gcc/config/riscv/riscv-protos.h | 15 +
 gcc/config/riscv/riscv-v.cc |  2 +-
 gcc/config/riscv/riscv.opt  | 20 +
 5 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 0d9d7701c7e..19aa5e12920 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -264,6 +264,14 @@
 })
 
 ;; Predicates for the V extension.
+(define_special_predicate "p_reg_or_const_csr_operand"
+  (match_code "reg, subreg, const_int")
+{
+  if (CONST_INT_P (op))
+return satisfies_constraint_K (op);
+  return GET_MODE (op) == Pmode;
+})
+
 (define_special_predicate "vector_length_operand"
   (ior (match_operand 0 "pmode_register_operand")
(match_operand 0 "const_csr_operand")))
@@ -291,6 +299,11 @@
   (and (match_code "const_vector")
(match_test "rtx_equal_p (op, riscv_vector::gen_scalar_move_mask 
(GET_MODE (op)))")))
 
+(define_predicate "vector_reg_or_const_dup_operand"
+  (ior (match_operand 0 "register_operand")
+   (match_test "const_vec_duplicate_p (op)
+   && !CONST_POLY_INT_P (CONST_VECTOR_ELT (op, 0))")))
+
 (define_predicate "vector_mask_operand"
   (ior (match_operand 0 "register_operand")
(match_operand 0 "vector_all_trues_mask_operand")))
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index ff398c0a2ae..c6b6d84fce4 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,46 @@ enum stack_protector_guard {
   SSP_GLOBAL   /* global canary */
 };
 
+/* RVV vector register sizes.  */
+enum riscv_vector_bits_enum
+{
+  RVV_SCALABLE,
+  RVV_NOT_IMPLEMENTED = RVV_SCALABLE,
+  RVV_64 = 64,
+  RVV_128 = 128,
+  RVV_256 = 256,
+  RVV_512 = 512,
+  RVV_1024 = 1024,
+  RVV_2048 = 2048,
+  RVV_4096 = 4096,
+  RVV_8192 = 8192,
+  RVV_16384 = 16384,
+  RVV_32768 = 32768,
+  RVV_65536 = 65536
+};
+
+/* vectorization factor.  */
+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
+enum vlmul_field_enum
+{
+  VLMUL_FIELD_000, /* LMUL = 1.  */
+  VLMUL_FIELD_001, /* LMUL = 2.  */
+  VLMUL_FIELD_010, /* LMUL = 4.  */
+  VLMUL_FIELD_011, /* LMUL = 8.  */
+  VLMUL_FIELD_100, /* RESERVED.  */
+  VLMUL_FIELD_101, /* LMUL = 1/8.  */
+  VLMUL_FIELD_110, /* LMUL = 1/4.  */
+  VLMUL_FIELD_111, /* LMUL = 1/2.  */
+  MAX_VLMUL_FIELD
+};
+
 #define MASK_ZICSR(1 << 0)
 #define MASK_ZIFENCEI (1 << 1)
 
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 88a6bf5442f..6a486a1cd61 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -217,4 +217,19 @@ const unsigned int RISCV_BUILTIN_SHIFT = 1;
 /* Mask that selects the riscv_builtin_class part of a function code.  */
 const unsigned int RISCV_BUILTIN_CLASS = (1 << RISCV_BUILTIN_SHIFT) - 1;
 
+/* Routines implemented in riscv-v.cc.  */
+
+namespace riscv_vector {
+extern unsigned int riscv_classify_vlmul_field (enum machine_mode m);
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode,
+ unsigned vf);
+extern bool riscv_tuple_mode_p (machine_mode);
+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern int riscv_classify_nf (machine_mode);
+extern int riscv_vlmul_regsize (machine_mode);
+extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
+extern rtx emit_vlmax_vsetvl (machine_mode vmode);
+extern rtx get_mask_policy_no_pred ();
+extern rtx get_tail_policy_no_pred ();
+}
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/con

Re: [PATCH v2 00/07] RISC-V: autovec: Add auto-vectorization support

2023-03-05 Thread Michael Collison
Thanks for the feedback, will try that next time.

Michael Collison


> On Mar 5, 2023, at 11:06 PM, Xi Ruoyao  wrote:
> 
> On Sun, 2023-03-05 at 22:13 -0500, Michael Collison wrote:
> 
> /* snip */
> 
>> - Fixed ChangeLog email formatting
> 
> Unfortunately it's not fixed.  We expect one tab, but now you have 16
> whitespaces.
> 
> To me it looks like your email client is being too smart and destroying
> the patch .  Try "git send-email" which is much easier to be correctly
> configured.
> 
> -- 
> Xi Ruoyao 
> School of Aerospace Science and Technology, Xidian University


[PATCH v2 07/07] RISC-V: autovec: Add autovectorization patterns for add & sub

2023-03-05 Thread Michael Collison

This patch adds tests for autovectorization of integer add and subtract.

gcc/testsuite/ChangeLog:

2023-03-02  Michael Collison 
                Vineet Gupta 

                * gcc.target/riscv/rvv/autovec: New directory
            for autovectorization tests.
            * gcc.target/riscv/rvv/autovec/loop-add-rv32.c: New
            test to verify code generation of vector add on rv32.
            * gcc.target/riscv/rvv/autovec/loop-add.c: New
            test to verify code generation of vector add on rv64.
            * gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: New
            test to verify code generation of vector subtract on rv32.
            * gcc.target/riscv/rvv/autovec/loop-sub.c: New
            test to verify code generation of vector subtract on rv64.

---
 .../riscv/rvv/autovec/loop-add-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   | 24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   | 24 +++
 4 files changed, 96 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c

 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c

 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c

new file mode 100644
index 000..bdc3b6892e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv 
-mabi=ilp32d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] + b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c

new file mode 100644
index 000..d7f992c7d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv 
-mabi=lp64d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] + b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c

new file mode 100644
index 000..7d0a40ec539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv 
-mabi=ilp32d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] - b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvsub\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

new file mode 100644
index 000..c8900884f83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv 
-mabi=lp64d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] - b[i];                \
+  }
+
+/* *int8_t not autovec curren

[PATCH V2 06/07] RISC-V: autovec: Add autovectorization patterns for add & sub

2023-03-05 Thread Michael Collison
This patch adds patterns that provide basic autovectorization support 
for integer adds and subtracts.


gcc/ChangeLog:

2023-03-02  Michael Collison 
                Juzhe Zhong 

                * config/riscv/riscv.md 
(riscv_vector_preferred_simd_mode): Include

                vector-iterators.md.
                * config/riscv/vector-auto.md: New file containing
                autovectorization patterns.
                * config/riscv/vector-iterators.md 
(UNSPEC_VADD/UNSPEC_VSUB):

                New unspecs for autovectorization patterns.
                * config/riscv/vector.md: Remove include of 
vector-iterators.md

                and include vector-auto.md.

---
 gcc/config/riscv/riscv.md    |   1 +
 gcc/config/riscv/vector-auto.md  | 172 +++
 gcc/config/riscv/vector-iterators.md |   2 +
 gcc/config/riscv/vector.md   |   4 +-
 4 files changed, 177 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 6c3176042fb..a504ace72e5 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -131,6 +131,7 @@
 (include "predicates.md")
 (include "constraints.md")
 (include "iterators.md")
+(include "vector-iterators.md")

 ;; 
 ;;
diff --git a/gcc/config/riscv/vector-auto.md 
b/gcc/config/riscv/vector-auto.md

new file mode 100644
index 000..e5a19663d18
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
@@ -0,0 +1,172 @@
+;; Machine description for RISC-V 'V' Extension for GNU compiler.
+;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI 
Technologies Ltd.

+;; Contributed by Michael Collison (colli...@rivosinc.com), Rivos Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+;; 
-

+;;  [INT] Addition
+;; 
-

+;; Includes:
+;; - vadd.vv
+;; - vadd.vx
+;; - vadd.vi
+;; 
-

+
+(define_expand "add3"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "vector_arith_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = gen_rtx_UNSPEC (mode, gen_rtvec (1, const0_rtx), 
UNSPEC_VUNDEF);

+  rtx vl = emit_vlmax_vsetvl (mode);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[1], 
operands[2],

+                vl, tail_policy, mask_policy, vlmax_avl_p));
+
+  DONE;
+})
+
+(define_expand "cond_add"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:VI 2 "register_operand")
+   (match_operand:VI 3 "vector_reg_or_const_dup_operand")
+   (match_operand:VI 4 "register_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = operands[4];
+  rtx vl = emit_vlmax_vsetvl (mode);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = operands[1];
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[2], 
operands[3],

+                vl, tail_policy, mask_policy, vlmax_avl_p));
+  DONE;
+})
+
+(define_expand "len_add"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "vector_reg_or_const_dup_operand")
+   (match_operand 3 "p_reg_or_const_csr_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = gen_rtx_UNSPEC (mode, gen_rtvec (1, const0_rtx), 
UNSPEC_VUNDEF);

+  rtx vl = operands[3];
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx 

[PATCH v2 05/07] RISC-V: autovec: Add tuning and target vectorization hooks

2023-03-05 Thread Michael Collison
This patch adds support for registering target hooks for basic 
autovectorization support as well as basic tuning information for the 
vector extension.


gcc/ChangeLog:

2023-03-02  Michael Collison 
                Juzhe Zhong 

            * config/riscv/riscv-cores.def (RISCV_TUNE):
            Add VECTOR_TUNE_INFO parameter and
            * common/config/riscv/riscv-common.cc (RISCV_TUNE):
            Add VECTOR_TUNE_INFO parameter.
            * config/riscv/riscv.cc (riscv_vector_tune_param):
            New struct for vector tuning information.
            (riscv_tune_info): add vector_tune_param.
            (vector_tune_param): New static variable.
            (riscv_vectorization_factor): New variable.
            (generic_rvv_insn_scale_table): New struct.
            (generic_rvv_stmt_scale_table): New struct.
            (generic_rvv_insn_cost_table): New vector insn cost table.
            (generic_rvv_stmt_cost_table): New vector statement 
cost table.

            (generic_rvv_tune_info): New rvv tuning table.
            (RISCV_TUNE): Add VECTOR_TUNE_INFO parameter.
            (riscv_rtx_costs): Return vector estimate if vector mode.
            (riscv_option_override): Set vector_tune_param.
            (riscv_option_override): Set riscv_vectorization_factor.
            (riscv_estimated_poly_value): Implement
            TARGET_ESTIMATED_POLY_VALUE.
            (riscv_preferred_simd_mode): Implement
            TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
        (riscv_autovectorize_vector_modes): Implement
        TARGET_AUTOVECTORIZE_VECTOR_MODES.
        (riscv_get_mask_mode): Implement 
TARGET_VECTORIZE_GET_MASK_MODE.

        (riscv_empty_mask_is_expensive): Implement
        TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
        (riscv_builtin_vectorization_cost): Implement
        TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST.
        (riscv_vectorize_create_costs): Implement
        TARGET_VECTORIZE_CREATE_COSTS.
        (TARGET_ESTIMATED_POLY_VALUE): Register target macro.
        (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Ditto.
           (TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto.
        (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Ditto.
        (TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
        (TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
        (TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK): Ditto.
        (TARGET_VECTORIZE_CREATE_COSTS): Ditto

---
 gcc/common/config/riscv/riscv-common.cc |   2 +-
 gcc/config/riscv/riscv-cores.def    |  14 +-
 gcc/config/riscv/riscv.cc   | 324 +++-
 3 files changed, 328 insertions(+), 12 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc

index ebc1ed7d7e4..6b8d92af986 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -246,7 +246,7 @@ static const riscv_cpu_info riscv_cpu_tables[] =

 static const char *riscv_tunes[] =
 {
-#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO) \
+#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO, 
VECTOR_TUNE_INFO)    \

 TUNE_NAME,
 #include "../../../config/riscv/riscv-cores.def"
 NULL
diff --git a/gcc/config/riscv/riscv-cores.def 
b/gcc/config/riscv/riscv-cores.def

index 2a834cae21d..4feb0366222 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -30,15 +30,15 @@
    identifier, reference to riscv.cc.  */

 #ifndef RISCV_TUNE
-#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO)
+#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO, VECTOR_TUNE_INFO)
 #endif

-RISCV_TUNE("rocket", generic, rocket_tune_info)
-RISCV_TUNE("sifive-3-series", generic, rocket_tune_info)
-RISCV_TUNE("sifive-5-series", generic, rocket_tune_info)
-RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info)
-RISCV_TUNE("thead-c906", generic, thead_c906_tune_info)
-RISCV_TUNE("size", generic, optimize_size_tune_info)
+RISCV_TUNE("rocket", generic, rocket_tune_info, generic_rvv_tune_info)
+RISCV_TUNE("sifive-3-series", generic, rocket_tune_info, 
generic_rvv_tune_info)
+RISCV_TUNE("sifive-5-series", generic, rocket_tune_info, 
generic_rvv_tune_info)
+RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info, 
generic_rvv_tune_info)
+RISCV_TUNE("thead-c906", generic, thead_c906_tune_info, 
generic_rvv_tune_info)

+RISCV_TUNE("size", generic, optimize_size_tune_info, generic_rvv_tune_info)

 #undef RISCV_TUNE

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index befb9b498b7..44659062070 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,16 @@ along with GCC; see the file COPYING3.  If not see
 

[PATCH v2 04/07] RISC-V: autovec: Add auto-vectorization support functions

2023-03-05 Thread Michael Collison
This patch adds support for functions used in implementing various 
portions of autovectorization support.


gcc/ChangeLog:

2023-03-02  Michael Collison 
                Juzhe Zhong 

            * config/riscv/riscv-v.cc (riscv_classify_vlmul_field):
            New function.
            (riscv_vector_preferred_simd_mode): Ditto.
            (get_mask_policy_no_pred): Ditto.
            (get_tail_policy_no_pred): Ditto.
            (riscv_tuple_mode_p): Ditto.
            (riscv_classify_nf): Ditto.
            (riscv_vlmul_regsize): Ditto.
            (riscv_vector_mask_mode_p): Ditto.
            (riscv_vector_get_mask_mode): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 176 
 1 file changed, 176 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 2d2de6e4a6c..c9a0d6b4c06 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -38,10 +38,12 @@
 #include "memmodel.h"
 #include "emit-rtl.h"
 #include "tm_p.h"
+#include "targhooks.h"
 #include "target.h"
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
+#include "riscv-vector-builtins.h"
 #include "rtx-vector-builder.h"

 using namespace riscv_vector;
@@ -109,6 +111,41 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT 
minval,

   && IN_RANGE (INTVAL (elt), minval, maxval));
 }

+/* Return the vlmul field for a specific machine mode.  */
+unsigned int
+riscv_classify_vlmul_field (enum machine_mode mode)
+{
+  /* Make the decision based on the mode's enum value rather than its
+ properties, so that we keep the correct classification regardless
+ of -mriscv-vector-bits.  */
+  switch (mode)
+    {
+    case E_VNx8BImode:
+  return VLMUL_FIELD_111;
+
+    case E_VNx4BImode:
+  return VLMUL_FIELD_110;
+
+    case E_VNx2BImode:
+  return VLMUL_FIELD_101;
+
+    case E_VNx16BImode:
+  return VLMUL_FIELD_000;
+
+    case E_VNx32BImode:
+  return VLMUL_FIELD_001;
+
+    case E_VNx64BImode:
+  return VLMUL_FIELD_010;
+
+    default:
+  break;
+    }
+
+  /* we don't care about VLMUL for Mask.  */
+  return VLMUL_FIELD_000;
+}
+
 rtx
 emit_vlmax_vsetvl (machine_mode vmode)
 {
@@ -163,6 +200,64 @@ calculate_ratio (unsigned int sew, enum vlmul_type 
vlmul)

   return ratio;
 }

+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */
+
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode, unsigned vf)
+{
+  if (!TARGET_VECTOR)
+    return word_mode;
+
+  switch (mode)
+    {
+    case E_QImode:
+  return vf == 1   ? VNx8QImode
+     : vf == 2 ? VNx16QImode
+     : vf == 4 ? VNx32QImode
+           : VNx64QImode;
+  break;
+    case E_HImode:
+  return vf == 1   ? VNx4HImode
+     : vf == 2 ? VNx8HImode
+     : vf == 4 ? VNx16HImode
+           : VNx32HImode;
+  break;
+    case E_SImode:
+  return vf == 1   ? VNx2SImode
+     : vf == 2 ? VNx4SImode
+     : vf == 4 ? VNx8SImode
+           : VNx16SImode;
+  break;
+    case E_DImode:
+  if (riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+      && riscv_vector_elen_flags != MASK_VECTOR_ELEN_FP_32)
+    return vf == 1     ? VNx1DImode
+       : vf == 2 ? VNx2DImode
+       : vf == 4 ? VNx4DImode
+             : VNx8DImode;
+  break;
+    case E_SFmode:
+  if (TARGET_HARD_FLOAT && riscv_vector_elen_flags != 
MASK_VECTOR_ELEN_32

+      && riscv_vector_elen_flags != MASK_VECTOR_ELEN_64)
+    return vf == 1     ? VNx2SFmode
+       : vf == 2 ? VNx4SFmode
+       : vf == 4 ? VNx8SFmode
+             : VNx16SFmode;
+  break;
+    case E_DFmode:
+  if (TARGET_DOUBLE_FLOAT && TARGET_VECTOR_ELEN_FP_64)
+    return vf == 1     ? VNx1DFmode
+       : vf == 2 ? VNx2DFmode
+       : vf == 4 ? VNx4DFmode
+             : VNx8DFmode;
+  break;
+    default:
+  break;
+    }
+
+  return word_mode;
+}
+
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
@@ -375,6 +470,87 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }

+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_mask_policy_for_pred (PRED_TYPE_none);
+}
+
+/* Return true if it is a RVV tuple mode.  */
+bool
+riscv_tuple_mode_p (machine_mode mode ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
+/* Return nf for a machine mode.  */
+int
+riscv_classify_nf (machine_mode mode)
+{
+  switch (mode)
+    {
+
+    default:
+  break;
+    }
+
+  return 1;
+}
+
+/* Return vlmul register size for a machine mode.  */
+int
+riscv_vlmul_regsize (machine_mode mode)
+{
+  if (GET_MODE_CLASS (mode) == MODE_VE

[PATCH v2 03/07] RISC-V: autovec: Add vector cost model

2023-03-05 Thread Michael Collison
This patches adds two new files to support the vector cost model and 
modifies the Makefile fragment to build the cost model c++ file. Due to 
the large size this patch is provided as an attachment.


gcc/ChangeLog:

2023-03-02  Michael Collison 
                Juzhe Zhong 

            * gcc/config.gcc (riscv-vector-cost.o): New object file 
to build.
            * config/riscv/riscv-vector-cost.cc: New file for riscv 
vector cost

            model
            * config/riscv/riscv-vector-cost.h: New header file for 
riscv vector

            cost model.
                * config/riscv/t-riscv: Add make rule for 
riscv-vector-cost.o.
From c606f674114a362ba0299caf160b23a98f37c898 Mon Sep 17 00:00:00 2001
From: Michael Collison 
Date: Sun, 5 Mar 2023 17:53:42 -0500
Subject: [PATCH] RISC-V: Add vector cost model

---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv-vector-cost.cc | 689 ++
 gcc/config/riscv/riscv-vector-cost.h  | 481 ++
 gcc/config/riscv/t-riscv  |   5 +
 4 files changed, 1176 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/riscv-vector-cost.cc
 create mode 100644 gcc/config/riscv/riscv-vector-cost.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index da3a6d3ba1f..4a260572a3d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -530,7 +530,7 @@ pru-*-*)
 riscv*)
 	cpu_type=riscv
 	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
-	extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
+	extra_objs="${extra_objs} riscv-vector-cost.o riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
 	d_target_objs="riscv-d.o"
 	extra_headers="riscv_vector.h"
 	target_gtfiles="$target_gtfiles \$(srcdir)/config/riscv/riscv-vector-builtins.cc"
diff --git a/gcc/config/riscv/riscv-vector-cost.cc b/gcc/config/riscv/riscv-vector-cost.cc
new file mode 100644
index 000..4abd0e54da0
--- /dev/null
+++ b/gcc/config/riscv/riscv-vector-cost.cc
@@ -0,0 +1,689 @@
+/* Cost model implementation for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define INCLUDE_STRING
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "backend.h"
+#include "rtl.h"
+#include "regs.h"
+#include "insn-config.h"
+#include "insn-attr.h"
+#include "recog.h"
+#include "rtlanal.h"
+#include "output.h"
+#include "alias.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "varasm.h"
+#include "stor-layout.h"
+#include "calls.h"
+#include "function.h"
+#include "explow.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "reload.h"
+#include "tm_p.h"
+#include "target.h"
+#include "basic-block.h"
+#include "expr.h"
+#include "optabs.h"
+#include "bitmap.h"
+#include "df.h"
+#include "diagnostic.h"
+#include "builtins.h"
+#include "predict.h"
+#include "tree-pass.h"
+#include "opts.h"
+#include "langhooks.h"
+#include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "tree-vectorizer.h"
+#include "tree-ssa-loop-niter.h"
+#include "riscv-vector-builtins.h"
+
+/* This file should be included last.  */
+#include "riscv-vector-cost.h"
+#include "target-def.h"
+
+bool
+vector_insn_cost_table::get_cost (rtx x, machine_mode mode, int *cost,
+  bool speed) const
+{
+  rtx op0, op1, op2;
+  enum rtx_code code = GET_CODE (x);
+  scalar_int_mode int_mode;
+
+  /* By default, assume that

[PATCH v2 02/07] RISC-V: autovec: Export policy functions to global scope

2023-03-05 Thread Michael Collison
This patch adds foundational support by making two functions that handle 
predication policies visibly globally.


gcc/ChangeLog:

2023-03-02  Michael Collison 
                Juzhe Zhong 

            * config/riscv/riscv-vector-builtins.cc 
(get_tail_policy_for_pred):

            Remove static declaration to to make externally visible.
            (get_mask_policy_for_pred): Ditto.
            * config/riscv/riscv-vector-builtins.h 
(get_tail_policy_for_pred):

            New external declaration.
            (get_mask_policy_for_pred): Ditto.

---
 gcc/config/riscv/riscv-vector-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-vector-builtins.h  | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc

index 2d57086262b..352ffd8867d 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -2448,7 +2448,7 @@ use_real_merge_p (enum predication_type_index pred)

 /* Get TAIL policy for predication. If predication indicates TU, 
return the TU.

    Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_tail_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tu || pred == PRED_TYPE_tum || pred == 
PRED_TYPE_tumu)
@@ -2458,7 +2458,7 @@ get_tail_policy_for_pred (enum 
predication_type_index pred)


 /* Get MASK policy for predication. If predication indicates MU, 
return the MU.

    Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_mask_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu)
diff --git a/gcc/config/riscv/riscv-vector-builtins.h 
b/gcc/config/riscv/riscv-vector-builtins.h

index 8464aa9b7e9..d62d2bdab54 100644
--- a/gcc/config/riscv/riscv-vector-builtins.h
+++ b/gcc/config/riscv/riscv-vector-builtins.h
@@ -456,6 +456,8 @@ extern const char *const operand_suffixes[NUM_OP_TYPES];
 extern const rvv_builtin_suffixes type_suffixes[NUM_VECTOR_TYPES + 1];
 extern const char *const predication_suffixes[NUM_PRED_TYPES];
 extern rvv_builtin_types_t builtin_types[NUM_VECTOR_TYPES + 1];
+extern rtx get_tail_policy_for_pred (enum predication_type_index pred);
+extern rtx get_mask_policy_for_pred (enum predication_type_index pred);

 inline tree
 rvv_arg_type_info::get_scalar_type (vector_type_index type_idx) const
--
2.34.1




[PATCH v2 01/07] RISC-V: autovec: Add new predicates and function prototypes

2023-03-05 Thread Michael Collison

This patch adds foundational support in the form of:

1. New predicates

2. New function prototypes

3. Exporting emit_vlmax_vsetvl to global scope

4. Add a new command line option -mriscv_vector_lmu

2023-03-02  Michael Collison 
                Juzhe Zhong 

            * config/riscv/riscv-protos.h (riscv_classify_vlmul_field):
            New external declaration.
            (riscv_vector_preferred_simd_mode): Ditto.
            (riscv_tuple_mode_p): Ditto.
            (riscv_vector_mask_mode_p): Ditto.
            (riscv_classify_nf): Ditto.
            (riscv_vlmul_regsize): Ditto.
            (riscv_vector_preferred_simd_mode): Ditto.
            (riscv_vector_get_mask_mode): Ditto.
            (emit_vlmax_vsetvl): Ditto.
            (get_mask_policy_no_pred): Ditto.
            (get_tail_policy_no_pred): Ditto.
            * config/riscv/riscv-opts.h (riscv_vector_bits_enum): 
New enum.

            (riscv_vector_lmul_enum): Ditto.
            (vlmul_field_enum): Ditto.
            * config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
            Remove static scope.
            * config/riscv/riscv.opt (riscv_vector_lmul):
            New option -mriscv_vector_lmul.
            * config/riscv/predicates.md (p_reg_or_const_csr_operand):
            New predicate.
            (vector_reg_or_const_dup_operand): Ditto.

---
 gcc/config/riscv/predicates.md  | 13 +++
 gcc/config/riscv/riscv-opts.h   | 40 +
 gcc/config/riscv/riscv-protos.h | 15 +
 gcc/config/riscv/riscv-v.cc |  2 +-
 gcc/config/riscv/riscv.opt  | 20 +
 5 files changed, 89 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 0d9d7701c7e..19aa5e12920 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -264,6 +264,14 @@
 })

 ;; Predicates for the V extension.
+(define_special_predicate "p_reg_or_const_csr_operand"
+  (match_code "reg, subreg, const_int")
+{
+  if (CONST_INT_P (op))
+    return satisfies_constraint_K (op);
+  return GET_MODE (op) == Pmode;
+})
+
 (define_special_predicate "vector_length_operand"
   (ior (match_operand 0 "pmode_register_operand")
    (match_operand 0 "const_csr_operand")))
@@ -291,6 +299,11 @@
   (and (match_code "const_vector")
    (match_test "rtx_equal_p (op, 
riscv_vector::gen_scalar_move_mask (GET_MODE (op)))")))


+(define_predicate "vector_reg_or_const_dup_operand"
+  (ior (match_operand 0 "register_operand")
+   (match_test "const_vec_duplicate_p (op)
+   && !CONST_POLY_INT_P (CONST_VECTOR_ELT (op, 0))")))
+
 (define_predicate "vector_mask_operand"
   (ior (match_operand 0 "register_operand")
    (match_operand 0 "vector_all_trues_mask_operand")))
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index ff398c0a2ae..c6b6d84fce4 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,46 @@ enum stack_protector_guard {
   SSP_GLOBAL            /* global canary */
 };

+/* RVV vector register sizes.  */
+enum riscv_vector_bits_enum
+{
+  RVV_SCALABLE,
+  RVV_NOT_IMPLEMENTED = RVV_SCALABLE,
+  RVV_64 = 64,
+  RVV_128 = 128,
+  RVV_256 = 256,
+  RVV_512 = 512,
+  RVV_1024 = 1024,
+  RVV_2048 = 2048,
+  RVV_4096 = 4096,
+  RVV_8192 = 8192,
+  RVV_16384 = 16384,
+  RVV_32768 = 32768,
+  RVV_65536 = 65536
+};
+
+/* vectorization factor.  */
+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
+enum vlmul_field_enum
+{
+  VLMUL_FIELD_000, /* LMUL = 1.  */
+  VLMUL_FIELD_001, /* LMUL = 2.  */
+  VLMUL_FIELD_010, /* LMUL = 4.  */
+  VLMUL_FIELD_011, /* LMUL = 8.  */
+  VLMUL_FIELD_100, /* RESERVED.  */
+  VLMUL_FIELD_101, /* LMUL = 1/8.  */
+  VLMUL_FIELD_110, /* LMUL = 1/4.  */
+  VLMUL_FIELD_111, /* LMUL = 1/2.  */
+  MAX_VLMUL_FIELD
+};
+
 #define MASK_ZICSR    (1 << 0)
 #define MASK_ZIFENCEI (1 << 1)

diff --git a/gcc/config/riscv/riscv-protos.h 
b/gcc/config/riscv/riscv-protos.h

index 88a6bf5442f..6a486a1cd61 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -217,4 +217,19 @@ const unsigned int RISCV_BUILTIN_SHIFT = 1;
 /* Mask that selects the riscv_builtin_class part of a function code.  */
 const unsigned int RISCV_BUILTIN_CLASS = (1 << RISCV_BUILTIN_SHIFT) - 1;

+/* Routines implemented in riscv-v.cc.  */
+
+namespace riscv_vector {
+extern unsigned int riscv_classify_vlmul_field (enum machine_mode m);
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode,
+                          unsigned vf);
+extern bool riscv_tuple_mode_p (machine_mode);
+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern 

[PATCH v2 00/07] RISC-V: autovec: Add auto-vectorization support

2023-03-05 Thread Michael Collison
This series of patches adds foundational support for RISC-V 
autovectorization support. These patches are based on the current 
upstream rvv vector intrinsic support and is not a new implementation. 
Most of the implementation consists of adding the new vector cost model, 
the autovectorization patterns themselves and target hooks.This 
implementation only provides support for integer addition and 
subtraction as a proof of concept. This patch set should not be 
construed to be feature complete. Based on conversations with the 
community these patches are intended to lay the groundwork for feature 
completion and collaboration within the RISC-V community.In version 1 of 
this patch submission I neglected to indicate that these patches are 
largely based off the work of Juzhe Zhong 
(juzhe.zh...@rivai.ai) of RiVAI. More 
specifically the rvv-next branch 
at:https://github.com/riscv-collab/riscv-gcc.git 
is the foundation of this 
patch set. I want to publicly apologize to Juzhe and RiVIA for not 
attributing their work visibly and publicly.As discussed on this list, 
if these patches are approved they will be merged into a 
"auto-vectorization" branch once gcc-13 branches for release.There are 
two known issues related to crashes (assert failures) associated with 
tree vectorization; one of which I have sent a patch for and have 
received feedback.


Changes in v2

- Updated ChangeLog entry to include RiVAI contributions

- Fixed ChangeLog email formatting

- Fixed gnu formatting issues in the code





[PATCH 07/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison

This patch adds tests for autovectorization of integer add and subtract.

gcc/testsuite/ChangeLog:

    * gcc.target/riscv/rvv/autovec: New directory
    for autovectorization tests.
    * gcc.target/riscv/rvv/autovec/loop-add-rv32.c: New
    test to verify code generation of vector add on rv32.
    * gcc.target/riscv/rvv/autovec/loop-add.c: New
    test to verify code generation of vector add on rv64.
    * gcc.target/riscv/rvv/autovec/loop-sub-rv32.c: New
    test to verify code generation of vector subtract on rv32.
    * gcc.target/riscv/rvv/autovec/loop-sub.c: New
    test to verify code generation of vector subtract on rv64.

---
 .../riscv/rvv/autovec/loop-add-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-add.c   | 24 +++
 .../riscv/rvv/autovec/loop-sub-rv32.c | 24 +++
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   | 24 +++
 4 files changed, 96 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c

 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c

 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c

new file mode 100644
index 000..bdc3b6892e9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv 
-mabi=ilp32d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] + b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c

new file mode 100644
index 000..d7f992c7d27
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv 
-mabi=lp64d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] + b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvadd\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c

new file mode 100644
index 000..7d0a40ec539
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv32gcv 
-mabi=ilp32d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] - b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { scan-assembler-times {\tvsub\.vv} 6 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c

new file mode 100644
index 000..c8900884f83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O2 -ftree-vectorize -march=rv64gcv 
-mabi=lp64d" } */

+
+#include 
+
+#define TEST_TYPE(TYPE)                 \
+  void vadd_##TYPE (TYPE *dst, TYPE *a, TYPE *b, int n)    \
+  {                            \
+    for (int i = 0; i < n; i++)                \
+  dst[i] = a[i] - b[i];                \
+  }
+
+/* *int8_t not autovec currently. */
+#define TEST_ALL()    \
+ TEST_TYPE(int16_t)    \
+ TEST_TYPE(uint16_t)    \
+ TEST_TYPE(int32_t)    \
+ TEST_TYPE(uint32_t)    \
+ TEST_TYPE(int64_t)    \
+ TEST_TYPE(uint64_t)
+
+TEST_ALL()
+
+/* { dg-final { 

[PATCH 06/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison
This patch adds patterns that provide basic autovectorization support 
for integer adds and subtracts.


gcc/ChangeLog:

    * config/riscv/riscv.md (riscv_classify_vlmul_field):
    New external declaration.
    (riscv_vector_preferred_simd_mode): Include
    vector-iterators.md.
    * config/riscv/vector-auto.md: New file containing
    autovectorization patterns.
    * config/riscv/vector-iterators.md (UNSPEC_VADD/UNSPEC_VSUB):
    New unspecs for autovectorization patterns.
    * config/riscv/vector.md: Remove include of vector-iterators.md
    and include vector-auto.md.

---
 gcc/config/riscv/riscv.md    |   1 +
 gcc/config/riscv/vector-auto.md  | 172 +++
 gcc/config/riscv/vector-iterators.md |   2 +
 gcc/config/riscv/vector.md   |   4 +-
 4 files changed, 177 insertions(+), 2 deletions(-)
 create mode 100644 gcc/config/riscv/vector-auto.md

diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 05924e9bbf1..c34124095f7 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -131,6 +131,7 @@
 (include "predicates.md")
 (include "constraints.md")
 (include "iterators.md")
+(include "vector-iterators.md")

 ;; 
 ;;
diff --git a/gcc/config/riscv/vector-auto.md 
b/gcc/config/riscv/vector-auto.md

new file mode 100644
index 000..e5a19663d18
--- /dev/null
+++ b/gcc/config/riscv/vector-auto.md
@@ -0,0 +1,172 @@
+;; Machine description for RISC-V 'V' Extension for GNU compiler.
+;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI 
Technologies Ltd.

+;; Contributed by Michael Collison (colli...@rivosinc.com, Rivos Inc.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+;; 
-

+;;  [INT] Addition
+;; 
-

+;; Includes:
+;; - vadd.vv
+;; - vadd.vx
+;; - vadd.vi
+;; 
-

+
+(define_expand "add3"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "vector_arith_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = gen_rtx_UNSPEC (mode, gen_rtvec (1, const0_rtx), 
UNSPEC_VUNDEF);

+  rtx vl = emit_vlmax_vsetvl (mode);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[1], 
operands[2],

+                vl, tail_policy, mask_policy, vlmax_avl_p));
+
+  DONE;
+})
+
+(define_expand "cond_add"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand: 1 "register_operand")
+   (match_operand:VI 2 "register_operand")
+   (match_operand:VI 3 "vector_reg_or_const_dup_operand")
+   (match_operand:VI 4 "register_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = operands[4];
+  rtx vl = emit_vlmax_vsetvl (mode);
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = operands[1];
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], mask, merge, operands[2], 
operands[3],

+                vl, tail_policy, mask_policy, vlmax_avl_p));
+  DONE;
+})
+
+(define_expand "len_add"
+  [(match_operand:VI 0 "register_operand")
+   (match_operand:VI 1 "register_operand")
+   (match_operand:VI 2 "vector_reg_or_const_dup_operand")
+   (match_operand 3 "p_reg_or_const_csr_operand")]
+  "TARGET_VECTOR"
+{
+  using namespace riscv_vector;
+
+  rtx merge = gen_rtx_UNSPEC (mode, gen_rtvec (1, const0_rtx), 
UNSPEC_VUNDEF);

+  rtx vl = operands[3];
+  rtx mask_policy = get_mask_policy_no_pred();
+  rtx tail_policy = get_tail_policy_no_pred();
+  rtx mask = CONSTM1_RTX(mode);
+  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);
+
+  emit_insn(gen_pred_add(operands[0], 

[PATCH 05/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison
This patch adds support for registering target hooks for basic 
autovectorization support as well as basic tuning information for the 
vector extension.


gcc/ChangeLog:

    * config/riscv/riscv-cores.def (RISCV_TUNE):
    Add VECTOR_TUNE_INFO parameter and
    * common/config/riscv/riscv-common.cc (RISCV_TUNE):
    Add VECTOR_TUNE_INFO parameter.
    * config/riscv/riscv.cc (riscv_vector_tune_param):
    New struct for vector tuning information.
    (riscv_tune_info): add vector_tune_param.
    (vector_tune_param): New static variable.
    (riscv_vectorization_factor): New variable.
    (generic_rvv_insn_scale_table): New struct.
    (generic_rvv_stmt_scale_table): New struct.
    (generic_rvv_insn_cost_table): New vector insn cost table.
    (generic_rvv_stmt_cost_table): New vector statement cost table.
    (generic_rvv_tune_info): New rvv tuning table.
    (RISCV_TUNE): Add VECTOR_TUNE_INFO parameter.
    (riscv_rtx_costs): Return vector estimate if vector mode.
    (riscv_option_override): Set vector_tune_param.
    (riscv_option_override): Set riscv_vectorization_factor.
    (riscv_estimated_poly_value): Implement
    TARGET_ESTIMATED_POLY_VALUE.
    (riscv_preferred_simd_mode): Implement
    TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
    (riscv_autovectorize_vector_modes): Implement
    TARGET_AUTOVECTORIZE_VECTOR_MODES.
    (riscv_get_mask_mode): Implement TARGET_VECTORIZE_GET_MASK_MODE.
    (riscv_empty_mask_is_expensive): Implement
    TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE.
    (riscv_builtin_vectorization_cost): Implement
    TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST.
    (riscv_vectorize_create_costs): Implement
    TARGET_VECTORIZE_CREATE_COSTS.
    (TARGET_ESTIMATED_POLY_VALUE): Register target macro.
    (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): Ditto.
    (TARGET_VECTORIZE_PREFERRED_SIMD_MODE): Ditto.
    (TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Ditto.
    (TARGET_VECTORIZE_GET_MASK_MODE): Ditto.
    (TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE): Ditto.
    (TARGET_VECTORIZE_LOOP_LEN_OVERRIDE_MASK): Ditto.
    (TARGET_VECTORIZE_CREATE_COSTS): Ditto

---
 gcc/common/config/riscv/riscv-common.cc |   2 +-
 gcc/config/riscv/riscv-cores.def    |  14 +-
 gcc/config/riscv/riscv.cc   | 321 +++-
 3 files changed, 325 insertions(+), 12 deletions(-)

diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc

index ebc1ed7d7e4..6b8d92af986 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -246,7 +246,7 @@ static const riscv_cpu_info riscv_cpu_tables[] =

 static const char *riscv_tunes[] =
 {
-#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO) \
+#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO, 
VECTOR_TUNE_INFO)    \

 TUNE_NAME,
 #include "../../../config/riscv/riscv-cores.def"
 NULL
diff --git a/gcc/config/riscv/riscv-cores.def 
b/gcc/config/riscv/riscv-cores.def

index 2a834cae21d..4feb0366222 100644
--- a/gcc/config/riscv/riscv-cores.def
+++ b/gcc/config/riscv/riscv-cores.def
@@ -30,15 +30,15 @@
    identifier, reference to riscv.cc.  */

 #ifndef RISCV_TUNE
-#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO)
+#define RISCV_TUNE(TUNE_NAME, PIPELINE_MODEL, TUNE_INFO, VECTOR_TUNE_INFO)
 #endif

-RISCV_TUNE("rocket", generic, rocket_tune_info)
-RISCV_TUNE("sifive-3-series", generic, rocket_tune_info)
-RISCV_TUNE("sifive-5-series", generic, rocket_tune_info)
-RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info)
-RISCV_TUNE("thead-c906", generic, thead_c906_tune_info)
-RISCV_TUNE("size", generic, optimize_size_tune_info)
+RISCV_TUNE("rocket", generic, rocket_tune_info, generic_rvv_tune_info)
+RISCV_TUNE("sifive-3-series", generic, rocket_tune_info, 
generic_rvv_tune_info)
+RISCV_TUNE("sifive-5-series", generic, rocket_tune_info, 
generic_rvv_tune_info)
+RISCV_TUNE("sifive-7-series", sifive_7, sifive_7_tune_info, 
generic_rvv_tune_info)
+RISCV_TUNE("thead-c906", generic, thead_c906_tune_info, 
generic_rvv_tune_info)

+RISCV_TUNE("size", generic, optimize_size_tune_info, generic_rvv_tune_info)

 #undef RISCV_TUNE

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index f11b7949a49..16b38ba4d76 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -60,6 +60,16 @@ along with GCC; see the file COPYING3.  If not see
 #include "opts.h"
 #include "tm-constrs.h"
 #include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "cfgrtl.h"
+#include "sel-sched.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "gimple-expr.h"
+#include "tree-vectorizer.h"
+#include "riscv-vector-cost.h"

 /* This file should be included last.  */
 #include "target-def.h"
@@ -238,6 +248,12 @@ struct riscv_tune_param
   bool slow_unaligned_access;
 };

+/* Cost for vector insn classes.  */
+struct riscv_vector_tune_param {
+    const vector_insn_cost_table* rvv_insn_costs_table;

[PATCH 04/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison
This patch adds support for functions used in implementing various 
portions of autovectorization support.


gcc/ChangeLog:

    * config/riscv/riscv-v.cc (riscv_classify_vlmul_field):
    New function.
    (riscv_vector_preferred_simd_mode): Ditto.
    (get_mask_policy_no_pred): Ditto.
    (get_tail_policy_no_pred): Ditto.
    (riscv_tuple_mode_p): Ditto.
    (riscv_classify_nf): Ditto.
    (riscv_vlmul_regsize): Ditto.
    (riscv_vector_mask_mode_p): Ditto.
    (riscv_vector_get_mask_mode): Ditto.

---
 gcc/config/riscv/riscv-v.cc | 176 
 1 file changed, 176 insertions(+)

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 58007cc16eb..58f69e259c0 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -39,9 +39,11 @@
 #include "emit-rtl.h"
 #include "tm_p.h"
 #include "target.h"
+#include "targhooks.h"
 #include "expr.h"
 #include "optabs.h"
 #include "tm-constrs.h"
+#include "riscv-vector-builtins.h"

 using namespace riscv_vector;

@@ -108,6 +110,41 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT 
minval,

   && IN_RANGE (INTVAL (elt), minval, maxval));
 }

+/* Return the vlmul field for a specific machine mode. */
+unsigned int
+riscv_classify_vlmul_field (enum machine_mode mode)
+{
+  /* Make the decision based on the mode's enum value rather than its
+ properties, so that we keep the correct classification regardless
+ of -mriscv-vector-bits.  */
+  switch (mode)
+    {
+    case E_VNx8BImode:
+  return VLMUL_FIELD_111;
+
+    case E_VNx4BImode:
+  return VLMUL_FIELD_110;
+
+    case E_VNx2BImode:
+  return VLMUL_FIELD_101;
+
+    case E_VNx16BImode:
+  return VLMUL_FIELD_000;
+
+    case E_VNx32BImode:
+  return VLMUL_FIELD_001;
+
+    case E_VNx64BImode:
+  return VLMUL_FIELD_010;
+
+    default:
+  break;
+    }
+
+  /* we don't care about VLMUL for Mask */
+  return VLMUL_FIELD_000;
+}
+
 rtx
 emit_vlmax_vsetvl (machine_mode vmode)
 {
@@ -162,6 +199,64 @@ calculate_ratio (unsigned int sew, enum vlmul_type 
vlmul)

   return ratio;
 }

+/* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE for RVV.  */
+
+machine_mode
+riscv_vector_preferred_simd_mode (scalar_mode mode, unsigned vf)
+{
+  if (!TARGET_VECTOR)
+    return word_mode;
+
+  switch (mode)
+    {
+    case E_QImode:
+  return vf == 1   ? VNx8QImode
+     : vf == 2 ? VNx16QImode
+     : vf == 4 ? VNx32QImode
+           : VNx64QImode;
+  break;
+    case E_HImode:
+  return vf == 1   ? VNx4HImode
+     : vf == 2 ? VNx8HImode
+     : vf == 4 ? VNx16HImode
+           : VNx32HImode;
+  break;
+    case E_SImode:
+  return vf == 1   ? VNx2SImode
+     : vf == 2 ? VNx4SImode
+     : vf == 4 ? VNx8SImode
+           : VNx16SImode;
+  break;
+    case E_DImode:
+  if (riscv_vector_elen_flags != MASK_VECTOR_ELEN_32
+      && riscv_vector_elen_flags != MASK_VECTOR_ELEN_FP_32)
+    return vf == 1     ? VNx1DImode
+       : vf == 2 ? VNx2DImode
+       : vf == 4 ? VNx4DImode
+             : VNx8DImode;
+  break;
+    case E_SFmode:
+  if (TARGET_HARD_FLOAT && riscv_vector_elen_flags != 
MASK_VECTOR_ELEN_32

+      && riscv_vector_elen_flags != MASK_VECTOR_ELEN_64)
+    return vf == 1     ? VNx2SFmode
+       : vf == 2 ? VNx4SFmode
+       : vf == 4 ? VNx8SFmode
+             : VNx16SFmode;
+  break;
+    case E_DFmode:
+  if (TARGET_DOUBLE_FLOAT && TARGET_VECTOR_ELEN_FP_64)
+    return vf == 1     ? VNx1DFmode
+       : vf == 2 ? VNx2DFmode
+       : vf == 4 ? VNx4DFmode
+             : VNx8DFmode;
+  break;
+    default:
+  break;
+    }
+
+  return word_mode;
+}
+
 /* Emit an RVV unmask && vl mov from SRC to DEST.  */
 static void
 emit_pred_op (unsigned icode, rtx mask, rtx dest, rtx src, rtx len,
@@ -374,6 +469,87 @@ get_avl_type_rtx (enum avl_type type)
   return gen_int_mode (type, Pmode);
 }

+rtx
+get_mask_policy_no_pred ()
+{
+  return get_mask_policy_for_pred(PRED_TYPE_none);
+}
+
+rtx
+get_tail_policy_no_pred ()
+{
+  return get_mask_policy_for_pred(PRED_TYPE_none);
+}
+
+/* Return true if it is a RVV tuple mode. */
+bool
+riscv_tuple_mode_p (machine_mode mode ATTRIBUTE_UNUSED)
+{
+  return false;
+}
+
+/* Return nf for a machine mode. */
+int
+riscv_classify_nf (machine_mode mode)
+{
+  switch (mode)
+    {
+
+    default:
+  break;
+    }
+
+  return 1;
+}
+
+/* Return vlmul register size for a machine mode. */
+int
+riscv_vlmul_regsize (machine_mode mode)
+{
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_BOOL)
+    return 1;
+  switch (riscv_classify_vlmul_field (mode))
+    {
+    case VLMUL_FIELD_001:
+  return 2;
+    case VLMUL_FIELD_010:
+  return 4;
+    case VLMUL_FIELD_011:
+  return 8;
+    case VLMUL_FIELD_100:
+  gcc_unreachable ();
+    default:
+  return 1;
+    }
+}
+
+/* Return true if it is a RVV mask mode. */
+bool
+riscv_vector_mask_mode_p (machine_mode mode)

[PATCH 03/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison
This patches adds two new files to support the vector cost model and 
modifies the Makefile fragment to build the cost model c++ file. Due to 
the large size this patch is provided as an attachment.


gcc/ChangeLog:

    * gcc/config.gcc (riscv-vector-cost.o): New object file to build.
    * config/riscv/riscv-vector-cost.cc: New file for riscv vector cost
    model
    * config/riscv/riscv-vector-cost.h: New header file for riscv vector
    cost model.
    * config/riscv/t-riscv: Add make rule for riscv-vector-cost.o.


From eb995818cd5f77f85e8df93b690b00ce1fd1aa35 Mon Sep 17 00:00:00 2001
From: Michael Collison 
Date: Thu, 2 Mar 2023 12:27:36 -0500
Subject: [PATCH] Autovectorization patch set 2

---
 gcc/config.gcc|   2 +-
 gcc/config/riscv/riscv-vector-cost.cc | 620 ++
 gcc/config/riscv/riscv-vector-cost.h  | 400 +
 gcc/config/riscv/t-riscv  |   5 +
 4 files changed, 1026 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/riscv-vector-cost.cc
 create mode 100644 gcc/config/riscv/riscv-vector-cost.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index c070e6ecd2e..a401187 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -530,7 +530,7 @@ pru-*-*)
 riscv*)
 	cpu_type=riscv
 	extra_objs="riscv-builtins.o riscv-c.o riscv-sr.o riscv-shorten-memrefs.o riscv-selftests.o riscv-v.o riscv-vsetvl.o"
-	extra_objs="${extra_objs} riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
+	extra_objs="${extra_objs} riscv-vector-cost.o riscv-vector-builtins.o riscv-vector-builtins-shapes.o riscv-vector-builtins-bases.o"
 	d_target_objs="riscv-d.o"
 	extra_headers="riscv_vector.h"
 	target_gtfiles="$target_gtfiles \$(srcdir)/config/riscv/riscv-vector-builtins.cc"
diff --git a/gcc/config/riscv/riscv-vector-cost.cc b/gcc/config/riscv/riscv-vector-cost.cc
new file mode 100644
index 000..5a33b20843a
--- /dev/null
+++ b/gcc/config/riscv/riscv-vector-cost.cc
@@ -0,0 +1,620 @@
+/* Cost model implementation for RISC-V 'V' Extension for GNU compiler.
+   Copyright (C) 2022-2023 Free Software Foundation, Inc.
+   Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   GCC is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#define INCLUDE_STRING
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "backend.h"
+#include "rtl.h"
+#include "regs.h"
+#include "insn-config.h"
+#include "insn-attr.h"
+#include "recog.h"
+#include "rtlanal.h"
+#include "output.h"
+#include "alias.h"
+#include "tree.h"
+#include "stringpool.h"
+#include "attribs.h"
+#include "varasm.h"
+#include "stor-layout.h"
+#include "calls.h"
+#include "function.h"
+#include "explow.h"
+#include "memmodel.h"
+#include "emit-rtl.h"
+#include "reload.h"
+#include "tm_p.h"
+#include "target.h"
+#include "basic-block.h"
+#include "expr.h"
+#include "optabs.h"
+#include "bitmap.h"
+#include "df.h"
+#include "diagnostic.h"
+#include "builtins.h"
+#include "predict.h"
+#include "tree-pass.h"
+#include "opts.h"
+#include "langhooks.h"
+#include "rtl-iter.h"
+#include "gimple.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "fold-const.h"
+#include "gimple-iterator.h"
+#include "tree-vectorizer.h"
+#include "tree-ssa-loop-niter.h"
+#include "riscv-vector-builtins.h"
+
+/* This file should be included last.  */
+#include "riscv-vector-cost.h"
+#include "target-def.h"
+
+bool vector_insn_cost_table::get_cost(rtx x, machine_mode mode, int *cost,
+  bool speed) const {
+  rtx op0, op1, op2;
+  enum rtx_code code = GET_CODE(x);
+  scalar_int_mode int_mode;
+
+  /* By default, assume that everything has equivalent cost to the
+ cheapest instruction.  Any additional costs are applie

[PATCH 02/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison
This patch adds foundational support by making two functions that handle 
predication policies visibly globally.


gcc/ChangeLog:

    * config/riscv/riscv-vector-builtins.cc (get_tail_policy_for_pred):
    Remove static declaration to to make externally visible.
    (get_mask_policy_for_pred): Ditto.
    * config/riscv/riscv-vector-builtins.h (get_tail_policy_for_pred):
    New external declaration.
    (get_mask_policy_for_pred): Ditto.

---
 gcc/config/riscv/riscv-vector-builtins.cc | 4 ++--
 gcc/config/riscv/riscv-vector-builtins.h  | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc

index 2e92ece3b64..90fc73a5bcf 100644
--- a/gcc/config/riscv/riscv-vector-builtins.cc
+++ b/gcc/config/riscv/riscv-vector-builtins.cc
@@ -1850,7 +1850,7 @@ use_real_merge_p (enum predication_type_index pred)

 /* Get TAIL policy for predication. If predication indicates TU, 
return the TU.

    Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_tail_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tu || pred == PRED_TYPE_tum || pred == 
PRED_TYPE_tumu)
@@ -1860,7 +1860,7 @@ get_tail_policy_for_pred (enum 
predication_type_index pred)


 /* Get MASK policy for predication. If predication indicates MU, 
return the MU.

    Otherwise, return the prefer default configuration.  */
-static rtx
+rtx
 get_mask_policy_for_pred (enum predication_type_index pred)
 {
   if (pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu)
diff --git a/gcc/config/riscv/riscv-vector-builtins.h 
b/gcc/config/riscv/riscv-vector-builtins.h

index ede08c6a480..135e2463b1e 100644
--- a/gcc/config/riscv/riscv-vector-builtins.h
+++ b/gcc/config/riscv/riscv-vector-builtins.h
@@ -433,6 +433,8 @@ extern const char *const operand_suffixes[NUM_OP_TYPES];
 extern const rvv_builtin_suffixes type_suffixes[NUM_VECTOR_TYPES + 1];
 extern const char *const predication_suffixes[NUM_PRED_TYPES];
 extern rvv_builtin_types_t builtin_types[NUM_VECTOR_TYPES + 1];
+extern rtx get_tail_policy_for_pred (enum predication_type_index pred);
+extern rtx get_mask_policy_for_pred (enum predication_type_index pred);

 inline bool
 function_instance::operator!= (const function_instance ) const
--
2.34.1



[PATCH 01/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison

This patch adds foundational support in the form of:

1. New predicates

2. New function prototypes

3. Exporting emit_vlmax_vsetvl to global scope

4. Add a new command line option -mriscv_vector_lmul

gcc/ChangeLog:

    * config/riscv/riscv-protos.h (riscv_classify_vlmul_field):
    New external declaration.
    (riscv_vector_preferred_simd_mode): Ditto.
    (riscv_tuple_mode_p): Ditto.
    (riscv_vector_mask_mode_p): Ditto.
    (riscv_classify_nf): Ditto.
    (riscv_vlmul_regsize): Ditto.
    (riscv_vector_preferred_simd_mode): Ditto.
    (riscv_vector_get_mask_mode): Ditto.
    (emit_vlmax_vsetvl): Ditto.
    (get_mask_policy_no_pred): Ditto.
    (get_tail_policy_no_pred): Ditto.
    * config/riscv/riscv-opts.h (riscv_vector_bits_enum): New enum.
    (riscv_vector_lmul_enum): Ditto.
    (vlmul_field_enum): Ditto.
    * config/riscv/riscv-v.cc (emit_vlmax_vsetvl):
    Remove static scope.
    * config/riscv/riscv.opt (riscv_vector_lmul):
    New option -mriscv_vector_lmul.
    * config/riscv/predicates.md (p_reg_or_const_csr_operand):
    New predicate.
    (vector_reg_or_const_dup_operand): Ditto.

---
 gcc/config/riscv/predicates.md  | 13 +++
 gcc/config/riscv/riscv-opts.h   | 40 +
 gcc/config/riscv/riscv-protos.h | 16 +
 gcc/config/riscv/riscv-v.cc |  2 +-
 gcc/config/riscv/riscv.opt  | 20 +
 5 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 7bc7c0b4f4d..31517ae4606 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -264,6 +264,14 @@
 })

 ;; Predicates for the V extension.
+(define_special_predicate "p_reg_or_const_csr_operand"
+  (match_code "reg, subreg, const_int")
+{
+  if (CONST_INT_P (op))
+    return satisfies_constraint_K (op);
+  return GET_MODE (op) == Pmode;
+})
+
 (define_special_predicate "vector_length_operand"
   (ior (match_operand 0 "pmode_register_operand")
    (match_operand 0 "const_csr_operand")))
@@ -287,6 +295,11 @@
   (ior (match_operand 0 "register_operand")
    (match_test "op == CONSTM1_RTX (GET_MODE (op))")))

+(define_predicate "vector_reg_or_const_dup_operand"
+  (ior (match_operand 0 "register_operand")
+   (match_test "const_vec_duplicate_p (op)
+  && !CONST_POLY_INT_P (CONST_VECTOR_ELT (op, 0))")))
+
 (define_predicate "vector_mask_operand"
   (ior (match_operand 0 "register_operand")
    (match_operand 0 "vector_all_trues_mask_operand")))
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index ff398c0a2ae..2057a14e153 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -67,6 +67,46 @@ enum stack_protector_guard {
   SSP_GLOBAL            /* global canary */
 };

+/* RVV vector register sizes.  */
+enum riscv_vector_bits_enum
+{
+  RVV_SCALABLE,
+  RVV_NOT_IMPLEMENTED = RVV_SCALABLE,
+  RVV_64 = 64,
+  RVV_128 = 128,
+  RVV_256 = 256,
+  RVV_512 = 512,
+  RVV_1024 = 1024,
+  RVV_2048 = 2048,
+  RVV_4096 = 4096,
+  RVV_8192 = 8192,
+  RVV_16384 = 16384,
+  RVV_32768 = 32768,
+  RVV_65536 = 65536
+};
+
+/* vectorization factor.  */
+enum riscv_vector_lmul_enum
+{
+  RVV_LMUL1 = 1,
+  RVV_LMUL2 = 2,
+  RVV_LMUL4 = 4,
+  RVV_LMUL8 = 8
+};
+
+enum vlmul_field_enum
+{
+  VLMUL_FIELD_000, /* LMUL = 1 */
+  VLMUL_FIELD_001, /* LMUL = 2 */
+  VLMUL_FIELD_010, /* LMUL = 4 */
+  VLMUL_FIELD_011, /* LMUL = 8 */
+  VLMUL_FIELD_100, /* RESERVED */
+  VLMUL_FIELD_101, /* LMUL = 1/8 */
+  VLMUL_FIELD_110, /* LMUL = 1/4 */
+  VLMUL_FIELD_111, /* LMUL = 1/2 */
+  MAX_VLMUL_FIELD
+};
+
 #define MASK_ZICSR    (1 << 0)
 #define MASK_ZIFENCEI (1 << 1)

diff --git a/gcc/config/riscv/riscv-protos.h 
b/gcc/config/riscv/riscv-protos.h

index 37c634eca1d..70c8dc4ce69 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -200,4 +200,19 @@ const unsigned int RISCV_BUILTIN_SHIFT = 1;
 /* Mask that selects the riscv_builtin_class part of a function code.  */
 const unsigned int RISCV_BUILTIN_CLASS = (1 << RISCV_BUILTIN_SHIFT) - 1;

+/* Routines implemented in riscv-v.cc*/
+
+namespace riscv_vector {
+extern unsigned int riscv_classify_vlmul_field (enum machine_mode m);
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode, 
unsigned vf);

+extern bool riscv_tuple_mode_p (machine_mode);
+extern bool riscv_vector_mask_mode_p (machine_mode);
+extern int riscv_classify_nf (machine_mode);
+extern int riscv_vlmul_regsize(machine_mode);
+extern machine_mode riscv_vector_preferred_simd_mode (scalar_mode mode, 
unsigned vf);

+extern opt_machine_mode riscv_vector_get_mask_mode (machine_mode mode);
+extern rtx emit_vlmax_vsetvl (machine_mode vmode);
+extern rtx get_mask_policy_no_pred ();
+extern rtx get_tail_policy_no_pred ();
+}
 #endif /* ! GCC_RISCV_PROTOS_H */
diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 59c25c65cd5..58007cc16eb 100644
--- 

[PATCH 00/07] RISC-V: Add auto-vectorization support

2023-03-02 Thread Michael Collison
This series of patches adds foundational support for RISC-V 
autovectorization. These patches are based on the current upstream rvv 
vector intrinsic support and is not a new implementation. Most of the 
implementation consists of adding the new vector cost model, the 
autovectorization patterns themselves and target hooks.


This implementation only provides support for integer addition and 
subtraction as a proof of concept.


As discussed on this list, if these patches are approved they will be 
merged into a "auto-vectorization" branch once gcc-13 branches for release.


There are two known issues related to crashes (assert failures) 
associated with tree vectorization; one of which I have sent a patch for 
and have received feedback. I will be sending a patch for the second 
issue tomorrow.



 gcc/common/config/riscv/riscv-common.cc   |   2 +-
 gcc/config.gcc    |   2 +-
 gcc/config/riscv/predicates.md    |  13 +
 gcc/config/riscv/riscv-cores.def  |  14 +-
 gcc/config/riscv/riscv-opts.h |  40 ++
 gcc/config/riscv/riscv-protos.h   |  15 +
 gcc/config/riscv/riscv-v.cc   | 178 -
 gcc/config/riscv/riscv-vector-builtins.cc |   4 +-
 gcc/config/riscv/riscv-vector-builtins.h  |   2 +
 gcc/config/riscv/riscv-vector-cost.cc | 620 ++
 gcc/config/riscv/riscv-vector-cost.h  | 400 +++
 gcc/config/riscv/riscv.cc | 321 -
 gcc/config/riscv/riscv.md |   1 +
 gcc/config/riscv/riscv.opt    |  20 +
 gcc/config/riscv/t-riscv  |   5 +
 gcc/config/riscv/vector-auto.md   | 172 +
 gcc/config/riscv/vector-iterators.md  |   2 +
 gcc/config/riscv/vector.md    |   4 +-
 .../riscv/rvv/autovec/loop-add-rv32.c |  24 +
 .../gcc.target/riscv/rvv/autovec/loop-add.c   |  24 +
 .../riscv/rvv/autovec/loop-sub-rv32.c |  24 +
 .../gcc.target/riscv/rvv/autovec/loop-sub.c   |  24 +
 22 files changed, 1893 insertions(+), 18 deletions(-)
 create mode 100644 gcc/config/riscv/riscv-vector-cost.cc
 create mode 100644 gcc/config/riscv/riscv-vector-cost.h
 create mode 100644 gcc/config/riscv/vector-auto.md
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add-rv32.c

 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-add.c
 create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub-rv32.c

 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/loop-sub.c



Re: [PATCH] vect: Check that vector factor is a compile-time constant

2023-03-01 Thread Michael Collison
Okay there seems to be consensus on using constant_lower_bound (vf), but 
I don't understand how that is a replacement for "vf.is_constant ()"? In 
one case we are checking if "vf" is a constant, on the other we are 
asking for the lower bound. For the crash in question 
"constant_lower_bound (vf) " returns the integer value of two.


On 2/27/23 09:51, Richard Sandiford wrote:

FWIW, this patch looks good to me.  I'd argue it's a regression fix
of kinds, in that the current code was correct before variable VF and
became incorrect after variable VF.  It might be possible to trigger
the problem on SVE too, with a sufficiently convoluted test case.
(Haven't tried though.)

Richard Biener  writes:

On Wed, Feb 22, 2023 at 12:03 AM Michael Collison  wrote:

While working on autovectorizing for the RISCV port I encountered an
issue where vect_do_peeling assumes that the vectorization factor is a
compile-time constant. The vectorization is not a compile-time constant
on RISCV.

Tested on RISCV and x86_64-linux-gnu. Okay?

I wonder how you arrive at prologue peeling with a non-constant VF?

Not sure about the RVV case, but I think it makes sense in principle.
E.g. if some ISA takes the LOAD_LEN rather than fully-predicated
approach, it can't easily use the first iteration of the vector loop
to do peeling for alignment.  (At least, the IV steps would then
no longer match VF for all iterations.)  I guess it could use a
*different* vector loop, but we don't support that yet.

There are also some corner cases for which we still don't support
predicated loops and instead fall back on an unpredicated VLA loop
followed by a scalar epilogue.  Peeling for alignment would then
require a scalar prologue too.


In any case it would probably be better to use constant_lower_bound (vf)
here?  Also it looks wrong to apply this limit in case we are using
a fully masked main vector loop.  But as said, the specific case of
non-constant VF and prologue peeling probably wasn't supposed to happen,
instead the prologue usually is applied via an offset to a fully masked loop?

Hmm, yeah, agree constant_lower_bound should work too.

Thanks,
Richard


Richard?

Thanks,
Richard.


Michael

gcc/

  * tree-vect-loop-manip.cc (vect_do_peeling): Verify
  that vectorization factor is a compile-time constant.

---
   gcc/tree-vect-loop-manip.cc | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 6aa3d2ed0bf..1ad1961c788 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2930,7 +2930,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
niters, tree nitersm1,
 niters = vect_build_loop_niters (loop_vinfo, _var_p);
 /* It's guaranteed that vector loop bound before vectorization is at
least VF, so set range information for newly generated var. */
-  if (new_var_p)
+  if (new_var_p && vf.is_constant ())
   {
 value_range vr (type,
 wi::to_wide (build_int_cst (type, vf)),
--
2.34.1



Re: [PATCH] vect: Check that vector factor is a compile-time constant

2023-02-22 Thread Michael Collison

Hi Jeff,

We do not have two independent implementations: my work is 100% based on 
the vector intrinsic foundation in upstream GCC. In fact I have only 
added two core patterns, vector add and subtract, that are based on the 
existing vector intrinsics implementation:


(define_expand "add3"
  [(match_operand:VI 0 "register_operand")
   (match_operand:VI 1 "register_operand")
   (match_operand:VI 2 "vector_arith_operand")]
  "TARGET_VECTOR"
{
  using namespace riscv_vector;

  rtx merge = gen_rtx_UNSPEC (mode, gen_rtvec (1, const0_rtx), 
UNSPEC_VUNDEF);

  rtx vl = emit_vlmax_vsetvl (mode);
  rtx mask_policy = get_mask_policy_no_pred();
  rtx tail_policy = get_tail_policy_no_pred();
  rtx mask = CONSTM1_RTX(mode);
  rtx vlmax_avl_p = get_avl_type_rtx(NONVLMAX);

  emit_insn(gen_pred_add(operands[0], mask, merge, operands[1], 
operands[2],

            vl, tail_policy, mask_policy, vlmax_avl_p));

  DONE;
})

This pattern leverages the existing vector intrinsics framework. The 
bulk of the changes are the cost model, and target macros. The cost 
model is based on Juzhe's work.


The point I am making is the auto-vectorization work is no more 
experimental than the intrinsics work which is still being merged.


On 2/22/23 23:01, Jeff Law wrote:



On 2/22/23 10:54, Michael Collison wrote:

Juzhe,

I disagree with this comment. There are many stakeholders for 
autovectorization and waiting until GCC 14 is not a viable solution 
for us as well as other stakeholders ready to begin work on 
autovectorization.


As we discussed I have been moving forward with patches for 
autovectorization and am preparing to send them to gcc-patches. This 
assert is preventing code from compiling and needs to be addressed.


If you have a solution in either the RISCV backend or in this file 
can you please present it?
I don't necessarily think it means waiting for gcc-14, but it does 
mean waiting for gcc-13 to branch and gcc-14 development to open. I 
would object to anyone trying to push forward an autovec 
implementation into gcc-13.  We're well past that point IMHO, even if 
the changes only affected the RISC-V backend.


Given that it looks like we have two independent implementations we're 
almost certainly going to have to sit down with both, evaluate both 
from a quality of code viewpoint and benchmark them both and 
ultimately choose one implementation or the other, or maybe even some 
mixing and matching.


I would strongly suggest that both groups have implementations we can 
start evaluating from a design/implementation standpoint relatively 
soon.  Ideally both groups would actually have branches in the repo 
that are regularly updated with their current implementation.


While I have a great interest in seeing an autovec implementation move 
forward as soon as possible after gcc-14 development opens, I have no 
opinions at this point about either of the two existing implementations.


Jeff


Re: [PATCH] vect: Check that vector factor is a compile-time constant

2023-02-22 Thread Michael Collison

Juzhe,

I disagree with this comment. There are many stakeholders for 
autovectorization and waiting until GCC 14 is not a viable solution for 
us as well as other stakeholders ready to begin work on autovectorization.


As we discussed I have been moving forward with patches for 
autovectorization and am preparing to send them to gcc-patches. This 
assert is preventing code from compiling and needs to be addressed.


If you have a solution in either the RISCV backend or in this file can 
you please present it?


On 2/22/23 10:27, juzhe.zh...@rivai.ai wrote:

>/gcc/ />//>/* tree-vect-loop-manip.cc (vect_do_peeling): Verify />/that vectorization factor is a compile-time constant. />//>/--- 
/>/gcc/tree-vect-loop-manip.cc | 2 +- />/1 file changed, 1 insertion(+), 1 deletion(-) />//>/diff --git a/gcc/tree-vect-loop-manip.cc 
b/gcc/tree-vect-loop-manip.cc />/index 6aa3d2ed0bf..1ad1961c788 100644 />/--- a/gcc/tree-vect-loop-manip.cc />/+++ b/gcc/tree-vect-loop-manip.cc 
/>/@@ -2930,7 +2930,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree />/niters, tree nitersm1, />/niters = vect_build_loop_niters (loop_vinfo, 
_var_p); />//* It's guaranteed that vector loop bound before vectorization is at />/least VF, so set range information for newly generated var. */ 
/>/- if (new_var_p) />/+ if (new_var_p && vf.is_constant ()) />/{ />/value_range vr (type, />/wi::to_wide (build_int_cst (type, vf)),/

I don't think we need to apply this limit in case of RVV 
auto-vectorization.
I have talked with Kito and I have a full solution of supporting RVV 
solution.


We are going to support RVV auto-vectorization in 3 configuration 
according to RVV ISA spec:
1. -march=zve32* support QI and HI auto-vectorization by VNx4QImode 
and VNx2HImode
2. -march=zve64* support QI and HI and SI auto-vectorization by 
VNx8QImode and VNx4HImode and VNx2SImode
3.-march=v* support QI and HI and SI and DI auto-vectorization by 
VNx16QImode and VNx8HImode and VNx4SImode and VNx2DImode


I will support them in GCC 14. Current loop vectorizer works well for 
us no need to fix it.

Thanks.

juzhe.zh...@rivai.ai


Re: [PATCH] vect: Check that vector factor is a compile-time constant

2023-02-22 Thread Michael Collison

Richard how would I check for a full masked main vector loop?

On 2/22/23 03:20, Richard Biener wrote:

On Wed, Feb 22, 2023 at 12:03 AM Michael Collison  wrote:

While working on autovectorizing for the RISCV port I encountered an
issue where vect_do_peeling assumes that the vectorization factor is a
compile-time constant. The vectorization is not a compile-time constant
on RISCV.

Tested on RISCV and x86_64-linux-gnu. Okay?

I wonder how you arrive at prologue peeling with a non-constant VF?
In any case it would probably be better to use constant_lower_bound (vf)
here?  Also it looks wrong to apply this limit in case we are using
a fully masked main vector loop.  But as said, the specific case of
non-constant VF and prologue peeling probably wasn't supposed to happen,
instead the prologue usually is applied via an offset to a fully masked loop?

Richard?

Thanks,
Richard.


Michael

gcc/

  * tree-vect-loop-manip.cc (vect_do_peeling): Verify
  that vectorization factor is a compile-time constant.

---
   gcc/tree-vect-loop-manip.cc | 2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 6aa3d2ed0bf..1ad1961c788 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2930,7 +2930,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree
niters, tree nitersm1,
 niters = vect_build_loop_niters (loop_vinfo, _var_p);
 /* It's guaranteed that vector loop bound before vectorization is at
least VF, so set range information for newly generated var. */
-  if (new_var_p)
+  if (new_var_p && vf.is_constant ())
   {
 value_range vr (type,
 wi::to_wide (build_int_cst (type, vf)),
--
2.34.1



[PATCH] vect: Check that vector factor is a compile-time constant

2023-02-21 Thread Michael Collison
While working on autovectorizing for the RISCV port I encountered an 
issue where vect_do_peeling assumes that the vectorization factor is a 
compile-time constant. The vectorization is not a compile-time constant 
on RISCV.


Tested on RISCV and x86_64-linux-gnu. Okay?

Michael

gcc/

    * tree-vect-loop-manip.cc (vect_do_peeling): Verify
    that vectorization factor is a compile-time constant.

---
 gcc/tree-vect-loop-manip.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 6aa3d2ed0bf..1ad1961c788 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -2930,7 +2930,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree 
niters, tree nitersm1,

   niters = vect_build_loop_niters (loop_vinfo, _var_p);
   /* It's guaranteed that vector loop bound before vectorization is at
  least VF, so set range information for newly generated var. */
-  if (new_var_p)
+  if (new_var_p && vf.is_constant ())
 {
   value_range vr (type,
           wi::to_wide (build_int_cst (type, vf)),
--
2.34.1



Re: [PATCH v2] match.pd: rewrite select to branchless expression

2022-12-01 Thread Michael Collison

Richard,

Can you submit this patch for me while I sort out git write access?

On 11/18/22 07:57, Richard Biener wrote:

On Fri, Nov 11, 2022 at 3:28 AM Michael Collison  wrote:

This patches transforms ((x & 0x1) == 0) ? y : z  y -into
(-(typeof(y))(x & 0x1) & z)  y, where op is a '^' or a '|'. It also
transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x ,
0x1)) & z ) op y.

Matching this patterns allows GCC to generate branchless code for one of
the functions in coremark.

Bootstrapped and tested on x86 and RISC-V. Okay?

OK.

Thanks,
Richard.


Michael.

2022-11-10  Michael Collison  

  * match.pd ((x & 0x1) == 0) ? y : z  y
  -> (-(typeof(y))(x & 0x1) & z)  y.

2022-11-10  Michael Collison 

  * gcc.dg/tree-ssa/branchless-cond.c: New test.

---

Changes in v2:

- Rewrite comment to use C syntax

- Guard against 1-bit types

- Simplify pattern by using zero_one_valued_p

   gcc/match.pd  | 24 +
   .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
   2 files changed, 50 insertions(+)
   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..258531e9046 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3486,6 +3486,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
 (max @2 @1))

+/* ((x & 0x1) == 0) ? y : z  y -> (-(typeof(y))(x & 0x1) & z)  y */
+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (eq zero_one_valued_p@0
+integer_zerop)
+@1
+(op:c @2 @1))
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > 1
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type @0)) @2) @1
+
+/* ((x & 0x1) == 0) ? z  y : y -> (-(typeof(y))(x & 0x1) & z)  y */
+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (ne zero_one_valued_p@0
+integer_zerop)
+   (op:c @2 @1)
+@1)
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > 1
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type @0)) @2) @1
+
   /* Simplifications of shift and rotates.  */

   (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
new file mode 100644
index 000..68087ae6568
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}
+
+/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
--
2.34.1



Re: [PATCH v2] match.pd: rewrite select to branchless expression

2022-11-11 Thread Michael Collison

Hi Prathamesh,

It is my understanding that INTEGRAL_TYPE_P applies to the other integer 
types you mentioned (chart, short, long). In fact the test function that 
motivated this match has a mixture of char and short and does not 
restrict matching.


On 11/11/22 02:44, Prathamesh Kulkarni wrote:

On Fri, 11 Nov 2022 at 07:58, Michael Collison  wrote:

This patches transforms ((x & 0x1) == 0) ? y : z  y -into
(-(typeof(y))(x & 0x1) & z)  y, where op is a '^' or a '|'. It also
transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x ,
0x1)) & z ) op y.

Matching this patterns allows GCC to generate branchless code for one of
the functions in coremark.

Bootstrapped and tested on x86 and RISC-V. Okay?

Michael.

2022-11-10  Michael Collison  

  * match.pd ((x & 0x1) == 0) ? y : z  y
  -> (-(typeof(y))(x & 0x1) & z)  y.

2022-11-10  Michael Collison 

  * gcc.dg/tree-ssa/branchless-cond.c: New test.

---

Changes in v2:

- Rewrite comment to use C syntax

- Guard against 1-bit types

- Simplify pattern by using zero_one_valued_p

   gcc/match.pd  | 24 +
   .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
   2 files changed, 50 insertions(+)
   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..258531e9046 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3486,6 +3486,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
 (max @2 @1))

+/* ((x & 0x1) == 0) ? y : z  y -> (-(typeof(y))(x & 0x1) & z)  y */
+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (eq zero_one_valued_p@0
+integer_zerop)
+@1
+(op:c @2 @1))
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > 1
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type @0)) @2) @1
+
+/* ((x & 0x1) == 0) ? z  y : y -> (-(typeof(y))(x & 0x1) & z)  y */
+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (ne zero_one_valued_p@0
+integer_zerop)
+   (op:c @2 @1)
+@1)
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > 1
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type @0)) @2) @1
+
   /* Simplifications of shift and rotates.  */

   (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
new file mode 100644
index 000..68087ae6568
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}

Sorry to nitpick -- Since the pattern gates on INTEGRAL_TYPE_P, would
it be a good idea
to have these tests for other integral types too besides int like
{char, short, long} ?

Thanks,
Prathamesh

+
+/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
--
2.34.1



[PATCH v2] match.pd: rewrite select to branchless expression

2022-11-10 Thread Michael Collison
This patches transforms ((x & 0x1) == 0) ? y : z  y -into 
(-(typeof(y))(x & 0x1) & z)  y, where op is a '^' or a '|'. It also 
transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x , 
0x1)) & z ) op y.


Matching this patterns allows GCC to generate branchless code for one of 
the functions in coremark.


Bootstrapped and tested on x86 and RISC-V. Okay?

Michael.

2022-11-10  Michael Collison  

    * match.pd ((x & 0x1) == 0) ? y : z  y
    -> (-(typeof(y))(x & 0x1) & z)  y.

2022-11-10  Michael Collison 

    * gcc.dg/tree-ssa/branchless-cond.c: New test.

---

Changes in v2:

- Rewrite comment to use C syntax

- Guard against 1-bit types

- Simplify pattern by using zero_one_valued_p

 gcc/match.pd  | 24 +
 .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
 2 files changed, 50 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..258531e9046 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3486,6 +3486,30 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
   (max @2 @1))
 
+/* ((x & 0x1) == 0) ? y : z  y -> (-(typeof(y))(x & 0x1) & z)  y */

+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (eq zero_one_valued_p@0
+integer_zerop)
+@1
+(op:c @2 @1))
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > 1
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type @0)) @2) @1
+
+/* ((x & 0x1) == 0) ? z  y : y -> (-(typeof(y))(x & 0x1) & z)  y */
+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (ne zero_one_valued_p@0
+integer_zerop)
+   (op:c @2 @1)
+@1)
+  (if (INTEGRAL_TYPE_P (type)
+   && TYPE_PRECISION (type) > 1
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type @0)) @2) @1
+
 /* Simplifications of shift and rotates.  */
 
 (for rotate (lrotate rrotate)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
new file mode 100644
index 000..68087ae6568
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}
+
+/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
--
2.34.1



Re: [PATCH] match.pd: rewrite select to branchless expression

2022-11-09 Thread Michael Collison

Richard,

Thanks for your feedback. I want to make sure I am following what you 
are recommending. Are you suggesting changing:


(for op (bit_xor bit_ior)
(simplify
(cond (eq (bit_and @0 integer_onep@1)
integer_zerop)
@2
(op:c @3 @2))
(if (INTEGRAL_TYPE_P (type)
&& (INTEGRAL_TYPE_P (TREE_TYPE (@0
(op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2


to

(for op (bit_xor bit_ior)
 (simplify
  (cond (eq zero_one_valued_p@0
    integer_zerop)
    @1
    (op:c @2 @1))
  (if (INTEGRAL_TYPE_P (type)
   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
   (op (bit_and (negate (convert:type (bit_and @0 { build_one_cst 
(type); }))) @2) @1



On 11/9/22 02:41, Richard Biener wrote:

On Tue, Nov 8, 2022 at 9:02 PM Michael Collison  wrote:

This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into
(-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also
transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x ,
0x1)) & z ) op y.

Matching this patterns allows GCC to generate branchless code for one of
the functions in coremark.

Bootstrapped and tested on x86 and RISC-V. Okay?

Michael.

2022-11-08  Michael Collison  

  * match.pd ((cond (and (x , 0x1) == 0), y, (z op y) )
  -> (-(and (x , 0x1)) & z ) op y)

2022-11-08  Michael Collison  

  * gcc.dg/tree-ssa/branchless-cond.c: New test.

---
   gcc/match.pd  | 22 
   .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
   2 files changed, 48 insertions(+)
   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..722f517ac6d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3486,6 +3486,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
 (max @2 @1))

+/* (cond (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z )
^ y */

Please write the match as a C expression in the comment, as present
it's a weird mix.  So x & 0x1 == 0 ? y : z  y -> (-(typeof(y))(x &
0x1) & z)  y


+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (eq (bit_and @0 integer_onep@1)
+integer_zerop)
+@2
+(op:c @3 @2))
+  (if (INTEGRAL_TYPE_P (type)
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2

Since you are literally keeping (bit_and @0 @1) and not matching @0 with
anything I suspect you could instead use

  (simplify (cond (eq zero_one_valued_p@0 integer_zerop) ...

eventually extending that to cover bit_and with one.  Do you need to guard
this against 'type' being a signed/unsigned 1-bit precision integer?


+
+/* (cond (and (x , 0x1) != 0), (z ^ y), y ) -> (-(and (x , 0x1)) & z )
^ y */
+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (ne (bit_and @0 integer_onep@1)
+integer_zerop)
+(op:c @3 @2)
+@2)
+  (if (INTEGRAL_TYPE_P (type)
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2
+
   /* Simplifications of shift and rotates.  */

   (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
new file mode 100644
index 000..68087ae6568
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}
+
+/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
--
2.34.1






[PATCH] match.pd: rewrite select to branchless expression

2022-11-08 Thread Michael Collison
This patches transforms (cond (and (x , 0x1) == 0), y, (z op y)) into 
(-(and (x , 0x1)) & z ) op y, where op is a '^' or a '|'. It also 
transforms (cond (and (x , 0x1) != 0), (z op y), y ) into (-(and (x , 
0x1)) & z ) op y.


Matching this patterns allows GCC to generate branchless code for one of 
the functions in coremark.


Bootstrapped and tested on x86 and RISC-V. Okay?

Michael.

2022-11-08  Michael Collison  

    * match.pd ((cond (and (x , 0x1) == 0), y, (z op y) )
    -> (-(and (x , 0x1)) & z ) op y)

2022-11-08  Michael Collison  

    * gcc.dg/tree-ssa/branchless-cond.c: New test.

---
 gcc/match.pd  | 22 
 .../gcc.dg/tree-ssa/branchless-cond.c | 26 +++
 2 files changed, 48 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 194ba8f5188..722f517ac6d 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3486,6 +3486,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
   (max @2 @1))

+/* (cond (and (x , 0x1) == 0), y, (z ^ y) ) -> (-(and (x , 0x1)) & z ) 
^ y */

+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (eq (bit_and @0 integer_onep@1)
+    integer_zerop)
+    @2
+    (op:c @3 @2))
+  (if (INTEGRAL_TYPE_P (type)
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2
+
+/* (cond (and (x , 0x1) != 0), (z ^ y), y ) -> (-(and (x , 0x1)) & z ) 
^ y */

+(for op (bit_xor bit_ior)
+ (simplify
+  (cond (ne (bit_and @0 integer_onep@1)
+    integer_zerop)
+    (op:c @3 @2)
+    @2)
+  (if (INTEGRAL_TYPE_P (type)
+   && (INTEGRAL_TYPE_P (TREE_TYPE (@0
+   (op (bit_and (negate (convert:type (bit_and @0 @1))) @3) @2
+
 /* Simplifications of shift and rotates.  */

 (for rotate (lrotate rrotate)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c 
b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c

new file mode 100644
index 000..68087ae6568
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/branchless-cond.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int f1(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z ^ y;
+}
+
+int f2(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z ^ y : y;
+}
+
+int f3(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) == 0) ? y : z | y;
+}
+
+int f4(unsigned int x, unsigned int y, unsigned int z)
+{
+  return ((x & 1) != 0) ? z | y : y;
+}
+
+/* { dg-final { scan-tree-dump-times " -" 4 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " & " 8 "optimized" } } */
+/* { dg-final { scan-tree-dump-not "if" "optimized" } } */
--
2.34.1






Re: [PATCH 1/4] Clean up of new format of -falign-FOO.

2018-07-17 Thread Michael Collison
Hi Martin,

Your alignment patch breaks the arm port. In the file arm.c, function 
'get_label_padding' the code uses:

static HOST_WIDE_INT
get_label_padding (rtx label)
{
  HOST_WIDE_INT align, min_insn_size;

  align = 1 << label_to_alignment (label);
  min_insn_size = TARGET_THUMB ? 2 : 4;
  return align > min_insn_size ? align - min_insn_size : 0;
}

Which breaks with your current change. I think this needs to be modified to:

'align = 1 << label_to_alignment (label).levels[0].log'

Regards,

Michael Collison



[PING][PATCH][Aarch64] v2: Arithmetic overflow addv patterns [Patch 2/4]

2018-07-11 Thread Michael Collison
Ping. Last patch here:

https://gcc.gnu.org/ml/gcc-patches/2018-06/msg00735.html



RE: [PATCH][Aarch64] v2: Arithmetic overflow subv patterns [Patch 3/4]

2018-06-13 Thread Michael Collison
Updated previous patch:

https://gcc.gnu.org/ml/gcc-patches/2018-06/msg00508.html

With coding style feedback from Richard Sandiford: (that also apply to this 
patch)

 https://gcc.gnu.org/ml/gcc-patches/2018-06/msg00508.html

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

2018-05-31  Michael Collison  
Richard Henderson 

* config/aarch64/aarch64.md (subv4, usubv4): New patterns.
(subti): Handle op1 zero.
(subvti4, usub4ti4): New.
(*sub3_compare1_imm): New.
(sub3_carryinCV): New.
(*sub3_carryinCV_z1_z2, *sub3_carryinCV_z1): New.
(*sub3_carryinCV_z2, *sub3_carryinCV): New.






gnutools-6308-pt3.patch
Description: gnutools-6308-pt3.patch


RE: [PATCH][Aarch64] v2: Arithmetic overflow addv patterns [Patch 2/4]

2018-06-13 Thread Michael Collison
Updated with Richard's style and mismatched mode comments.

Okay for trunk?

-Original Message-
From: Richard Sandiford  
Sent: Monday, June 11, 2018 11:47 AM
To: Michael Collison 
Cc: James Greenhalgh ; GCC Patches 
; nd 
Subject: Re: [PATCH][Aarch64] v2: Arithmetic overflow addv patterns [Patch 2/4]

Michael Collison  writes:
> +(define_expand "uaddv4"
> +  [(match_operand:GPI 0 "register_operand")
> +   (match_operand:GPI 1 "register_operand")
> +   (match_operand:GPI 2 "register_operand")
> +   (label_ref (match_operand 3 "" ""))]
> +  ""
> +{
> +  emit_insn (gen_add3_compareC (operands[0], operands[1], 
> +operands[2]));
> +  aarch64_gen_unlikely_cbranch (NE, CC_Cmode, operands[3]);
> +
> +  DONE;
> +})
> +
> +

Nit: stray extra line.

>  (define_expand "addti3"
>[(set (match_operand:TI 0 "register_operand" "")
>   (plus:TI (match_operand:TI 1 "register_operand" "")
> -  (match_operand:TI 2 "register_operand" "")))]
> +  (match_operand:TI 2 "aarch64_reg_or_imm" "")))]
>""
>  {
> -  rtx low = gen_reg_rtx (DImode);
> -  emit_insn (gen_adddi3_compareC (low, gen_lowpart (DImode, operands[1]),
> -   gen_lowpart (DImode, operands[2])));
> +  rtx low_dest,op1_low,op2_low,high_dest,op1_high,op2_high;

Spaces after commas (sorry)

[...]

> @@ -1837,10 +1946,70 @@
>[(set_attr "type" "alus_sreg")]
>  )
>  
> +;; Note that since we're sign-extending, match the immediate in GPI 
> +;; rather than in DWI.  Since CONST_INT is modeless, this works fine.
> +(define_insn "*add3_compareV_cconly_imm"
> +  [(set (reg:CC_V CC_REGNUM)
> + (compare:CC_V
> +   (plus:
> + (sign_extend: (match_operand:GPI 0 "register_operand" "r,r"))
> + (match_operand:GPI 1 "aarch64_plus_immediate" "I,J"))
> +   (sign_extend: (plus:GPI (match_dup 0) (match_dup 1)]

Real reason for replying is: this is a neat trick, but I think it's only an 
accident that genrecog doesn't reject the mode on operand 1.
PLUSes can't have mismatched modes since since rtl doesn't have the sign 
information to do a conversion.

IMO we should use a similar structure to the zero_extends:

  (plus:
(zero_extend: (match_operand:GPI 0 "register_operand" "r,r"))
(match_operand: 1 "const_scalar_int_operand" ""))
  (sign_extend:
(plus:GPI
  (match_dup 0)
  (match_operand:GPI 2 "aarch64_plus_immediate" "I,J")

but with the simpler check:

  INTVAL (operands[1]) == INTVAL (operands[2])

Thanks,
Richard


gnutools-6308-pt2.patch
Description: gnutools-6308-pt2.patch


RE: [PATCH][Aarch64] v2: Arithmetic overflow subv patterns [Patch 3/4]

2018-06-08 Thread Michael Collison
All requested changes made:

- label_ref added as operand 3
- more descriptive variable names used

Okay for trunk?

-Original Message-
From: James Greenhalgh  
Sent: Thursday, June 7, 2018 5:30 PM
To: Michael Collison 
Cc: GCC Patches ; nd 
Subject: Re: [PATCH][Aarch64] v2: Arithmetic overflow subv patterns [Patch 3/4]

On Wed, Jun 06, 2018 at 12:19:52PM -0500, Michael Collison wrote:
> This is a respin of a AArch64 patch that adds support for builtin arithmetic 
> overflow operations. This update separates the patch into multiple pieces and 
> addresses comments made by Richard Earnshaw here:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00249.html
> 
> Original patch and motivation for patch here:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01512.html
> 
> This patch contains new patterns for subv overflow patterns.

>  
> +(define_expand "subv4"
> +  [(match_operand:GPI 0 "register_operand")
> +   (match_operand:GPI 1 "aarch64_reg_or_zero")
> +   (match_operand:GPI 2 "aarch64_reg_or_zero")
> +   (match_operand 3 "")]
> +

As in the previous patch I'd prefer to have the predicate showing this needs a 
label, even if it is not used for validation.

Likewise on the variable names.

Otherwise, this is OK, but aghain I'd appreciate more eyes on the patterns.

Thanks,
James

> 
> Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?
> 
> 2018-05-31  Michael Collison  
>   Richard Henderson 
> 
>   * config/aarch64/aarch64.md (subv4, usubv4): New patterns.
>   (subti): Handle op1 zero.
>   (subvti4, usub4ti4): New.
>   (*sub3_compare1_imm): New.
>   (sub3_carryinCV): New.
>   (*sub3_carryinCV_z1_z2, *sub3_carryinCV_z1): New.
>   (*sub3_carryinCV_z2, *sub3_carryinCV): New.
> 




RE: [PATCH][Aarch64] v2: Arithmetic overflow addv patterns [Patch 2/4]

2018-06-08 Thread Michael Collison
All requested changes made:

- label_ref added as operand 3
- more meaningful names given to variables

Okay for trunk?
-Original Message-
From: James Greenhalgh  
Sent: Thursday, June 7, 2018 5:29 PM
To: Michael Collison 
Cc: GCC Patches ; nd 
Subject: Re: [PATCH][Aarch64] v2: Arithmetic overflow addv patterns [Patch 2/4]

On Wed, Jun 06, 2018 at 12:16:22PM -0500, Michael Collison wrote:
> This is a respin of a AArch64 patch that adds support for builtin arithmetic 
> overflow operations. This update separates the patch into multiple pieces and 
> addresses comments made by Richard Earnshaw here:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00249.html
> 
> Original patch and motivation for patch here:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01512.html
> 
> This patch contains new patterns for addv overflow patterns.
> 
> Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

> +(define_expand "addv4"
> +  [(match_operand:GPI 0 "register_operand")
> +   (match_operand:GPI 1 "register_operand")
> +   (match_operand:GPI 2 "register_operand")
> +   (match_operand 3 "")]
> +  ""

It won't be validated; but I'd prefer us to add the constraint on the label so 
this code is self-documenting. It would have saved me a trip to the manual to 
understand operand 3.

>  (define_expand "addti3"
>[(set (match_operand:TI 0 "register_operand" "")
>   (plus:TI (match_operand:TI 1 "register_operand" "")
> -  (match_operand:TI 2 "register_operand" "")))]
> +  (match_operand:TI 2 "aarch64_reg_or_imm" "")))]
>""
>  {
> -  rtx low = gen_reg_rtx (DImode);
> -  emit_insn (gen_adddi3_compareC (low, gen_lowpart (DImode, operands[1]),
> -   gen_lowpart (DImode, operands[2])));
> +  rtx l0,l1,l2,h0,h1,h2;

Let's give these slightly meaningful names please. dest_high, dest_low, 
op1_high, etc.

Other than these two comments, I think this is OK.

There are some subtleties in here though that I've probably missed, so I 
wouldn't say no to a second pair of eyes.

Thanks,
James


> 
> 
> 2018-05-31  Michael Collison  
>   Richard Henderson 
> 
>   * config/aarch64/aarch64.md: (addv4, uaddv4): New.
>   (addti3): Create simpler code if low part is already known to be 0.
>   (addvti4, uaddvti4): New.
>   (*add3_compareC_cconly_imm): New.
>   (*add3_compareC_cconly): New.
>   (*add3_compareC_imm): New.
>   (*add3_compareC): Rename from add3_compare1; do not
>   handle constants within this pattern..
>   (*add3_compareV_cconly_imm): New.
>   (*add3_compareV_cconly): New.
>   (*add3_compareV_imm): New.
>   (add3_compareV): New.
>   (add3_carryinC, add3_carryinV): New.
>   (*add3_carryinC_zero, *add3_carryinV_zero): New.
>   (*add3_carryinC, *add3_carryinV): New.
>   ((*add3_compareC_cconly_imm): Replace 'ne' operator
>   with 'comparison' operator.
>   (*add3_compareV_cconly_imm): Ditto.
>   (*add3_compareV_cconly): Ditto.
>   (*add3_compareV_imm): Ditto.
>   (add3_compareV): Ditto.
>   (add3_carryinC): Ditto.
>   (*add3_carryinC_zero): Ditto.
>   (*add3_carryinC): Ditto.
>   (add3_carryinV): Ditto.
>   (*add3_carryinV_zero): Ditto.
>   (*add3_carryinV): Ditto.




gnutools-6308-pt2.patch
Description: gnutools-6308-pt2.patch


RE: [PATCH][Aarch64] v2: Arithmetic overflow common functions [Patch 1/4]

2018-06-08 Thread Michael Collison
Patch updated as requested:

- name changed from 'aarch64_add_128bit_scratch_regs' to 
'aarch64_addti_scratch_regs'
- name changed from 'aarch64_subv_128bit_scratch_reg's to ' 
aarch64_subvti_scratch_regs'

I did not find any helper function to replace ' aarch64_gen_unlikely_cbranch'.

Okay for trunk?


-Original Message-
From: James Greenhalgh  
Sent: Thursday, June 7, 2018 5:19 PM
To: Michael Collison 
Cc: GCC Patches ; nd 
Subject: Re: [PATCH][Aarch64] v2: Arithmetic overflow common functions [Patch 
1/4]

On Wed, Jun 06, 2018 at 12:14:03PM -0500, Michael Collison wrote:
> This is a respin of a AArch64 patch that adds support for builtin arithmetic 
> overflow operations. This update separates the patch into multiple pieces and 
> addresses comments made by Richard Earnshaw here:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00249.html
> 
> Original patch and motivation for patch here:
> 
> https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01512.html
> 
> This patch primarily contains common functions in aarch64.c for 
> generating TImode scratch registers, and common rtl functions utilized by the 
> overflow patterns in aarch64.md. In addition a new mode representing overflow 
> CC_Vmode is introduced.
> 
> Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

Normally it is preferred that each patch in a series stands independent of the 
others. So if I apply just 1/4 I should get a working toolchain. You have some 
dependencies here between 1/4 and 3/4.

Rather than ask you to rework these patches, I think I'll instead ask you to 
squash them all to a single commit after we're done with review. That will save 
you some rebase work and maintain the property that trunk can be built at most 
revisions.


> (aarch64_add_128bit_scratch_regs): Declare
> (aarch64_subv_128bit_scratch_regs): Declare.

Why use 128bit in the function name rather than call it 
aarch64_subvti_scratch_regs ?


> @@ -16337,6 +16353,131 @@ aarch64_split_dimode_const_store (rtx dst, rtx src)
>return true;
>  }
>  
> +/* Generate RTL for a conditional branch with rtx comparison CODE in
> +   mode CC_MODE.  The destination of the unlikely conditional branch
> +   is LABEL_REF.  */
> +
> +void
> +aarch64_gen_unlikely_cbranch (enum rtx_code code, machine_mode cc_mode,
> +   rtx label_ref)
> +{
> +  rtx x;
> +  x = gen_rtx_fmt_ee (code, VOIDmode,
> +   gen_rtx_REG (cc_mode, CC_REGNUM),
> +   const0_rtx);
> +
> +  x = gen_rtx_IF_THEN_ELSE (VOIDmode, x,
> + gen_rtx_LABEL_REF (VOIDmode, label_ref),
> + pc_rtx);
> +  aarch64_emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); }
> +

I'm a bit surprised this is AArh64 specific and there are no helper functions 
to get you here. Not that it should block the patch;l but if we can reuse 
something I'd prefer we did.

> +void
> +aarch64_expand_subvti (rtx op0, rtx low_dest, rtx low_in1,
> +rtx low_in2, rtx high_dest, rtx high_in1,
> +rtx high_in2)
> +{
> +  if (low_in2 == const0_rtx)
> +{
> +  low_dest = low_in1;
> +  emit_insn (gen_subdi3_compare1 (high_dest, high_in1,
> +   force_reg (DImode, high_in2)));
> +}
> +  else
> +{
> +  if (CONST_INT_P (low_in2))
> + {
> +   low_in2 = force_reg (DImode, GEN_INT (-UINTVAL (low_in2)));
> +   high_in2 = force_reg (DImode, high_in2);
> +   emit_insn (gen_adddi3_compareC (low_dest, low_in1, low_in2));
> + }
> +  else
> + emit_insn (gen_subdi3_compare1 (low_dest, low_in1, low_in2));
> +  emit_insn (gen_subdi3_carryinCV (high_dest,
> +force_reg (DImode, high_in1),
> +high_in2));

This is where we'd break the build. gen_subdi3_carryinCV isn't defined until 
3/4.

The above points are minor.

This patch is OK with them cleaned up, once I've reviewed the other 3 parts to 
this series.

James

> 
> 2018-05-31  Michael Collison  
> Richard Henderson 
> 
> * config/aarch64/aarch64-modes.def (CC_V): New.
> * config/aarch64/aarch64-protos.h
> (aarch64_add_128bit_scratch_regs): Declare
> (aarch64_subv_128bit_scratch_regs): Declare.
> (aarch64_expand_subvti): Declare.
> (aarch64_gen_unlikely_cbranch): Declare
> * config/aarch64/aarch64.c (aarch64_select_cc_mode): Test for signed 
> overflow using CC_Vmode.
> (aarch64_get_condition_code_1): Handle CC_Vmode.
> (aarch64_gen_unlikely_cbranch): New function.
> (aarch64_add_128bit_scratch_regs): New function.
> (aarch64_subv_128bit_scratch_regs): New function.
> (aarch64_expand_subvti): New function.




gnutools-6308-pt1.patch
Description: gnutools-6308-pt1.patch


[PATCH][Aarch64] v2: Arithmetic overflow tests [Patch 4/4]

2018-06-06 Thread Michael Collison
This is a respin of a AArch64 patch that adds support for builtin arithmetic 
overflow operations. This update separates the patch into multiple pieces and 
addresses comments made by Richard Earnshaw here:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00249.html

Original patch and motivation for patch here:

https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01512.html

This patch contains new test cases to verify that the new overflow patterns are 
being utilized.

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

2018-05-31  Michael Collison  
Richard Henderson 

* gcc.target/aarch64/builtin_sadd_128.c: New testcase.
* gcc.target/aarch64/builtin_saddl.c: New testcase.
* gcc.target/aarch64/builtin_saddll.c: New testcase.
* gcc.target/aarch64/builtin_uadd_128.c: New testcase.
* gcc.target/aarch64/builtin_uaddl.c: New testcase.
* gcc.target/aarch64/builtin_uaddll.c: New testcase.
* gcc.target/aarch64/builtin_ssub_128.c: New testcase.
* gcc.target/aarch64/builtin_ssubl.c: New testcase.
* gcc.target/aarch64/builtin_ssubll.c: New testcase.
* gcc.target/aarch64/builtin_usub_128.c: New testcase.
* gcc.target/aarch64/builtin_usubl.c: New testcase.
* gcc.target/aarch64/builtin_usubll.c: New testcase.


gnutools-6308-pt4.patch
Description: gnutools-6308-pt4.patch


[PATCH][Aarch64] v2: Arithmetic overflow subv patterns [Patch 3/4]

2018-06-06 Thread Michael Collison
This is a respin of a AArch64 patch that adds support for builtin arithmetic 
overflow operations. This update separates the patch into multiple pieces and 
addresses comments made by Richard Earnshaw here:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00249.html

Original patch and motivation for patch here:

https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01512.html

This patch contains new patterns for subv overflow patterns.

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

2018-05-31  Michael Collison  
Richard Henderson 

* config/aarch64/aarch64.md (subv4, usubv4): New patterns.
(subti): Handle op1 zero.
(subvti4, usub4ti4): New.
(*sub3_compare1_imm): New.
(sub3_carryinCV): New.
(*sub3_carryinCV_z1_z2, *sub3_carryinCV_z1): New.
(*sub3_carryinCV_z2, *sub3_carryinCV): New.



gnutools-6308-pt3.patch
Description: gnutools-6308-pt3.patch


[PATCH][Aarch64] v2: Arithmetic overflow addv patterns [Patch 2/4]

2018-06-06 Thread Michael Collison
This is a respin of a AArch64 patch that adds support for builtin arithmetic 
overflow operations. This update separates the patch into multiple pieces and 
addresses comments made by Richard Earnshaw here:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00249.html

Original patch and motivation for patch here:

https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01512.html

This patch contains new patterns for addv overflow patterns.

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?


2018-05-31  Michael Collison  
Richard Henderson 

* config/aarch64/aarch64.md: (addv4, uaddv4): New.
(addti3): Create simpler code if low part is already known to be 0.
(addvti4, uaddvti4): New.
(*add3_compareC_cconly_imm): New.
(*add3_compareC_cconly): New.
(*add3_compareC_imm): New.
(*add3_compareC): Rename from add3_compare1; do not
handle constants within this pattern..
(*add3_compareV_cconly_imm): New.
(*add3_compareV_cconly): New.
(*add3_compareV_imm): New.
(add3_compareV): New.
(add3_carryinC, add3_carryinV): New.
(*add3_carryinC_zero, *add3_carryinV_zero): New.
(*add3_carryinC, *add3_carryinV): New.
((*add3_compareC_cconly_imm): Replace 'ne' operator
with 'comparison' operator.
(*add3_compareV_cconly_imm): Ditto.
(*add3_compareV_cconly): Ditto.
(*add3_compareV_imm): Ditto.
(add3_compareV): Ditto.
(add3_carryinC): Ditto.
(*add3_carryinC_zero): Ditto.
(*add3_carryinC): Ditto.
(add3_carryinV): Ditto.
(*add3_carryinV_zero): Ditto.
(*add3_carryinV): Ditto.


gnutools-6308-pt2.patch
Description: gnutools-6308-pt2.patch


[PATCH][Aarch64] v2: Arithmetic overflow common functions [Patch 1/4]

2018-06-06 Thread Michael Collison
This is a respin of a AArch64 patch that adds support for builtin arithmetic 
overflow operations. This update separates the patch into multiple pieces and 
addresses comments made by Richard Earnshaw here:

https://gcc.gnu.org/ml/gcc-patches/2017-07/msg00249.html

Original patch and motivation for patch here:

https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01512.html

This patch primarily contains common functions in aarch64.c for generating 
TImode scratch registers,
and common rtl functions utilized by the overflow patterns in aarch64.md. In 
addition a new mode representing overflow CC_Vmode is introduced.

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

2018-05-31  Michael Collison  
Richard Henderson 

* config/aarch64/aarch64-modes.def (CC_V): New.
* config/aarch64/aarch64-protos.h
(aarch64_add_128bit_scratch_regs): Declare
(aarch64_subv_128bit_scratch_regs): Declare.
(aarch64_expand_subvti): Declare.
(aarch64_gen_unlikely_cbranch): Declare
* config/aarch64/aarch64.c (aarch64_select_cc_mode): Test
for signed overflow using CC_Vmode.
(aarch64_get_condition_code_1): Handle CC_Vmode.
(aarch64_gen_unlikely_cbranch): New function.
(aarch64_add_128bit_scratch_regs): New function.
(aarch64_subv_128bit_scratch_regs): New function.
(aarch64_expand_subvti): New function.


gnutools-6308-pt1.patch
Description: gnutools-6308-pt1.patch


[PATCH 2/2][Aarch64] Improve FP to int conversions

2018-05-18 Thread Michael Collison
This patch improves additional cases of FP to integer conversions with 
-ffast-math enabled.

Example 1:

double
f5 (int x)
{
  return (double)(float) x;
}


At -O2 with -ffast-math

Trunk generates:

f5:
scvtf   s0, w0
fcvtd0, s0
ret


With the patch we can merge the conversion to float and float-extend and reduce 
the sequence to one instruction at -O2 and -ffast-math

f5:
scvtf   d0, w0
ret

Example 2

int
f6 (double x)
{
  return (int)(float) x;
}


At -O2 (even with -ffast-math) trunk generates

f6:
fcvts0, d0
fcvtzs  w0, s0
ret

We can merge the float_truncate into the fix at the rtl level

With -ffast-math enabled and -O2 we can now generate:

f6:
fcvtzs  w0, d0
ret

Bootstrapped and regression tested on aarch64-linux-gnu. Okay for trunk?

2018-05-15  Michael Collison  <michael.colli...@arm.com>

* config/aarch64/aarch64.md:
(*df2): New pattern.
(truncdfsf_2: New pattern.
(*fix_to_sign_extenddi2): Ditto.
* gcc.target/aarch64/float_int_conv.c: New testcase.


gnutools-6527-pt2.patch
Description: gnutools-6527-pt2.patch


[PATCH 1/2][Aarch64] Improve FP to int conversions

2018-05-18 Thread Michael Collison
This patch improves additional cases of FP to integer conversions.

Example 1:

unsigned long
f7 (double x)
{
  return (unsigned) y;
}


At -O2

Trunk generates:

f7:
fcvtzu  w0, d0
uxtwx0, w0
ret

With the patch we can merge the zero-extend and reduce the sequence to one 
instruction at -O2

f7:
fcvtzu  x0, d0
ret

Bootstrapped and regression tested on aarch64-linux-gnu. Okay for trunk?

2018-05-15  Michael Collison  <michael.colli...@arm.com>

* config/aarch64/aarch64.md:
(*fix_to_zero_extenddfdi2): New pattern.
* gcc.target/aarch64/fix_extend1.c: New testcase.


gnutools-6527-pt1.patch
Description: gnutools-6527-pt1.patch


[Arm] GCC crash in cprop_hardreg when targeting v8-A Thumb

2018-02-03 Thread Michael Collison
This patches fixes a bug affecting two patterns in arm/thumb2.md where the 
split condition was insufficient and allowed illegal rtl to be generated. The 
split condition for patterns "*thumb2_mov_negscc "and "*thumb_mov_notscc " 
allowed splitting that ignored "arm_restrict_it". This was causing illegal rtl 
to be generated for IT blocks which in turn caused an internal error.

Bootstrapped and regression tested on arm-linux-gnueabihf. Okay for trunk?


2018-01-28  Michael Collison  <michael.colli...@arm.com>

* config/arm/thumb2.md:
(*thumb2_mov_negscc): Split only if TARGET_THUMB2 && !arm_restrict_it.
(*thumb_mov_notscc): Ditto.
* gcc.target/arm/pr7676.c: New testcase.



gnutools-7676.patch
Description: gnutools-7676.patch


RE: [PATCH 5/5][AArch64] fp16fml support

2018-01-10 Thread Michael Collison
Okay will put on my to-do list for post GCC 8.

-Original Message-
From: James Greenhalgh [mailto:james.greenha...@arm.com] 
Sent: Wednesday, January 10, 2018 12:21 PM
To: Michael Collison <michael.colli...@arm.com>
Cc: Richard Sandiford <richard.sandif...@linaro.org>; GCC Patches 
<gcc-patches@gcc.gnu.org>; nd <n...@arm.com>
Subject: Re: [PATCH 5/5][AArch64] fp16fml support

On Tue, Jan 09, 2018 at 06:28:09PM +, Michael Collison wrote:
> Patch updated per Richard's comments. Ok for trunk?

This patch adds a lot of code, much of which looks like it ought to be possible 
to common up using the iterators. I'm going to OK it as is, as I'd like to see 
this make GCC 8, and we've sat on it for long enough, but I would really 
appreciate futurec refactoring in this area. I'm worried about maintainability 
as it stands.

OK.

Thanks,
James

> 
> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@linaro.org]
> Sent: Thursday, January 4, 2018 8:02 AM
> To: Michael Collison <michael.colli...@arm.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>
> Subject: Re: [PATCH 5/5][AArch64] fp16fml support
> 
> Hi Michael,
> 
> Not a review of the full patch, just a comment about the patterns:
> 
> Michael Collison <michael.colli...@arm.com> writes:
> > +(define_expand "aarch64_fmll_lane_lowv2sf"
> > +  [(set (match_operand:V2SF 0 "register_operand" "")
> > +   (unspec:V2SF [(match_operand:V2SF 1 "register_operand" "")
> > +  (match_operand:V4HF 2 "register_operand" "")
> > +  (match_operand:V4HF 3 "register_operand" "")
> > +  (match_operand:SI 4 "aarch64_imm2" "")]
> > +VFMLA16_LOW))]
> > +  "TARGET_F16FML"
> > +{
> > +rtx p1 = aarch64_simd_vect_par_cnst_half (V4HFmode,
> > + GET_MODE_NUNITS (V4HFmode),
> > + false);
> > +rtx lane = GEN_INT (ENDIAN_LANE_N (GET_MODE_NUNITS (SImode), 
> > +INTVAL (operands[4])));
> 
> Please use the newly-introduced aarch64_endian_lane_rtx for this.
> 
> GET_MODE_NUNITS (SImode) doesn't seem right though, since that's always 1.
> Should it be using V4HFmode instead?
> 
> Same for the other patterns.
> 
> Thanks,
> Richard




RE: [PATCH 5/5][AArch64] fp16fml support

2018-01-09 Thread Michael Collison
Patch updated per Richard's comments. Ok for trunk?

-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@linaro.org] 
Sent: Thursday, January 4, 2018 8:02 AM
To: Michael Collison <michael.colli...@arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>
Subject: Re: [PATCH 5/5][AArch64] fp16fml support

Hi Michael,

Not a review of the full patch, just a comment about the patterns:

Michael Collison <michael.colli...@arm.com> writes:
> +(define_expand "aarch64_fmll_lane_lowv2sf"
> +  [(set (match_operand:V2SF 0 "register_operand" "")
> + (unspec:V2SF [(match_operand:V2SF 1 "register_operand" "")
> +(match_operand:V4HF 2 "register_operand" "")
> +(match_operand:V4HF 3 "register_operand" "")
> +(match_operand:SI 4 "aarch64_imm2" "")]
> +  VFMLA16_LOW))]
> +  "TARGET_F16FML"
> +{
> +rtx p1 = aarch64_simd_vect_par_cnst_half (V4HFmode,
> +   GET_MODE_NUNITS (V4HFmode),
> +   false);
> +rtx lane = GEN_INT (ENDIAN_LANE_N (GET_MODE_NUNITS (SImode), INTVAL 
> (operands[4])));

Please use the newly-introduced aarch64_endian_lane_rtx for this.

GET_MODE_NUNITS (SImode) doesn't seem right though, since that's always 1.
Should it be using V4HFmode instead?

Same for the other patterns.

Thanks,
Richard


fp16fml_up_v2.patch
Description: fp16fml_up_v2.patch


RE: [PATCH 1/5][AArch64] Crypto command line split

2018-01-09 Thread Michael Collison
I used a generic statement that applied to all five patches. The patch was 
bootstrapped and the test suite executed along with the other patches together. 
As you correctly point out there are no new instruction, but backward 
compatibility was tested as existing patterns had their pattern conditional 
statement changed from TARGET_CRYPTO to TARGET_AES.

-Original Message-
From: James Greenhalgh [mailto:james.greenha...@arm.com] 
Sent: Tuesday, January 9, 2018 10:44 AM
To: Michael Collison <michael.colli...@arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>; nd <n...@arm.com>
Subject: Re: [PATCH 1/5][AArch64] Crypto command line split

On Wed, Jan 03, 2018 at 05:21:27PM +, Michael Collison wrote:
> Hi all,
> 
> This patch adds two new command line options for the legacy 
> cryptographic extensions AES (+aes) and SHA-1/SHA-2 (+sha2). Backward 
> compatibility is retained by modifying the +crypto feature modifier to enable 
> +aes and +sha2.
> 
> Bootstrapped on aarch64-none-elf. Tested with new binutils and 
> verified all instructions assembly correctly.

I'm a bit confused by this testing statement. aarch64-none-elf is not a 
bootstrap target. Was this bootstrapped or only tested with a cross-compiler 
(or both)? Was the testsuite also run? I don't see any new instructions in this 
patch that would need a new binutils.

The patch is OK, but please clarify how it has been tested.

Thanks,
James



[PATCH 5/5][AArch64] fp16fml support

2018-01-03 Thread Michael Collison
Hi All,

This patch adds support for the FP16 multiply add/subtract instructions in 
Armv8.4-a.  Support for the new instructions is in the form of new ACLE 
intrinsics. A new command line feature modifier, +fp16fml, is added to enable 
the support. Enabling +fp16fml automatically enables +fp16.

Test cases were added to verify that the ACLE Intrinsics generate the 
appropriate FP16 multiply add/subtract assembly instructions.

Bootstrapped on aarch64-none-elf. Tested with new binutils and verified all 
instructions assembly correctly.

Okay for trunk?

2017-11-10  Michael Collison  <michael.colli...@arm.com>

* config/aarch64/aarch64-modes.def (V2HF): New VECTOR_MODE.
* config/aarch64/aarch64-option-extension.def: Add
AARCH64_OPT_EXTENSION of 'fp16fml'.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
(__ARM_FEATURE_FP16_FML): Define if TARGET_F16FML is true.
* config/aarch64/predicates.md (aarch64_lane_imm3): New predicate.
* config/aarch64/constraints.md (Ui7): New constraint.
* config/aarch64/iterators.md (VFMLA_W): New mode iterator.
(VFMLA_SEL_W): Ditto.
(f16quad): Ditto.
(f16mac1): Ditto.
(VFMLA16_LOW): New int iterator.
(VFMLA16_HIGH): Ditto.
(UNSPEC_FMLAL): New unspec.
(UNSPEC_FMLSL): Ditto.
(UNSPEC_FMLAL2): Ditto.
(UNSPEC_FMLSL2): Ditto.
(f16mac): New code attribute.
* config/aarch64/aarch64-simd-builtins.def
(aarch64_fmlal_lowv2sf): Ditto.
(aarch64_fmlsl_lowv2sf): Ditto.
(aarch64_fmlalq_lowv4sf): Ditto.
(aarch64_fmlslq_lowv4sf): Ditto.
(aarch64_fmlal_highv2sf): Ditto.
(aarch64_fmlsl_highv2sf): Ditto.
(aarch64_fmlalq_highv4sf): Ditto.
(aarch64_fmlslq_highv4sf): Ditto.
(aarch64_fmlal_lane_lowv2sf): Ditto.
(aarch64_fmlsl_lane_lowv2sf): Ditto.
(aarch64_fmlal_laneq_lowv2sf): Ditto.
(aarch64_fmlsl_laneq_lowv2sf): Ditto.
(aarch64_fmlalq_lane_lowv4sf): Ditto.
(aarch64_fmlsl_lane_lowv4sf): Ditto.
(aarch64_fmlalq_laneq_lowv4sf): Ditto.
(aarch64_fmlsl_laneq_lowv4sf): Ditto.
(aarch64_fmlal_lane_highv2sf): Ditto.
(aarch64_fmlsl_lane_highv2sf): Ditto.
(aarch64_fmlal_laneq_highv2sf): Ditto.
(aarch64_fmlsl_laneq_highv2sf): Ditto.
(aarch64_fmlalq_lane_highv4sf): Ditto.
(aarch64_fmlsl_lane_highv4sf): Ditto.
(aarch64_fmlalq_laneq_highv4sf): Ditto.
(aarch64_fmlsl_laneq_highv4sf): Ditto.
* config/aarch64/aarch64-simd.md:
(aarch64_fmll_low): New pattern.
(aarch64_fmll_high): Ditto.
(aarch64_simd_fmll_low): Ditto.
(aarch64_simd_fmll_high): Ditto.
(aarch64_fmll_lane_lowv2sf): Ditto.
(aarch64_fmll_lane_highv2sf): Ditto.
(aarch64_simd_fmll_lane_lowv2sf): Ditto.
(aarch64_simd_fmll_lane_highv2sf): Ditto.
(aarch64_fmllq_laneq_lowv4sf): Ditto.
(aarch64_fmllq_laneq_highv4sf): Ditto.
(aarch64_simd_fmllq_laneq_lowv4sf): Ditto.
(aarch64_simd_fmllq_laneq_highv4sf): Ditto.
(aarch64_fmll_laneq_lowv2sf): Ditto.
(aarch64_fmll_laneq_highv2sf): Ditto.
(aarch64_simd_fmll_laneq_lowv2sf): Ditto.
(aarch64_simd_fmll_laneq_highv2sf): Ditto.
(aarch64_fmllq_lane_lowv4sf): Ditto.
(aarch64_fmllq_lane_highv4sf): Ditto.
(aarch64_simd_fmllq_lane_lowv4sf): Ditto.
(aarch64_simd_fmllq_lane_highv4sf): Ditto.
* config/aarch64/arm_neon.h (vfmlal_low_u32): New intrinsic.
(vfmlsl_low_u32): Ditto.
(vfmlalq_low_u32): Ditto.
(vfmlslq_low_u32): Ditto.
(vfmlal_high_u32): Ditto.
(vfmlsl_high_u32): Ditto.
(vfmlalq_high_u32): Ditto.
(vfmlslq_high_u32): Ditto.
(vfmlal_lane_low_u32): Ditto.
(vfmlsl_lane_low_u32): Ditto.
(vfmlal_laneq_low_u32): Ditto.
(vfmlsl_laneq_low_u32): Ditto.
(vfmlalq_lane_low_u32): Ditto.
(vfmlslq_lane_low_u32): Ditto.
(vfmlalq_laneq_low_u32): Ditto.
(vfmlslq_laneq_low_u32): Ditto.
(vfmlal_lane_high_u32): Ditto.
(vfmlsl_lane_high_u32): Ditto.
(vfmlal_laneq_high_u32): Ditto.
(vfmlsl_laneq_high_u32): Ditto.
(vfmlalq_lane_high_u32): Ditto.
(vfmlslq_lane_high_u32): Ditto.
(vfmlalq_laneq_high_u32): Ditto.
(vfmlslq_laneq_high_u32): Ditto.
* config/aarch64/aarch64.h (AARCH64_FL_F16SML): New flag.
(AARCH64_FL_FOR_ARCH8_4): New.
(AARCH64_ISA_F16FML): New ISA flag.
(TARGET_F16FML): New feature flag for fp16fml.
gcc.target/aarch64/fp16_fmul_high_1.c: New testcase.
gcc.target/aarch64/fp16_fmul_high_2.c: New testcase.
gcc.target/aarch64/fp16_fmul_high_3.c: New testcase.
gcc.target/aarch64/fp16_fmul_high.h: New shared testcase.
gcc.target/aarch64/fp16_fmul_lane_high_1.c: New te

[PATCH 4/5][AArch64] Crypto sha512 and sha3

2018-01-03 Thread Michael Collison
Hi All,

This patch adds support for the SHA-512 and SHA-3 instructions added in 
Armv8.4-a. Support for the new instructions is in the form of new ACLE 
intrinsics. A new command line feature modifier, +sha3, is added to enable the 
support.

Test cases were added to verify that the ACLE Intrinsics generate the 
appropriate SHA-512/SHA-3 assembly instructions.

Bootstrapped on aarch64-none-elf. Tested with new binutils and verified all 
instructions assembly correctly.

Okay for trunk?

2017-11-10  Michael Collison  <michael.colli...@arm.com>

* config/aarch64/aarch64-builtins.c:
(aarch64_types_ternopu_imm_qualifiers, TYPES_TERNOPUI): New.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
(__ARM_FEATURE_SHA3): Define if TARGET_SHA3 is true.
* config/aarch64/aarch64.h (AARCH64_FL_SHA3): New flags.
(AARCH64_ISA_SHA3): New ISA flag.
(TARGET_SHA3): New feature flag for sha3.
* config/aarch64/iterators.md (sha512_op): New int attribute.
(CRYPTO_SHA512): New int iterator.
(UNSPEC_SHA512H): New unspec.
(UNSPEC_SHA512H2): Ditto.
(UNSPEC_SHA512SU0): Ditto.
(UNSPEC_SHA512SU1): Ditto.
* config/aarch64/aarch64-simd-builtins.def
(aarch64_crypto_sha512hqv2di): New builtin.
(aarch64_crypto_sha512h2qv2di): Ditto.
(aarch64_crypto_sha512su0qv2di): Ditto.
(aarch64_crypto_sha512su1qv2di): Ditto.
(aarch64_eor3qv8hi): Ditto.
(aarch64_rax1qv2di): Ditto.
(aarch64_xarqv2di): Ditto.
(aarch64_bcaxqv8hi): Ditto.
* config/aarch64/aarch64-simd.md:
(aarch64_crypto_sha512hqv2di): New pattern.
(aarch64_crypto_sha512su0qv2di): Ditto.
(aarch64_crypto_sha512su1qv2di): Ditto.
(aarch64_eor3qv8hi): Ditto.
(aarch64_rax1qv2di): Ditto.
(aarch64_xarqv2di): Ditto.
(aarch64_bcaxqv8hi): Ditto.
* config/aarch64/arm_neon.h (vsha512hq_u64): New intrinsic.
(vsha512h2q_u64): Ditto.
(vsha512su0q_u64): Ditto.
(vsha512su1q_u64): Ditto.
(veor3q_u16): Ditto.
(vrax1q_u64): Ditto.
(vxarq_u64): Ditto.
(vbcaxq_u16): Ditto.
* config/arm/types.md (crypto_sha512): New type attribute.
(crypto_sha3): Ditto.
(doc/invoke.texi): Document new sha3 option.
gcc.target/aarch64/sha2.h: New shared testcase.
gcc.target/aarch64/sha2_1.c: New testcase.
gcc.target/aarch64/sha2_2.c: New testcase.
gcc.target/aarch64/sha2_3.c: New testcase.
gcc.target/aarch64/sha3.h: New shared testcase.
gcc.target/aarch64/sha3_1.c: New testcase.
gcc.target/aarch64/sha3_2.c: New testcase.
gcc.target/aarch64/sha3_3.c: New testcase.


crypto_sha512.patch
Description: crypto_sha512.patch


[PATCH 3/5][AArch64] Crypto SM4 Support

2018-01-03 Thread Michael Collison
Hi All,

This patch adds support for the SM3/SM4 cryptographic instructions added in 
Armv8.4-a. Support for the new instructions is in the form of new ACLE 
intrinsics. A new command line feature modifier, +sm4, is added to enable the 
support.

Test cases were added to verify that the ACLE Intrinsics generate the 
appropriate SM3/SM4 assembly instructions.

Bootstrapped on aarch64-none-elf. Tested with new binutils and verified all 
instructions assembly correctly.

Okay for trunk?

2017-11-10  Michael Collison  <michael.colli...@arm.com>

* config/aarch64/aarch64-builtins.c:
(aarch64_types_quadopu_imm_qualifiers, TYPES_QUADOPUI): New.
* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
(__ARM_FEATURE_SM3): Define if TARGET_SM4 is true.
(__ARM_FEATURE_SM4): Define if TARGET_SM4 is true.
* config/aarch64/aarch64.h (AARCH64_FL_SM4): New flags.
(AARCH64_ISA_SM4): New ISA flag.
(TARGET_SM4): New feature flag for sm4.
* config/aarch64/aarch64-simd-builtins.def
(aarch64_sm3ss1qv4si): Ditto.
(aarch64_sm3tt1aq4si): Ditto.
(aarch64_sm3tt1bq4si): Ditto.
(aarch64_sm3tt2aq4si): Ditto.
(aarch64_sm3tt2bq4si): Ditto.
(aarch64_sm3partw1qv4si): Ditto.
(aarch64_sm3partw2qv4si): Ditto.
(aarch64_sm4eqv4si): Ditto.
(aarch64_sm4ekeyqv4si): Ditto.
* config/aarch64/aarch64-simd.md:
(aarch64_sm3ss1qv4si): Ditto.
(aarch64_sm3ttqv4si): Ditto.
(aarch64_sm3partwqv4si): Ditto.
(aarch64_sm4eqv4si): Ditto.
(aarch64_sm4ekeyqv4si): Ditto.
* config/aarch64/iterators.md (sm3tt_op): New int iterator.
(sm3part_op): Ditto.
(CRYPTO_SM3TT): Ditto.
(CRYPTO_SM3PART): Ditto.
(UNSPEC_SM3SS1): New unspec.
(UNSPEC_SM3TT1A): Ditto.
(UNSPEC_SM3TT1B): Ditto.
(UNSPEC_SM3TT2A): Ditto.
(UNSPEC_SM3TT2B): Ditto.
(UNSPEC_SM3PARTW1): Ditto.
(UNSPEC_SM3PARTW2): Ditto.
(UNSPEC_SM4E): Ditto.
(UNSPEC_SM4EKEY): Ditto.
* config/aarch64/constraints.md (Ui2): New constraint.
* config/aarch64/predicates.md (aarch64_imm2): New predicate.
* config/arm/types.md (crypto_sm3): New type attribute.
(crypto_sm4): Ditto.
* config/aarch64/arm_neon.h (vsm3ss1q_u32): New intrinsic.
(vsm3tt1aq_u32): Ditto.
(vsm3tt1bq_u32): Ditto.
(vsm3tt2aq_u32): Ditto.
(vsm3tt2bq_u32): Ditto.
(vsm3partw1q_u32): Ditto.
(vsm3partw2q_u32): Ditto.
(vsm4eq_u32): Ditto.
(vsm4ekeyq_u32): Ditto.
(doc/invoke.texi): Document new sm4 option.
gcc.target/aarch64/sm3_sm4.c: New testcase.


crypto_sm4.patch
Description: crypto_sm4.patch


[PATCH 2/5][AArch64] Add v8.4 architecture

2018-01-03 Thread Michael Collison
Hi all,

This patch adds support for the Arm architecture v8.4. A new command line 
option, -march=armv8.4-a, is added as well as documentation.

Bootstrapped on aarch64-none-elf. Tested with new binutils and verified all 
instructions assembly correctly.

2017-11-10  Michael Collison  <michael.colli...@arm.com>

* config/aarch64/aarch64-arches.def (armv8.4-a): New architecture.
* config/aarch64/aarch64.h (AARCH64_ISA_V8_4): New ISA flag.
(AARCH64_FL_FOR_ARCH8_4): New.
(AARCH64_FL_V8_4): New flag.
(doc/invoke.texi): Document new armv8.4-a option.



v8_4_architecture.patch
Description: v8_4_architecture.patch


[PATCH 1/5][AArch64] Crypto command line split

2018-01-03 Thread Michael Collison
Hi all,

This patch adds two new command line options for the legacy cryptographic 
extensions AES (+aes) and SHA-1/SHA-2 (+sha2). Backward compatibility is 
retained by modifying the +crypto feature modifier to enable +aes and +sha2.

Bootstrapped on aarch64-none-elf. Tested with new binutils and verified all 
instructions assembly correctly.

2017-11-10  Michael Collison  <michael.colli...@arm.com>

* config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins):
(__ARM_FEATURE_AES): Define if TARGET_AES is true.
(__ARM_FEATURE_SHA2): Define if TARGET_SHA2 is true.
* config/aarch64/aarch64-option-extension.def: Add
AARCH64_OPT_EXTENSION of 'sha2'.
(aes): Add AARCH64_OPT_EXTENSION of 'aes'.
(crypto): Disable sha2 and aes if crypto disabled.
(crypto): Enable aes and sha2 if enabled.
(simd): Disable sha2 and aes if simd disabled.
* config/aarch64/aarch64.h (AARCH64_FL_AES, AARCH64_FL_SHA2):
New flags.
(AARCH64_ISA_AES, AARCH64_ISA_SHA2): New ISA flags.
(TARGET_SHA2): New feature flag for sha2.
(TARGET_AES): New feature flag for aes.
* config/aarch64/aarch64-simd.md:
(aarch64_crypto_aesv16qi): Make pattern
conditional on TARGET_AES.
(aarch64_crypto_aesv16qi): Ditto.
(aarch64_crypto_sha1hsi): Make pattern conditional
on TARGET_SHA2.
(aarch64_crypto_sha1hv4si): Ditto.
(aarch64_be_crypto_sha1hv4si): Ditto.
(aarch64_crypto_sha1su1v4si): Ditto.
(aarch64_crypto_sha1v4si): Ditto.
(aarch64_crypto_sha1su0v4si): Ditto.
(aarch64_crypto_sha256hv4si): Ditto.
(aarch64_crypto_sha256su0v4si): Ditto.
(aarch64_crypto_sha256su1v4si): Ditto.
(doc/invoke.texi): Document new aes and sha2 options.


crypto_split.patch
Description: crypto_split.patch


[PATCH 0/5][AArch64] ARMv8.4-A support

2018-01-03 Thread Michael Collison
Hello,

The ARMv8.4-A architecture builds on ARMv8.3-A and includes optional 
cryptographic extensions supporting SHA512, SHA3, SM3 and SM4. New FP16 
multiply add/subtract instructions have been added that are mandatory in 
ARMv8.4-A and optional from ARMv8.2-A onward.  

Although the new cryptographic instructions are introduced in ARMv8.4-A, they 
may be optionally supported in any architecture implementation from ARMv8.2-A 
onward.

This patch set adds support to GCC for the ARMv8.4-A architecture and adds new 
command line options to individually select the existing cryptographic 
SHA-1/SHA-256 and AES extensions. The existing +crypto option is retained for 
backward compatibility. New command line options are added for SHA512/SHA3, 
SM3/SM4 and the FP16 multiply add/subtract extensions. The cryptographic and 
FP16 multiply add/subtract instructions are exposed as new ACLE intrinsics.

The patches in this series are:

- Add new command line options for SHA-1/SHA-256 and AES including documentation
- Add support for ARMv8.4-A
- Add support for the cryptographic SM3/SM4 extension
- Add support for the cryptographic SHA-512/SHA-3 extension
- Add support for the FP16 multiply add/subtract extension



  1   2   3   >