[PING][PATCH] arm: Remove unsigned variant of vcaddq_m

2023-08-19 Thread Stam Markianos-Wright via Gcc-patches


(Pinging since I realised that this is required for my later Low Overhead Loop 
patch series to work)

Ok for trunk with the updated changelog that Christophe mentioned?

Thanks,
Stamatis/Stam Markianos-Wright


From: Stam Markianos-Wright
Sent: Tuesday, August 1, 2023 6:21 PM
To: gcc-patches@gcc.gnu.org 
Cc: Richard Earnshaw ; Kyrylo Tkachov 

Subject: arm: Remove unsigned variant of vcaddq_m

Hi all,

The unsigned variants of the vcaddq_m operation are not needed within the
compiler, as the assembly output of the signed and unsigned versions of the
ops is identical: with a `.i` suffix (as opposed to separate `.s` and `.u`
suffixes).

Tested with baremetal arm-none-eabi on Arm's fastmodels.

Ok for trunk?

Thanks,
Stamatis Markianos-Wright

gcc/ChangeLog:

 * config/arm/arm-mve-builtins-base.cc (vcaddq_rot90, vcaddq_rot270):
   Use common insn for signed and unsigned front-end definitions.
 * config/arm/arm_mve_builtins.def
   (vcaddq_rot90_m_u, vcaddq_rot270_m_u): Make common.
   (vcaddq_rot90_m_s, vcaddq_rot270_m_s): Remove.
 * config/arm/iterators.md (mve_insn): Merge signed and unsigned defs.
   (isu): Likewise.
   (rot): Likewise.
   (mve_rot): Likewise.
   (supf): Likewise.
   (VxCADDQ_M): Likewise.
 * config/arm/unspecs.md (unspec): Likewise.
---
  gcc/config/arm/arm-mve-builtins-base.cc |  4 ++--
  gcc/config/arm/arm_mve_builtins.def |  6 ++---
  gcc/config/arm/iterators.md | 30 +++--
  gcc/config/arm/mve.md   |  4 ++--
  gcc/config/arm/unspecs.md   |  6 ++---
  5 files changed, 21 insertions(+), 29 deletions(-)

diff --git a/gcc/config/arm/arm-mve-builtins-base.cc
b/gcc/config/arm/arm-mve-builtins-base.cc
index e31095ae112..426a87e9852 100644
--- a/gcc/config/arm/arm-mve-builtins-base.cc
+++ b/gcc/config/arm/arm-mve-builtins-base.cc
@@ -260,8 +260,8 @@ FUNCTION_PRED_P_S_U (vaddvq, VADDVQ)
  FUNCTION_PRED_P_S_U (vaddvaq, VADDVAQ)
  FUNCTION_WITH_RTX_M (vandq, AND, VANDQ)
  FUNCTION_ONLY_N (vbrsrq, VBRSRQ)
-FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot,
(UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M_S,
VCADDQ_ROT90_M_U, VCADDQ_ROT90_M_F))
-FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot,
(UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M_S,
VCADDQ_ROT270_M_U, VCADDQ_ROT270_M_F))
+FUNCTION (vcaddq_rot90, unspec_mve_function_exact_insn_rot,
(UNSPEC_VCADD90, UNSPEC_VCADD90, UNSPEC_VCADD90, VCADDQ_ROT90_M,
VCADDQ_ROT90_M, VCADDQ_ROT90_M_F))
+FUNCTION (vcaddq_rot270, unspec_mve_function_exact_insn_rot,
(UNSPEC_VCADD270, UNSPEC_VCADD270, UNSPEC_VCADD270, VCADDQ_ROT270_M,
VCADDQ_ROT270_M, VCADDQ_ROT270_M_F))
  FUNCTION (vcmlaq, unspec_mve_function_exact_insn_rot, (-1, -1,
UNSPEC_VCMLA, -1, -1, VCMLAQ_M_F))
  FUNCTION (vcmlaq_rot90, unspec_mve_function_exact_insn_rot, (-1, -1,
UNSPEC_VCMLA90, -1, -1, VCMLAQ_ROT90_M_F))
  FUNCTION (vcmlaq_rot180, unspec_mve_function_exact_insn_rot, (-1, -1,
UNSPEC_VCMLA180, -1, -1, VCMLAQ_ROT180_M_F))
diff --git a/gcc/config/arm/arm_mve_builtins.def
b/gcc/config/arm/arm_mve_builtins.def
index 43dacc3dda1..6ac1812c697 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -523,8 +523,8 @@ VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED,
vhsubq_m_n_u, v16qi, v8hi, v4si)
  VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhaddq_m_u, v16qi, v8hi, v4si)
  VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vhaddq_m_n_u, v16qi, v8hi,
v4si)
  VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, veorq_m_u, v16qi, v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot90_m_u, v16qi,
v8hi, v4si)
-VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot270_m_u, v16qi,
v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot90_m_, v16qi,
v8hi, v4si)
+VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vcaddq_rot270_m_, v16qi,
v8hi, v4si)
  VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vbicq_m_u, v16qi, v8hi, v4si)
  VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vandq_m_u, v16qi, v8hi, v4si)
  VAR3 (QUADOP_UNONE_UNONE_UNONE_UNONE_PRED, vaddq_m_u, v16qi, v8hi, v4si)
@@ -587,8 +587,6 @@ VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED,
vhcaddq_rot270_m_s, v16qi, v8hi, v4si)
  VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhaddq_m_s, v16qi, v8hi, v4si)
  VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vhaddq_m_n_s, v16qi, v8hi, v4si)
  VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, veorq_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot90_m_s, v16qi, v8hi, v4si)
-VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vcaddq_rot270_m_s, v16qi, v8hi,
v4si)
  VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbrsrq_m_n_s, v16qi, v8hi, v4si)
  VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vbicq_m_s, v16qi, v8hi, v4si)
  VAR3 (QUADOP_NONE_NONE_NONE_NONE_PRED, vandq_m_s, v16qi, v8hi, v4si)
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index b13ff53d36f..2edd0b06370 100644
--- a/gcc/config/arm

[commited trunk 7/9] arm testsuite: Remove reduntant tests

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
Following Andrea's overhaul of the MVE testsuite, these tests are now
reduntant, as equivalent checks have been added to the each intrinsic's
.c test.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/mve_fp_vaddq_n.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vaddq_m.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vaddq_n.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_s64.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_u64.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_z_s64.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_z_u64.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_s64.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_u64.c: 
Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_z_s64.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_z_u64.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_f16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_s16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_s32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_f16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_s16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_s32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_u16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_u32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_f16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_s16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_s32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_u16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_u32.c: 
Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_f16.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_s16.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_s32.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_u16.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_f32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_s32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_f32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_s32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_u32.c: 
Removed.
* 

[commited trunk 9/9] arm testsuite: Shifts and get_FPSCR ACLE optimisation fixes

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
These newly updated tests were rewritten by Andrea. Some of them
needed further manual fixing as follows:

* The #shift immediate value not in the check-function-bodies as expected
* The ACLE was specifying sub-optimal code: lsr+and instead of ubfx. In
  this case the test rewritten from the ACLE had the lsr+and pattern,
  but the compiler was able to optimise to ubfx. Hence I've changed the
  test to now match on ubfx.
* Added a separate test to check shift on constants being optimised to
  movs.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/srshr.c: Update shift value.
* gcc.target/arm/mve/intrinsics/srshrl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/uqshl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/uqshll.c: Update shift value.
* gcc.target/arm/mve/intrinsics/urshr.c: Update shift value.
* gcc.target/arm/mve/intrinsics/urshrl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/vadciq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_u32.c: Update to ubfx.
* gcc.target/arm/mve/mve_const_shifts.c: New test.
---
 .../gcc.target/arm/mve/intrinsics/srshr.c |  2 +-
 .../gcc.target/arm/mve/intrinsics/srshrl.c|  2 +-
 .../gcc.target/arm/mve/intrinsics/uqshl.c | 14 +--
 .../gcc.target/arm/mve/intrinsics/uqshll.c| 14 +--
 .../gcc.target/arm/mve/intrinsics/urshr.c |  4 +-
 .../gcc.target/arm/mve/intrinsics/urshrl.c|  4 +-
 .../arm/mve/intrinsics/vadciq_m_s32.c |  8 +---
 .../arm/mve/intrinsics/vadciq_m_u32.c |  8 +---
 .../arm/mve/intrinsics/vadciq_s32.c   |  8 +---
 .../arm/mve/intrinsics/vadciq_u32.c   |  8 +---
 .../arm/mve/intrinsics/vadcq_m_s32.c  |  8 +---
 .../arm/mve/intrinsics/vadcq_m_u32.c  |  8 +---
 .../gcc.target/arm/mve/intrinsics/vadcq_s32.c |  8 +---
 .../gcc.target/arm/mve/intrinsics/vadcq_u32.c |  8 +---
 .../arm/mve/intrinsics/vsbciq_m_s32.c |  8 +---
 .../arm/mve/intrinsics/vsbciq_m_u32.c |  8 +---
 .../arm/mve/intrinsics/vsbciq_s32.c   |  8 +---
 .../arm/mve/intrinsics/vsbciq_u32.c   |  8 +---
 .../arm/mve/intrinsics/vsbcq_m_s32.c  |  8 +---
 .../arm/mve/intrinsics/vsbcq_m_u32.c  |  8 +---
 .../gcc.target/arm/mve/intrinsics/vsbcq_s32.c |  8 +---
 .../gcc.target/arm/mve/intrinsics/vsbcq_u32.c |  8 +---
 .../gcc.target/arm/mve/mve_const_shifts.c | 41 +++
 23 files changed, 81 insertions(+), 128 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/mve_const_shifts.c

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
index 94e3f42fd33..734375d58c0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** srshr   (?:ip|fp|r[0-9]+), #shift(?:@.*|)
+** srshr   (?:ip|fp|r[0-9]+), #1(?:@.*|)
 ** ...
 */
 int32_t
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
index 65f28ccbfde..a91943c38a0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** srshrl  (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #shift(?: @.*|)
+** srshrl  (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #1(?: @.*|)
 ** ...
 */
 int64_t
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
index b23c9d97ba6..462531cad54 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** uqshl   (?:ip|fp|r[0-9]+), #shift(?:@.*|)
+** uqshl   (?:ip|fp|r[0-9]+), #1(?:@.*|)
 ** ...
 

[commited trunk 2/9] arm: Fix vstrwq* backend + testsuite

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
From: Andrea Corallo 

Hi all,

this patch fixes the vstrwq* MVE instrinsics failing to emit the
correct sequence of instruction due to a missing predicate. Also the
immediate range is fixed to be multiples of 2 up between [-252, 252].

Best Regards

  Andrea

gcc/ChangeLog:

* config/arm/constraints.md (mve_vldrd_immediate): Move it to
predicates.md.
(Ri): Move constraint definition from predicates.md.
(Rl): Define new constraint.
* config/arm/mve.md (mve_vstrwq_scatter_base_wb_p_v4si): Add
missing constraint.
(mve_vstrwq_scatter_base_wb_p_fv4sf): Add missing Up constraint
for op 1, use mve_vstrw_immediate predicate and Rl constraint for
op 2. Fix asm output spacing.
(mve_vstrdq_scatter_base_wb_p_v2di): Add missing constraint.
* config/arm/predicates.md (Ri) Move constraint to constraints.md
(mve_vldrd_immediate): Move it from
constraints.md.
(mve_vstrw_immediate): New predicate.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vstrwq_f32.c: Use
check-function-bodies instead of scan-assembler checks.  Use
extern "C" for C++ testing.
* gcc.target/arm/mve/intrinsics/vstrwq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_u32.c: Likewise.
---
 gcc/config/arm/constraints.md | 20 --
 gcc/config/arm/mve.md | 10 ++---
 gcc/config/arm/predicates.md  | 14 +++
 .../arm/mve/intrinsics/vstrwq_f32.c   | 32 ---
 .../arm/mve/intrinsics/vstrwq_p_f32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_p_s32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_p_u32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_s32.c   | 32 ---
 .../mve/intrinsics/vstrwq_scatter_base_f32.c  | 28 +++--
 .../intrinsics/vstrwq_scatter_base_p_f32.c| 36 +++--
 .../intrinsics/vstrwq_scatter_base_p_s32.c| 36 +++--
 .../intrinsics/vstrwq_scatter_base_p_u32.c| 36 +++--
 .../mve/intrinsics/vstrwq_scatter_base_s32.c  | 28 +++--
 .../mve/intrinsics/vstrwq_scatter_base_u32.c  | 28 +++--
 .../intrinsics/vstrwq_scatter_base_wb_f32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_f32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_s32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_u32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_s32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_base_wb_u32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_offset_f32.c| 32 ---
 .../intrinsics/vstrwq_scatter_offset_p_f32.c  | 40 ---
 

[commited trunk 8/9] arm testsuite: XFAIL or relax registers in some tests [PR109697]

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
Hi all,

This is a simple testsuite tidy-up patch, addressing to types of errors:

* The vcmp vector-scalar tests failing due to the compiler's preference
of vector-vector comparisons, over vector-scalar comparisons. This is
due to the lack of cost model for MVE and the compiler not knowing that
the RTL vec_duplicate is free in those instructions. For now, we simply
XFAIL these checks.
* The tests for pr108177 had strict usage of q0 and r0 registers,
meaning that they would FAIL with -mfloat-abi=softf. The register checks
have now been relaxed. A couple of these run-tests also had incosistent
use of integer MVE with floating point vectors, so I've now changed these
to use FP MVE.

gcc/testsuite/ChangeLog:
PR target/109697
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/pr108177-1.c: Relax registers.
* gcc.target/arm/mve/pr108177-10.c: Relax registers.
* gcc.target/arm/mve/pr108177-11.c: Relax registers.
* gcc.target/arm/mve/pr108177-12.c: Relax registers.
* gcc.target/arm/mve/pr108177-13.c: Relax registers.
* gcc.target/arm/mve/pr108177-13-run.c: use mve_fp
* gcc.target/arm/mve/pr108177-14.c: Relax registers.
* gcc.target/arm/mve/pr108177-14-run.c: use mve_fp
* gcc.target/arm/mve/pr108177-2.c: Relax registers.
* gcc.target/arm/mve/pr108177-3.c: Relax registers.
* gcc.target/arm/mve/pr108177-4.c: Relax registers.
* gcc.target/arm/mve/pr108177-5.c: Relax registers.
* gcc.target/arm/mve/pr108177-6.c: Relax registers.
* gcc.target/arm/mve/pr108177-7.c: Relax registers.
* gcc.target/arm/mve/pr108177-8.c: Relax registers.
* gcc.target/arm/mve/pr108177-9.c: Relax registers.
---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c | 2 +-
 

[commited trunk 4/9] arm: Stop vadcq, vsbcq intrinsics from overwriting the FPSCR NZ flags

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
Hi all,

We noticed that calls to the vadcq and vsbcq intrinsics, both of
which use __builtin_arm_set_fpscr_nzcvqc to set the Carry flag in
the FPSCR, would produce the following code:

```
< r2 is the *carry input >
vmrsr3, FPSCR_nzcvqc
bic r3, r3, #536870912
orr r3, r3, r2, lsl #29
vmsrFPSCR_nzcvqc, r3
```

when the MVE ACLE instead gives a different instruction sequence of:
```
< Rt is the *carry input >
VMRS Rs,FPSCR_nzcvqc
BFI Rs,Rt,#29,#1
VMSR FPSCR_nzcvqc,Rs
```

the bic + orr pair is slower and it's also wrong, because, if the
*carry input is greater than 1, then we risk overwriting the top two
bits of the FPSCR register (the N and Z flags).

This turned out to be a problem in the header file and the solution was
to simply add a `& 1x0u` to the `*carry` input: then the compiler knows
that we only care about the lowest bit and can optimise to a BFI.

Ok for trunk?

Thanks,
Stam Markianos-Wright

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vadcq_s32): Fix arithmetic.
(__arm_vadcq_u32): Likewise.
(__arm_vadcq_m_s32): Likewise.
(__arm_vadcq_m_u32): Likewise.
(__arm_vsbcq_s32): Likewise.
(__arm_vsbcq_u32): Likewise.
(__arm_vsbcq_m_s32): Likewise.
(__arm_vsbcq_m_u32): Likewise.
* config/arm/mve.md (get_fpscr_nzcvqc): Make unspec_volatile.

gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: New.
---
 gcc/config/arm/arm_mve.h  | 16 ++---
 gcc/config/arm/mve.md |  2 +-
 .../arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c | 67 +++
 3 files changed, 76 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 1774e6eca2b..4ad1c99c288 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -4098,7 +4098,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vadcq_sv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -4108,7 +4108,7 @@ __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   uint32x4_t __res = __builtin_mve_vadcq_uv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -4118,7 +4118,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, 
unsigned * __carry, mve_pred16_t __p)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vadcq_m_sv4si (__inactive, __a, __b, __p);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -4128,7 +4128,7 @@ __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, 
unsigned * __carry, mve_pred16_t __p)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   uint32x4_t __res =  __builtin_mve_vadcq_m_uv4si (__inactive, __a, __b, __p);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -4174,7 +4174,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vsbcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vsbcq_sv4si (__a, __b);
  

[commited trunk 5/9] arm: Fix overloading of MVE scalar constant parameters on vbicq

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
We found this as part of the wider testsuite updates.

The applicable tests are authored by Andrea earlier in this patch series

Ok for trunk?

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vbicq): Change coerce on
scalar constant.
---
 gcc/config/arm/arm_mve.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 4ad1c99c288..30cec519791 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -10847,10 +10847,10 @@ extern void *__ARM_undef;
 #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1 (__p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3 (p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vbicq_s8 
(__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vbicq_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vbicq_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -11699,10 +11699,10 @@ extern void *__ARM_undef;
 #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1 (__p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3 (p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vbicq_s8 
(__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vbicq_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vbicq_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
-- 
2.25.1



[committed gcc12 backport] arm: Fix overloading of MVE scalar constant parameters on vbicq, vmvnq_m

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
We found this as part of the wider testsuite updates.

The applicable tests are authored by Andrea earlier in this patch series

Ok for trunk?

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vbicq): Change coerce on
scalar constant.
(__arm_vmvnq_m): Likewise.
---
 gcc/config/arm/arm_mve.h | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 39b3446617d..0b35bd0eedd 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35906,10 +35906,10 @@ extern void *__ARM_undef;
 #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1 (__p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3 (p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vbicq_s8 
(__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vbicq_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vbicq_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -38825,10 +38825,10 @@ extern void *__ARM_undef;
 #define __arm_vbicq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
   _Generic( (int (*)[__ARM_mve_typeid(__p0)][__ARM_mve_typeid(__p1)])0, \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce1 (__p1, int)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce1 (__p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3 (p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vbicq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3 (p1, int)), \
   int (*)[__ARM_mve_type_int8x16_t][__ARM_mve_type_int8x16_t]: __arm_vbicq_s8 
(__ARM_mve_coerce(__p0, int8x16_t), __ARM_mve_coerce(__p1, int8x16_t)), \
   int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int16x8_t]: __arm_vbicq_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce(__p1, int16x8_t)), \
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vbicq_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
@@ -40962,10 +40962,10 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: 
__arm_vmvnq_m_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, 
uint8x16_t), p2), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: 
__arm_vmvnq_m_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, 
uint16x8_t), p2), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: 
__arm_vmvnq_m_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, 
uint32x4_t), p2), \
-  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vmvnq_m_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce1(__p1, 

[committed gcc12 backport] arm testsuite: Shifts and get_FPSCR ACLE optimisation fixes

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
These newly updated tests were rewritten by Andrea. Some of them
needed further manual fixing as follows:

* The #shift immediate value not in the check-function-bodies as expected
* The ACLE was specifying sub-optimal code: lsr+and instead of ubfx. In
  this case the test rewritten from the ACLE had the lsr+and pattern,
  but the compiler was able to optimise to ubfx. Hence I've changed the
  test to now match on ubfx.
* Added a separate test to check shift on constants being optimised to
  movs.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/srshr.c: Update shift value.
* gcc.target/arm/mve/intrinsics/srshrl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/uqshl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/uqshll.c: Update shift value.
* gcc.target/arm/mve/intrinsics/urshr.c: Update shift value.
* gcc.target/arm/mve/intrinsics/urshrl.c: Update shift value.
* gcc.target/arm/mve/intrinsics/vadciq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadciq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vadcq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbciq_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_s32.c: Update to ubfx.
* gcc.target/arm/mve/intrinsics/vsbcq_u32.c: Update to ubfx.
* gcc.target/arm/mve/mve_const_shifts.c: New test.
---
 .../gcc.target/arm/mve/intrinsics/srshr.c |  2 +-
 .../gcc.target/arm/mve/intrinsics/srshrl.c|  2 +-
 .../gcc.target/arm/mve/intrinsics/uqshl.c | 14 +--
 .../gcc.target/arm/mve/intrinsics/uqshll.c| 14 +--
 .../gcc.target/arm/mve/intrinsics/urshr.c |  4 +-
 .../gcc.target/arm/mve/intrinsics/urshrl.c|  4 +-
 .../arm/mve/intrinsics/vadciq_m_s32.c |  8 +---
 .../arm/mve/intrinsics/vadciq_m_u32.c |  8 +---
 .../arm/mve/intrinsics/vadciq_s32.c   |  8 +---
 .../arm/mve/intrinsics/vadciq_u32.c   |  8 +---
 .../arm/mve/intrinsics/vadcq_m_s32.c  |  8 +---
 .../arm/mve/intrinsics/vadcq_m_u32.c  |  8 +---
 .../gcc.target/arm/mve/intrinsics/vadcq_s32.c |  8 +---
 .../gcc.target/arm/mve/intrinsics/vadcq_u32.c |  8 +---
 .../arm/mve/intrinsics/vsbciq_m_s32.c |  8 +---
 .../arm/mve/intrinsics/vsbciq_m_u32.c |  8 +---
 .../arm/mve/intrinsics/vsbciq_s32.c   |  8 +---
 .../arm/mve/intrinsics/vsbciq_u32.c   |  8 +---
 .../arm/mve/intrinsics/vsbcq_m_s32.c  |  8 +---
 .../arm/mve/intrinsics/vsbcq_m_u32.c  |  8 +---
 .../gcc.target/arm/mve/intrinsics/vsbcq_s32.c |  8 +---
 .../gcc.target/arm/mve/intrinsics/vsbcq_u32.c |  8 +---
 .../gcc.target/arm/mve/mve_const_shifts.c | 41 +++
 23 files changed, 81 insertions(+), 128 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/mve/mve_const_shifts.c

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
index 94e3f42fd33..734375d58c0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshr.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** srshr   (?:ip|fp|r[0-9]+), #shift(?:@.*|)
+** srshr   (?:ip|fp|r[0-9]+), #1(?:@.*|)
 ** ...
 */
 int32_t
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
index 65f28ccbfde..a91943c38a0 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/srshrl.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** srshrl  (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #shift(?: @.*|)
+** srshrl  (?:ip|fp|r[0-9]+), (?:ip|fp|r[0-9]+), #1(?: @.*|)
 ** ...
 */
 int64_t
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
index b23c9d97ba6..462531cad54 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/uqshl.c
@@ -12,7 +12,7 @@ extern "C" {
 /*
 **foo:
 ** ...
-** uqshl   (?:ip|fp|r[0-9]+), #shift(?:@.*|)
+** uqshl   (?:ip|fp|r[0-9]+), #1(?:@.*|)
 ** ...
 

[committed gcc12 backport] arm testsuite: Remove reduntant tests

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
Following Andrea's overhaul of the MVE testsuite, these tests are now
reduntant, as equivalent checks have been added to the each intrinsic's
.c test.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/mve_fp_vaddq_n.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vaddq_m.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vaddq_n.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_m_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vddupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vdwdupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_m_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vidupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_viwdupq_x_n_u8.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_s64.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_u64.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_z_s64.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_offset_z_u64.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_s64.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_u64.c: 
Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_z_s64.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrdq_gather_shifted_offset_z_u64.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_f16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_s16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_s32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_u16.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_f16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_s16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_s32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_u16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_offset_z_u32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_f16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_s16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_s32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_u16.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_u32.c: 
Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_f16.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_s16.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_s32.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_u16.c: Removed.
* 
gcc.target/arm/mve/intrinsics/mve_vldrhq_gather_shifted_offset_z_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_f32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_s32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_u32.c: Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_f32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_s32.c: 
Removed.
* gcc.target/arm/mve/intrinsics/mve_vldrwq_gather_offset_z_u32.c: 
Removed.
* 

[committed gcc12 backport] arm testsuite: XFAIL or relax registers in some tests [PR109697]

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
Hi all,

This is a simple testsuite tidy-up patch, addressing to types of errors:

* The vcmp vector-scalar tests failing due to the compiler's preference
of vector-vector comparisons, over vector-scalar comparisons. This is
due to the lack of cost model for MVE and the compiler not knowing that
the RTL vec_duplicate is free in those instructions. For now, we simply
XFAIL these checks.
* The tests for pr108177 had strict usage of q0 and r0 registers,
meaning that they would FAIL with -mfloat-abi=softf. The register checks
have now been relaxed. A couple of these run-tests also had incosistent
use of integer MVE with floating point vectors, so I've now changed
these to use FP MVE.

gcc/testsuite/ChangeLog:
PR target/109697
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c: XFAIL check.
* gcc.target/arm/mve/intrinsics/vcmpneq_n_u8.c: XFAIL check.
* gcc.target/arm/mve/pr108177-1.c: Relax registers.
* gcc.target/arm/mve/pr108177-10.c: Relax registers.
* gcc.target/arm/mve/pr108177-11.c: Relax registers.
* gcc.target/arm/mve/pr108177-12.c: Relax registers.
* gcc.target/arm/mve/pr108177-13.c: Relax registers.
* gcc.target/arm/mve/pr108177-13-run.c: use mve_fp
* gcc.target/arm/mve/pr108177-14.c: Relax registers.
* gcc.target/arm/mve/pr108177-14-run.c: use mve_fp
* gcc.target/arm/mve/pr108177-2.c: Relax registers.
* gcc.target/arm/mve/pr108177-3.c: Relax registers.
* gcc.target/arm/mve/pr108177-4.c: Relax registers.
* gcc.target/arm/mve/pr108177-5.c: Relax registers.
* gcc.target/arm/mve/pr108177-6.c: Relax registers.
* gcc.target/arm/mve/pr108177-7.c: Relax registers.
* gcc.target/arm/mve/pr108177-8.c: Relax registers.
* gcc.target/arm/mve/pr108177-9.c: Relax registers.
---
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpcsq_n_u8.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpeqq_n_u8.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgeq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpgtq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmphiq_n_u8.c  | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpleq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpltq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_f32.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u16.c | 2 +-
 gcc/testsuite/gcc.target/arm/mve/intrinsics/vcmpneq_n_u32.c | 2 +-
 

[committed gcc12 backport] arm: Stop vadcq, vsbcq intrinsics from overwriting the FPSCR NZ flags

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
Hi all,

We noticed that calls to the vadcq and vsbcq intrinsics, both of
which use __builtin_arm_set_fpscr_nzcvqc to set the Carry flag in
the FPSCR, would produce the following code:

```
< r2 is the *carry input >
vmrsr3, FPSCR_nzcvqc
bic r3, r3, #536870912
orr r3, r3, r2, lsl #29
vmsrFPSCR_nzcvqc, r3
```

when the MVE ACLE instead gives a different instruction sequence of:
```
< Rt is the *carry input >
VMRS Rs,FPSCR_nzcvqc
BFI Rs,Rt,#29,#1
VMSR FPSCR_nzcvqc,Rs
```

the bic + orr pair is slower and it's also wrong, because, if the
*carry input is greater than 1, then we risk overwriting the top two
bits of the FPSCR register (the N and Z flags).

This turned out to be a problem in the header file and the solution was
to simply add a `& 1x0u` to the `*carry` input: then the compiler knows
that we only care about the lowest bit and can optimise to a BFI.

Ok for trunk?

Thanks,
Stam Markianos-Wright

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vadcq_s32): Fix arithmetic.
(__arm_vadcq_u32): Likewise.
(__arm_vadcq_m_s32): Likewise.
(__arm_vadcq_m_u32): Likewise.
(__arm_vsbcq_s32): Likewise.
(__arm_vsbcq_u32): Likewise.
(__arm_vsbcq_m_s32): Likewise.
(__arm_vsbcq_m_u32): Likewise.
* config/arm/mve.md (get_fpscr_nzcvqc): Make unspec_volatile.

gcc/testsuite/ChangeLog:
* gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c: New.

(cherry picked from commit f1417d051be094ffbce228e11951f3e12e8fca1c)
---
 gcc/config/arm/arm_mve.h  | 16 ++---
 gcc/config/arm/mve.md |  2 +-
 .../arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c | 67 +++
 3 files changed, 76 insertions(+), 9 deletions(-)
 create mode 100644 
gcc/testsuite/gcc.target/arm/mve/mve_vadcq_vsbcq_fpscr_overwrite.c

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 82ceec2bbfc..6bf1794d2ff 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -16055,7 +16055,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vadcq_sv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16065,7 +16065,7 @@ __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_u32 (uint32x4_t __a, uint32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   uint32x4_t __res = __builtin_mve_vadcq_uv4si (__a, __b);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16075,7 +16075,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_m_s32 (int32x4_t __inactive, int32x4_t __a, int32x4_t __b, 
unsigned * __carry, mve_pred16_t __p)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   int32x4_t __res = __builtin_mve_vadcq_m_sv4si (__inactive, __a, __b, __p);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16085,7 +16085,7 @@ __extension__ extern __inline uint32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vadcq_m_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32x4_t __b, 
unsigned * __carry, mve_pred16_t __p)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry & 0x1u) << 29));
   uint32x4_t __res =  __builtin_mve_vadcq_m_uv4si (__inactive, __a, __b, __p);
   *__carry = (__builtin_arm_get_fpscr_nzcvqc () >> 29) & 0x1u;
   return __res;
@@ -16131,7 +16131,7 @@ __extension__ extern __inline int32x4_t
 __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vsbcq_s32 (int32x4_t __a, int32x4_t __b, unsigned * __carry)
 {
-  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | (*__carry << 29));
+  __builtin_arm_set_fpscr_nzcvqc((__builtin_arm_get_fpscr_nzcvqc () & 
~0x2000u) | ((*__carry &a

[committed gcc12 backport] [arm] complete vmsr/vmrs blank and case adjustments

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
From: Alexandre Oliva 

Back in September last year, some of the vmsr and vmrs patterns had an
extraneous blank removed, and the case of register names lowered, but
another instance remained, and so did a testcase.

for  gcc/ChangeLog

* config/arm/vfp.md (*thumb2_movsi_vfp): Drop blank after tab
after vmsr and vmrs, and lower the case of P0.

for  gcc/testsuite/ChangeLog

* gcc.target/arm/acle/cde-mve-full-assembly.c: Drop blank
after tab after vmsr, and lower the case of P0.
---
 gcc/config/arm/vfp.md |   4 +-
 .../arm/acle/cde-mve-full-assembly.c  | 264 +-
 2 files changed, 134 insertions(+), 134 deletions(-)

diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 932e4b7447e..7a430ef8d36 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -312,9 +312,9 @@ (define_insn "*thumb2_movsi_vfp"
 case 12: case 13:
   return output_move_vfp (operands);
 case 14:
-  return \"vmsr\\t P0, %1\";
+  return \"vmsr\\tp0, %1\";
 case 15:
-  return \"vmrs\\t %0, P0\";
+  return \"vmrs\\t%0, p0\";
 case 16:
   return \"mcr\\tp10, 7, %1, cr1, cr0, 0\\t @SET_FPSCR\";
 case 17:
diff --git a/gcc/testsuite/gcc.target/arm/acle/cde-mve-full-assembly.c 
b/gcc/testsuite/gcc.target/arm/acle/cde-mve-full-assembly.c
index 501cc84da10..e3e7f7ef3e5 100644
--- a/gcc/testsuite/gcc.target/arm/acle/cde-mve-full-assembly.c
+++ b/gcc/testsuite/gcc.target/arm/acle/cde-mve-full-assembly.c
@@ -567,80 +567,80 @@
contain back references).  */
 /*
 ** test_cde_vcx1q_mfloat16x8_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
 ** vpst
 ** vcx1t   p0, q0, #32
 ** bx  lr
 */
 /*
 ** test_cde_vcx1q_mfloat32x4_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
 ** vpst
 ** vcx1t   p0, q0, #32
 ** bx  lr
 */
 /*
 ** test_cde_vcx1q_muint8x16_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
 ** vpst
 ** vcx1t   p0, q0, #32
 ** bx  lr
 */
 /*
 ** test_cde_vcx1q_muint16x8_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
 ** vpst
 ** vcx1t   p0, q0, #32
 ** bx  lr
 */
 /*
 ** test_cde_vcx1q_muint32x4_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
 ** vpst
 ** vcx1t   p0, q0, #32
 ** bx  lr
 */
 /*
 ** test_cde_vcx1q_muint64x2_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
 ** vpst
 ** vcx1t   p0, q0, #32
 ** bx  lr
 */
 /*
 ** test_cde_vcx1q_mint8x16_tintint:
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
-** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
 P0, r2 @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ movhi)
+** (?:vldr\.64 d0, \.L[0-9]*\n\tvldr\.64   d1, \.L[0-9]*\+8|vmsr   
p0, r2  @ 

[committed gcc12 backport] arm: Add vorrq_n overloading into vorrq _Generic

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
We found this as part of the wider testsuite updates.

The applicable tests are authored by Andrea earlier in this patch series

Ok for trunk?

gcc/ChangeLog:

* config/arm/arm_mve.h (__arm_vorrq): Add _n variant.
---
 gcc/config/arm/arm_mve.h | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 6bf1794d2ff..39b3446617d 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35852,6 +35852,10 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: 
__arm_vorrq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, 
uint8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: 
__arm_vorrq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, 
uint16x8_t)), \
   int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: 
__arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, 
uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vorrq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vorrq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vorrq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vorrq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)), \
   int (*)[__ARM_mve_type_float16x8_t][__ARM_mve_type_float16x8_t]: 
__arm_vorrq_f16 (__ARM_mve_coerce(__p0, float16x8_t), __ARM_mve_coerce(__p1, 
float16x8_t)), \
   int (*)[__ARM_mve_type_float32x4_t][__ARM_mve_type_float32x4_t]: 
__arm_vorrq_f32 (__ARM_mve_coerce(__p0, float32x4_t), __ARM_mve_coerce(__p1, 
float32x4_t)));})
 
@@ -38637,7 +38641,11 @@ extern void *__ARM_undef;
   int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int32x4_t]: __arm_vorrq_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce(__p1, int32x4_t)), \
   int (*)[__ARM_mve_type_uint8x16_t][__ARM_mve_type_uint8x16_t]: 
__arm_vorrq_u8 (__ARM_mve_coerce(__p0, uint8x16_t), __ARM_mve_coerce(__p1, 
uint8x16_t)), \
   int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_uint16x8_t]: 
__arm_vorrq_u16 (__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce(__p1, 
uint16x8_t)), \
-  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: 
__arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, 
uint32x4_t)));})
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_uint32x4_t]: 
__arm_vorrq_u32 (__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce(__p1, 
uint32x4_t)), \
+  int (*)[__ARM_mve_type_uint16x8_t][__ARM_mve_type_int_n]: __arm_vorrq_n_u16 
(__ARM_mve_coerce(__p0, uint16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_uint32x4_t][__ARM_mve_type_int_n]: __arm_vorrq_n_u32 
(__ARM_mve_coerce(__p0, uint32x4_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int16x8_t][__ARM_mve_type_int_n]: __arm_vorrq_n_s16 
(__ARM_mve_coerce(__p0, int16x8_t), __ARM_mve_coerce3(p1, int)), \
+  int (*)[__ARM_mve_type_int32x4_t][__ARM_mve_type_int_n]: __arm_vorrq_n_s32 
(__ARM_mve_coerce(__p0, int32x4_t), __ARM_mve_coerce3(p1, int)));})
 
 #define __arm_vornq(p0,p1) ({ __typeof(p0) __p0 = (p0); \
   __typeof(p1) __p1 = (p1); \
-- 
2.25.1



[committed gcc12 backport] arm: Fix vstrwq* backend + testsuite

2023-05-18 Thread Stam Markianos-Wright via Gcc-patches
From: Andrea Corallo 

Hi all,

this patch fixes the vstrwq* MVE instrinsics failing to emit the
correct sequence of instruction due to a missing predicate. Also the
immediate range is fixed to be multiples of 2 up between [-252, 252].

Best Regards

  Andrea

gcc/ChangeLog:

* config/arm/constraints.md (mve_vldrd_immediate): Move it to
predicates.md.
(Ri): Move constraint definition from predicates.md.
(Rl): Define new constraint.
* config/arm/mve.md (mve_vstrwq_scatter_base_wb_p_v4si): Add
missing constraint.
(mve_vstrwq_scatter_base_wb_p_fv4sf): Add missing Up constraint
for op 1, use mve_vstrw_immediate predicate and Rl constraint for
op 2. Fix asm output spacing.
(mve_vstrdq_scatter_base_wb_p_v2di): Add missing constraint.
* config/arm/predicates.md (Ri) Move constraint to constraints.md
(mve_vldrd_immediate): Move it from
constraints.md.
(mve_vstrw_immediate): New predicate.

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vstrwq_f32.c: Use
check-function-bodies instead of scan-assembler checks.  Use
extern "C" for C++ testing.
* gcc.target/arm/mve/intrinsics/vstrwq_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_p_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_base_wb_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_p_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_offset_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_f32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_p_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_s32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_scatter_shifted_offset_u32.c: 
Likewise.
* gcc.target/arm/mve/intrinsics/vstrwq_u32.c: Likewise.
---
 gcc/config/arm/constraints.md | 20 --
 gcc/config/arm/mve.md | 10 ++---
 gcc/config/arm/predicates.md  | 14 +++
 .../arm/mve/intrinsics/vstrwq_f32.c   | 32 ---
 .../arm/mve/intrinsics/vstrwq_p_f32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_p_s32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_p_u32.c | 40 ---
 .../arm/mve/intrinsics/vstrwq_s32.c   | 32 ---
 .../mve/intrinsics/vstrwq_scatter_base_f32.c  | 28 +++--
 .../intrinsics/vstrwq_scatter_base_p_f32.c| 36 +++--
 .../intrinsics/vstrwq_scatter_base_p_s32.c| 36 +++--
 .../intrinsics/vstrwq_scatter_base_p_u32.c| 36 +++--
 .../mve/intrinsics/vstrwq_scatter_base_s32.c  | 28 +++--
 .../mve/intrinsics/vstrwq_scatter_base_u32.c  | 28 +++--
 .../intrinsics/vstrwq_scatter_base_wb_f32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_f32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_s32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_p_u32.c | 40 ---
 .../intrinsics/vstrwq_scatter_base_wb_s32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_base_wb_u32.c   | 32 ---
 .../intrinsics/vstrwq_scatter_offset_f32.c| 32 ---
 .../intrinsics/vstrwq_scatter_offset_p_f32.c  | 40 ---
 

[PATCH 2/2 v2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2023-01-11 Thread Stam Markianos-Wright via Gcc-patches

-  Respin of the below patch -

In this 2/2 patch, from v1 to v2 I have:

* Removed the modification the interface of the doloop_end target-insn
(so I no longer need to touch any other target backends)


* Added more modes to `arm_get_required_vpr_reg` to make it flexible
between searching: all operands/only input arguments/only outputs. Also
added helpers:
`arm_get_required_vpr_reg_ret_val`
`arm_get_required_vpr_reg_param`

* Added support for the use of other VPR predicate values within
a dlstp/letp loop, as long as they don't originate from the vctp-generated
VPR value. Also changed `arm_mve_get_loop_unique_vctp` to the simpler
`arm_mve_get_loop_vctp` since now we can support other VCTP insns
within the loop.

* Added support for loops of the form:
     int num_of_iters = (num_of_elem + num_of_lanes - 1) / num_of_lanes
     for (i = 0; i < num_of_iters; i++)
       {
     p = vctp (num_of_elem)
     n -= num_of_lanes;
       }
   to be tranformed into dlstp/letp loops.

* Changed the VCTP look-ahead for SIGN_EXTEND and SUBREG insns to
use df def/use chains instead of `next_nonnote_nondebug_insn_bb`.

* Added support for using unpredicated (but predicable) insns
within the dlstp/letp loop. These need to meet some specific conditions,
because they _will_ become implicitly tail predicated by the dlstp/letp
transformation.

* Added a df chain check to any other instructions to make sure that they
don't USE the VCTP-generated VPR value.

* Added testing of all these various edge cases.


Original email with updated Changelog at the end:



Hi all,

This is the 2/2 patch that contains the functional changes needed
for MVE Tail Predicated Low Overhead Loops.  See my previous email
for a general introduction of MVE LOLs.

This support is added through the already existing loop-doloop
mechanisms that are used for non-MVE dls/le looping.

Changes are:

1) Relax the loop-doloop mechanism in the mid-end to allow for
   decrement numbers other that -1 and for `count` to be an
   rtx containing the number of elements to be processed, rather
   than an expression for calculating the number of iterations.
2) Add a `allow_elementwise_doloop` target hook. This allows the
   target backend to manipulate the iteration count as it needs:
   in our case to change it from a pre-calculation of the number
   of iterations to the number of elements to be processed.
3) The doloop_end target-insn now had an additional parameter:
   the `count` (note: this is before it gets modified to just be
   the number of elements), so that the decrement value is
   extracted from that parameter.

And many things in the backend to implement the above optimisation:

4)  Appropriate changes to the define_expand of doloop_end and new
    patterns for dlstp and letp.
5) `arm_attempt_dlstp_transform`: (called from the define_expand of
    doloop_end) this function checks for the loop's suitability for
    dlstp/letp transformation and then implements it, if possible.
6) `arm_mve_get_loop_unique_vctp`: A function that loops through
    the loop contents and returns the vctp VPR-genereting operation
    within the loop, if it is unique and there is exclusively one
    vctp within the loop.
7) A couple of utility functions: `arm_mve_get_vctp_lanes` to map
   from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg`
   to check an insn to see if it requires the VPR or not.

No regressions on arm-none-eabi with various targets and on
aarch64-none-elf. Thoughts on getting this into trunk?

Thank you,
Stam Markianos-Wright

gcc/ChangeLog:

    * config/arm/arm-protos.h (arm_attempt_dlstp_transform): New.
    * config/arm/arm.cc (TARGET_ALLOW_ELEMENTWISE_DOLOOP): New.
    (arm_mve_get_vctp_lanes): New.
    (arm_get_required_vpr_reg): New.
    (arm_get_required_vpr_reg_ret_val): New.
    (arm_get_required_vpr_reg_param): New.
    (arm_mve_get_loop_vctp): New.
    (arm_attempt_dlstp_transform): New.
    (arm_allow_elementwise_doloop): New.
    * config/arm/iterators.md (DLSTP): New.
    (mode1): Add DLSTP mappings.
    * config/arm/mve.md (*predicated_doloop_end_internal): New.
    (dlstp_insn): New.
    * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
    * config/arm/unspecs.md: New unspecs.
    * tm.texi: Document new hook.
    * tm.texi.in: Likewise.
    * loop-doloop.cc (doloop_condition_get): Relax conditions.
    (doloop_optimize): Add support for elementwise LoLs.
    * target.def (allow_elementwise_doloop): New hook.
    * targhooks.cc (default_allow_elementwise_doloop): New.
    * targhooks.h (default_allow_elementwise_doloop): New.

gcc/testsuite/ChangeLog:

    * gcc.target/arm/lob.h: Update framework.
    * gcc.target/arm/lob1.c: Likewise.
    * gcc.target/arm/lob6.c: Likewise.
    * gcc.target/arm/dlstp-int16x8.c: New test.
    * gcc.target/arm/dlstp-int32x4.c: New test.
    * gcc.target/arm/dl

[PING][PATCH] arm: Split up MVE _Generic associations to prevent type clashes [PR107515]

2023-01-10 Thread Stam Markianos-Wright via Gcc-patches

Hi all,

With these previous patches:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606586.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606587.html
we enabled the MVE overloaded _Generic associations to handle more
scalar types, however at PR 107515 we found a new regression that
wasn't detected in our testing:

With glibc's `posix/types.h`:
```
typedef signed int __int32_t;
...
typedef __int32_t int32_t;
```
We would get a `error: '_Generic' specifies two compatible types`
from `__ARM_mve_coerce3` because of `type: param`, when `type` is
`int` and `int32_t: param` both being the same under the hood.

The same did not happen with Newlib's header `sys/_stdint.h`:
```
typedef long int __int32_t;
...
typedef __int32_t int32_t ;
```
which worked fine, because it uses `long int`.

The same could feasibly happen in `__ARM_mve_coerce2` between
`__fp16` and `float16_t`.

The solution here is to break the _Generic down, so that the similar
types don't appear at the same level, as is done in `__ARM_mve_typeid`.

Ok for trunk?

Thanks,
Stam Markianos-Wright

gcc/ChangeLog:
 PR target/96795
 PR target/107515
 * config/arm/arm_mve.h (__ARM_mve_coerce2): Split types.
 (__ARM_mve_coerce3): Likewise.

gcc/testsuite/ChangeLog:
 PR target/96795
 PR target/107515
 *
gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c: New test.
 *
gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c: New test.


=== Inline Ctrl+C, Ctrl+V or patch ===

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index
09167ec118ed3310c5077145e119196f29d83cac..70003653db65736fcfd019e83d9f18153be650dc
100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35659,9 +35659,9 @@ extern void *__ARM_undef;
  #define __ARM_mve_coerce1(param, type) \
  _Generic(param, type: param, const type: param, default: *(type
*)__ARM_undef)
  #define __ARM_mve_coerce2(param, type) \
-_Generic(param, type: param, float16_t: param, float32_t: param,
default: *(type *)__ARM_undef)
+_Generic(param, type: param, __fp16: param, default: _Generic
(param, _Float16: param, float16_t: param, float32_t: param, default:
*(type *)__ARM_undef))
  #define __ARM_mve_coerce3(param, type) \
-_Generic(param, type: param, int8_t: param, int16_t: param,
int32_t: param, int64_t: param, uint8_t: param, uint16_t: param,
uint32_t: param, uint64_t: param, default: *(type *)__ARM_undef)
+_Generic(param, type: param, default: _Generic (param, int8_t:
param, int16_t: param, int32_t: param, int64_t: param, uint8_t: param,
uint16_t: param, uint32_t: param, uint64_t: param, default: *(type
*)__ARM_undef))

  #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */

diff --git
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c
new file mode 100644
index
..427dcacb5ff59b53d5eab1f1582ef6460da3f2f3
--- /dev/null
+++
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c
@@ -0,0 +1,65 @@
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O2 -Wno-pedantic -Wno-long-long" } */
+#include "arm_mve.h"
+
+float f1;
+double f2;
+float16_t f3;
+float32_t f4;
+__fp16 f5;
+_Float16 f6;
+
+int i1;
+short i2;
+long i3;
+long long i4;
+int8_t i5;
+int16_t i6;
+int32_t i7;
+int64_t i8;
+
+const int ci1;
+const short ci2;
+const long ci3;
+const long long ci4;
+const int8_t ci5;
+const int16_t ci6;
+const int32_t ci7;
+const int64_t ci8;
+
+float16x8_t floatvec;
+int16x8_t intvec;
+
+void test(void)
+{
+/* Test a few different supported ways of passing an int value.  The
+intrinsic vmulq was chosen arbitrarily, but it is representative of
+all intrinsics that take a non-const scalar value.  */
+intvec = vmulq(intvec, 2);
+intvec = vmulq(intvec, (int32_t) 2);
+intvec = vmulq(intvec, (short) 2);
+intvec = vmulq(intvec, i1);
+intvec = vmulq(intvec, i2);
+intvec = vmulq(intvec, i3);
+intvec = vmulq(intvec, i4);
+intvec = vmulq(intvec, i5);
+intvec = vmulq(intvec, i6);
+intvec = vmulq(intvec, i7);
+intvec = vmulq(intvec, i8);
+
+/* Test a few different supported ways of passing a float value.  */
+floatvec = vmulq(floatvec, 0.5);
+floatvec = vmulq(floatvec, 0.5f);
+floatvec = vmulq(floatvec, (__fp16) 0.5);
+floatvec = vmulq(floatvec, f1);
+floatvec = vmulq(floatvec, f2);
+floatvec = vmulq(floatvec, f3);
+floatvec = vmulq(floatvec, f4);
+floatvec = vmulq(floatvec, f5);
+floatvec = vmulq(floatvec, f6);
+floatvec = vmulq(floatvec, 0.15f16);
+floatvec = vmulq(floatvec, (_Float16) 0.15);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end 

Re: [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714]

2023-01-10 Thread Stam Markianos-Wright via Gcc-patches



On 12/12/2022 13:42, Kyrylo Tkachov wrote:

Hi Stam,


-Original Message-
From: Stam Markianos-Wright 
Sent: Friday, December 9, 2022 1:32 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw
; Ramana Radhakrishnan
; ni...@redhat.com
Subject: [PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions
[PR107714]

Hi all,

In the M-Class Arm-ARM:

https://developer.arm.com/documentation/ddi0553/bu/?lang=en

these MVE instructions only have '!' writeback variant and at:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

we found that the Um constraint would also allow through a
register offset writeback, resulting in an assembler error.

Here I have added a new constraint and predicate for these
instructions, which (uniquely, AFAICT), only support a `!` writeback
increment by the data size (inside the compiler this is a POST_INC).

No regressions in arm-none-eabi with MVE and MVE.FP.

Ok for trunk, and backport to GCC11 and GCC12 (testing pending)?

Thanks,
Stam

gcc/ChangeLog:
      PR target/107714
      * config/arm/arm-protos.h (mve_struct_mem_operand): New
protoype.
      * config/arm/arm.cc (mve_struct_mem_operand): New function.
      * config/arm/constraints.md (Ug): New constraint.
      * config/arm/mve.md (mve_vst4q): Change constraint.
      (mve_vst2q): Likewise.
      (mve_vld4q): Likewise.
      (mve_vld2q): Likewise.
      * config/arm/predicates.md (mve_struct_operand): New predicate.

gcc/testsuite/ChangeLog:
      PR target/107714
      * gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test.


diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 
e5a36d29c7135943b9bb5ea396f70e2e4beb1e4a..8908b7f5b15ce150685868e78e75280bf32053f1
 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -474,6 +474,12 @@
   (and (match_code "mem")
(match_test "TARGET_32BIT && arm_coproc_mem_operand (op, FALSE)")))
  
+(define_memory_constraint "Ug"

+ "@internal
+  In Thumb-2 state a valid MVE struct load/store address."
+ (and (match_code "mem")
+  (match_test "TARGET_HAVE_MVE && mve_struct_mem_operand (op)")))
+

I think you can define the constraints in terms of the new mve_struct_operand predicate 
directly (see how we define the "Ua" constraint, for example).
Ok if that works (and testing passes of course).


Done as discussed and re-tested on all branches. Pushed as:

4269a6567eb991e6838f40bda5be9e3a7972530c to trunk

25edc76f2afba0b4eaf22174d42de042a6969dbe to gcc-12

08842ad274f5e2630994f7c6e70b2d31768107ea to gcc-11

Thank you!
Stam



Thanks,
Kyrill



[PATCH] Fix memory constraint on MVE v[ld/st][2/4] instructions [PR107714]

2022-12-09 Thread Stam Markianos-Wright via Gcc-patches

Hi all,

In the M-Class Arm-ARM:

https://developer.arm.com/documentation/ddi0553/bu/?lang=en

these MVE instructions only have '!' writeback variant and at:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107714

we found that the Um constraint would also allow through a
register offset writeback, resulting in an assembler error.

Here I have added a new constraint and predicate for these
instructions, which (uniquely, AFAICT), only support a `!` writeback
increment by the data size (inside the compiler this is a POST_INC).

No regressions in arm-none-eabi with MVE and MVE.FP.

Ok for trunk, and backport to GCC11 and GCC12 (testing pending)?

Thanks,
Stam

gcc/ChangeLog:
    PR target/107714
    * config/arm/arm-protos.h (mve_struct_mem_operand): New protoype.
    * config/arm/arm.cc (mve_struct_mem_operand): New function.
    * config/arm/constraints.md (Ug): New constraint.
    * config/arm/mve.md (mve_vst4q): Change constraint.
    (mve_vst2q): Likewise.
    (mve_vld4q): Likewise.
    (mve_vld2q): Likewise.
    * config/arm/predicates.md (mve_struct_operand): New predicate.

gcc/testsuite/ChangeLog:
    PR target/107714
    * gcc.target/arm/mve/intrinsics/vldst24q_reg_offset.c: New test.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 550272facd12e60a49bf8a3b20f811cc13765b3a..8ea38118b05769bd6fcb1d22d902a50979cfd953 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -122,6 +122,7 @@ extern int arm_coproc_mem_operand_wb (rtx, int);
 extern int neon_vector_mem_operand (rtx, int, bool);
 extern int mve_vector_mem_operand (machine_mode, rtx, bool);
 extern int neon_struct_mem_operand (rtx);
+extern int mve_struct_mem_operand (rtx);
 
 extern rtx *neon_vcmla_lane_prepare_operands (rtx *);
 
diff --git a/gcc/config/arm/arm.cc b/gcc/config/arm/arm.cc
index b587561eebea921bdc68016922d37948e2870ce2..31f2a7b9d4688dde69d1435e24cf885e8544be71 100644
--- a/gcc/config/arm/arm.cc
+++ b/gcc/config/arm/arm.cc
@@ -13737,6 +13737,24 @@ neon_vector_mem_operand (rtx op, int type, bool strict)
   return FALSE;
 }
 
+/* Return TRUE if OP is a mem suitable for loading/storing an MVE struct
+   type.  */
+int
+mve_struct_mem_operand (rtx op)
+{
+  rtx ind = XEXP (op, 0);
+
+  /* Match: (mem (reg)).  */
+  if (REG_P (ind))
+return arm_address_register_rtx_p (ind, 0);
+
+  /* Allow only post-increment by the mode size.  */
+  if (GET_CODE (ind) == POST_INC)
+return arm_address_register_rtx_p (XEXP (ind, 0), 0);
+
+  return FALSE;
+}
+
 /* Return TRUE if OP is a mem suitable for loading/storing a Neon struct
type.  */
 int
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index e5a36d29c7135943b9bb5ea396f70e2e4beb1e4a..8908b7f5b15ce150685868e78e75280bf32053f1 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -474,6 +474,12 @@
  (and (match_code "mem")
   (match_test "TARGET_32BIT && arm_coproc_mem_operand (op, FALSE)")))
 
+(define_memory_constraint "Ug"
+ "@internal
+  In Thumb-2 state a valid MVE struct load/store address."
+ (and (match_code "mem")
+  (match_test "TARGET_HAVE_MVE && mve_struct_mem_operand (op)")))
+
 (define_memory_constraint "Uj"
  "@internal
   In ARM/Thumb-2 state a VFP load/store address that supports writeback
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index b5e6da4b1335818a3e8815de59850e845a2d0400..847bc032afa2c3977c05725562a14940beb282d4 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -99,7 +99,7 @@
 ;; [vst4q])
 ;;
 (define_insn "mve_vst4q"
-  [(set (match_operand:XI 0 "neon_struct_operand" "=Um")
+  [(set (match_operand:XI 0 "mve_struct_operand" "=Ug")
 	(unspec:XI [(match_operand:XI 1 "s_register_operand" "w")
 		(unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
 	 VST4Q))
@@ -9959,7 +9959,7 @@
 ;; [vst2q])
 ;;
 (define_insn "mve_vst2q"
-  [(set (match_operand:OI 0 "neon_struct_operand" "=Um")
+  [(set (match_operand:OI 0 "mve_struct_operand" "=Ug")
 	(unspec:OI [(match_operand:OI 1 "s_register_operand" "w")
 		(unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
 	 VST2Q))
@@ -9988,7 +9988,7 @@
 ;;
 (define_insn "mve_vld2q"
   [(set (match_operand:OI 0 "s_register_operand" "=w")
-	(unspec:OI [(match_operand:OI 1 "neon_struct_operand" "Um")
+	(unspec:OI [(match_operand:OI 1 "mve_struct_operand" "Ug")
 		(unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
 	 VLD2Q))
   ]
@@ -10016,7 +10016,7 @@
 ;;
 (define_insn "mve_vld4q"
   [(set (match_operand:XI 0 "s_register_operand" "=w")
-	(unspec:XI [(match_operand:XI 1 "neon_struct_operand" "Um")
+	(unspec:XI [(match_operand:XI 1 "mve_struct_operand" "Ug")
 		(unspec:MVE_VLD_ST [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
 	 VLD4Q))
   ]
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index aab5a91ad4ddc6a7a02611d05442d6de63841a7c..67f2fdb4f8f607ceb50871e1bc17dbdb9b987c2c 100644
--- 

[PATCH] arm: Split up MVE _Generic associations to prevent type clashes [PR107515]

2022-12-01 Thread Stam Markianos-Wright via Gcc-patches

Hi all,

With these previous patches:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606586.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606587.html
we enabled the MVE overloaded _Generic associations to handle more
scalar types, however at PR 107515 we found a new regression that
wasn't detected in our testing:

With glibc's `posix/types.h`:
```
typedef signed int __int32_t;
...
typedef __int32_t int32_t;
```
We would get a `error: '_Generic' specifies two compatible types`
from `__ARM_mve_coerce3` because of `type: param`, when `type` is
`int` and `int32_t: param` both being the same under the hood.

The same did not happen with Newlib's header `sys/_stdint.h`:
```
typedef long int __int32_t;
...
typedef __int32_t int32_t ;
```
which worked fine, because it uses `long int`.

The same could feasibly happen in `__ARM_mve_coerce2` between
`__fp16` and `float16_t`.

The solution here is to break the _Generic down, so that the similar
types don't appear at the same level, as is done in `__ARM_mve_typeid`.

Ok for trunk?

Thanks,
Stam Markianos-Wright

gcc/ChangeLog:
    PR target/96795
    PR target/107515
    * config/arm/arm_mve.h (__ARM_mve_coerce2): Split types.
    (__ARM_mve_coerce3): Likewise.

gcc/testsuite/ChangeLog:
    PR target/96795
    PR target/107515
    * 
gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c: New test.
    * 
gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-int.c: New test.



=== Inline Ctrl+C, Ctrl+V or patch ===

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 
09167ec118ed3310c5077145e119196f29d83cac..70003653db65736fcfd019e83d9f18153be650dc 
100644

--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35659,9 +35659,9 @@ extern void *__ARM_undef;
 #define __ARM_mve_coerce1(param, type) \
 _Generic(param, type: param, const type: param, default: *(type 
*)__ARM_undef)

 #define __ARM_mve_coerce2(param, type) \
-    _Generic(param, type: param, float16_t: param, float32_t: param, 
default: *(type *)__ARM_undef)
+    _Generic(param, type: param, __fp16: param, default: _Generic 
(param, _Float16: param, float16_t: param, float32_t: param, default: 
*(type *)__ARM_undef))

 #define __ARM_mve_coerce3(param, type) \
-    _Generic(param, type: param, int8_t: param, int16_t: param, 
int32_t: param, int64_t: param, uint8_t: param, uint16_t: param, 
uint32_t: param, uint64_t: param, default: *(type *)__ARM_undef)
+    _Generic(param, type: param, default: _Generic (param, int8_t: 
param, int16_t: param, int32_t: param, int64_t: param, uint8_t: param, 
uint16_t: param, uint32_t: param, uint64_t: param, default: *(type 
*)__ARM_undef))


 #if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */

diff --git 
a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c

new file mode 100644
index 
..427dcacb5ff59b53d5eab1f1582ef6460da3f2f3

--- /dev/null
+++ 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_intrinsic_type_overloads-fp.c

@@ -0,0 +1,65 @@
+/* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
+/* { dg-add-options arm_v8_1m_mve_fp } */
+/* { dg-additional-options "-O2 -Wno-pedantic -Wno-long-long" } */
+#include "arm_mve.h"
+
+float f1;
+double f2;
+float16_t f3;
+float32_t f4;
+__fp16 f5;
+_Float16 f6;
+
+int i1;
+short i2;
+long i3;
+long long i4;
+int8_t i5;
+int16_t i6;
+int32_t i7;
+int64_t i8;
+
+const int ci1;
+const short ci2;
+const long ci3;
+const long long ci4;
+const int8_t ci5;
+const int16_t ci6;
+const int32_t ci7;
+const int64_t ci8;
+
+float16x8_t floatvec;
+int16x8_t intvec;
+
+void test(void)
+{
+    /* Test a few different supported ways of passing an int value.  The
+    intrinsic vmulq was chosen arbitrarily, but it is representative of
+    all intrinsics that take a non-const scalar value.  */
+    intvec = vmulq(intvec, 2);
+    intvec = vmulq(intvec, (int32_t) 2);
+    intvec = vmulq(intvec, (short) 2);
+    intvec = vmulq(intvec, i1);
+    intvec = vmulq(intvec, i2);
+    intvec = vmulq(intvec, i3);
+    intvec = vmulq(intvec, i4);
+    intvec = vmulq(intvec, i5);
+    intvec = vmulq(intvec, i6);
+    intvec = vmulq(intvec, i7);
+    intvec = vmulq(intvec, i8);
+
+    /* Test a few different supported ways of passing a float value.  */
+    floatvec = vmulq(floatvec, 0.5);
+    floatvec = vmulq(floatvec, 0.5f);
+    floatvec = vmulq(floatvec, (__fp16) 0.5);
+    floatvec = vmulq(floatvec, f1);
+    floatvec = vmulq(floatvec, f2);
+    floatvec = vmulq(floatvec, f3);
+    floatvec = vmulq(floatvec, f4);
+    floatvec = vmulq(floatvec, f5);
+    floatvec = vmulq(floatvec, f6);
+    floatvec = vmulq(floatvec, 0.15f16);
+    floatvec = vmulq(floatvec, (_Float16) 0.15);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No

[PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2022-11-28 Thread Stam Markianos-Wright via Gcc-patches


On 11/15/22 15:51, Andre Vieira (lists) wrote:


On 11/11/2022 17:40, Stam Markianos-Wright via Gcc-patches wrote:

Hi all,

This is the 2/2 patch that contains the functional changes needed
for MVE Tail Predicated Low Overhead Loops.  See my previous email
for a general introduction of MVE LOLs.

This support is added through the already existing loop-doloop
mechanisms that are used for non-MVE dls/le looping.

Changes are:

1) Relax the loop-doloop mechanism in the mid-end to allow for
   decrement numbers other that -1 and for `count` to be an
   rtx containing the number of elements to be processed, rather
   than an expression for calculating the number of iterations.
2) Add a `allow_elementwise_doloop` target hook. This allows the
   target backend to manipulate the iteration count as it needs:
   in our case to change it from a pre-calculation of the number
   of iterations to the number of elements to be processed.
3) The doloop_end target-insn now had an additional parameter:
   the `count` (note: this is before it gets modified to just be
   the number of elements), so that the decrement value is
   extracted from that parameter.

And many things in the backend to implement the above optimisation:

4)  Appropriate changes to the define_expand of doloop_end and new
    patterns for dlstp and letp.
5) `arm_attempt_dlstp_transform`: (called from the define_expand of
    doloop_end) this function checks for the loop's suitability for
    dlstp/letp transformation and then implements it, if possible.
6) `arm_mve_get_loop_unique_vctp`: A function that loops through
    the loop contents and returns the vctp VPR-genereting operation
    within the loop, if it is unique and there is exclusively one
    vctp within the loop.
7) A couple of utility functions: `arm_mve_get_vctp_lanes` to map
   from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg`
   to check an insn to see if it requires the VPR or not.

No regressions on arm-none-eabi with various targets and on
aarch64-none-elf. Thoughts on getting this into trunk?

Thank you,
Stam Markianos-Wright

gcc/ChangeLog:

    * config/aarch64/aarch64.md: Add extra doloop_end arg.
    * config/arm/arm-protos.h (arm_attempt_dlstp_transform): New.
    * config/arm/arm.cc (TARGET_ALLOW_ELEMENTWISE_DOLOOP): New.
    (arm_mve_get_vctp_lanes): New.
    (arm_get_required_vpr_reg): New.
    (arm_mve_get_loop_unique_vctp): New.
    (arm_attempt_dlstp_transform): New.
    (arm_allow_elementwise_doloop): New.
    * config/arm/iterators.md:
    * config/arm/mve.md (*predicated_doloop_end_internal): New.
    (dlstp_insn): New.
    * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
    * config/arm/unspecs.md: New unspecs.
    * config/ia64/ia64.md: Add extra doloop_end arg.
    * config/pru/pru.md: Add extra doloop_end arg.
    * config/rs6000/rs6000.md: Add extra doloop_end arg.
    * config/s390/s390.md: Add extra doloop_end arg.
    * config/v850/v850.md: Add extra doloop_end arg.
    * doc/tm.texi: Document new hook.
    * doc/tm.texi.in: Likewise.
    * loop-doloop.cc (doloop_condition_get): Relax conditions.
    (doloop_optimize): Add support for elementwise LoLs.
    * target-insns.def (doloop_end): Add extra arg.
    * target.def (allow_elementwise_doloop): New hook.
    * targhooks.cc (default_allow_elementwise_doloop): New.
    * targhooks.h (default_allow_elementwise_doloop): New.

gcc/testsuite/ChangeLog:

    * gcc.target/arm/lob.h: Update framework.
    * gcc.target/arm/lob1.c: Likewise.
    * gcc.target/arm/lob6.c: Likewise.
    * gcc.target/arm/dlstp-int16x8.c: New test.
    * gcc.target/arm/dlstp-int32x4.c: New test.
    * gcc.target/arm/dlstp-int64x2.c: New test.
    * gcc.target/arm/dlstp-int8x16.c: New test.


### Inline copy of patch ###

diff --git a/gcc/config/aarch64/aarch64.md 
b/gcc/config/aarch64/aarch64.md
index 
f2e3d905dbbeb2949f2947f5cfd68208c94c9272..7a6d24a80060b4a704a481ccd1a32d96e7b0f369 
100644

--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7366,7 +7366,8 @@
 ;; knows what to generate.
 (define_expand "doloop_end"
   [(use (match_operand 0 "" ""))  ; loop pseudo
-   (use (match_operand 1 "" ""))] ; label
+   (use (match_operand 1 "" ""))  ; label
+   (use (match_operand 2 "" ""))] ; decrement constant
   "optimize > 0 && flag_modulo_sched"
 {
   rtx s0;
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
550272facd12e60a49bf8a3b20f811cc13765b3a..7684620f0f4d161dd9e9ad2d70308021ec3d3d34 
100644

--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -63,7 +63,7 @@ extern void arm_decompose_di_binop (rtx, rtx, rtx 
*, rtx *, rtx *, rtx *);

 extern bool arm_q_bit_acce

[PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]

2022-11-21 Thread Stam Markianos-Wright via Gcc-patches



On 11/20/22 22:49, Ramana Radhakrishnan wrote:

On Fri, Nov 18, 2022 at 4:59 PM Kyrylo Tkachov via Gcc-patches
 wrote:




-Original Message-
From: Andrea Corallo 
Sent: Thursday, November 17, 2022 4:38 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw
; Stam Markianos-Wright 
Subject: [PATCH 15/35] arm: Explicitly specify other float types for _Generic
overloading [PR107515]

From: Stam Markianos-Wright 

This patch adds explicit references to other float types
to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

gcc/ChangeLog:
 PR 107515
 * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.

Argh, I'm looking forward to when we move away from this _Generic business, but 
for now ok.
The ChangeLog should say "PR target/107515" for the git hook to recognize it 
IIRC.

and the PR is against 11.x - is there a plan to back port this and
dependent patches to relevant branches ?


Hi Ramana!


Assuming maintainer approval, we do hope to backport.

And yes, it would have to be the whole patch series, so that we carry

over all the improved testing, as well (and we'll have to run it ofc).


Does that sound Ok?

Thank you,

Stam




Ramana


Thanks,
Kyrill


---
  gcc/config/arm/arm_mve.h | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index fd1876b57a0..f6b42dc3fab 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35582,6 +35582,9 @@ enum {
   short: __ARM_mve_type_int_n, \
   int: __ARM_mve_type_int_n, \
   long: __ARM_mve_type_int_n, \
+ _Float16: __ARM_mve_type_fp_n, \
+ __fp16: __ARM_mve_type_fp_n, \
+ float: __ARM_mve_type_fp_n, \
   double: __ARM_mve_type_fp_n, \
   long long: __ARM_mve_type_int_n, \
   unsigned char: __ARM_mve_type_int_n, \
--
2.25.1


Re: [PATCH 15/35] arm: Explicitly specify other float types for _Generic overloading [PR107515]

2022-11-21 Thread Stam Markianos-Wright via Gcc-patches



On 11/18/22 16:58, Kyrylo Tkachov wrote:



-Original Message-
From: Andrea Corallo 
Sent: Thursday, November 17, 2022 4:38 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw
; Stam Markianos-Wright 
Subject: [PATCH 15/35] arm: Explicitly specify other float types for _Generic
overloading [PR107515]

From: Stam Markianos-Wright 

This patch adds explicit references to other float types
to __ARM_mve_typeid in arm_mve.h.  Resolves PR 107515:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107515

gcc/ChangeLog:
 PR 107515
 * config/arm/arm_mve.h (__ARM_mve_typeid): Add float types.

Argh, I'm looking forward to when we move away from this _Generic business, but 
for now ok.

Oh we all are ;)

The ChangeLog should say "PR target/107515" for the git hook to recognize it 
IIRC.


Agh, thanks for spotting this! Will change and push it with the rest of 
the patch series when ready/


Thank you,

Stam



Thanks,
Kyrill


---
  gcc/config/arm/arm_mve.h | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index fd1876b57a0..f6b42dc3fab 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -35582,6 +35582,9 @@ enum {
   short: __ARM_mve_type_int_n, \
   int: __ARM_mve_type_int_n, \
   long: __ARM_mve_type_int_n, \
+ _Float16: __ARM_mve_type_fp_n, \
+ __fp16: __ARM_mve_type_fp_n, \
+ float: __ARM_mve_type_fp_n, \
   double: __ARM_mve_type_fp_n, \
   long long: __ARM_mve_type_int_n, \
   unsigned char: __ARM_mve_type_int_n, \
--
2.25.1


Re: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n intrinsic

2022-11-21 Thread Stam Markianos-Wright via Gcc-patches



On 11/18/22 16:49, Kyrylo Tkachov wrote:



-Original Message-
From: Andrea Corallo 
Sent: Thursday, November 17, 2022 4:38 PM
To: gcc-patches@gcc.gnu.org
Cc: Kyrylo Tkachov ; Richard Earnshaw
; Stam Markianos-Wright 
Subject: [PATCH 13/35] arm: further fix overloading of MVE vaddq[_m]_n
intrinsic

From: Stam Markianos-Wright 

It was observed that in tests `vaddq_m_n_[s/u][8/16/32].c`, the _Generic
resolution would fall back to the `__ARM_undef` failure state.

This is a regression since `dc39db873670bea8d8e655444387ceaa53a01a79`
and
`6bd4ce64eb48a72eca300cb52773e6101d646004`, but it previously wasn't
identified, because the tests were not checking for this kind of failure.

The above commits changed the definitions of the intrinsics from using
`[u]int[8/16/32]_t` types for the scalar argument to using `int`. This
allowed `int` to be supported in user code through the overloaded
`#defines`, but seems to have broken the `[u]int[8/16/32]_t` types

The solution implemented by this patch is to explicitly use a new
_Generic mapping from all the `[u]int[8/16/32]_t` types for int. With this
change, both `int` and `[u]int[8/16/32]_t` parameters are supported from
user code and are handled by the overloading mechanism correctly.

gcc/ChangeLog:

 * config/arm/arm_mve.h (__arm_vaddq_m_n_s8): Change types.
 (__arm_vaddq_m_n_s32): Likewise.
 (__arm_vaddq_m_n_s16): Likewise.
 (__arm_vaddq_m_n_u8): Likewise.
 (__arm_vaddq_m_n_u32): Likewise.
 (__arm_vaddq_m_n_u16): Likewise.
 (__arm_vaddq_m): Fix Overloading.
 (__ARM_mve_coerce3): New.

Ok. Wasn't there a PR in Bugzilla about this that we can cite in the commit 
message?
Thanks,
Kyrill


Thanks for the review! Ah yes, there was this one:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96795

which was closed last time around.
It does make sense to add it, though, so we'll do that.

Thanks!




---
  gcc/config/arm/arm_mve.h | 78 
  1 file changed, 40 insertions(+), 38 deletions(-)

diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index 684f997520f..951dc25374b 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -9675,42 +9675,42 @@ __arm_vabdq_m_u16 (uint16x8_t __inactive,
uint16x8_t __a, uint16x8_t __b, mve_pr

  __extension__ extern __inline int8x16_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m_n_s8 (int8x16_t __inactive, int8x16_t __a, int8_t __b,
mve_pred16_t __p)
  {
return __builtin_mve_vaddq_m_n_sv16qi (__inactive, __a, __b, __p);
  }

  __extension__ extern __inline int32x4_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m_n_s32 (int32x4_t __inactive, int32x4_t __a, int32_t __b,
mve_pred16_t __p)
  {
return __builtin_mve_vaddq_m_n_sv4si (__inactive, __a, __b, __p);
  }

  __extension__ extern __inline int16x8_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m_n_s16 (int16x8_t __inactive, int16x8_t __a, int16_t __b,
mve_pred16_t __p)
  {
return __builtin_mve_vaddq_m_n_sv8hi (__inactive, __a, __b, __p);
  }

  __extension__ extern __inline uint8x16_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m_n_u8 (uint8x16_t __inactive, uint8x16_t __a, uint8_t __b,
mve_pred16_t __p)
  {
return __builtin_mve_vaddq_m_n_uv16qi (__inactive, __a, __b, __p);
  }

  __extension__ extern __inline uint32x4_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m_n_u32 (uint32x4_t __inactive, uint32x4_t __a, uint32_t
__b, mve_pred16_t __p)
  {
return __builtin_mve_vaddq_m_n_uv4si (__inactive, __a, __b, __p);
  }

  __extension__ extern __inline uint16x8_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m_n_u16 (uint16x8_t __inactive, uint16x8_t __a, uint16_t
__b, mve_pred16_t __p)
  {
return __builtin_mve_vaddq_m_n_uv8hi (__inactive, __a, __b, __p);
  }
@@ -26417,42 +26417,42 @@ __arm_vabdq_m (uint16x8_t __inactive,
uint16x8_t __a, uint16x8_t __b, mve_pred16

  __extension__ extern __inline int8x16_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
-__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int __b,
mve_pred16_t __p)
+__arm_vaddq_m (int8x16_t __inactive, int8x16_t __a, int8_t __b,
mve_pred16_t __p)
  {
   return __arm_vaddq_m_n_s8 (__inactive, __a, __b, __p

[PATCH 2/2] arm: Add support for MVE Tail-Predicated Low Overhead Loops

2022-11-11 Thread Stam Markianos-Wright via Gcc-patches

Hi all,

This is the 2/2 patch that contains the functional changes needed
for MVE Tail Predicated Low Overhead Loops.  See my previous email
for a general introduction of MVE LOLs.

This support is added through the already existing loop-doloop
mechanisms that are used for non-MVE dls/le looping.

Changes are:

1) Relax the loop-doloop mechanism in the mid-end to allow for
   decrement numbers other that -1 and for `count` to be an
   rtx containing the number of elements to be processed, rather
   than an expression for calculating the number of iterations.
2) Add a `allow_elementwise_doloop` target hook. This allows the
   target backend to manipulate the iteration count as it needs:
   in our case to change it from a pre-calculation of the number
   of iterations to the number of elements to be processed.
3) The doloop_end target-insn now had an additional parameter:
   the `count` (note: this is before it gets modified to just be
   the number of elements), so that the decrement value is
   extracted from that parameter.

And many things in the backend to implement the above optimisation:

4)  Appropriate changes to the define_expand of doloop_end and new
    patterns for dlstp and letp.
5) `arm_attempt_dlstp_transform`: (called from the define_expand of
    doloop_end) this function checks for the loop's suitability for
    dlstp/letp transformation and then implements it, if possible.
6) `arm_mve_get_loop_unique_vctp`: A function that loops through
    the loop contents and returns the vctp VPR-genereting operation
    within the loop, if it is unique and there is exclusively one
    vctp within the loop.
7) A couple of utility functions: `arm_mve_get_vctp_lanes` to map
   from vctp unspecs to number of lanes, and `arm_get_required_vpr_reg`
   to check an insn to see if it requires the VPR or not.

No regressions on arm-none-eabi with various targets and on
aarch64-none-elf. Thoughts on getting this into trunk?

Thank you,
Stam Markianos-Wright

gcc/ChangeLog:

    * config/aarch64/aarch64.md: Add extra doloop_end arg.
    * config/arm/arm-protos.h (arm_attempt_dlstp_transform): New.
    * config/arm/arm.cc (TARGET_ALLOW_ELEMENTWISE_DOLOOP): New.
    (arm_mve_get_vctp_lanes): New.
    (arm_get_required_vpr_reg): New.
    (arm_mve_get_loop_unique_vctp): New.
    (arm_attempt_dlstp_transform): New.
    (arm_allow_elementwise_doloop): New.
    * config/arm/iterators.md:
    * config/arm/mve.md (*predicated_doloop_end_internal): New.
    (dlstp_insn): New.
    * config/arm/thumb2.md (doloop_end): Update for MVE LOLs.
    * config/arm/unspecs.md: New unspecs.
    * config/ia64/ia64.md: Add extra doloop_end arg.
    * config/pru/pru.md: Add extra doloop_end arg.
    * config/rs6000/rs6000.md: Add extra doloop_end arg.
    * config/s390/s390.md: Add extra doloop_end arg.
    * config/v850/v850.md: Add extra doloop_end arg.
    * doc/tm.texi: Document new hook.
    * doc/tm.texi.in: Likewise.
    * loop-doloop.cc (doloop_condition_get): Relax conditions.
    (doloop_optimize): Add support for elementwise LoLs.
    * target-insns.def (doloop_end): Add extra arg.
    * target.def (allow_elementwise_doloop): New hook.
    * targhooks.cc (default_allow_elementwise_doloop): New.
    * targhooks.h (default_allow_elementwise_doloop): New.

gcc/testsuite/ChangeLog:

    * gcc.target/arm/lob.h: Update framework.
    * gcc.target/arm/lob1.c: Likewise.
    * gcc.target/arm/lob6.c: Likewise.
    * gcc.target/arm/dlstp-int16x8.c: New test.
    * gcc.target/arm/dlstp-int32x4.c: New test.
    * gcc.target/arm/dlstp-int64x2.c: New test.
    * gcc.target/arm/dlstp-int8x16.c: New test.


### Inline copy of patch ###

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
f2e3d905dbbeb2949f2947f5cfd68208c94c9272..7a6d24a80060b4a704a481ccd1a32d96e7b0f369 
100644

--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -7366,7 +7366,8 @@
 ;; knows what to generate.
 (define_expand "doloop_end"
   [(use (match_operand 0 "" ""))  ; loop pseudo
-   (use (match_operand 1 "" ""))] ; label
+   (use (match_operand 1 "" ""))  ; label
+   (use (match_operand 2 "" ""))] ; decrement constant
   "optimize > 0 && flag_modulo_sched"
 {
   rtx s0;
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 
550272facd12e60a49bf8a3b20f811cc13765b3a..7684620f0f4d161dd9e9ad2d70308021ec3d3d34 
100644

--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -63,7 +63,7 @@ extern void arm_decompose_di_binop (rtx, rtx, rtx *, 
rtx *, rtx *, rtx *);

 extern bool arm_q_bit_access (void);
 extern bool arm_ge_bits_access (void);
 extern bool arm_target_insn_ok_for_lob (rtx);
-
+extern rtx arm_attempt_dlstp_transform (r

[PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]

2021-03-31 Thread Stam Markianos-Wright via Gcc-patches

On 29/03/2021 10:20, Richard Biener wrote:

On Fri, 26 Mar 2021, Richard Sandiford wrote:


Richard Biener  writes:

On Wed, 24 Mar 2021, Stam Markianos-Wright wrote:


Hi all,

This patch resolves bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

This is achieved by forcing a re-calculation of *stmt_vectype_out if an
incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an
extra introduced max_nunits ceiling.

I am not 100% sure if this is the best way to go about fixing this, because
this is my first look at the vectorizer and I lack knowledge of the wider
context, so do let me know if you see a better way to do this!

I have added the previously ICE-ing reproducer as a new test.

This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" for
GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10.

(the non-fdisable-tree-fre4 version has gone latent on GCC11)

Bootstrapped and reg-tested on aarch64-linux-gnu.
Also reg-tested on aarch64-none-elf.


I don't think this is going to work well given uses will expect
a vector type that's consistent here.

I think giving up is for the moment the best choice, thus replacing
the assert with vectorization failure.

In the end we shouldn't require those nunits vectypes to be
separately computed - we compute the vector type of the defs
anyway and in case they're invariant the vectorizable_* function
either can deal with the type mix or not anyway.


I agree this area needs simplification, but I think the direction of
travel should be to make the assert valid.  I agree this is probably
the pragmatic fix for GCC 11 and earlier though.


The issue is that we compute a vector type for a use that may differ
from what we'd compute for it in the context of its definition (or
in the context of another use).  Any such "local" decision is likely
flawed and I'd rather simplify further doing the only decision on
the definition side - if there's a disconnect between the number
of lanes (and thus altering the VF won't help) then we have to give
up anyway.

Richard.



Thank you both for the further info! Would it be fair to close the 
initial PR regarding the ICE 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974) and then open a 
second one at a lower priority level to address these further improvements?


Also Christophe has kindly found out that the test FAILs in ILP32, so it 
would be great to get that one in asap, too! 
https://gcc.gnu.org/pipermail/gcc-patches/2021-March/567431.html


Cheers,
Stam



Re: [PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]

2021-03-25 Thread Stam Markianos-Wright via Gcc-patches

On 24/03/2021 13:46, Richard Biener wrote:

On Wed, 24 Mar 2021, Stam Markianos-Wright wrote:


Hi all,

This patch resolves bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

This is achieved by forcing a re-calculation of *stmt_vectype_out if an
incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with an
extra introduced max_nunits ceiling.

I am not 100% sure if this is the best way to go about fixing this, because
this is my first look at the vectorizer and I lack knowledge of the wider
context, so do let me know if you see a better way to do this!

I have added the previously ICE-ing reproducer as a new test.

This is compiled as "g++ -Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" for
GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for GCC10.

(the non-fdisable-tree-fre4 version has gone latent on GCC11)

Bootstrapped and reg-tested on aarch64-linux-gnu.
Also reg-tested on aarch64-none-elf.


I don't think this is going to work well given uses will expect
a vector type that's consistent here.

I think giving up is for the moment the best choice, thus replacing
the assert with vectorization failure.

In the end we shouldn't require those nunits vectypes to be
separately computed - we compute the vector type of the defs
anyway and in case they're invariant the vectorizable_* function
either can deal with the type mix or not anyway.



Yea good point! I agree and after all we are very close to releases now ;)

I've attached the patch that just do the graceful vectorization failure 
and add a slightly better test now. Re-tested as previously with no 
issues ofc.


gcc-10.patch is what I'd backport to GCC10 (the only difference between 
that and gcc-11.patch is that one compiles with `-fdisable-tree-fre4` 
and the other without it).


Ok to push this to the GCC11 branch and backport to the GCC10 branch?

Cheers :D
Stam


That said, the goal should be to simplify things here.

Richard.



gcc/ChangeLog:

 * tree-vect-stmts.c (get_vectype_for_scalar_type): Add new
 parameter to core function and add new function overload.
 (vect_get_vector_types_for_stmt): Add re-calculation logic.

gcc/testsuite/ChangeLog:

 * g++.target/aarch64/sve/pr96974.C: New test.





diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
new file mode 100644
index 000..363241d18df
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=armv8.2-a+sve -fdisable-tree-fre4 -fdump-tree-slp-details" } */
+
+float a;
+int
+b ()
+{ return __builtin_lrintf(a); }
+
+struct c {
+  float d;
+c() {
+  for (int e = 0; e < 9; e++)
+	coeffs[e] = d ? b() : 0;
+}
+int coeffs[10];
+} f;
+
+/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of vector subparts between" "slp1" } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index d791d3a4720..4c01e82ff39 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -12148,8 +12148,12 @@ vect_get_vector_types_for_stmt (vec_info *vinfo, stmt_vec_info stmt_info,
 	}
 }
 
-  gcc_assert (multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
-			  TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)));
+  if (!multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
+		   TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)))
+return opt_result::failure_at (stmt,
+   "Not vectorized: Incompatible number "
+   "of vector subparts between %T and %T\n",
+   nunits_vectype, *stmt_vectype_out);
 
   if (dump_enabled_p ())
 {
diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
new file mode 100644
index 000..2023c55e3e6
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=armv8.2-a+sve -fdump-tree-slp-details" } */
+
+float a;
+int
+b ()
+{ return __builtin_lrintf(a); }
+
+struct c {
+  float d;
+c() {
+  for (int e = 0; e < 9; e++)
+	coeffs[e] = d ? b() : 0;
+}
+int coeffs[10];
+} f;
+
+/* { dg-final { scan-tree-dump "Not vectorized: Incompatible number of vector subparts between" "slp1" } } */
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index c2d1f39fe0f..6418edb5204 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -12249,8 +12249,12 @@ vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
 	}
 }
 
-  gcc_assert (multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
-			  TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)));
+  if (!multiple_p (TYPE_VECTOR_SUBPARTS (nunits_vectype),
+		   TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)))
+return opt_result::failure_at (stmt,
+   "Not vectorized: Incompatible number "
+   "of vector subparts between %T and %T\n",
+   nunits_vectype, *stmt_vectype_out);
 
   if (dump_enabled_p ())
 {


[PATCH] slp tree vectorizer: Re-calculate vectorization factor in the case of invalid choices [PR96974]

2021-03-24 Thread Stam Markianos-Wright via Gcc-patches

Hi all,

This patch resolves bug:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96974

This is achieved by forcing a re-calculation of *stmt_vectype_out if an
incompatible combination of TYPE_VECTOR_SUBPARTS is detected, but with 
an extra introduced max_nunits ceiling.


I am not 100% sure if this is the best way to go about fixing this, 
because this is my first look at the vectorizer and I lack knowledge of 
the wider context, so do let me know if you see a better way to do this!


I have added the previously ICE-ing reproducer as a new test.

This is compiled as "g++ -Ofast -march=armv8.2-a+sve 
-fdisable-tree-fre4" for GCC11 and "g++ -Ofast -march=armv8.2-a+sve" for 
GCC10.


(the non-fdisable-tree-fre4 version has gone latent on GCC11)

Bootstrapped and reg-tested on aarch64-linux-gnu.
Also reg-tested on aarch64-none-elf.


gcc/ChangeLog:

* tree-vect-stmts.c (get_vectype_for_scalar_type): Add new
parameter to core function and add new function overload.
(vect_get_vector_types_for_stmt): Add re-calculation logic.

gcc/testsuite/ChangeLog:

* g++.target/aarch64/sve/pr96974.C: New test.
diff --git a/gcc/testsuite/g++.target/aarch64/sve/pr96974.C b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
new file mode 100644
index ..2f6ebd6ce3dd8626f5e666edba77d2c925739b7d
--- /dev/null
+++ b/gcc/testsuite/g++.target/aarch64/sve/pr96974.C
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast -march=armv8.2-a+sve -fdisable-tree-fre4" } */
+
+float a;
+int
+b ()
+{ return __builtin_lrintf(a); }
+
+struct c {
+  float d;
+c() {
+  for (int e = 0; e < 9; e++)
+	coeffs[e] = d ? b() : 0;
+}
+int coeffs[10];
+} f;
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index c2d1f39fe0f4bbc90ffa079cb6a8fcf87b76b3af..f8d3eac38718e18bf957b85109cccbc03e21c041 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -11342,7 +11342,7 @@ get_related_vectype_for_scalar_type (machine_mode prevailing_mode,
 
 tree
 get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
-			 unsigned int group_size)
+			 unsigned int group_size, unsigned int max_nunits)
 {
   /* For BB vectorization, we should always have a group size once we've
  constructed the SLP tree; the only valid uses of zero GROUP_SIZEs
@@ -11375,13 +11375,16 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
 	 fail (in the latter case because GROUP_SIZE is too small
 	 for the target), but it's possible that a target could have
 	 a hole between supported vector types.
+	 There is also the option to artificially pass a max_nunits,
+	 which is smaller than GROUP_SIZE, if the use of GROUP_SIZE
+	 would result in an incompatible mode for the target.
 
 	 If GROUP_SIZE is not a power of 2, this has the effect of
 	 trying the largest power of 2 that fits within the group,
 	 even though the group is not a multiple of that vector size.
 	 The BB vectorizer will then try to carve up the group into
 	 smaller pieces.  */
-  unsigned int nunits = 1 << floor_log2 (group_size);
+  unsigned int nunits = 1 << floor_log2 (max_nunits);
   do
 	{
 	  vectype = get_related_vectype_for_scalar_type (vinfo->vector_mode,
@@ -11394,6 +11397,14 @@ get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
   return vectype;
 }
 
+tree
+get_vectype_for_scalar_type (vec_info *vinfo, tree scalar_type,
+			 unsigned int group_size)
+{
+  return get_vectype_for_scalar_type (vinfo, scalar_type,
+ group_size, group_size);
+}
+
 /* Return the vector type corresponding to SCALAR_TYPE as supported
by the target.  NODE, if nonnull, is the SLP tree node that will
use the returned vector type.  */
@@ -12172,6 +12183,8 @@ vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
 
   tree vectype;
   tree scalar_type = NULL_TREE;
+  tree scalar_type_orig = NULL_TREE;
+
   if (group_size == 0 && STMT_VINFO_VECTYPE (stmt_info))
 {
   vectype = STMT_VINFO_VECTYPE (stmt_info);
@@ -12210,6 +12223,7 @@ vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
 			 "get vectype for scalar type: %T\n", scalar_type);
 	}
   vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
+  scalar_type_orig = scalar_type;
   if (!vectype)
 	return opt_result::failure_at (stmt,
    "not vectorized:"
@@ -12249,6 +12263,36 @@ vect_get_vector_types_for_stmt (stmt_vec_info stmt_info,
 	}
 }
 
+  /* In rare cases with different types and sizes we may reach an invalid
+ combination where nunits_vectype has fewer TYPE_VECTOR_SUBPARTS than
+ *stmt_vectype_out.  In that case attempt to re-calculate
+ *stmt_vectype_out with an imposed max taken from nunits_vectype.  */
+  unsigned int max_nunits;
+  if (known_lt (TYPE_VECTOR_SUBPARTS (nunits_vectype),
+		TYPE_VECTOR_SUBPARTS (*stmt_vectype_out)))
+{
+  if (dump_enabled_p ())
+	dump_printf_loc (MSG_NOTE, vect_location,
+	   

Re: [committed obvious][arm] Add test that was missing from old commit [PR91816]

2020-11-26 Thread Stam Markianos-Wright via Gcc-patches

On 26/11/2020 09:01, Christophe Lyon wrote:

On Wed, 25 Nov 2020 at 14:24, Stam Markianos-Wright via Gcc-patches
 wrote:


Hi all,

A while back I submitted GCC10 commit:

   44f77a6dea2f312ee1743f3dde465c1b8453ee13

for PR91816.

Turns out I was an idiot and forgot to include the test in the actual
git commit, even my entire patch had been approved.

Tested that the test still passes on a cross arm-none-eabi and also in a
Cortex A-15 bootstrap with no regressions.

Submitting this as Obvious to gcc-11 and backporting to gcc-10.



Hi,

This new test fails when forcing -mcpu=cortex-m3/4/5/7/33:
FAIL: gcc.target/arm/pr91816.c scan-assembler-times beq\\t.L[0-9] 2
FAIL: gcc.target/arm/pr91816.c scan-assembler-times beq\\t.Lbcond[0-9] 1
FAIL: gcc.target/arm/pr91816.c scan-assembler-times bne\\t.L[0-9] 2
FAIL: gcc.target/arm/pr91816.c scan-assembler-times bne\\t.Lbcond[0-9] 1

I didn't check manually what is generated, can you have a look?



Oh wow thank you for spotting this!

It looks like the A class target that I had tested had a tendency to 
emit a movw/movt pair, whereas these M class targets would emit a single 
ldr. This resulted in an overall shorter jump for these targets that did 
not trigger the new far-branch code.


The test passes after... doubling it's own size:



 #define HW3HW2 HW2 HW2 HW2 HW2 HW2 HW2 HW2 HW2 HW2
 #define HW4HW3 HW3 HW3 HW3 HW3 HW3 HW3 HW3 HW3 HW3
 #define HW5HW4 HW4 HW4 HW4 HW4 HW4 HW4 HW4 HW4 HW4
+#define HW6HW5 HW5

 __attribute__((noinline,noclone)) void f1 (int a)
 {
@@ -25,7 +26,7 @@ __attribute__((noinline,noclone)) void f2 (int a)

 __attribute__((noinline,noclone)) void f3 (int a)
 {
-  if (a) { HW5 }
+  if (a) { HW6 }
 }

 __attribute__((noinline,noclone)) void f4 (int a)
@@ -41,7 +42,7 @@ __attribute__((noinline,noclone)) void f5 (int a)

 __attribute__((noinline,noclone)) void f6 (int a)
 {
-  if (a == 1) { HW5 }
+  if (a == 1) { HW6 }
 }

But this does effectively double the compilation time of an already 
quite large test. Would that be ok?


Overall this is the edge case testing that the compiler behaves 
correctly with a branch in huge compilation unit, so it would be nice to 
have test coverage of it on as many targets as possible... but also 
kinda rare.


Hope this helps!

Cheers,
Stam




Thanks,

Christophe





Thanks,
Stam Markianos-Wright

gcc/testsuite/ChangeLog:
 PR target/91816
 * gcc.target/arm/pr91816.c: New test.




[backport gcc-8,9][arm] Thumb2 out of range conditional branch fix [PR91816]

2020-11-25 Thread Stam Markianos-Wright via Gcc-patches

Hi all,

Now that I have pushed the entirety of this patch to gcc-10 and gcc-11, 
I would like to backport it to gcc-8 and gcc-9.


PR link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91816

This patch had originally been approved here:

https://gcc.gnu.org/legacy-ml/gcc-patches/2020-01/msg02010.html

See the attached diffs that have been rebased and apply cleanly.

Tested on a cross arm-none-eabi and also in a Cortex A-15 bootstrap with 
no regressions.


Ok to backport?

Thanks,
Stam Markianos-Wright
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 9d0acde7a39..87e01e35221 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -553,4 +553,6 @@ void arm_parse_option_features (sbitmap, const 
cpu_arch_option *,
 
 void arm_initialize_isa (sbitmap, const enum isa_feature *);
 
+const char * arm_gen_far_branch (rtx *, int, const char * , const char *);
+
 #endif /* ! GCC_ARM_PROTOS_H */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f990ca11bcb..eefe3d99548 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -31629,6 +31629,39 @@ arm_constant_alignment (const_tree exp, HOST_WIDE_INT 
align)
   return align;
 }
 
+/* Generate code to enable conditional branches in functions over 1 MiB.
+   Parameters are:
+ operands: is the operands list of the asm insn (see arm_cond_branch or
+   arm_cond_branch_reversed).
+ pos_label: is an index into the operands array where operands[pos_label] 
is
+   the asm label of the final jump destination.
+ dest: is a string which is used to generate the asm label of the 
intermediate
+   destination
+   branch_format: is a string denoting the intermediate branch format, e.g.
+ "beq", "bne", etc.  */
+
+const char *
+arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
+   const char * branch_format)
+{
+  rtx_code_label * tmp_label = gen_label_rtx ();
+  char label_buf[256];
+  char buffer[128];
+  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
+   CODE_LABEL_NUMBER (tmp_label));
+  const char *label_ptr = arm_strip_name_encoding (label_buf);
+  rtx dest_label = operands[pos_label];
+  operands[pos_label] = tmp_label;
+
+  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr);
+  output_asm_insn (buffer, operands);
+
+  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, label_ptr);
+  operands[pos_label] = dest_label;
+  output_asm_insn (buffer, operands);
+  return "";
+}
+
 #if CHECKING_P
 namespace selftest {
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 6d6b37719e0..81c96658d95 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -7187,9 +7187,15 @@
 ;; And for backward branches we have 
 ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4).
 ;;
+;; In 16-bit Thumb these ranges are:
 ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving (-2040->2048).
 ;; For a 'b' pos_range = 254,  neg_range = -256  giving (-250 ->256).
 
+;; In 32-bit Thumb these ranges are:
+;; For a 'b'   +/- 16MB is not checked for.
+;; For a 'b' pos_range = 1048574,  neg_range = -1048576  giving
+;; (-1048568 -> 1048576).
+
 (define_expand "cbranchsi4"
   [(set (pc) (if_then_else
  (match_operator 0 "expandable_comparison_operator"
@@ -7444,23 +7450,50 @@
  (label_ref (match_operand 0 "" ""))
  (pc)))]
   "TARGET_32BIT"
-  "*
-  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
+  {
+if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
 {
   arm_ccfsm_state += 2;
-  return \"\";
+  return "";
 }
-  return \"b%d1\\t%l0\";
-  "
+switch (get_attr_length (insn))
+  {
+   case 2: /* Thumb2 16-bit b{cond}.  */
+   case 4: /* Thumb2 32-bit b{cond} or A32 b{cond}.  */
+ return "b%d1\t%l0";
+ break;
+
+   /* Thumb2 b{cond} out of range.  Use 16-bit b{cond} and
+  unconditional branch b.  */
+   default: return arm_gen_far_branch (operands, 0, "Lbcond", "b%D1\t");
+  }
+  }
   [(set_attr "conds" "use")
(set_attr "type" "branch")
(set (attr "length")
-   (if_then_else
-  (and (match_test "TARGET_THUMB2")
-   (and (ge (minus (match_dup 0) (pc)) (const_int -250))
-(le (minus (match_dup 0) (pc)) (const_int 256
-  (const_int 2)
-  (const_int 4)))]
+(if_then_else (match_test "!TARGET_THUMB2")
+
+  ;;Target is not Thumb2, therefore is A32.  Generate b{cond}.
+  (const_int 4)
+
+  ;; Check if target is within 16-bit Thumb2 b{cond} range.
+  (if_then_else (and (ge (minus (match_dup 0) (pc)) (const_int -

[committed obvious][arm] Add test that was missing from old commit [PR91816]

2020-11-25 Thread Stam Markianos-Wright via Gcc-patches

Hi all,

A while back I submitted GCC10 commit:

 44f77a6dea2f312ee1743f3dde465c1b8453ee13

for PR91816.

Turns out I was an idiot and forgot to include the test in the actual 
git commit, even my entire patch had been approved.


Tested that the test still passes on a cross arm-none-eabi and also in a
Cortex A-15 bootstrap with no regressions.

Submitting this as Obvious to gcc-11 and backporting to gcc-10.

Thanks,
Stam Markianos-Wright

gcc/testsuite/ChangeLog:
PR target/91816
* gcc.target/arm/pr91816.c: New test.
diff --git a/gcc/testsuite/gcc.target/arm/pr91816.c b/gcc/testsuite/gcc.target/arm/pr91816.c
new file mode 100644
index 000..75b938a6aad
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr91816.c
@@ -0,0 +1,63 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+/* { dg-additional-options "-mthumb" }  */
+/* { dg-timeout-factor 4.0 } */
+
+int printf(const char *, ...);
+
+#define HW0	printf("Hello World!\n");
+#define HW1	HW0 HW0 HW0 HW0 HW0 HW0 HW0 HW0 HW0 HW0
+#define HW2	HW1 HW1 HW1 HW1 HW1 HW1 HW1 HW1 HW1 HW1
+#define HW3	HW2 HW2 HW2 HW2 HW2 HW2 HW2 HW2 HW2 HW2
+#define HW4	HW3 HW3 HW3 HW3 HW3 HW3 HW3 HW3 HW3 HW3
+#define HW5	HW4 HW4 HW4 HW4 HW4 HW4 HW4 HW4 HW4 HW4
+
+__attribute__((noinline,noclone)) void f1 (int a)
+{
+  if (a) { HW0 }
+}
+
+__attribute__((noinline,noclone)) void f2 (int a)
+{
+  if (a) { HW3 }
+}
+
+
+__attribute__((noinline,noclone)) void f3 (int a)
+{
+  if (a) { HW5 }
+}
+
+__attribute__((noinline,noclone)) void f4 (int a)
+{
+  if (a == 1) { HW0 }
+}
+
+__attribute__((noinline,noclone)) void f5 (int a)
+{
+  if (a == 1) { HW3 }
+}
+
+
+__attribute__((noinline,noclone)) void f6 (int a)
+{
+  if (a == 1) { HW5 }
+}
+
+
+int main(void)
+{
+	f1(0);
+	f2(0);
+	f3(0);
+	f4(0);
+	f5(0);
+	f6(0);
+	return 0;
+}
+
+
+/* { dg-final { scan-assembler-times "beq\\t.L\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "beq\\t.Lbcond\[0-9\]" 1 } } */
+/* { dg-final { scan-assembler-times "bne\\t.L\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "bne\\t.Lbcond\[0-9\]" 1 } } */


Re: [Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension

2020-02-11 Thread Stam Markianos-Wright




On 2/11/20 10:25 AM, Kyrill Tkachov wrote:

Hi Stam,

On 2/10/20 1:35 PM, Stam Markianos-Wright wrote:



On 2/3/20 11:20 AM, Stam Markianos-Wright wrote:
>
>
> On 1/27/20 3:54 PM, Stam Markianos-Wright wrote:
>>
>> On 1/16/20 4:05 PM, Stam Markianos-Wright wrote:
>>>
>>>
>>> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:
>>>>
>>>>
>>>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
>>>>>
>>>>>
>>>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
>>>>>> operations (vector/by element) to the ARM back-end.
>>>>>>
>>>>>> These are:
>>>>>> usdot (vector), dot (by element).
>>>>>>
>>>>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
>>>>>> for ARM they remain optional as of ARMv8.6-a.
>>>>>>
>>>>>> The functions are declared in arm_neon.h, RTL patterns are defined to
>>>>>> generate assembler and tests are added to verify and perform adequate 
checks.

>>>>>>
>>>>>> Regression testing on arm-none-eabi passed successfully.
>>>>>>
>>>>>> This patch depends on:
>>>>>>
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>>>>>
>>>>>> for ARM CLI updates, and on:
>>>>>>
>>>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>>>>>
>>>>>> for testsuite effective_target update.
>>>>>>
>>>>>> Ok for trunk?
>>>>>
>>>>
>>>> New diff addressing review comments from Aarch64 version of the patch.
>>>>
>>>> _Change of order of operands in RTL patterns.
>>>> _Change tests to use check-function-bodies, compile with optimisation and
>>>> check for exact registers.
>>>> _Rename tests to remove "-compile-" in filename.
>>>>
>>>
> .Ping!

Ping :)

Diff re-attached in this ping email is same as the one posted on 10/01

Thank you!



Sorry for the delay.

This is ok.


No worries, thank you!

Committed as r10-6575.

Cheers,
Stam


Thanks,

Kyrill



> .
>>>
>>> Cheers,
>>> Stam
>>>
>>>>>>
>>>>>>
>>>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>>>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>>>>>
>>>>>> PS. I don't have commit rights, so if someone could commit on my behalf,
>>>>>> that would be great :)
>>>>>>
>>>>>>
>>>>>> gcc/ChangeLog:
>>>>>>
>>>>>> 2019-11-28  Stam Markianos-Wright 
>>>>>>
>>>>>>  * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>>>>>>  (USTERNOP_QUALIFIERS): New define.
>>>>>>  (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>>>>>  (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>>>>>  (arm_expand_builtin_args):
>>>>>>  Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>>>>>>  (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>>>>>>  * config/arm/arm_neon.h (vusdot_s32): New.
>>>>>>  (vusdot_lane_s32): New.
>>>>>>  (vusdotq_lane_s32): New.
>>>>>>  (vsudot_lane_s32): New.
>>>>>>  (vsudotq_lane_s32): New.
>>>>>>  * config/arm/arm_neon_builtins.def
>>>>>> (usdot,usdot_lane,sudot_lane): New.
>>>>>>  * config/arm/iterators.md (DOTPROD_I8MM): New.
>>>>>>  (sup, opsuffix): Add .
>>>>>>     * config/arm/neon.md (neon_usdot, dot_lane: New.
>>>>>>  * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>>>>>>
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>>
>>>>>> 2019-12-12  Stam Markianos-Wright 
>>>>>>
>>>>>>  * gcc.target/arm/simd/vdot-2-1.c: New test.
>>>>>>  * gcc.target/arm/simd/vdot-2-2.c: New test.
>>>>>>  * gcc.target/arm/simd/vdot-2-3.c: New test.
>>>>>>  * gcc.target/arm/simd/vdot-2-4.c: New test.
>>>>>>
>>>>>>
>>>>


[Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension

2020-02-10 Thread Stam Markianos-Wright



On 2/3/20 11:20 AM, Stam Markianos-Wright wrote:



On 1/27/20 3:54 PM, Stam Markianos-Wright wrote:


On 1/16/20 4:05 PM, Stam Markianos-Wright wrote:



On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:



On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:



On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:

Hi all,

This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
operations (vector/by element) to the ARM back-end.

These are:
usdot (vector), dot (by element).

The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
for ARM they remain optional as of ARMv8.6-a.

The functions are declared in arm_neon.h, RTL patterns are defined to
generate assembler and tests are added to verify and perform adequate checks.

Regression testing on arm-none-eabi passed successfully.

This patch depends on:

https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html

for ARM CLI updates, and on:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html

for testsuite effective_target update.

Ok for trunk?




New diff addressing review comments from Aarch64 version of the patch.

_Change of order of operands in RTL patterns.
_Change tests to use check-function-bodies, compile with optimisation and 
check for exact registers.

_Rename tests to remove "-compile-" in filename.




.Ping!


Ping :)

Diff re-attached in this ping email is same as the one posted on 10/01

Thank you!

.


Cheers,
Stam




ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

PS. I don't have commit rights, so if someone could commit on my behalf,
that would be great :)


gcc/ChangeLog:

2019-11-28  Stam Markianos-Wright  

 * config/arm/arm-builtins.c (enum arm_type_qualifiers):
 (USTERNOP_QUALIFIERS): New define.
 (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
 (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
 (arm_expand_builtin_args):
 Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
 (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
 * config/arm/arm_neon.h (vusdot_s32): New.
 (vusdot_lane_s32): New.
 (vusdotq_lane_s32): New.
 (vsudot_lane_s32): New.
 (vsudotq_lane_s32): New.
 * config/arm/arm_neon_builtins.def
 (usdot,usdot_lane,sudot_lane): New.
 * config/arm/iterators.md (DOTPROD_I8MM): New.
 (sup, opsuffix): Add .
    * config/arm/neon.md (neon_usdot, dot_lane: New.
 * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.


gcc/testsuite/ChangeLog:

2019-12-12  Stam Markianos-Wright  

 * gcc.target/arm/simd/vdot-2-1.c: New test.
 * gcc.target/arm/simd/vdot-2-2.c: New test.
 * gcc.target/arm/simd/vdot-2-3.c: New test.
 * gcc.target/arm/simd/vdot-2-4.c: New test.




diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index df84560588a..1b4316d0e93 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -86,7 +86,10 @@ enum arm_type_qualifiers
   qualifier_const_void_pointer = 0x802,
   /* Lane indices selected in pairs - must be within range of previous
  argument = a vector.  */
-  qualifier_lane_pair_index = 0x1000
+  qualifier_lane_pair_index = 0x1000,
+  /* Lane indices selected in quadtuplets - must be within range of previous
+ argument = a vector.  */
+  qualifier_lane_quadtup_index = 0x2000
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned };
 #define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers)
 
+/* T (T, unsigned T, T).  */
+static enum arm_type_qualifiers
+arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned,
+  qualifier_none };
+#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers)
+
 /* T (T, immediate).  */
 static enum arm_type_qualifiers
 arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned, qualifier_lane_index };
 #define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers)
 
+/* T (T, unsigned T, T, lane index).  */
+static enum arm_type_qualifiers
+arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned,
+  qualifier_none, qualifier_lane_quadtup_index };
+#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers)
+
+/* T (T, T, unsigend T, lane index).  */
+static enum arm_type_qualifiers
+arm_sumac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none,
+  qualifier_unsigned, qualifier_lane_quadtup_index };
+#define SUMAC_LANE_QUADTUP_QUALIFIERS (arm_sumac_lane_quadtup_qualifiers)
+
 /* T (T, T, immediate).  */
 static enum arm_type_qualifiers
 arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -2148,6 +2172,7 @@ ty

Re: [GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.

2020-02-04 Thread Stam Markianos-Wright



On 2/4/20 12:02 PM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

On 1/31/20 1:45 PM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

On 1/30/20 10:01 AM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

On 1/29/20 12:42 PM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

Hi all,

This fixes:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300

Genmodes.c was generating the "wider_mode" chain as follows:

HF -> BF -> SF - > DF -> TF -> VOID

This caused issues in some rare cases where conversion between modes
was needed, such as the above PR93300 where BFmode was being picked up
as a valid mode for:

optabs.c:prepare_float_lib_cmp

which then led to the ICE at expr.c:convert_mode_scalar.


Hi Richard,



Can you go into more details about why this chain was a problem?
Naively, it's the one I'd have expected: HF should certainly have
priority over BF,


Is that because functionally it looks like genmodes puts things in reverse
alphabetical order if all else is equal? (If I'm reading the comment about
MODE_RANDOM, MODE_CC correctly)


but BF coming before SF doesn't seem unusual in itself.

I'm not saying the patch is wrong.  It just wasn't clear why it was
right either.


Yes, I see what you mean. I'll go through my thought process here:

In investigating the ICE PR93300 I found that the diversion from pre-bf16
behaviour was specifically at `optabs.c:prepare_float_lib_cmp`, where a
`FOR_EACH_MODE_FROM (mode, orig_mode)` is used to then go off and generate
library calls for conversions.

This was then being caught further down by the gcc_assert at expr.c:325 where
GET_MODE_PRECISION (from_mode) was equal to GET_MODE_PRECISION (to_mode) because
it was trying to emit a HF->BF conversion libcall as `bl __extendhfbf2` (which
is what happened if i removed the gcc_assert at expr.c:325)

With BFmode being a target-defined mode, I didn't want to add something like `if
(mode != BFmode)` to specifically exclude BFmode from being selected for this.
(and there's nothing different between HFmode and BFmode here to allow me to
make this distinction?)

Also I couldn't find anywhere where the target back-end is not consulted for a
"is this supported: yes/no" between the `FOR_EACH_MODE_FROM` loop and the
libcall being created later on as __extendhfbf2.


Yeah, prepare_float_lib_cmp just checks for libfuncs rather than
calling target hooks directly.  The libfuncs themselves are under
the control of the target though.

By default we assume all float modes have associated libfuncs.
It's then up to the target to remove functions that don't exist
(or redirect them to other functions).  So I think we need to remove
BFmode libfuncs in arm_init_libfuncs in the same way as we currently
do for HFmode.

I guess we should also nullify the conversion libfuncs for BFmode,
not just the arithmetic and comparison ones.


Ahhh now this works, thank you for the suggestion!

I was aware of arm_init_libfuncs, but I had not realised that returning NULL
would have the desired effect for us, in this case. So I have essentially rolled
back the whole previous version of the patch and done this in the new diff.
It seems to have fixed the ICE and I am currently in the process of regression
testing!


LGTM behaviourally, just a couple of requests about how it's written:



Thank you!
Stam



Thanks,
Richard


Finally, because we really don't want __bf16 to be in the same "conversion rank"
as standard float modes for things like automatic promotion, this seemed like a
reasonable solution to that problem :)

Let me know of your thoughts!

Cheers,
Stam


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c47fc232f39..18055d4a75e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2643,6 +2643,30 @@ arm_init_libfuncs (void)
   default:
 break;
   }
+
+  /* For all possible libcalls in BFmode, return NULL.  */
+  /* Conversions.  */
+  set_conv_libfunc (trunc_optab, BFmode, HFmode, (NULL));
+  set_conv_libfunc (sext_optab, HFmode, BFmode, (NULL));
+  set_conv_libfunc (trunc_optab, BFmode, SFmode, (NULL));
+  set_conv_libfunc (sext_optab, SFmode, BFmode, (NULL));
+  set_conv_libfunc (trunc_optab, BFmode, DFmode, (NULL));
+  set_conv_libfunc (sext_optab, DFmode, BFmode, (NULL));


It might be slightly safer to do:

FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_FLOAT)

to iterate over all float modes on the non-BF side.


Done :)



+  /* Arithmetic.  */
+  set_optab_libfunc (add_optab, BFmode, NULL);
+  set_optab_libfunc (sdiv_optab, BFmode, NULL);
+  set_optab_libfunc (smul_optab, BFmode, NULL);
+  set_optab_libfunc (neg_optab, BFmode, NULL);
+  set_optab_libfunc (sub_optab, BFmode, NULL);
+
+  /* Comparisons.  */
+  set_optab_libfunc (eq_optab, BFmode, NULL);
+  set_optab_libfunc (ne_optab, BFmode, NULL);
+  set_optab_libfunc (lt_optab, BFmode, NULL);
+  set_optab_libfunc (le_optab, BFmode, NULL);
+  set_optab_libfunc (ge_opta

Re: [GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.

2020-02-04 Thread Stam Markianos-Wright



On 1/31/20 1:45 PM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

On 1/30/20 10:01 AM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

On 1/29/20 12:42 PM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

Hi all,

This fixes:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300

Genmodes.c was generating the "wider_mode" chain as follows:

HF -> BF -> SF - > DF -> TF -> VOID

This caused issues in some rare cases where conversion between modes
was needed, such as the above PR93300 where BFmode was being picked up
as a valid mode for:

optabs.c:prepare_float_lib_cmp

which then led to the ICE at expr.c:convert_mode_scalar.


Hi Richard,



Can you go into more details about why this chain was a problem?
Naively, it's the one I'd have expected: HF should certainly have
priority over BF,


Is that because functionally it looks like genmodes puts things in reverse
alphabetical order if all else is equal? (If I'm reading the comment about
MODE_RANDOM, MODE_CC correctly)


but BF coming before SF doesn't seem unusual in itself.

I'm not saying the patch is wrong.  It just wasn't clear why it was
right either.


Yes, I see what you mean. I'll go through my thought process here:

In investigating the ICE PR93300 I found that the diversion from pre-bf16
behaviour was specifically at `optabs.c:prepare_float_lib_cmp`, where a
`FOR_EACH_MODE_FROM (mode, orig_mode)` is used to then go off and generate
library calls for conversions.

This was then being caught further down by the gcc_assert at expr.c:325 where
GET_MODE_PRECISION (from_mode) was equal to GET_MODE_PRECISION (to_mode) because
it was trying to emit a HF->BF conversion libcall as `bl __extendhfbf2` (which
is what happened if i removed the gcc_assert at expr.c:325)

With BFmode being a target-defined mode, I didn't want to add something like `if
(mode != BFmode)` to specifically exclude BFmode from being selected for this.
(and there's nothing different between HFmode and BFmode here to allow me to
make this distinction?)

Also I couldn't find anywhere where the target back-end is not consulted for a
"is this supported: yes/no" between the `FOR_EACH_MODE_FROM` loop and the
libcall being created later on as __extendhfbf2.


Yeah, prepare_float_lib_cmp just checks for libfuncs rather than
calling target hooks directly.  The libfuncs themselves are under
the control of the target though.

By default we assume all float modes have associated libfuncs.
It's then up to the target to remove functions that don't exist
(or redirect them to other functions).  So I think we need to remove
BFmode libfuncs in arm_init_libfuncs in the same way as we currently
do for HFmode.

I guess we should also nullify the conversion libfuncs for BFmode,
not just the arithmetic and comparison ones.


Ahhh now this works, thank you for the suggestion!

I was aware of arm_init_libfuncs, but I had not realised that returning NULL
would have the desired effect for us, in this case. So I have essentially rolled
back the whole previous version of the patch and done this in the new diff.
It seems to have fixed the ICE and I am currently in the process of regression
testing!


LGTM behaviourally, just a couple of requests about how it's written:



Thank you!
Stam



Thanks,
Richard


Finally, because we really don't want __bf16 to be in the same "conversion rank"
as standard float modes for things like automatic promotion, this seemed like a
reasonable solution to that problem :)

Let me know of your thoughts!

Cheers,
Stam


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c47fc232f39..18055d4a75e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2643,6 +2643,30 @@ arm_init_libfuncs (void)
  default:
break;
  }
+
+  /* For all possible libcalls in BFmode, return NULL.  */
+  /* Conversions.  */
+  set_conv_libfunc (trunc_optab, BFmode, HFmode, (NULL));
+  set_conv_libfunc (sext_optab, HFmode, BFmode, (NULL));
+  set_conv_libfunc (trunc_optab, BFmode, SFmode, (NULL));
+  set_conv_libfunc (sext_optab, SFmode, BFmode, (NULL));
+  set_conv_libfunc (trunc_optab, BFmode, DFmode, (NULL));
+  set_conv_libfunc (sext_optab, DFmode, BFmode, (NULL));


It might be slightly safer to do:

   FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_FLOAT)

to iterate over all float modes on the non-BF side.


Done :)



+  /* Arithmetic.  */
+  set_optab_libfunc (add_optab, BFmode, NULL);
+  set_optab_libfunc (sdiv_optab, BFmode, NULL);
+  set_optab_libfunc (smul_optab, BFmode, NULL);
+  set_optab_libfunc (neg_optab, BFmode, NULL);
+  set_optab_libfunc (sub_optab, BFmode, NULL);
+
+  /* Comparisons.  */
+  set_optab_libfunc (eq_optab, BFmode, NULL);
+  set_optab_libfunc (ne_optab, BFmode, NULL);
+  set_optab_libfunc (lt_optab, BFmode, NULL);
+  set_optab_libfunc (le_optab, BFmode, NULL);
+  set_optab_libfunc (ge_optab, BFmode, NULL);
+  set_optab_libfunc (gt_optab, BFmode, NULL);
+  set_optab_libfun

[Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension

2020-02-03 Thread Stam Markianos-Wright




On 1/27/20 3:54 PM, Stam Markianos-Wright wrote:


On 1/16/20 4:05 PM, Stam Markianos-Wright wrote:



On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:



On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:



On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:

Hi all,

This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
operations (vector/by element) to the ARM back-end.

These are:
usdot (vector), dot (by element).

The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
for ARM they remain optional as of ARMv8.6-a.

The functions are declared in arm_neon.h, RTL patterns are defined to
generate assembler and tests are added to verify and perform adequate checks.

Regression testing on arm-none-eabi passed successfully.

This patch depends on:

https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html

for ARM CLI updates, and on:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html

for testsuite effective_target update.

Ok for trunk?




New diff addressing review comments from Aarch64 version of the patch.

_Change of order of operands in RTL patterns.
_Change tests to use check-function-bodies, compile with optimisation and 
check for exact registers.

_Rename tests to remove "-compile-" in filename.




.Ping!
.


Cheers,
Stam




ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

PS. I don't have commit rights, so if someone could commit on my behalf,
that would be great :)


gcc/ChangeLog:

2019-11-28  Stam Markianos-Wright  

 * config/arm/arm-builtins.c (enum arm_type_qualifiers):
 (USTERNOP_QUALIFIERS): New define.
 (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
 (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
 (arm_expand_builtin_args):
 Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
 (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
 * config/arm/arm_neon.h (vusdot_s32): New.
 (vusdot_lane_s32): New.
 (vusdotq_lane_s32): New.
 (vsudot_lane_s32): New.
 (vsudotq_lane_s32): New.
 * config/arm/arm_neon_builtins.def
 (usdot,usdot_lane,sudot_lane): New.
 * config/arm/iterators.md (DOTPROD_I8MM): New.
 (sup, opsuffix): Add .
    * config/arm/neon.md (neon_usdot, dot_lane: New.
 * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.


gcc/testsuite/ChangeLog:

2019-12-12  Stam Markianos-Wright  

 * gcc.target/arm/simd/vdot-compile-2-1.c: New test.
 * gcc.target/arm/simd/vdot-compile-2-2.c: New test.
 * gcc.target/arm/simd/vdot-compile-2-3.c: New test.
 * gcc.target/arm/simd/vdot-compile-2-4.c: New test.






Re: [GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.

2020-01-30 Thread Stam Markianos-Wright



On 1/30/20 10:01 AM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

On 1/29/20 12:42 PM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

Hi all,

This fixes:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300

Genmodes.c was generating the "wider_mode" chain as follows:

HF -> BF -> SF - > DF -> TF -> VOID

This caused issues in some rare cases where conversion between modes
was needed, such as the above PR93300 where BFmode was being picked up
as a valid mode for:

optabs.c:prepare_float_lib_cmp

which then led to the ICE at expr.c:convert_mode_scalar.


Hi Richard,



Can you go into more details about why this chain was a problem?
Naively, it's the one I'd have expected: HF should certainly have
priority over BF,


Is that because functionally it looks like genmodes puts things in reverse
alphabetical order if all else is equal? (If I'm reading the comment about
MODE_RANDOM, MODE_CC correctly)


but BF coming before SF doesn't seem unusual in itself.

I'm not saying the patch is wrong.  It just wasn't clear why it was
right either.


Yes, I see what you mean. I'll go through my thought process here:

In investigating the ICE PR93300 I found that the diversion from pre-bf16
behaviour was specifically at `optabs.c:prepare_float_lib_cmp`, where a
`FOR_EACH_MODE_FROM (mode, orig_mode)` is used to then go off and generate
library calls for conversions.

This was then being caught further down by the gcc_assert at expr.c:325 where
GET_MODE_PRECISION (from_mode) was equal to GET_MODE_PRECISION (to_mode) because
it was trying to emit a HF->BF conversion libcall as `bl __extendhfbf2` (which
is what happened if i removed the gcc_assert at expr.c:325)

With BFmode being a target-defined mode, I didn't want to add something like `if
(mode != BFmode)` to specifically exclude BFmode from being selected for this.
(and there's nothing different between HFmode and BFmode here to allow me to
make this distinction?)

Also I couldn't find anywhere where the target back-end is not consulted for a
"is this supported: yes/no" between the `FOR_EACH_MODE_FROM` loop and the
libcall being created later on as __extendhfbf2.


Yeah, prepare_float_lib_cmp just checks for libfuncs rather than
calling target hooks directly.  The libfuncs themselves are under
the control of the target though.

By default we assume all float modes have associated libfuncs.
It's then up to the target to remove functions that don't exist
(or redirect them to other functions).  So I think we need to remove
BFmode libfuncs in arm_init_libfuncs in the same way as we currently
do for HFmode.

I guess we should also nullify the conversion libfuncs for BFmode,
not just the arithmetic and comparison ones.


Ahhh now this works, thank you for the suggestion!

I was aware of arm_init_libfuncs, but I had not realised that returning NULL 
would have the desired effect for us, in this case. So I have essentially rolled 
back the whole previous version of the patch and done this in the new diff.
It seems to have fixed the ICE and I am currently in the process of regression 
testing!


Thank you!
Stam



Thanks,
Richard


Finally, because we really don't want __bf16 to be in the same "conversion rank"
as standard float modes for things like automatic promotion, this seemed like a
reasonable solution to that problem :)

Let me know of your thoughts!

Cheers,
Stam


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index c47fc232f39..18055d4a75e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2643,6 +2643,30 @@ arm_init_libfuncs (void)
 default:
   break;
 }
+
+  /* For all possible libcalls in BFmode, return NULL.  */
+  /* Conversions.  */
+  set_conv_libfunc (trunc_optab, BFmode, HFmode, (NULL));
+  set_conv_libfunc (sext_optab, HFmode, BFmode, (NULL));
+  set_conv_libfunc (trunc_optab, BFmode, SFmode, (NULL));
+  set_conv_libfunc (sext_optab, SFmode, BFmode, (NULL));
+  set_conv_libfunc (trunc_optab, BFmode, DFmode, (NULL));
+  set_conv_libfunc (sext_optab, DFmode, BFmode, (NULL));
+
+  /* Arithmetic.  */
+  set_optab_libfunc (add_optab, BFmode, NULL);
+  set_optab_libfunc (sdiv_optab, BFmode, NULL);
+  set_optab_libfunc (smul_optab, BFmode, NULL);
+  set_optab_libfunc (neg_optab, BFmode, NULL);
+  set_optab_libfunc (sub_optab, BFmode, NULL);
+
+  /* Comparisons.  */
+  set_optab_libfunc (eq_optab, BFmode, NULL);
+  set_optab_libfunc (ne_optab, BFmode, NULL);
+  set_optab_libfunc (lt_optab, BFmode, NULL);
+  set_optab_libfunc (le_optab, BFmode, NULL);
+  set_optab_libfunc (ge_optab, BFmode, NULL);
+  set_optab_libfunc (gt_optab, BFmode, NULL);
+  set_optab_libfunc (unord_optab, BFmode, NULL);
 
   /* Use names prefixed with __gnu_ for fixed-point helper functions.  */
   {


Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2020-01-30 Thread Stam Markianos-Wright



On 1/28/20 10:35 AM, Kyrill Tkachov wrote:

Hi Stam,

On 1/8/20 3:18 PM, Stam Markianos-Wright wrote:


On 12/10/19 5:03 PM, Kyrill Tkachov wrote:

Hi Stam,

On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:

Pinging with more correct maintainers this time :)

Also would need to backport to gcc7,8,9, but need to get this approved
first!


Sorry for the delay.

Same here now! Sorry totally forget about this in the lead up to Xmas!

Done the changes marked below and also removed the unnecessary extra #defines
from the test.



This is ok with a nit on the testcase...


diff --git a/gcc/testsuite/gcc.target/arm/pr91816.c 
b/gcc/testsuite/gcc.target/arm/pr91816.c

new file mode 100644
index 
..757c897e9c0db32709227b3fdf1b4a8033428232

--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr91816.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-options "-march=armv7-a -mthumb -mfpu=vfpv3-d16" }  */
+int printf(const char *, ...);
+

I think this needs a couple of effective target checks like arm_hard_vfp_ok and 
arm_thumb2_ok. See other tests in gcc.target/arm that add -mthumb to the options.


Hmm, looking back at this now, is there any reason why it can't just be:

/* { dg-do compile } */
/* { dg-require-effective-target arm_thumb2_ok } */
/* { dg-additional-options "-mthumb" }  */

were we don't override the march or fpu options at all, but just use 
`require-effective-target arm_thumb2_ok` to make sure that thumb2 is supported?


The attached new diff does just that.

Cheers :)

Stam.



Thanks,
Kyrill



diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 7c4b1003844..8895becc639 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -576,4 +576,6 @@ void arm_parse_option_features (sbitmap, const 
cpu_arch_option *,
 
 void arm_initialize_isa (sbitmap, const enum isa_feature *);
 
+const char * arm_gen_far_branch (rtx *, int, const char * , const char *);
+
 #endif /* ! GCC_ARM_PROTOS_H */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 07231d722b9..ee5de169f3e 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -32626,6 +32626,40 @@ arm_run_selftests (void)
 }
 } /* Namespace selftest.  */
 
+
+/* Generate code to enable conditional branches in functions over 1 MiB.
+   Parameters are:
+ operands: is the operands list of the asm insn (see arm_cond_branch or
+   arm_cond_branch_reversed).
+ pos_label: is an index into the operands array where operands[pos_label] 
is
+   the asm label of the final jump destination.
+ dest: is a string which is used to generate the asm label of the 
intermediate
+   destination
+   branch_format: is a string denoting the intermediate branch format, e.g.
+ "beq", "bne", etc.  */
+
+const char *
+arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
+   const char * branch_format)
+{
+  rtx_code_label * tmp_label = gen_label_rtx ();
+  char label_buf[256];
+  char buffer[128];
+  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
+   CODE_LABEL_NUMBER (tmp_label));
+  const char *label_ptr = arm_strip_name_encoding (label_buf);
+  rtx dest_label = operands[pos_label];
+  operands[pos_label] = tmp_label;
+
+  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr);
+  output_asm_insn (buffer, operands);
+
+  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, label_ptr);
+  operands[pos_label] = dest_label;
+  output_asm_insn (buffer, operands);
+  return "";
+}
+
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
 #endif /* CHECKING_P */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index f89a2d412df..fb1d4547e5c 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -7546,9 +7546,15 @@
 ;; And for backward branches we have 
 ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4).
 ;;
+;; In 16-bit Thumb these ranges are:
 ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving (-2040->2048).
 ;; For a 'b' pos_range = 254,  neg_range = -256  giving (-250 ->256).
 
+;; In 32-bit Thumb these ranges are:
+;; For a 'b'   +/- 16MB is not checked for.
+;; For a 'b' pos_range = 1048574,  neg_range = -1048576  giving
+;; (-1048568 -> 1048576).
+
 (define_expand "cbranchsi4"
   [(set (pc) (if_then_else
  (match_operator 0 "expandable_comparison_operator"
@@ -7721,23 +7727,50 @@
  (label_ref (match_operand 0 "" ""))
  (pc)))]
   "TARGET_32BIT"
-  "*
-  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
+  {
+if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
 {
   arm_ccfsm_state += 2;
-  return \"\";
+  return "";
 }
-  return \"b%d1\\

Re: [GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.

2020-01-29 Thread Stam Markianos-Wright



On 1/29/20 12:42 PM, Richard Sandiford wrote:

Stam Markianos-Wright  writes:

Hi all,

This fixes:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300

Genmodes.c was generating the "wider_mode" chain as follows:

HF -> BF -> SF - > DF -> TF -> VOID

This caused issues in some rare cases where conversion between modes
was needed, such as the above PR93300 where BFmode was being picked up
as a valid mode for:

optabs.c:prepare_float_lib_cmp

which then led to the ICE at expr.c:convert_mode_scalar.


Hi Richard,



Can you go into more details about why this chain was a problem?
Naively, it's the one I'd have expected: HF should certainly have
priority over BF,


Is that because functionally it looks like genmodes puts things in reverse 
alphabetical order if all else is equal? (If I'm reading the comment about 
MODE_RANDOM, MODE_CC correctly)



but BF coming before SF doesn't seem unusual in itself.

I'm not saying the patch is wrong.  It just wasn't clear why it was
right either.


Yes, I see what you mean. I'll go through my thought process here:

In investigating the ICE PR93300 I found that the diversion from pre-bf16 
behaviour was specifically at `optabs.c:prepare_float_lib_cmp`, where a 
`FOR_EACH_MODE_FROM (mode, orig_mode)` is used to then go off and generate 
library calls for conversions.


This was then being caught further down by the gcc_assert at expr.c:325 where 
GET_MODE_PRECISION (from_mode) was equal to GET_MODE_PRECISION (to_mode) because 
it was trying to emit a HF->BF conversion libcall as `bl __extendhfbf2` (which 
is what happened if i removed the gcc_assert at expr.c:325)


With BFmode being a target-defined mode, I didn't want to add something like `if 
(mode != BFmode)` to specifically exclude BFmode from being selected for this. 
(and there's nothing different between HFmode and BFmode here to allow me to 
make this distinction?)


Also I couldn't find anywhere where the target back-end is not consulted for a 
"is this supported: yes/no" between the `FOR_EACH_MODE_FROM` loop and the 
libcall being created later on as __extendhfbf2.


Finally, because we really don't want __bf16 to be in the same "conversion rank" 
as standard float modes for things like automatic promotion, this seemed like a 
reasonable solution to that problem :)


Let me know of your thoughts!

Cheers,
Stam


Thanks,
Richard



This patch adds a new FLOAT_MODE_UNRANKED macro which uses the existing "order"
attribute of mode_data to place BFmode as:

HF -> SF - > DF -> TF -> BF -> VOID

This fixes the existing ICE seen by PR93300 (hence providing this with no
explicit test) and causes no further regressions.
Reg-tested on arm-none-eabi, aarch64-none-elf and bootstrapped on a Cortex-A15.

Ok for trunk?

Cheers,
Stam

gcc/ChangeLog:

2020-01-28  Stam Markianos-Wright  

* config/aarch64/aarch64-modes.def: Update BFmode to use 
FLOAT_MODE_UNRANKED.
* config/arm/arm-modes.def: Update BFmode to use FLOAT_MODE_UNRANKED.
* genmodes.c (FLOAT_MODE_UNRANKED): New macro.
   (make_float_mode): Add ORDER parameter.

The whole diff for reference:

diff --git a/gcc/config/aarch64/aarch64-modes.def
b/gcc/config/aarch64/aarch64-modes.def
index 1eeb8d88452..0b36da942b4 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -69,10 +69,10 @@ VECTOR_MODES (FLOAT, 16); /*V4SF V2DF.  */
   VECTOR_MODE (FLOAT, DF, 1);   /* V1DF.  */
   VECTOR_MODE (FLOAT, HF, 2);   /* V2HF.  */

-/* Bfloat16 modes.  */
-FLOAT_MODE (BF, 2, 0);
+/* Bfloat16 modes. Using 1 as the ORDER argument ensures that this is
+   placed after normal floating point modes in the GET_MODES_WIDER chain.  */
+FLOAT_MODE_UNRANKED (BF, 2, 0, 1);
   ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format);
-
   VECTOR_MODE (FLOAT, BF, 4);   /*  V4BF.  */
   VECTOR_MODE (FLOAT, BF, 8);   /*  V8BF.  */

diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index ea92ef35723..86551be8e3b 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -78,7 +78,9 @@ VECTOR_MODES (FLOAT, 8);  /*V4HF V2SF */
   VECTOR_MODES (FLOAT, 16); /*   V8HF V4SF V2DF */
   VECTOR_MODE (FLOAT, HF, 2);   /* V2HF */

-FLOAT_MODE (BF, 2, 0);
+/* Bfloat16 modes. Using 1 as the ORDER argument ensures that this is
+   placed after normal floating point modes in the GET_MODES_WIDER chain.  */
+FLOAT_MODE_UNRANKED (BF, 2, 0, 1);
   ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format);
   VECTOR_MODE (FLOAT, BF, 4);   /*  V4BF.  */
   VECTOR_MODE (FLOAT, BF, 8);   /*  V8BF.  */
diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index bd78310ea24..c4e3dd1150d 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -617,20 +617,23 @@ make_fixed_point_mode (enum mode_class cl,
 m->fbit = fbit;
   }


[GCC][BUG][Aarch64][ARM] (PR93300) Fix ICE due to BFmode placement in GET_MODES_WIDER chain.

2020-01-29 Thread Stam Markianos-Wright

Hi all,

This fixes:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300

Genmodes.c was generating the "wider_mode" chain as follows:

HF -> BF -> SF - > DF -> TF -> VOID

This caused issues in some rare cases where conversion between modes was needed,
such as the above PR93300 where BFmode was being picked up as a valid mode for:

optabs.c:prepare_float_lib_cmp

which then led to the ICE at expr.c:convert_mode_scalar.

This patch adds a new FLOAT_MODE_UNRANKED macro which uses the existing "order"
attribute of mode_data to place BFmode as:

HF -> SF - > DF -> TF -> BF -> VOID

This fixes the existing ICE seen by PR93300 (hence providing this with no 
explicit test) and causes no further regressions.

Reg-tested on arm-none-eabi, aarch64-none-elf and bootstrapped on a Cortex-A15.

Ok for trunk?

Cheers,
Stam

gcc/ChangeLog:

2020-01-28  Stam Markianos-Wright  

* config/aarch64/aarch64-modes.def: Update BFmode to use 
FLOAT_MODE_UNRANKED.
* config/arm/arm-modes.def: Update BFmode to use FLOAT_MODE_UNRANKED.
* genmodes.c (FLOAT_MODE_UNRANKED): New macro.
 (make_float_mode): Add ORDER parameter.

The whole diff for reference:

diff --git a/gcc/config/aarch64/aarch64-modes.def 
b/gcc/config/aarch64/aarch64-modes.def

index 1eeb8d88452..0b36da942b4 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -69,10 +69,10 @@ VECTOR_MODES (FLOAT, 16); /*V4SF V2DF.  */
 VECTOR_MODE (FLOAT, DF, 1);   /* V1DF.  */
 VECTOR_MODE (FLOAT, HF, 2);   /* V2HF.  */

-/* Bfloat16 modes.  */
-FLOAT_MODE (BF, 2, 0);
+/* Bfloat16 modes. Using 1 as the ORDER argument ensures that this is
+   placed after normal floating point modes in the GET_MODES_WIDER chain.  */
+FLOAT_MODE_UNRANKED (BF, 2, 0, 1);
 ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format);
-
 VECTOR_MODE (FLOAT, BF, 4);   /*V4BF.  */
 VECTOR_MODE (FLOAT, BF, 8);   /*V8BF.  */

diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index ea92ef35723..86551be8e3b 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -78,7 +78,9 @@ VECTOR_MODES (FLOAT, 8);  /*V4HF V2SF */
 VECTOR_MODES (FLOAT, 16); /*   V8HF V4SF V2DF */
 VECTOR_MODE (FLOAT, HF, 2);   /* V2HF */

-FLOAT_MODE (BF, 2, 0);
+/* Bfloat16 modes. Using 1 as the ORDER argument ensures that this is
+   placed after normal floating point modes in the GET_MODES_WIDER chain.  */
+FLOAT_MODE_UNRANKED (BF, 2, 0, 1);
 ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format);
 VECTOR_MODE (FLOAT, BF, 4);   /*V4BF.  */
 VECTOR_MODE (FLOAT, BF, 8);   /*V8BF.  */
diff --git a/gcc/genmodes.c b/gcc/genmodes.c
index bd78310ea24..c4e3dd1150d 100644
--- a/gcc/genmodes.c
+++ b/gcc/genmodes.c
@@ -617,20 +617,23 @@ make_fixed_point_mode (enum mode_class cl,
   m->fbit = fbit;
 }

-#define FLOAT_MODE(N, Y, F) FRACTIONAL_FLOAT_MODE (N, -1U, Y, F)
-#define FRACTIONAL_FLOAT_MODE(N, B, Y, F) \
-  make_float_mode (#N, B, Y, #F, __FILE__, __LINE__)
+#define FLOAT_MODE_UNRANKED(N, Y, F, ORDER)   \
+   FRACTIONAL_FLOAT_MODE (N, -1U, Y, F, ORDER)
+#define FLOAT_MODE(N, Y, F) FRACTIONAL_FLOAT_MODE (N, -1U, Y, F, 0)
+#define FRACTIONAL_FLOAT_MODE(N, B, Y, F, ORDER) \
+  make_float_mode (#N, B, Y, #F, ORDER, __FILE__, __LINE__)

 static void
 make_float_mode (const char *name,
 unsigned int precision, unsigned int bytesize,
-const char *format,
+const char *format, unsigned int order,
 const char *file, unsigned int line)
 {
   struct mode_data *m = new_mode (MODE_FLOAT, name, file, line);
   m->bytesize = bytesize;
   m->precision = precision;
   m->format = format;
+  m->order = order;
 }

 #define DECIMAL_FLOAT_MODE(N, Y, F)\
diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index 1eeb8d88452..0b36da942b4 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -69,10 +69,10 @@ VECTOR_MODES (FLOAT, 16); /*V4SF V2DF.  */
 VECTOR_MODE (FLOAT, DF, 1);   /* V1DF.  */
 VECTOR_MODE (FLOAT, HF, 2);   /* V2HF.  */
 
-/* Bfloat16 modes.  */
-FLOAT_MODE (BF, 2, 0);
+/* Bfloat16 modes. Using 1 as the ORDER argument ensures that this is
+   placed after normal floating point modes in the GET_MODES_WIDER chain.  */
+FLOAT_MODE_UNRANKED (BF, 2, 0, 1);
 ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format);
-
 VECTOR_MODE (FLOAT, BF, 4);   /*		 V4BF.  */
 VECTOR_MODE (FLOAT, BF, 8);   /*		 V8BF.  */
 
diff --git a/gcc/config/arm/arm-modes.def b/gcc/config/arm/arm-modes.def
index ea92ef35723..86551be8e3b 100644
--- a/gcc/config/arm/arm-modes.def
+++ b/gcc/config/arm/arm-modes.def
@@ -78,7 +78,9 @@ VECTOR_MODES (FLOAT, 8);  /*

[committed][GCC][ARM] Update __fp16 test to fix regression caused by Bfloat optimisation.

2020-01-27 Thread Stam Markianos-Wright
Hi all,

This was committed following offline approval by Kyryl.

One minor optimisation introduced by :

https://gcc.gnu.org/ml/gcc-patches/2020-01/msg01237.html

was to set a preference for both __fp16 types and __bf16 types to be
loaded/stored directly into/from the FP/NEON registers (if they are available
and if the vld1.16 is compatible), rather than be passed through the regular
r-registers.

This would convert many observed instances of:

**  ldrhr3, [r3]@ __fp16
**  vmov.f16s15, r3 @ __fp16

Into a single:

**  vld1.16 {d7[2]}, [r3]

This resulted in a regression of a dg-scan-assembler in a __fp16 test.

This patch updates the test to the same testing standard used by the BFloat
tests (use check-function-bodies to explicitly check for correct assembler
generated by each function) and updates it for the latest optimisation.

Cheers,
Stam

gcc/testsuite/ChangeLog:

2020-01-27  Stam Markianos-Wright  

* gcc.target/arm/armv8_2-fp16-move-1.c: Update following load/store
 optimisation.
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
index 2321dd38cc6..009bb8d1575 100644
--- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c
@@ -3,39 +3,78 @@
 /* { dg-options "-O2" }  */
 /* { dg-add-options arm_v8_2a_fp16_scalar }  */
 /* { dg-additional-options "-mfloat-abi=hard" } */
-
+/* { dg-final { check-function-bodies "**" "" } } */
+
+/*
+**test_load_1:
+**	...
+**	vld1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+*/
 __fp16
 test_load_1 (__fp16* a)
 {
   return *a;
 }
 
+/*
+**test_load_2:
+**	...
+**	vld1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+*/
 __fp16
 test_load_2 (__fp16* a, int i)
 {
   return a[i];
 }
 
-
+/*
+**test_store_1:
+**	...
+**	vst1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+*/
 void
 test_store_1 (__fp16* a, __fp16 b)
 {
   *a = b;
 }
 
+/*
+**test_store_2:
+**	...
+**	vst1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+*/
 void
 test_store_2 (__fp16* a, int i, __fp16 b)
 {
   a[i] = b;
 }
 
-
+/*
+**test_load_store_1:
+**	...
+**	vld1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+**	vst1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+*/
 __fp16
 test_load_store_1 (__fp16* a, int i, __fp16* b)
 {
   a[i] = b[i];
 }
 
+/*
+**test_load_store_2:
+**	...
+**	vld1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+**	vst1.16	{d[0-9]+\[[0-9]+\]}, \[r[0-9]+\]
+**	...
+*/
 __fp16
 test_load_store_2 (__fp16* a, int i, __fp16* b)
 {
@@ -43,9 +82,6 @@ test_load_store_2 (__fp16* a, int i, __fp16* b)
   return a[i];
 }
 
-/* { dg-final { scan-assembler-times {vst1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]+\]} 3 } }  */
-/* { dg-final { scan-assembler-times {vld1\.16\t\{d[0-9]+\[[0-9]+\]\}, \[r[0-9]+\]} 3 } }  */
-
 __fp16
 test_select_1 (int sel, __fp16 a, __fp16 b)
 {


[PINGx2][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2020-01-27 Thread Stam Markianos-Wright


On 1/16/20 4:06 PM, Stam Markianos-Wright wrote:
> 
> 
> On 1/8/20 3:18 PM, Stam Markianos-Wright wrote:
>>
>>
>> On 12/10/19 5:03 PM, Kyrill Tkachov wrote:
>>> Hi Stam,
>>>
>>> On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:
>>>> Pinging with more correct maintainers this time :)
>>>>
>>>> Also would need to backport to gcc7,8,9, but need to get this approved
>>>> first!
>>>>
>>>
>>> Sorry for the delay.
>>
>> Same here now! Sorry totally forget about this in the lead up to Xmas!
>>
>> Done the changes marked below and also removed the unnecessary extra 
>> #defines 
>> from the test.
> 
> Ping :)
> 
> Cheers,
> Stam
> 
>>
>>>
>>>
>>>> Thank you,
>>>> Stam
>>>>
>>>>
>>>>  Forwarded Message 
>>>> Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional
>>>> branches in Thumb2 (PR91816)
>>>> Date: Mon, 21 Oct 2019 10:37:09 +0100
>>>> From: Stam Markianos-Wright 
>>>> To: Ramana Radhakrishnan 
>>>> CC: gcc-patches@gcc.gnu.org , nd ,
>>>> James Greenhalgh , Richard Earnshaw
>>>> 
>>>>
>>>>
>>>>
>>>> On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:
>>>> >>
>>>> >> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
>>>> >> however, on my native Aarch32 setup the test times out when run as part
>>>> >> of a big "make check-gcc" regression, but not when run individually.
>>>> >>
>>>> >> 2019-10-11  Stamatis Markianos-Wright 
>>>> >>
>>>> >>   * config/arm/arm.md: Update b for Thumb2 range checks.
>>>> >>   * config/arm/arm.c: New function arm_gen_far_branch.
>>>> >>   * config/arm/arm-protos.h: New function arm_gen_far_branch
>>>> >>   prototype.
>>>> >>
>>>> >> gcc/testsuite/ChangeLog:
>>>> >>
>>>> >> 2019-10-11  Stamatis Markianos-Wright 
>>>> >>
>>>> >>   * testsuite/gcc.target/arm/pr91816.c: New test.
>>>> >
>>>> >> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>>>> >> index f995974f9bb..1dce333d1c3 100644
>>>> >> --- a/gcc/config/arm/arm-protos.h
>>>> >> +++ b/gcc/config/arm/arm-protos.h
>>>> >> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
>>>> cpu_arch_option *,
>>>> >>
>>>> >>   void arm_initialize_isa (sbitmap, const enum isa_feature *);
>>>> >>
>>>> >> +const char * arm_gen_far_branch (rtx *, int,const char * , const char 
>>>> >> *);
>>>> >> +
>>>> >> +
>>>> >
>>>> > Lets get the nits out of the way.
>>>> >
>>>> > Unnecessary extra new line, need a space between int and const above.
>>>> >
>>>> >
>>>>
>>>> .Fixed!
>>>>
>>>> >>   #endif /* ! GCC_ARM_PROTOS_H */
>>>> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>>> >> index 39e1a1ef9a2..1a693d2ddca 100644
>>>> >> --- a/gcc/config/arm/arm.c
>>>> >> +++ b/gcc/config/arm/arm.c
>>>> >> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>>>> >>   }
>>>> >>   } /* Namespace selftest.  */
>>>> >>
>>>> >> +
>>>> >> +/* Generate code to enable conditional branches in functions over 1 
>>>> MiB.  */
>>>> >> +const char *
>>>> >> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
>>>> >> +    const char * branch_format)
>>>> >
>>>> > Not sure if this is some munging from the attachment but check
>>>> > vertical alignment of parameters.
>>>> >
>>>>
>>>> .Fixed!
>>>>
>>>> >> +{
>>>> >> +  rtx_code_label * tmp_label = gen_label_rtx ();
>>>> >> +  char label_buf[256];
>>>> >> +  char buffer[128];
>>>> >> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
>>>> >&g

[Pingx3][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension

2020-01-27 Thread Stam Markianos-Wright

On 1/16/20 4:05 PM, Stam Markianos-Wright wrote:
> 
> 
> On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:
>>
>>
>> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
>>>
>>>
>>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
>>>> Hi all,
>>>>
>>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
>>>> operations (vector/by element) to the ARM back-end.
>>>>
>>>> These are:
>>>> usdot (vector), dot (by element).
>>>>
>>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
>>>> for ARM they remain optional as of ARMv8.6-a.
>>>>
>>>> The functions are declared in arm_neon.h, RTL patterns are defined to
>>>> generate assembler and tests are added to verify and perform adequate 
>>>> checks.
>>>>
>>>> Regression testing on arm-none-eabi passed successfully.
>>>>
>>>> This patch depends on:
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>>>
>>>> for ARM CLI updates, and on:
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>>>
>>>> for testsuite effective_target update.
>>>>
>>>> Ok for trunk?
>>>
>>> .Ping :)
>>>
>> Ping :)
>>
>> New diff addressing review comments from Aarch64 version of the patch.
>>
>> _Change of order of operands in RTL patterns.
>> _Change tests to use check-function-bodies, compile with optimisation and 
>> check for exact registers.
>> _Rename tests to remove "-compile-" in filename.
>>
> 
> Ping!
> 
> Cheers,
> Stam
> 
>>>>
>>>> Cheers,
>>>> Stam
>>>>
>>>>
>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>>>
>>>> PS. I don't have commit rights, so if someone could commit on my behalf,
>>>> that would be great :)
>>>>
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2019-11-28  Stam Markianos-Wright  
>>>>
>>>>  * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>>>>  (USTERNOP_QUALIFIERS): New define.
>>>>  (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>>>  (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>>>  (arm_expand_builtin_args):
>>>>  Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>>>>  (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>>>>  * config/arm/arm_neon.h (vusdot_s32): New.
>>>>  (vusdot_lane_s32): New.
>>>>  (vusdotq_lane_s32): New.
>>>>  (vsudot_lane_s32): New.
>>>>  (vsudotq_lane_s32): New.
>>>>  * config/arm/arm_neon_builtins.def
>>>>  (usdot,usdot_lane,sudot_lane): New.
>>>>  * config/arm/iterators.md (DOTPROD_I8MM): New.
>>>>  (sup, opsuffix): Add .
>>>>     * config/arm/neon.md (neon_usdot, dot_lane: New.
>>>>  * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>>>>
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2019-12-12  Stam Markianos-Wright  
>>>>
>>>>  * gcc.target/arm/simd/vdot-compile-2-1.c: New test.
>>>>  * gcc.target/arm/simd/vdot-compile-2-2.c: New test.
>>>>  * gcc.target/arm/simd/vdot-compile-2-3.c: New test.
>>>>  * gcc.target/arm/simd/vdot-compile-2-4.c: New test.
>>>>
>>>>
>>


Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]

2020-01-20 Thread Stam Markianos-Wright


On 1/20/20 1:07 PM, Christophe Lyon wrote:
> Hi,
> 
> 
> On Thu, 16 Jan 2020 at 16:59, Stam Markianos-Wright
>  wrote:
>>
>>
>>
>> On 1/13/20 10:05 AM, Kyrill Tkachov wrote:
>>> Hi Stam,
>>>
>>> On 1/10/20 6:45 PM, Stam Markianos-Wright wrote:
>>>> Hi all,
>>>>
>>>> This is a respin of patch:
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>>>>
>>>> which has now been split into two (similar to the Aarch64 version).
>>>>
>>>> This is patch 1 of 2 and adds Bfloat type support to the ARM back-end.
>>>> It also adds a new machine_mode (BFmode) for this type and accompanying 
>>>> Vector
>>>> modes V4BFmode and V8BFmode.
>>>>
>>>> The second patch in this series uses existing target hooks to restrict 
>>>> type use.
>>>>
>>>> Regression testing on arm-none-eabi passed successfully.
>>>>
>>>> This patch depends on:
>>>>
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>>>
>>>> for test suite effective_target update.
>>>>
>>>> Ok for trunk?
>>>
>>> This is ok, thanks.
>>>
>>> You can commit it once the git conversion goes through :)
>>
>> Committed as r10-6020-g2e87b2f4121fe1d39edb76f4e492dfe327be6a1b
>>
> 
> This since commit, I've noticed many ICEs like:
> Executing on host:
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/xgcc
> -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/
> /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c
> -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
> -fdiagnostics-color=never  -fdiagnostics-urls=never-O0
> -mfp16-format=ieee   -lm  -o ./arm-fp16-ops-1.exe(timeout =
> 800)
> spawn -ignore SIGHUP
> /aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/xgcc
> -B/aci-gcc-fsf/builds/gcc-fsf-gccsrc-thumb/obj-arm-none-eabi/gcc3/gcc/
> /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c
> -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers
> -fdiagnostics-color=never -fdiagnostics-urls=never -O0
> -mfp16-format=ieee -lm -o ./arm-fp16-ops-1.exe
> during RTL pass: expand
> In file included from /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:3,
>   from /gcc/testsuite/gcc.dg/torture/arm-fp16-ops-1.c:5:
> /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h: In function 'main':
> /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:31:12: internal compiler
> error: in convert_mode_scalar, at expr.c:328
> /gcc/testsuite/gcc.dg/torture/arm-fp16-ops.h:31:3: note: in expansion
> of macro 'CHECK'
> 0x8cb089 convert_mode_scalar
>  /gcc/expr.c:325
> 0x8cb089 convert_move(rtx_def*, rtx_def*, int)
>  /gcc/expr.c:297
> 0x8cb32f convert_modes(machine_mode, machine_mode, rtx_def*, int)
>  /gcc/expr.c:737
> 0xb8b2a0 expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*,
> rtx_def*, int, optab_methods)
>  /gcc/optabs.c:1895
> 0x8bdebc expand_expr_real_2(separate_ops*, rtx_def*, machine_mode,
> expand_modifier)
>  /gcc/expr.c:9847
> 0x77e52a expand_gimple_stmt_1
>  /gcc/cfgexpand.c:3784
> 0x77e52a expand_gimple_stmt
>  /gcc/cfgexpand.c:3844
> 0x78068d expand_gimple_basic_block
>  /gcc/cfgexpand.c:5884
> 0x78279c execute
>  /gcc/cfgexpand.c:6539
> 
> This example is for gcc.dg/torture/arm-fp16-ops-1.c target arm-none-eabi.
> 
> You said you saw no regressions, am I missing something?
> (this is still true as of todays' daily-bump
> bec238768255acf0fe5b0993d05cf99f6331b79e)
> 
> Thanks,
> 
> Christophe

Hi Christophe!

Yes I think this is a duplicate of 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93300 which Martin raised last 
Friday.

I'm working on this! I made the rookie mistake of doing my reg-testing on a 
non-final version of the patch rather than the _final_ final version - hence 
not 
picking this up until it was too late... Sorry about that!

I'm working on the fix now :)

Cheers,
Stam


> 
> 
> 
>> Thank you!
>> Stam
>>>
>>> Kyrill
>>>
>>>
>>>>
>>>> Cheers,
>>>> Stam
>>>>
>>>>
>>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>>>
>>>> Details on ARM Bfloat can be found here:
>>>> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bflo

Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2020-01-16 Thread Stam Markianos-Wright


On 1/8/20 3:18 PM, Stam Markianos-Wright wrote:
> 
> 
> On 12/10/19 5:03 PM, Kyrill Tkachov wrote:
>> Hi Stam,
>>
>> On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:
>>> Pinging with more correct maintainers this time :)
>>>
>>> Also would need to backport to gcc7,8,9, but need to get this approved
>>> first!
>>>
>>
>> Sorry for the delay.
> 
> Same here now! Sorry totally forget about this in the lead up to Xmas!
> 
> Done the changes marked below and also removed the unnecessary extra #defines 
> from the test.

Ping :)

Cheers,
Stam

> 
>>
>>
>>> Thank you,
>>> Stam
>>>
>>>
>>>  Forwarded Message 
>>> Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional
>>> branches in Thumb2 (PR91816)
>>> Date: Mon, 21 Oct 2019 10:37:09 +0100
>>> From: Stam Markianos-Wright 
>>> To: Ramana Radhakrishnan 
>>> CC: gcc-patches@gcc.gnu.org , nd ,
>>> James Greenhalgh , Richard Earnshaw
>>> 
>>>
>>>
>>>
>>> On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:
>>> >>
>>> >> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
>>> >> however, on my native Aarch32 setup the test times out when run as part
>>> >> of a big "make check-gcc" regression, but not when run individually.
>>> >>
>>> >> 2019-10-11  Stamatis Markianos-Wright 
>>> >>
>>> >>   * config/arm/arm.md: Update b for Thumb2 range checks.
>>> >>   * config/arm/arm.c: New function arm_gen_far_branch.
>>> >>   * config/arm/arm-protos.h: New function arm_gen_far_branch
>>> >>   prototype.
>>> >>
>>> >> gcc/testsuite/ChangeLog:
>>> >>
>>> >> 2019-10-11  Stamatis Markianos-Wright 
>>> >>
>>> >>   * testsuite/gcc.target/arm/pr91816.c: New test.
>>> >
>>> >> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>>> >> index f995974f9bb..1dce333d1c3 100644
>>> >> --- a/gcc/config/arm/arm-protos.h
>>> >> +++ b/gcc/config/arm/arm-protos.h
>>> >> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
>>> cpu_arch_option *,
>>> >>
>>> >>   void arm_initialize_isa (sbitmap, const enum isa_feature *);
>>> >>
>>> >> +const char * arm_gen_far_branch (rtx *, int,const char * , const char 
>>> >> *);
>>> >> +
>>> >> +
>>> >
>>> > Lets get the nits out of the way.
>>> >
>>> > Unnecessary extra new line, need a space between int and const above.
>>> >
>>> >
>>>
>>> .Fixed!
>>>
>>> >>   #endif /* ! GCC_ARM_PROTOS_H */
>>> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>> >> index 39e1a1ef9a2..1a693d2ddca 100644
>>> >> --- a/gcc/config/arm/arm.c
>>> >> +++ b/gcc/config/arm/arm.c
>>> >> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>>> >>   }
>>> >>   } /* Namespace selftest.  */
>>> >>
>>> >> +
>>> >> +/* Generate code to enable conditional branches in functions over 1 
>>> >> MiB.  */
>>> >> +const char *
>>> >> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
>>> >> +    const char * branch_format)
>>> >
>>> > Not sure if this is some munging from the attachment but check
>>> > vertical alignment of parameters.
>>> >
>>>
>>> .Fixed!
>>>
>>> >> +{
>>> >> +  rtx_code_label * tmp_label = gen_label_rtx ();
>>> >> +  char label_buf[256];
>>> >> +  char buffer[128];
>>> >> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
>>> >> +    CODE_LABEL_NUMBER (tmp_label));
>>> >> +  const char *label_ptr = arm_strip_name_encoding (label_buf);
>>> >> +  rtx dest_label = operands[pos_label];
>>> >> +  operands[pos_label] = tmp_label;
>>> >> +
>>> >> +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr);
>>> >> +  output_asm_insn (buffer, operands);
>>> >> +
>>> >> +  snprintf (buffer,

[Pingx2][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension

2020-01-16 Thread Stam Markianos-Wright


On 1/10/20 6:48 PM, Stam Markianos-Wright wrote:
> 
> 
> On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
>>
>>
>> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
>>> Hi all,
>>>
>>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
>>> operations (vector/by element) to the ARM back-end.
>>>
>>> These are:
>>> usdot (vector), dot (by element).
>>>
>>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
>>> for ARM they remain optional as of ARMv8.6-a.
>>>
>>> The functions are declared in arm_neon.h, RTL patterns are defined to
>>> generate assembler and tests are added to verify and perform adequate 
>>> checks.
>>>
>>> Regression testing on arm-none-eabi passed successfully.
>>>
>>> This patch depends on:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>>
>>> for ARM CLI updates, and on:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>>
>>> for testsuite effective_target update.
>>>
>>> Ok for trunk?
>>
>> .Ping :)
>>
> Ping :)
> 
> New diff addressing review comments from Aarch64 version of the patch.
> 
> _Change of order of operands in RTL patterns.
> _Change tests to use check-function-bodies, compile with optimisation and 
> check 
> for exact registers.
> _Rename tests to remove "-compile-" in filename.
> 

Ping!

Cheers,
Stam

>>>
>>> Cheers,
>>> Stam
>>>
>>>
>>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>>
>>> PS. I don't have commit rights, so if someone could commit on my behalf,
>>> that would be great :)
>>>
>>>
>>> gcc/ChangeLog:
>>>
>>> 2019-11-28  Stam Markianos-Wright  
>>>
>>>  * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>>>  (USTERNOP_QUALIFIERS): New define.
>>>  (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>>  (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>>  (arm_expand_builtin_args):
>>>  Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>>>  (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>>>      * config/arm/arm_neon.h (vusdot_s32): New.
>>>  (vusdot_lane_s32): New.
>>>  (vusdotq_lane_s32): New.
>>>  (vsudot_lane_s32): New.
>>>  (vsudotq_lane_s32): New.
>>>  * config/arm/arm_neon_builtins.def
>>>  (usdot,usdot_lane,sudot_lane): New.
>>>  * config/arm/iterators.md (DOTPROD_I8MM): New.
>>>  (sup, opsuffix): Add .
>>>     * config/arm/neon.md (neon_usdot, dot_lane: New.
>>>  * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>>>
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2019-12-12  Stam Markianos-Wright  
>>>
>>>  * gcc.target/arm/simd/vdot-compile-2-1.c: New test.
>>>  * gcc.target/arm/simd/vdot-compile-2-2.c: New test.
>>>  * gcc.target/arm/simd/vdot-compile-2-3.c: New test.
>>>  * gcc.target/arm/simd/vdot-compile-2-4.c: New test.
>>>
>>>
> 


Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for dot product (usdot - vector, dot - by element) for AArch64 AdvSIMD ARMv8.6 Extension

2020-01-16 Thread Stam Markianos-Wright


On 1/9/20 3:48 PM, Richard Sandiford wrote:
> OK, thanks.
> 

Committed as r10-6004-g8c197c851e7528baba7cb837f34c05ba2242f705

Thank you!
Stam
> Richard
> 
> Stam Markianos-Wright  writes:
>> On 12/30/19 10:21 AM, Richard Sandiford wrote:
>>> Stam Markianos-Wright  writes:
>>>> On 12/20/19 2:13 PM, Richard Sandiford wrote:
>>>>> Stam Markianos-Wright  writes:
>>>>>> +**...
>>>>>> +**ret
>>>>>> +*/
>>>>>> +int32x2_t ufoo (int32x2_t r, uint8x8_t x, int8x8_t y)
>>>>>> +{
>>>>>> +  return vusdot_s32 (r, x, y);
>>>>>> +}
>>>>>> +
>>>>>
>>>>> If we're using check-function-bodies anyway, it might be slightly more
>>>>> robust to compile at -O and check for the exact RA.  E.g.:
>>>>>
>>>>> /*
>>>>> **ufoo:
>>>>> **usdotv0\.2s, (v1\.8b, v2\.8b|v2\.8b, v1\.8b)
>>>>> **ret
>>>>> */
>>>>>
>>>>> Just a suggestion though -- either way is fine.
>>>>
>>>> done this too and as per our internal discussion also added one
>>>> xx_untied tests for usdot and one for usdot_lane
>>>>
>>>> That's one xx_untied test for each of the RTL pattern types added in
>>>> aarch64-simd.md. Lmk if this is ok!
>>>>
>>>> Also I found that the way we were using check-function-bodies wasn't
>>>> actually checking the assembler correctly, so I've changed that to:
>>>> +/* { dg-final { check-function-bodies "**" "" "" } } */
>>>> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
>>>> which seems to perform more checks
>>>
>>> Ah, OK, hadn't realised that we were cycling through optimisation
>>> options already.  In that case, it might be better to leave out the
>>> -O from the dg-options and instead use:
>>>
>>> /* { dg-skip-if "" { *-*-* } { { "-fno-fat-lto-objects" } { "-O0" } } } */
>>>
>>> (untested).
>>>
>>> It's unfortunate that we're skipping this for -O0 though.  Ideally we'd
>>> still compile the code and just skip the dg-final.  Does it work if you do:
>>>
>>> /* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */
>>> /* { dg-skip-if "" { *-*-* } { { "-fno-fat-lto-objects" } } } */
>>>
>>> ?  Make sure that we actually still run the check-function-bodies when
>>> optimisation is enabled. :-)
>>
>> This works!
>> Now we are only doing the following for O0:
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O0  (test 
>> for
>> excess errors)
>>
>> whereas for other optimisation levels do all the checks:
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1  (test 
>> for
>> excess errors)
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1
>> check-function-bodies ufoo
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1
>> check-function-bodies ufooq
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1
>> check-function-bodies ufoo_lane
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1
>> check-function-bodies ufoo_laneq
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1
>> check-function-bodies ufooq_lane
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1
>> check-function-bodies ufooq_laneq
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1
>> check-function-bodies sfoo_lane
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1
>> check-function-bodies sfoo_laneq
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1
>> check-function-bodies sfooq_lane
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1
>> check-function-bodies sfooq_laneq
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1
>> check-function-bodies ufoo_untied
>> PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1
>> check-function-bodies ufooq_laneq_untied
>>
>>>
>>> Also, I'm an idiot.  The reason I'd used (...|...) in the regexps was
>>> that "dot product is commutative".  But of course that's not true for
>>> these mixed-sign ops, so the string must be:

Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for bfdot for ARMv8.6 Extension

2020-01-16 Thread Stam Markianos-Wright


On 1/9/20 3:54 PM, Richard Sandiford wrote:
> Please update the names of the testsuite files to match the ones
> in the bfloat16_t patch.  (Same for the usdot/sudot patch -- sorry
> for forgetting there.)
> 
> OK with that change, thanks.
> 

Done and committed as r10-6006-gf275d73a57f1e5a07fbd4978f4b4457a5eaa1e39

Thank you!
Stam

> Richard
> 
> Stam Markianos-Wright  writes:
>> On 12/30/19 10:29 AM, Richard Sandiford wrote:
>>> Stam Markianos-Wright  writes:
>>>> diff --git a/gcc/config/aarch64/aarch64-simd.md 
>>>> b/gcc/config/aarch64/aarch64-simd.md
>>>> index 
>>>> adfda96f077075ad53d4bea2919c4d3b326e49f5..7587bc46ba1c80389ea49fa83a0e6f8a489711e9
>>>>  100644
>>>> --- a/gcc/config/aarch64/aarch64-simd.md
>>>> +++ b/gcc/config/aarch64/aarch64-simd.md
>>>> @@ -7028,3 +7028,36 @@
>>>>  "xtn\t%0., %1."
>>>>  [(set_attr "type" "neon_shift_imm_narrow_q")]
>>>>)
>>>> +
>>>> +(define_insn "aarch64_bfdot"
>>>> +  [(set (match_operand:VDQSF 0 "register_operand" "=w")
>>>> +  (plus:VDQSF
>>>> +(unspec:VDQSF
>>>> + [(match_operand: 2 "register_operand" "w")
>>>> +  (match_operand: 3 "register_operand" "w")]
>>>> +  UNSPEC_BFDOT)
>>>> +(match_operand:VDQSF 1 "register_operand" "0")))]
>>>> +  "TARGET_BF16_SIMD"
>>>> +  "bfdot\t%0., %2., %3."
>>>> +  [(set_attr "type" "neon_dot")]
>>>> +)
>>>> +
>>>> +
>>>> +(define_insn "aarch64_bfdot_lane"
>>>
>>> Too many blank lines.
>>
>> Fixed, sorry I hadn't noticed!
>>
>>>
>>>> +  [(set (match_operand:VDQSF 0 "register_operand" "=w")
>>>> +  (plus:VDQSF
>>>> +(unspec:VDQSF
>>>> + [(match_operand: 2 "register_operand" "w")
>>>> +  (match_operand:VBF 3 "register_operand" "w")
>>>> +  (match_operand:SI 4 "const_int_operand" "n")]
>>>> +  UNSPEC_BFDOT)
>>>> +(match_operand:VDQSF 1 "register_operand" "0")))]
>>>> +  "TARGET_BF16_SIMD"
>>>> +{
>>>> +  int nunits = GET_MODE_NUNITS (mode).to_constant ();
>>>> +  int lane = INTVAL (operands[4]);
>>>> +  operands[4] = gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane), SImode);
>>>> +  return "bfdot\t%0., %2., %3.2h[%4]";
>>>> +}
>>>> +  [(set_attr "type" "neon_dot")]
>>>> +)
>>>> [...]
>>>> diff --git 
>>>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c 
>>>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c
>>>> new file mode 100644
>>>> index 
>>>> ..c575dcd3901172a52fa9403c9179d58eea44eb72
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c
>>>> @@ -0,0 +1,91 @@
>>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>>>> +/* { dg-add-options arm_v8_2a_bf16_neon }  */
>>>> +/* { dg-additional-options "-O -save-temps" } */
>>>> +/* { dg-final { check-function-bodies "**" "" } } */
>>>> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
>>>
>>> Same comment as for USDOT/SUDOT regarding the dg- markup.
>>
>> Done!
>>>
>>>> +
>>>> +#include 
>>>> +
>>>> +/*
>>>> +**ufoo:
>>>> +**bfdot   v0.2s, (v1.4h, v2.4h|v2.4h, v1.4h)
>>>> +**ret
>>>> +*/
>>>> +float32x2_t ufoo(float32x2_t r, bfloat16x4_t x, bfloat16x4_t y)
>>>> +{
>>>> +  return vbfdot_f32 (r, x, y);
>>>> +}
>>>> +
>>>> +/*
>>>> +**ufooq:
>>>> +**bfdot   v0.4s, (v1.8h, v2.8h|v2.8h, v1.8h)
>>>> +**ret
>>>> +*/
>>>> +float32x4_t ufooq(float32x4_t r, bfloat16x8_t x, bfloat16x8_t y)
>>>> +{
>>>> +  return vbfdotq_f32 (r, x, y);
>>>> +}
>

Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [2/2]

2020-01-16 Thread Stam Markianos-Wright


On 1/13/20 10:43 AM, Kyrill Tkachov wrote:
> Hi Stam,
> 
> On 1/10/20 6:47 PM, Stam Markianos-Wright wrote:
>> Hi all,
>>
>> This patch is part 2 of Bfloat16_t enablement in the ARM back-end.
>>
>> This new type is constrained using target hooks TARGET_INVALID_CONVERSION,
>> TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP so that it may only be used
>> through ACLE intrinsics (will be provided in later patches).
>>
>> Regression testing on arm-none-eabi passed successfully.
>>
>> Ok for trunk?
> 
> 
> Ok.
> 
> Thanks,
> 
> Kyrill

Committed as r10-6021-g3ea9140170b8a511822b1a873dea1227093f3ccf

Thank you!
Stam
> 
> 
>>
>> Cheers,
>> Stam
>>
>>
>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>
>> Details on ARM Bfloat can be found here:
>> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
>>  
>>
>>
>>
>>
>> gcc/ChangeLog:
>>
>> 2020-01-10  Stam Markianos-Wright 
>>
>>     * config/arm/arm.c
>>     (arm_invalid_conversion): New function for target hook.
>>     (arm_invalid_unary_op): New function for target hook.
>>     (arm_invalid_binary_op): New function for target hook.
>>
>> 2020-01-10  Stam Markianos-Wright 
>>
>>     * gcc.target/arm/bfloat16_scalar_typecheck.c: New test.
>>     * gcc.target/arm/bfloat16_vector_typecheck_1.c: New test.
>>     * gcc.target/arm/bfloat16_vector_typecheck_2.c: New test.
>>
>>


Re: [GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]

2020-01-16 Thread Stam Markianos-Wright


On 1/13/20 10:05 AM, Kyrill Tkachov wrote:
> Hi Stam,
> 
> On 1/10/20 6:45 PM, Stam Markianos-Wright wrote:
>> Hi all,
>>
>> This is a respin of patch:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html
>>
>> which has now been split into two (similar to the Aarch64 version).
>>
>> This is patch 1 of 2 and adds Bfloat type support to the ARM back-end.
>> It also adds a new machine_mode (BFmode) for this type and accompanying 
>> Vector
>> modes V4BFmode and V8BFmode.
>>
>> The second patch in this series uses existing target hooks to restrict type 
>> use.
>>
>> Regression testing on arm-none-eabi passed successfully.
>>
>> This patch depends on:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>
>> for test suite effective_target update.
>>
>> Ok for trunk?
> 
> This is ok, thanks.
> 
> You can commit it once the git conversion goes through :)

Committed as r10-6020-g2e87b2f4121fe1d39edb76f4e492dfe327be6a1b

Thank you!
Stam
> 
> Kyrill
> 
> 
>>
>> Cheers,
>> Stam
>>
>>
>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>
>> Details on ARM Bfloat can be found here:
>> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
>>  
>>
>>
>>
>>
>> gcc/ChangeLog:
>>
>> 2020-01-10  Stam Markianos-Wright 
>>
>>     * config.gcc: Add arm_bf16.h.
>>     * config/arm/arm-builtins.c (arm_mangle_builtin_type):  Fix comment.
>>     (arm_simd_builtin_std_type): Add BFmode.
>>     (arm_init_simd_builtin_types): Define element types for vector types.
>>     (arm_init_bf16_types):  New function.
>>     (arm_init_builtins): Add arm_init_bf16_types function call.
>>     * config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes.
>>     * config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF.
>>     * config/arm/arm.c (aapcs_vfp_sub_candidate):  Add BFmode.
>>     (arm_hard_regno_mode_ok): Add BFmode and tidy up statements.
>>     (arm_vector_mode_supported_p): Add V4BF, V8BF.
>>     (arm_mangle_type):
>>     * config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE,
>>   VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node,
>>   arm_bf16_ptr_type_node.
>>     * config/arm/arm.md: New enabled_for_bfmode_scalar,
>>   enabled_for_bfmode_vector attributes. Add BFmode to movhf expand.
>>   pattern and define_split between ARM registers.
>>     * config/arm/arm_bf16.h: New file.
>>     * config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types.
>>     * config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): 
>> New.
>>   (VQXMOV): Add V8BF.
>>     * config/arm/neon.md: Add BF vector types to NEON move patterns.
>>     * config/arm/vfp.md: Add BFmode to movhf patterns.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2020-01-10  Stam Markianos-Wright 
>>
>>     * g++.dg/abi/mangle-neon.C: Add Bfloat vector types.
>>     * g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test.
>>     * gcc.target/arm/bfloat16_scalar_1_1.c: New test.
>>     * gcc.target/arm/bfloat16_scalar_1_2.c: New test.
>>     * gcc.target/arm/bfloat16_scalar_2_1.c: New test.
>>     * gcc.target/arm/bfloat16_scalar_2_2.c: New test.
>>     * gcc.target/arm/bfloat16_scalar_3_1.c: New test.
>>     * gcc.target/arm/bfloat16_scalar_3_2.c: New test.
>>     * gcc.target/arm/bfloat16_scalar_4.c: New test.
>>     * gcc.target/arm/bfloat16_simd_1_1.c: New test.
>>     * gcc.target/arm/bfloat16_simd_1_2.c: New test.
>>     * gcc.target/arm/bfloat16_simd_2_1.c: New test.
>>     * gcc.target/arm/bfloat16_simd_2_2.c: New test.
>>     * gcc.target/arm/bfloat16_simd_3_1.c: New test.
>>     * gcc.target/arm/bfloat16_simd_3_2.c: New test.
>>
>>
>>


Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]

2020-01-10 Thread Stam Markianos-Wright


On 1/10/20 4:29 PM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> On 1/9/20 4:13 PM, Stam Markianos-Wright wrote:
>>> On 1/9/20 4:07 PM, Richard Sandiford wrote:
>>>> Stam Markianos-Wright  writes:
>>>>> diff --git a/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C
>>>>> b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C
>>>>> new file mode 100644
>>>>> index 000..55cbb0b0ef7
>>>>> --- /dev/null
>>>>> +++ b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C
>>>>> @@ -0,0 +1,14 @@
>>>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>>>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>>>>> +/* { dg-add-options arm_v8_2a_bf16_neon }  */
>>>>> +/* { dg-additional-options "-O3 --save-temps" } */
>>>>> +
>>>>> +#include 
>>>>> +
>>>>> +void foo (void)
>>>>> +{
>>>>> +  bfloat16_t (); /* { dg-error {invalid conversion to type 'bfloat16_t'} 
>>>>> ""
>>>>> {target *-*-*} } */
>>>>
>>>> The "" {target *-*-*} stuff isn't needed: that's just for when the test
>>>> depends on a target selector or if you need to specify a line number
>>>> (which comes after the target).
>>
>> Removed them.
>>
>>>
>>> Ah ok cool. I just had something that worked and was just doing ctrl+c 
>>> ctrl+v
>>> everywhere!
>>>
>>>>
>>>> Same for the rest of the patch.
>>>>
>>>>> +  bfloat16_t a = bfloat16_t(); /* { dg-error {invalid conversion to type
>>>>> 'bfloat16_t'} "" {target *-*-*} } */
>>>>
>>>> Why's this one an error?  Looks like it should be OK.  Do we build
>>>> bfloat16_t() as a conversion from a zero integer?
>>>>
>>> Yea that's exactly what it looked like when I went into the debugging! But 
>>> will
>>> investigate a bit further and see if I can fix it for the next revision.
>>>
>>
>> Changed this to dg-bogus with an XFAIL for the purposes of this patch in 
>> Stage 3 :)
> 
> Yeah.  Like we discussed off-list, we'd need to change the target hook
> to do this properly.  (And if we do change the target hook, it would be
> good to make it output the errors itself, like we discussed upthread.)
> Something for GCC 11 perhaps...

Agreed!

> 
>> diff --git a/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C 
>> b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C
>> new file mode 100644
>> index 
>> ..0a04cfb18e567ae0eec88da8ea37922434c60080
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C
>> @@ -0,0 +1,14 @@
>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-add-options arm_v8_2a_bf16_neon }  */
>> +/* { dg-additional-options "-O3 --save-temps" } */
>> +
>> +#include 
>> +
>> +void foo (void)
>> +{
>> +  bfloat16_t (); /* { dg-bogus {invalid conversion to type 'bfloat16_t'} "" 
>> { xfail *-*-* } } */
>> +  bfloat16_t a = bfloat16_t(); /* { dg-error {invalid conversion to type 
>> 'bfloat16_t'} } */
> 
> This should be a dg-bogus too.

Done and committed as 280130.

Diff attached for reference.

Cheers,
Stam

> 
> OK with that change, thanks.
> 
> Richard
> 

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ebd3f6cf45b..ce410ddf551 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -21760,6 +21760,55 @@ aarch64_stack_protect_guard (void)
   return NULL_TREE;
 }
 
+/* Return the diagnostic message string if conversion from FROMTYPE to
+   TOTYPE is not allowed, NULL otherwise.  */
+
+static const char *
+aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
+{
+  if (element_mode (fromtype) != element_mode (totype))
+{
+  /* Do no allow conversions to/from BFmode scalar types.  */
+  if (TYPE_MODE (fromtype) == BFmode)
+	return N_("invalid conversion from type %");
+  if (TYPE_MODE (totype) == BFmode)
+	return N_("invalid conversion to type %");
+}
+
+  /* Conversion allowed.  */
+  return NULL;
+}
+
+/* Return the diagnostic message string if the unary operation OP is
+   not permitted on TYPE, NULL otherwise.  */
+
+static const char *
+aarch64_invalid_unary_op (int op, const_tree typ

Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]

2020-01-10 Thread Stam Markianos-Wright


On 1/9/20 3:42 PM, Richard Sandiford wrote:
> Thanks for the update, looks great.
> 
> Stam Markianos-Wright  writes:
>> diff --git a/gcc/config/aarch64/arm_bf16.h b/gcc/config/aarch64/arm_bf16.h
>> new file mode 100644
>> index 
>> ..884b6f3bc7a28c516e54c26a71b1b769f55867a7
>> --- /dev/null
>> +++ b/gcc/config/aarch64/arm_bf16.h
>> @@ -0,0 +1,32 @@
>> +/* Arm BF16 instrinsics include file.
>> +
>> +   Copyright (C) 2019 Free Software Foundation, Inc.
>> +   Contributed by Arm.
> 
> Needs to include 2020 now :-)  Maybe 2019-2020 since it was posted
> in 2019 and would have been changed to 2019-2020 in the automatic update.
> 
> Which reminds me to update my patches too...
> 
> OK for trunk with that change, thanks.

Done and committed as 280129.

Diff attached for reference (and as an attempt to try and keep myself sane and 
not mix it all up!)

Cheers,
Stam

> 
> Richard
> 


diff --git a/gcc/config.gcc b/gcc/config.gcc
index c3d6464f3e6adaa1db818a61de00cff8e00ae08e..075e46072d1643302b9587d4e3f14f2e29b4ec8d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -315,7 +315,7 @@ m32c*-*-*)
 ;;
 aarch64*-*-*)
 	cpu_type=aarch64
-	extra_headers="arm_fp16.h arm_neon.h arm_acle.h arm_sve.h"
+	extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h"
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	d_target_objs="aarch64-d.o"
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 1bd2640a1ced352de232fed1cf134b46c69b80f7..b2d6b761489183c262320d62293bec343b315c11 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -68,6 +68,9 @@
 #define hi_UPE_HImode
 #define hf_UPE_HFmode
 #define qi_UPE_QImode
+#define bf_UPE_BFmode
+#define v4bf_UP  E_V4BFmode
+#define v8bf_UP  E_V8BFmode
 #define UP(X) X##_UP
 
 #define SIMD_MAX_BUILTIN_ARGS 5
@@ -568,6 +571,10 @@ static tree aarch64_simd_intXI_type_node = NULL_TREE;
 tree aarch64_fp16_type_node = NULL_TREE;
 tree aarch64_fp16_ptr_type_node = NULL_TREE;
 
+/* Back-end node type for brain float (bfloat) types.  */
+tree aarch64_bf16_type_node = NULL_TREE;
+tree aarch64_bf16_ptr_type_node = NULL_TREE;
+
 /* Wrapper around add_builtin_function.  NAME is the name of the built-in
function, TYPE is the function type, and CODE is the function subcode
(relative to AARCH64_BUILTIN_GENERAL).  */
@@ -659,6 +666,8 @@ aarch64_simd_builtin_std_type (machine_mode mode,
   return float_type_node;
 case E_DFmode:
   return double_type_node;
+case E_BFmode:
+  return aarch64_bf16_type_node;
 default:
   gcc_unreachable ();
 }
@@ -750,6 +759,10 @@ aarch64_init_simd_builtin_types (void)
   aarch64_simd_types[Float64x1_t].eltype = double_type_node;
   aarch64_simd_types[Float64x2_t].eltype = double_type_node;
 
+  /* Init Bfloat vector types with underlying __bf16 type.  */
+  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
+  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
+
   for (i = 0; i < nelts; i++)
 {
   tree eltype = aarch64_simd_types[i].eltype;
@@ -1059,6 +1072,19 @@ aarch64_init_fp16_types (void)
   aarch64_fp16_ptr_type_node = build_pointer_type (aarch64_fp16_type_node);
 }
 
+/* Initialize the backend REAL_TYPE type supporting bfloat types.  */
+static void
+aarch64_init_bf16_types (void)
+{
+  aarch64_bf16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (aarch64_bf16_type_node) = 16;
+  SET_TYPE_MODE (aarch64_bf16_type_node, BFmode);
+  layout_type (aarch64_bf16_type_node);
+
+  lang_hooks.types.register_builtin_type (aarch64_bf16_type_node, "__bf16");
+  aarch64_bf16_ptr_type_node = build_pointer_type (aarch64_bf16_type_node);
+}
+
 /* Pointer authentication builtins that will become NOP on legacy platform.
Currently, these builtins are for internal use only (libgcc EH unwinder).  */
 
@@ -1214,6 +1240,8 @@ aarch64_general_init_builtins (void)
 
   aarch64_init_fp16_types ();
 
+  aarch64_init_bf16_types ();
+
   if (TARGET_SIMD)
 aarch64_init_simd_builtins ();
 
diff --git a/gcc/config/aarch64/aarch64-modes.def b/gcc/config/aarch64/aarch64-modes.def
index 6cd8ed0972ad7029e0319aad71d3afbda5684a4f..1eeb8d884520b1a53b8a580f165d42858c03228c 100644
--- a/gcc/config/aarch64/aarch64-modes.def
+++ b/gcc/config/aarch64/aarch64-modes.def
@@ -69,6 +69,13 @@ VECTOR_MODES (FLOAT, 16); /*V4SF V2DF.  */
 VECTOR_MODE (FLOAT, DF, 1);   /* V1DF.  */
 VECTOR_MODE (FLOAT, HF, 2);   /* V2HF.  */
 
+/* Bfloat16 modes.  */
+FLOAT_MODE (BF, 2, 0);
+ADJUST_FLOAT_FORMAT (BF, _bfloat_half_format);
+
+VECTOR_MODE (FLOAT, BF, 4);   /*		 V4BF.  */
+VECTOR_MODE (FLOAT, BF, 8);   /*		 V8BF.  */
+
 /* Oct Int: 256-bit inte

Re: [Ping][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension

2020-01-10 Thread Stam Markianos-Wright


On 12/18/19 1:25 PM, Stam Markianos-Wright wrote:
> 
> 
> On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
>> Hi all,
>>
>> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
>> operations (vector/by element) to the ARM back-end.
>>
>> These are:
>> usdot (vector), dot (by element).
>>
>> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
>> for ARM they remain optional as of ARMv8.6-a.
>>
>> The functions are declared in arm_neon.h, RTL patterns are defined to
>> generate assembler and tests are added to verify and perform adequate checks.
>>
>> Regression testing on arm-none-eabi passed successfully.
>>
>> This patch depends on:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>
>> for ARM CLI updates, and on:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>
>> for testsuite effective_target update.
>>
>> Ok for trunk?
> 
> .Ping :)
> 
Ping :)

New diff addressing review comments from Aarch64 version of the patch.

_Change of order of operands in RTL patterns.
_Change tests to use check-function-bodies, compile with optimisation and check 
for exact registers.
_Rename tests to remove "-compile-" in filename.

>>
>> Cheers,
>> Stam
>>
>>
>> ACLE documents are at https://developer.arm.com/docs/101028/latest
>> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
>>
>> PS. I don't have commit rights, so if someone could commit on my behalf,
>> that would be great :)
>>
>>
>> gcc/ChangeLog:
>>
>> 2019-11-28  Stam Markianos-Wright  
>>
>>  * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>>  (USTERNOP_QUALIFIERS): New define.
>>  (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>  (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>>  (arm_expand_builtin_args):
>>  Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>>  (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>>  * config/arm/arm_neon.h (vusdot_s32): New.
>>  (vusdot_lane_s32): New.
>>  (vusdotq_lane_s32): New.
>>  (vsudot_lane_s32): New.
>>  (vsudotq_lane_s32): New.
>>  * config/arm/arm_neon_builtins.def
>>  (usdot,usdot_lane,sudot_lane): New.
>>  * config/arm/iterators.md (DOTPROD_I8MM): New.
>>  (sup, opsuffix): Add .
>>     * config/arm/neon.md (neon_usdot, dot_lane: New.
>>  * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-12-12  Stam Markianos-Wright  
>>
>>  * gcc.target/arm/simd/vdot-compile-2-1.c: New test.
>>  * gcc.target/arm/simd/vdot-compile-2-2.c: New test.
>>  * gcc.target/arm/simd/vdot-compile-2-3.c: New test.
>>  * gcc.target/arm/simd/vdot-compile-2-4.c: New test.
>>
>>

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index df84560588a..1b4316d0e93 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -86,7 +86,10 @@ enum arm_type_qualifiers
   qualifier_const_void_pointer = 0x802,
   /* Lane indices selected in pairs - must be within range of previous
  argument = a vector.  */
-  qualifier_lane_pair_index = 0x1000
+  qualifier_lane_pair_index = 0x1000,
+  /* Lane indices selected in quadtuplets - must be within range of previous
+ argument = a vector.  */
+  qualifier_lane_quadtup_index = 0x2000
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned };
 #define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers)
 
+/* T (T, unsigned T, T).  */
+static enum arm_type_qualifiers
+arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned,
+  qualifier_none };
+#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers)
+
 /* T (T, immediate).  */
 static enum arm_type_qualifiers
 arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned, qualifier_lane_index };
 #define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers)
 
+/* T (T, unsigned T, T, lane index).  */
+static enum arm_type_qualifiers
+arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned,
+  qualifier_none, qualifier_lane_quadtup_index };
+#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers)
+
+/* T (T, T, unsigend T, lane index).  */
+static enum arm_type_qualifiers
+a

[GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [2/2]

2020-01-10 Thread Stam Markianos-Wright
Hi all,

This patch is part 2 of Bfloat16_t enablement in the ARM back-end.

This new type is constrained using target hooks TARGET_INVALID_CONVERSION,
TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP so that it may only be used
through ACLE intrinsics (will be provided in later patches).

Regression testing on arm-none-eabi passed successfully.

Ok for trunk?

Cheers,
Stam


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Details on ARM Bfloat can be found here:
https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
 



gcc/ChangeLog:

2020-01-10  Stam Markianos-Wright  

* config/arm/arm.c
(arm_invalid_conversion): New function for target hook.
(arm_invalid_unary_op): New function for target hook.
(arm_invalid_binary_op): New function for target hook.

2020-01-10  Stam Markianos-Wright  

* gcc.target/arm/bfloat16_scalar_typecheck.c: New test.
* gcc.target/arm/bfloat16_vector_typecheck_1.c: New test.
* gcc.target/arm/bfloat16_vector_typecheck_2.c: New test.


diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 9bd228b5433..d4180d4166c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -688,6 +688,15 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE arm_mangle_type
 
+#undef TARGET_INVALID_CONVERSION
+#define TARGET_INVALID_CONVERSION arm_invalid_conversion
+
+#undef TARGET_INVALID_UNARY_OP
+#define TARGET_INVALID_UNARY_OP arm_invalid_unary_op
+
+#undef TARGET_INVALID_BINARY_OP
+#define TARGET_INVALID_BINARY_OP arm_invalid_binary_op
+
 #undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV
 #define TARGET_ATOMIC_ASSIGN_EXPAND_FENV arm_atomic_assign_expand_fenv
 
@@ -32432,6 +32441,55 @@ arm_coproc_ldc_stc_legitimate_address (rtx op)
   return false;
 }
 
+/* Return the diagnostic message string if conversion from FROMTYPE to
+   TOTYPE is not allowed, NULL otherwise.  */
+
+static const char *
+arm_invalid_conversion (const_tree fromtype, const_tree totype)
+{
+  if (element_mode (fromtype) != element_mode (totype))
+{
+  /* Do no allow conversions to/from BFmode scalar types.  */
+  if (TYPE_MODE (fromtype) == BFmode)
+	return N_("invalid conversion from type %");
+  if (TYPE_MODE (totype) == BFmode)
+	return N_("invalid conversion to type %");
+}
+
+  /* Conversion allowed.  */
+  return NULL;
+}
+
+/* Return the diagnostic message string if the unary operation OP is
+   not permitted on TYPE, NULL otherwise.  */
+
+static const char *
+arm_invalid_unary_op (int op, const_tree type)
+{
+  /* Reject all single-operand operations on BFmode except for &.  */
+  if (element_mode (type) == BFmode && op != ADDR_EXPR)
+return N_("operation not permitted on type %");
+
+  /* Operation allowed.  */
+  return NULL;
+}
+
+/* Return the diagnostic message string if the binary operation OP is
+   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
+
+static const char *
+arm_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
+			   const_tree type2)
+{
+  /* Reject all 2-operand operations on BFmode.  */
+  if (element_mode (type1) == BFmode
+  || element_mode (type2) == BFmode)
+return N_("operation not permitted on type %");
+
+  /* Operation allowed.  */
+  return NULL;
+}
+
 /* Implement TARGET_CAN_CHANGE_MODE_CLASS.
 
In VFPv1, VFP registers could only be accessed in the mode they were
diff --git a/gcc/testsuite/g++.target/arm/bfloat_cpp_typecheck.C b/gcc/testsuite/g++.target/arm/bfloat_cpp_typecheck.C
new file mode 100644
index 000..3e6f7d83752
--- /dev/null
+++ b/gcc/testsuite/g++.target/arm/bfloat_cpp_typecheck.C
@@ -0,0 +1,14 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
+/* { dg-add-options arm_v8_2a_bf16_neon }  */
+/* { dg-additional-options "-O3 --save-temps" } */
+
+#include 
+
+void foo (void)
+{
+  bfloat16_t (); /* { dg-bogus {invalid conversion to type 'bfloat16_t'} "" { xfail *-*-* } } */
+  bfloat16_t a = bfloat16_t(); /* { dg-bogus {invalid conversion to type 'bfloat16_t'} "" { xfail *-*-* } } */
+  bfloat16_t (0x1234); /* { dg-error {invalid conversion to type 'bfloat16_t'} } */
+  bfloat16_t (0.1); /* { dg-error {invalid conversion to type 'bfloat16_t'} } */
+}
diff --git a/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c
new file mode 100644
index 000..672641e6630
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/bfloat16_scalar_typecheck.c
@@ -0,0 +1,219 @@
+/* { dg-do assemble { target { arm*-*-* } } } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-require-effective-tar

[GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end [1/2]

2020-01-10 Thread Stam Markianos-Wright
Hi all,

This is a respin of patch:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01448.html

which has now been split into two (similar to the Aarch64 version).

This is patch 1 of 2 and adds Bfloat type support to the ARM back-end.
It also adds a new machine_mode (BFmode) for this type and accompanying Vector
modes V4BFmode and V8BFmode.

The second patch in this series uses existing target hooks to restrict type use.

Regression testing on arm-none-eabi passed successfully.

This patch depends on:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html

for test suite effective_target update.

Ok for trunk?

Cheers,
Stam


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Details on ARM Bfloat can be found here:
https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
 



gcc/ChangeLog:

2020-01-10  Stam Markianos-Wright  

* config.gcc: Add arm_bf16.h.
* config/arm/arm-builtins.c (arm_mangle_builtin_type):  Fix comment.
(arm_simd_builtin_std_type): Add BFmode.
(arm_init_simd_builtin_types): Define element types for vector types.
(arm_init_bf16_types):  New function.
(arm_init_builtins): Add arm_init_bf16_types function call.
* config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes.
* config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF.
* config/arm/arm.c (aapcs_vfp_sub_candidate):  Add BFmode.
(arm_hard_regno_mode_ok): Add BFmode and tidy up statements.
(arm_vector_mode_supported_p): Add V4BF, V8BF.
(arm_mangle_type):
* config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE,
  VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node,
  arm_bf16_ptr_type_node.
* config/arm/arm.md: New enabled_for_bfmode_scalar,
  enabled_for_bfmode_vector attributes. Add BFmode to movhf expand.
  pattern and define_split between ARM registers.
* config/arm/arm_bf16.h: New file.
* config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types.
* config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): New.
  (VQXMOV): Add V8BF.
* config/arm/neon.md: Add BF vector types to NEON move patterns.
* config/arm/vfp.md: Add BFmode to movhf patterns.

gcc/testsuite/ChangeLog:

2020-01-10  Stam Markianos-Wright  

* g++.dg/abi/mangle-neon.C: Add Bfloat vector types.
* g++.dg/ext/arm-bf16/bf16-mangle-1.C: New test.
* gcc.target/arm/bfloat16_scalar_1_1.c: New test.
* gcc.target/arm/bfloat16_scalar_1_2.c: New test.
* gcc.target/arm/bfloat16_scalar_2_1.c: New test.
* gcc.target/arm/bfloat16_scalar_2_2.c: New test.
* gcc.target/arm/bfloat16_scalar_3_1.c: New test.
* gcc.target/arm/bfloat16_scalar_3_2.c: New test.
* gcc.target/arm/bfloat16_scalar_4.c: New test.
* gcc.target/arm/bfloat16_simd_1_1.c: New test.
* gcc.target/arm/bfloat16_simd_1_2.c: New test.
* gcc.target/arm/bfloat16_simd_2_1.c: New test.
* gcc.target/arm/bfloat16_simd_2_2.c: New test.
* gcc.target/arm/bfloat16_simd_3_1.c: New test.
* gcc.target/arm/bfloat16_simd_3_2.c: New test.



diff --git a/gcc/config.gcc b/gcc/config.gcc
index c3d6464f3e6adaa1db818a61de00cff8e00ae08e..6a7a4725fe5e99fba16b40b18cfebb84984d06b8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -344,7 +344,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index df84560588a842ce3c69c589367625f6098cb5bb..7f279cca6688c6f11948159666ee647ae533c61d 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -315,12 +315,14 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define v8qi_UP  E_V8QImode
 #define v4hi_UP  E_V4HImode
 #define v4hf_UP  E_V4HFmode
+#define v4bf_UP  E_V4BFmode
 #define v2si_UP  E_V2SImode
 #define v2sf_UP  E_V2SFmode
 #define di_UPE_DImode
 #define v16qi_UP E_V16QImode
 #define v8hi_UP  E_V8HImode
 #define v8hf_UP  E_V8HFmode
+#define v8bf_UP  E_V8BFmode
 #define v4si_UP  E_V4SImode
 #define v4sf_UP  E_V4SFmode
 #define v2di_UP  E_V2DImode
@@ -328,9 +330,10 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define ei_UP	 E_EImode
 #define oi_UP	 E_OImode
 #define hf_UP	 E_HFmode
+#define bf_UPE_BFmode
 #define si_UP	 E_SImode
 #define void_UP	 E_VOIDmode
-
+#define sf_UP	 E_SFmode
 #define UP(X) X##_UP
 
 typedef struct {
@@ -806,6 +809,11 @@ static struct arm_s

Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]

2020-01-10 Thread Stam Markianos-Wright


On 1/9/20 4:13 PM, Stam Markianos-Wright wrote:
> 
> 
> On 1/9/20 4:07 PM, Richard Sandiford wrote:
>> Stam Markianos-Wright  writes:
>>> diff --git a/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C 
>>> b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C
>>> new file mode 100644
>>> index 000..55cbb0b0ef7
>>> --- /dev/null
>>> +++ b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C
>>> @@ -0,0 +1,14 @@
>>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>>> +/* { dg-add-options arm_v8_2a_bf16_neon }  */
>>> +/* { dg-additional-options "-O3 --save-temps" } */
>>> +
>>> +#include 
>>> +
>>> +void foo (void)
>>> +{
>>> +  bfloat16_t (); /* { dg-error {invalid conversion to type 'bfloat16_t'} 
>>> "" 
>>> {target *-*-*} } */
>>
>> The "" {target *-*-*} stuff isn't needed: that's just for when the test
>> depends on a target selector or if you need to specify a line number
>> (which comes after the target).

Removed them.

> 
> Ah ok cool. I just had something that worked and was just doing ctrl+c ctrl+v 
> everywhere!
> 
>>
>> Same for the rest of the patch.
>>
>>> +  bfloat16_t a = bfloat16_t(); /* { dg-error {invalid conversion to type 
>>> 'bfloat16_t'} "" {target *-*-*} } */
>>
>> Why's this one an error?  Looks like it should be OK.  Do we build
>> bfloat16_t() as a conversion from a zero integer?
>>
> Yea that's exactly what it looked like when I went into the debugging! But 
> will 
> investigate a bit further and see if I can fix it for the next revision.
> 

Changed this to dg-bogus with an XFAIL for the purposes of this patch in Stage 
3 :)

> Thank you so much for the help in getting these fixed :D
> 
> Cheers,
> Stam
> 
>> Looks good otherwise, thanks, but I think we should try to support
>> the line above if we can.
>>
>> Richard
>>
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index ebd3f6cf45bc0b5118c4c39e323e6380d64c885e..ce410ddf5515407a4680e186b04c6b6a40ae2562 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -21760,6 +21760,55 @@ aarch64_stack_protect_guard (void)
   return NULL_TREE;
 }
 
+/* Return the diagnostic message string if conversion from FROMTYPE to
+   TOTYPE is not allowed, NULL otherwise.  */
+
+static const char *
+aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
+{
+  if (element_mode (fromtype) != element_mode (totype))
+{
+  /* Do no allow conversions to/from BFmode scalar types.  */
+  if (TYPE_MODE (fromtype) == BFmode)
+	return N_("invalid conversion from type %");
+  if (TYPE_MODE (totype) == BFmode)
+	return N_("invalid conversion to type %");
+}
+
+  /* Conversion allowed.  */
+  return NULL;
+}
+
+/* Return the diagnostic message string if the unary operation OP is
+   not permitted on TYPE, NULL otherwise.  */
+
+static const char *
+aarch64_invalid_unary_op (int op, const_tree type)
+{
+  /* Reject all single-operand operations on BFmode except for &.  */
+  if (element_mode (type) == BFmode && op != ADDR_EXPR)
+return N_("operation not permitted on type %");
+
+  /* Operation allowed.  */
+  return NULL;
+}
+
+/* Return the diagnostic message string if the binary operation OP is
+   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
+
+static const char *
+aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
+			   const_tree type2)
+{
+  /* Reject all 2-operand operations on BFmode.  */
+  if (element_mode (type1) == BFmode
+  || element_mode (type2) == BFmode)
+return N_("operation not permitted on type %");
+
+  /* Operation allowed.  */
+  return NULL;
+}
+
 /* Implement TARGET_ASM_FILE_END for AArch64.  This adds the AArch64 GNU NOTE
section at the end if needed.  */
 #define GNU_PROPERTY_AARCH64_FEATURE_1_AND	0xc000
@@ -22010,6 +22059,15 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE aarch64_mangle_type
 
+#undef TARGET_INVALID_CONVERSION
+#define TARGET_INVALID_CONVERSION aarch64_invalid_conversion
+
+#undef TARGET_INVALID_UNARY_OP
+#define TARGET_INVALID_UNARY_OP aarch64_invalid_unary_op
+
+#undef TARGET_INVALID_BINARY_OP
+#define TARGET_INVALID_BINARY_OP aarch64_invalid_binary_op
+
 #undef TARGET_VERIFY_TYPE_CONTEXT
 #define TARGET_VERIFY_TYPE_CONTEXT aarch64_verify_type_context
 
diff --git a/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C
new file mode 100644
index 

Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]

2020-01-09 Thread Stam Markianos-Wright


On 1/9/20 4:07 PM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> diff --git a/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C 
>> b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C
>> new file mode 100644
>> index 000..55cbb0b0ef7
>> --- /dev/null
>> +++ b/gcc/testsuite/g++.target/aarch64/bfloat_cpp_typecheck.C
>> @@ -0,0 +1,14 @@
>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-add-options arm_v8_2a_bf16_neon }  */
>> +/* { dg-additional-options "-O3 --save-temps" } */
>> +
>> +#include 
>> +
>> +void foo (void)
>> +{
>> +  bfloat16_t (); /* { dg-error {invalid conversion to type 'bfloat16_t'} "" 
>> {target *-*-*} } */
> 
> The "" {target *-*-*} stuff isn't needed: that's just for when the test
> depends on a target selector or if you need to specify a line number
> (which comes after the target).

Ah ok cool. I just had something that worked and was just doing ctrl+c ctrl+v 
everywhere!

> 
> Same for the rest of the patch.
> 
>> +  bfloat16_t a = bfloat16_t(); /* { dg-error {invalid conversion to type 
>> 'bfloat16_t'} "" {target *-*-*} } */
> 
> Why's this one an error?  Looks like it should be OK.  Do we build
> bfloat16_t() as a conversion from a zero integer?
> 
Yea that's exactly what it looked like when I went into the debugging! But will 
investigate a bit further and see if I can fix it for the next revision.

Thank you so much for the help in getting these fixed :D

Cheers,
Stam

> Looks good otherwise, thanks, but I think we should try to support
> the line above if we can.
> 
> Richard
> 


Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]

2020-01-09 Thread Stam Markianos-Wright


On 1/7/20 5:14 PM, Richard Sandiford wrote:
> Thanks for the update.  The new patch looks really good, just some
> minor comments.
> 
> Stam Markianos-Wright  writes:
>> [...]
>> Also I've update the filenames of all our tests to make them a bit clearer:
>>
>> C tests:
>>
>> __ bfloat16_scalar_compile_1.c to bfloat16_scalar_compile_3.c: Compilation of
>> scalar moves/loads/stores with "-march8.2-a+bf16", "-march8.2-a and +bf16 
>> target
>> pragma", "-march8.2-a" (now does not error out at all). There now include
>> register asms to check more MOV alternatives.
>>
>> __ bfloat16_scalar_compile_4.c: The _Complex error test.
>>
>> __ bfloat16_simd_compile_1.c to bfloat16_simd_compile_3.c: Likewise to
>> x_scalar_x, but also include (vector) 0x1234.. compilation (no assembler 
>> scan).
> 
> Sounds good to me, although TBH the "_compile" feels a bit redundant.

Yes, true that! Removed it.

> 
>> I had also done a small c++ test, but have chosen to shift that to the [2/2]
>> patch because it is currently being blocked by target_invalid_conversion.
> 
> OK.  Does that include the mangling test?

Aaah no, this is the test checking for bfloat16_t(), bfloat16_t (0x1234), 
bfloat16_t(0.25), etc. (which are more of language-level checks)

Oh! I had forgotten about the mangling, so I've added it in this revision.

> 
>> [...]
>>>>> - a test that involves moving constants, for both scalars and vectors.
>>>>>  You can create zero scalar constants in C++ using bfloat16_t() etc.
>>>>>  For vectors it's possible to do things like:
>>>>>
>>>>>typedef short v2bf __attribute__((vector_size(4)));
>>>>>v2hi foo (void) { return (v2hi) 0x12345678; }
>>>>>
>>>>>  The same sort of things should work for bfloat16x4_t and 
>>>>> bfloat16x8_t.
>>>>
>>>> Leaving this as an open issue for now because I'm not 100% sure what we
>>>> should/shouldn't be allowing past the tree-level target hooks.
>>>>
>>>> If we do want to block this we would do this in the [2/2] patch.
>>>> I will come back to it and create a scan-assembler test when I'm more 
>>>> clear on
>>>> what we should and shouldn't allow at the higher level :)
>>>
>>> FWIW, I'm not sure we should go out of our way to disallow this.
>>> Preventing bfloat16_t() in C++ would IMO be unnatural.  And the
>>> "(vector) vector-sized-integer" syntax specifically treats the vector
>>> as a bundle of bits without really caring what the element type is.
>>> Even if we did manage to forbid the conversion in that context,
>>> it would still be possible to achieve the same thing using:
>>>
>>>  v2hi
>>>  foo (void)
>>>  {
>>>union { v2hi v; unsigned int i; } u;
>>>u.i = 0x12345678;
>>>return u.v;
>>>  }
>>>
>> Added the compilation of "(vector) vector-sized-integer" in the vector tests.
>>
>> But target_invalid_conversion in the [2/2] patch is a complication to this 
>> (as
>> with bfloat_16t() in c++.
>>
>> I was under the impression that the original intent of bfloat was for it to 
>> be
>> storage only, with any initialisation happening through the float32 convert
>> intrinsic.
>>
>> Either I'd be happy to allow it, but it does feel like we'd slightly be going
>> against what's the ACLE currently.
>> However, looking back at it now, it only mentions using ACLE intrinsics over 
>> C
>> operators, so I'd be happy to allow this for vectors.
>>
>> For scalars though, if we e.g. were to allow:
>>
>> bfloat16_t (0x1234);
>>
>> on a single bfloat, I don't see how we could still block conversions like:
>>
>> bfloat16_t scalar1 = 0.1;
>> bfloat16_t scalar2 = 0;
>> bfloat16_t scalar3 = is_a_float;
>>
>> Agreed that the union {} would still always slip through, though.
> 
> It wasn't clear sorry, but I meant literally "bfloat16_t()", i.e.
> construction with zero initialisation.  I agree we don't want to
> support "bfloat16_t(0.25)" etc.

Added to [2/2] as mentioned above.

> 
>> [...]
>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c 
>>>> b/gcc/testsuite/gcc.target/aarch64/bfloat16_compile_1.c
>>>> new file mode 100644
>>>> index 000..f2bef671deb
>>>> --- /dev/null

Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]

2020-01-09 Thread Stam Markianos-Wright


On 1/7/20 3:26 PM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> On 12/19/19 10:08 AM, Richard Sandiford wrote:
>>> Stam Markianos-Wright  writes:
>>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>>> index f57469b6e23..f40f6432fd4 100644
>>>> --- a/gcc/config/aarch64/aarch64.c
>>>> +++ b/gcc/config/aarch64/aarch64.c
>>>> @@ -21661,6 +21661,68 @@ aarch64_stack_protect_guard (void)
>>>>  return NULL_TREE;
>>>>}
>>>>
>>>> +/* Return the diagnostic message string if conversion from FROMTYPE to
>>>> +   TOTYPE is not allowed, NULL otherwise.  */
>>>> +
>>>> +static const char *
>>>> +aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
>>>> +{
>>>> +  static char templ[100];
>>>> +  if ((GET_MODE_INNER (TYPE_MODE (fromtype)) == BFmode
>>>> +   || GET_MODE_INNER (TYPE_MODE (totype)) == BFmode)
>>>> +   && TYPE_MODE (fromtype) != TYPE_MODE (totype))
>>>> +  {
>>>> +snprintf (templ, sizeof (templ), \
>>>> +  "incompatible types when assigning to type '%s' from type '%s'",
>>>> +  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))),
>>>> +  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype;
>>>> +return N_(templ);
>>>> +  }
>>>> +  /* Conversion allowed.  */
>>>> +  return NULL;
>>>> +}
>>>> +
>>>
>>> This won't handle translation properly.  We also have no guarantee that
>>> the formatted string will fit in 100 characters since at least one of
>>> the type names is unconstrained.  (Also, not all types have names.)
>>>
>>
>> Hi Richard. I'm sending an email here to show you what I have done here, too 
>> :)
>>
>> Currently I have the following:
>>
>> static const char *
>> aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
>> {
>> static char templ[100];
>> if (TYPE_MODE (fromtype) != TYPE_MODE (totype)
>> && ((TYPE_MODE (fromtype) == BFmode && !VECTOR_TYPE_P (fromtype))
>>|| (TYPE_MODE (totype) == BFmode && !VECTOR_TYPE_P (totype
> 
> Just:
> 
>  if (TYPE_MODE (fromtype) != TYPE_MODE (totype)
>  && (TYPE_MODE (fromtype) == BFmode || TYPE_MODE (fromtype) == 
> BFmode))
> 
> should be enough.  Types that have BFmode can't also be vectors.

Yep, agreed.

> 
>>   {
>> if (TYPE_NAME (fromtype) != NULL && TYPE_NAME (totype) != NULL)
>>  {
>>snprintf (templ, sizeof (templ),
>>  "incompatible types when assigning to type '%s' from type '%s'",
>>  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))),
>>  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype;
>>return N_(templ);
>>  }
>> else
>>  {
>>snprintf (templ, sizeof (templ),
>>  "incompatible types for assignment");
>>return N_(templ);
>>  }
> 
> This still has the problem I mentioned above though: DECL_NAMEs are
> supplied by the user and can be arbitrary lengths, so there's no
> guarantee that the error message fits in the 100-character buffer.
> We would get a truncated message if the buffer isn't big enough.
> 
> As far as translation goes: the arguments to diagnostic functions
> like "error" are untranslated strings, which the diagnostic functions
> then translate internally.  po/exgettext scans the source tree looking
> for strings that need to be translatable and collects them all in po/gcc.pot.
> Constant format strings in calls to known diagnostic functions get picked
> up automatically (see ABOUT-GCC-NLS), but others need to be marked with
> N_().  This N_() is simply a no-op wrapper macro that marks the argument
> as needing translation.  It has no effect if the argument isn't a
> constant string.
> 
> The interface of this hook is to return an untranslated diagnostic string
> that gets passed to error.  A better interface would be to let the hook
> raise its own error and return a boolean result, but that isn't what
> we have.
> 
> So in the above, it's "incompatible types for assignment" that needs to
> be wrapped in N_().  Wrapping templ has no effect.
> 
> This is also why the first arm doesn't work for translation.  It constructs
> and returns an arbitrary new string that won't have been entered into
> gcc.pot (an

Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for bfdot for ARMv8.6 Extension

2020-01-09 Thread Stam Markianos-Wright


On 12/30/19 10:29 AM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> diff --git a/gcc/config/aarch64/aarch64-simd.md 
>> b/gcc/config/aarch64/aarch64-simd.md
>> index 
>> adfda96f077075ad53d4bea2919c4d3b326e49f5..7587bc46ba1c80389ea49fa83a0e6f8a489711e9
>>  100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -7028,3 +7028,36 @@
>> "xtn\t%0., %1."
>> [(set_attr "type" "neon_shift_imm_narrow_q")]
>>   )
>> +
>> +(define_insn "aarch64_bfdot"
>> +  [(set (match_operand:VDQSF 0 "register_operand" "=w")
>> +(plus:VDQSF
>> +  (unspec:VDQSF
>> +   [(match_operand: 2 "register_operand" "w")
>> +(match_operand: 3 "register_operand" "w")]
>> +UNSPEC_BFDOT)
>> +  (match_operand:VDQSF 1 "register_operand" "0")))]
>> +  "TARGET_BF16_SIMD"
>> +  "bfdot\t%0., %2., %3."
>> +  [(set_attr "type" "neon_dot")]
>> +)
>> +
>> +
>> +(define_insn "aarch64_bfdot_lane"
> 
> Too many blank lines.

Fixed, sorry I hadn't noticed!

> 
>> +  [(set (match_operand:VDQSF 0 "register_operand" "=w")
>> +(plus:VDQSF
>> +  (unspec:VDQSF
>> +   [(match_operand: 2 "register_operand" "w")
>> +(match_operand:VBF 3 "register_operand" "w")
>> +(match_operand:SI 4 "const_int_operand" "n")]
>> +UNSPEC_BFDOT)
>> +  (match_operand:VDQSF 1 "register_operand" "0")))]
>> +  "TARGET_BF16_SIMD"
>> +{
>> +  int nunits = GET_MODE_NUNITS (mode).to_constant ();
>> +  int lane = INTVAL (operands[4]);
>> +  operands[4] = gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane), SImode);
>> +  return "bfdot\t%0., %2., %3.2h[%4]";
>> +}
>> +  [(set_attr "type" "neon_dot")]
>> +)
>> [...]
>> diff --git 
>> a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c 
>> b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c
>> new file mode 100644
>> index 
>> ..c575dcd3901172a52fa9403c9179d58eea44eb72
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c
>> @@ -0,0 +1,91 @@
>> +/* { dg-do assemble { target { aarch64*-*-* } } } */
>> +/* { dg-require-effective-target arm_v8_2a_bf16_neon_ok } */
>> +/* { dg-add-options arm_v8_2a_bf16_neon }  */
>> +/* { dg-additional-options "-O -save-temps" } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
> 
> Same comment as for USDOT/SUDOT regarding the dg- markup.

Done!
> 
>> +
>> +#include 
>> +
>> +/*
>> +**ufoo:
>> +**  bfdot   v0.2s, (v1.4h, v2.4h|v2.4h, v1.4h)
>> +**  ret
>> +*/
>> +float32x2_t ufoo(float32x2_t r, bfloat16x4_t x, bfloat16x4_t y)
>> +{
>> +  return vbfdot_f32 (r, x, y);
>> +}
>> +
>> +/*
>> +**ufooq:
>> +**  bfdot   v0.4s, (v1.8h, v2.8h|v2.8h, v1.8h)
>> +**  ret
>> +*/
>> +float32x4_t ufooq(float32x4_t r, bfloat16x8_t x, bfloat16x8_t y)
>> +{
>> +  return vbfdotq_f32 (r, x, y);
>> +}
> 
> The (...|...)s here are correct.
Yep.
> 
>> +
>> +/*
>> +**ufoo_lane:
>> +**  bfdot   v0.2s, (v1.4h, v2.2h\[0\]|v2.4h, v1.2h\[0\])
>> +**  ret
>> +*/
>> +float32x2_t ufoo_lane(float32x2_t r, bfloat16x4_t x, bfloat16x4_t y)
>> +{
>> +  return vbfdot_lane_f32 (r, x, y, 0);
>> +}
>> +
>> +/*
>> +**ufooq_laneq:
>> +**  bfdot   v0.4s, (v1.8h, v2.2h\[2\]|v2.8h, v1.2h\[2\])
>> +**  ret
>> +*/
>> +float32x4_t ufooq_laneq(float32x4_t r, bfloat16x8_t x, bfloat16x8_t y)
>> +{
>> +  return vbfdotq_laneq_f32 (r, x, y, 2);
>> +}
>> +
>> +/*
>> +**ufoo_laneq:
>> +**  bfdot   v0.2s, (v1.4h, v2.2h\[3\]|v2.4h, v1.2h\[3\])
>> +**  ret
>> +*/
>> +float32x2_t ufoo_laneq(float32x2_t r, bfloat16x4_t x, bfloat16x8_t y)
>> +{
>> +  return vbfdot_laneq_f32 (r, x, y, 3);
>> +}
>> +
>> +/*
>> +**ufooq_lane:
>> +**  bfdot   v0.4s, (v1.8h, v2.2h\[1\]|v2.8h, v1.2h\[1\])
>> +**  ret
>> +*/
>> +float32x4_t ufooq_lane(float32x4_t r, bf

Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for dot product (usdot - vector, dot - by element) for AArch64 AdvSIMD ARMv8.6 Extension

2020-01-09 Thread Stam Markianos-Wright


On 12/30/19 10:21 AM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> On 12/20/19 2:13 PM, Richard Sandiford wrote:
>>> Stam Markianos-Wright  writes:
>>>> +**...
>>>> +**ret
>>>> +*/
>>>> +int32x2_t ufoo (int32x2_t r, uint8x8_t x, int8x8_t y)
>>>> +{
>>>> +  return vusdot_s32 (r, x, y);
>>>> +}
>>>> +
>>>
>>> If we're using check-function-bodies anyway, it might be slightly more
>>> robust to compile at -O and check for the exact RA.  E.g.:
>>>
>>> /*
>>> **ufoo:
>>> **usdotv0\.2s, (v1\.8b, v2\.8b|v2\.8b, v1\.8b)
>>> **ret
>>> */
>>>
>>> Just a suggestion though -- either way is fine.
>>
>> done this too and as per our internal discussion also added one
>> xx_untied tests for usdot and one for usdot_lane
>>
>> That's one xx_untied test for each of the RTL pattern types added in
>> aarch64-simd.md. Lmk if this is ok!
>>
>> Also I found that the way we were using check-function-bodies wasn't
>> actually checking the assembler correctly, so I've changed that to:
>> +/* { dg-final { check-function-bodies "**" "" "" } } */
>> +/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
>> which seems to perform more checks
> 
> Ah, OK, hadn't realised that we were cycling through optimisation
> options already.  In that case, it might be better to leave out the
> -O from the dg-options and instead use:
> 
> /* { dg-skip-if "" { *-*-* } { { "-fno-fat-lto-objects" } { "-O0" } } } */
> 
> (untested).
> 
> It's unfortunate that we're skipping this for -O0 though.  Ideally we'd
> still compile the code and just skip the dg-final.  Does it work if you do:
> 
> /* { dg-final { check-function-bodies "**" "" {-O[^0]} } } */
> /* { dg-skip-if "" { *-*-* } { { "-fno-fat-lto-objects" } } } */
> 
> ?  Make sure that we actually still run the check-function-bodies when
> optimisation is enabled. :-)

This works!
Now we are only doing the following for O0:
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O0  (test for 
excess errors)

whereas for other optimisation levels do all the checks:
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1  (test for 
excess errors)
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1 
check-function-bodies ufoo
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1 
check-function-bodies ufooq
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1 
check-function-bodies ufoo_lane
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1 
check-function-bodies ufoo_laneq
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1 
check-function-bodies ufooq_lane
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1 
check-function-bodies ufooq_laneq
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1 
check-function-bodies sfoo_lane
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1 
check-function-bodies sfoo_laneq
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1 
check-function-bodies sfooq_lane
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1 
check-function-bodies sfooq_laneq
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1 
check-function-bodies ufoo_untied
PASS: gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c   -O1 
check-function-bodies ufooq_laneq_untied

> 
> Also, I'm an idiot.  The reason I'd used (...|...) in the regexps was
> that "dot product is commutative".  But of course that's not true for
> these mixed-sign ops, so the string must be:
> 
>   usdot  v0\.2s, v1\.8b, v2\.8b
> 
> The patch copied the (...|...) regexps above to the lane tests, but those
> wouldn't be commutative even if the operands had the same type.

Ahh, makes sense now. Done :)

Cheers,
Stam

> 
> Thanks,
> Richard
> 


diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 1bd2640a1ced352de232fed1cf134b46c69b80f7..702b317d94d2fc6ebe59609727ad853f3f5cc652 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -107,6 +107,9 @@ enum aarch64_type_qualifiers
   /* Lane indices selected in pairs. - must be in range, and flipped for
  bigendian.  */
   qualifier_lane_pair_index = 0x800,
+  /* Lane indices selected in quadtuplets. - must be in range, and flipped for
+ bigendian.  */
+  qualifier_lane_quadtup_index = 0x1000,
 };
 
 typedef struct
@@ -173,6 +176,10 @@ aarch64_types_

Re: [PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2020-01-08 Thread Stam Markianos-Wright


On 12/10/19 5:03 PM, Kyrill Tkachov wrote:
> Hi Stam,
> 
> On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:
>> Pinging with more correct maintainers this time :)
>>
>> Also would need to backport to gcc7,8,9, but need to get this approved
>> first!
>>
> 
> Sorry for the delay.

Same here now! Sorry totally forget about this in the lead up to Xmas!

Done the changes marked below and also removed the unnecessary extra #defines 
from the test.

> 
> 
>> Thank you,
>> Stam
>>
>>
>>  Forwarded Message 
>> Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional
>> branches in Thumb2 (PR91816)
>> Date: Mon, 21 Oct 2019 10:37:09 +0100
>> From: Stam Markianos-Wright 
>> To: Ramana Radhakrishnan 
>> CC: gcc-patches@gcc.gnu.org , nd ,
>> James Greenhalgh , Richard Earnshaw
>> 
>>
>>
>>
>> On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:
>> >>
>> >> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
>> >> however, on my native Aarch32 setup the test times out when run as part
>> >> of a big "make check-gcc" regression, but not when run individually.
>> >>
>> >> 2019-10-11  Stamatis Markianos-Wright 
>> >>
>> >>   * config/arm/arm.md: Update b for Thumb2 range checks.
>> >>   * config/arm/arm.c: New function arm_gen_far_branch.
>> >>   * config/arm/arm-protos.h: New function arm_gen_far_branch
>> >>   prototype.
>> >>
>> >> gcc/testsuite/ChangeLog:
>> >>
>> >> 2019-10-11  Stamatis Markianos-Wright 
>> >>
>> >>   * testsuite/gcc.target/arm/pr91816.c: New test.
>> >
>> >> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>> >> index f995974f9bb..1dce333d1c3 100644
>> >> --- a/gcc/config/arm/arm-protos.h
>> >> +++ b/gcc/config/arm/arm-protos.h
>> >> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
>> cpu_arch_option *,
>> >>
>> >>   void arm_initialize_isa (sbitmap, const enum isa_feature *);
>> >>
>> >> +const char * arm_gen_far_branch (rtx *, int,const char * , const char *);
>> >> +
>> >> +
>> >
>> > Lets get the nits out of the way.
>> >
>> > Unnecessary extra new line, need a space between int and const above.
>> >
>> >
>>
>> .Fixed!
>>
>> >>   #endif /* ! GCC_ARM_PROTOS_H */
>> >> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> >> index 39e1a1ef9a2..1a693d2ddca 100644
>> >> --- a/gcc/config/arm/arm.c
>> >> +++ b/gcc/config/arm/arm.c
>> >> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>> >>   }
>> >>   } /* Namespace selftest.  */
>> >>
>> >> +
>> >> +/* Generate code to enable conditional branches in functions over 1 MiB. 
>> >>  */
>> >> +const char *
>> >> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
>> >> +    const char * branch_format)
>> >
>> > Not sure if this is some munging from the attachment but check
>> > vertical alignment of parameters.
>> >
>>
>> .Fixed!
>>
>> >> +{
>> >> +  rtx_code_label * tmp_label = gen_label_rtx ();
>> >> +  char label_buf[256];
>> >> +  char buffer[128];
>> >> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
>> >> +    CODE_LABEL_NUMBER (tmp_label));
>> >> +  const char *label_ptr = arm_strip_name_encoding (label_buf);
>> >> +  rtx dest_label = operands[pos_label];
>> >> +  operands[pos_label] = tmp_label;
>> >> +
>> >> +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr);
>> >> +  output_asm_insn (buffer, operands);
>> >> +
>> >> +  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
>> >> label_ptr);
>> >> +  operands[pos_label] = dest_label;
>> >> +  output_asm_insn (buffer, operands);
>> >> +  return "";
>> >> +}
>> >> +
>> >> +
>> >
>> > Unnecessary extra newline.
>> >
>>
>> .Fixed!
>>
>> >>   #undef TARGET_RUN_TARGET_SELFTESTS
>> >>   #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
>>

Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]

2020-01-07 Thread Stam Markianos-Wright


On 12/19/19 10:08 AM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index f57469b6e23..f40f6432fd4 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -21661,6 +21661,68 @@ aarch64_stack_protect_guard (void)
>> return NULL_TREE;
>>   }
>>   
>> +/* Return the diagnostic message string if conversion from FROMTYPE to
>> +   TOTYPE is not allowed, NULL otherwise.  */
>> +
>> +static const char *
>> +aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
>> +{
>> +  static char templ[100];
>> +  if ((GET_MODE_INNER (TYPE_MODE (fromtype)) == BFmode
>> +   || GET_MODE_INNER (TYPE_MODE (totype)) == BFmode)
>> +   && TYPE_MODE (fromtype) != TYPE_MODE (totype))
>> +  {
>> +snprintf (templ, sizeof (templ), \
>> +  "incompatible types when assigning to type '%s' from type '%s'",
>> +  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))),
>> +  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype;
>> +return N_(templ);
>> +  }
>> +  /* Conversion allowed.  */
>> +  return NULL;
>> +}
>> +
> 
> This won't handle translation properly.  We also have no guarantee that
> the formatted string will fit in 100 characters since at least one of
> the type names is unconstrained.  (Also, not all types have names.)
> 

Hi Richard. I'm sending an email here to show you what I have done here, too :)

Currently I have the following:

static const char *
aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
{
   static char templ[100];
   if (TYPE_MODE (fromtype) != TYPE_MODE (totype)
   && ((TYPE_MODE (fromtype) == BFmode && !VECTOR_TYPE_P (fromtype))
  || (TYPE_MODE (totype) == BFmode && !VECTOR_TYPE_P (totype
 {
   if (TYPE_NAME (fromtype) != NULL && TYPE_NAME (totype) != NULL)
{
  snprintf (templ, sizeof (templ),
"incompatible types when assigning to type '%s' from type '%s'",
IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))),
IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype;
  return N_(templ);
}
   else
{
  snprintf (templ, sizeof (templ),
"incompatible types for assignment");
  return N_(templ);
}
 }
   /* Conversion allowed.  */
   return NULL;
}

This blocks the conversion only if the two types are of different modes and one 
of them is a BFmode scalar.

Doing it like this seems to block all scalar-sized assignments:

C:

typedef bfloat16_t vbf __attribute__((vector_size(2)));
vbf foo3 (void) { return (vbf) 0x1234; }

bfloat16_t foo1 (void) { return (bfloat16_t) 0x1234; }

bfloat16_t scalar1_3 = 0;
bfloat16_t scalar1_4 = 0.1;
bfloat16_t scalar1_5 = is_a_float;

bfloat16x4_t vector2_8 = { 0.0, 0, n2, is_a_float }; // (blocked on each 
element 
assignment)


C++:

bfloat16_t c1 (void) { return bfloat16_t (0x1234); }

bfloat16_t c2 (void) { return bfloat16_t (0.1); }


But then it allows vector initialisation from binary:

C:
bfloat16x4_t foo1 (void) { return (bfloat16x4_t) 0x1234567812345678; }

C++:
bfloat16x4_t foo1 (void) { return bfloat16x4_t (0x1234567812345678); }
typedef bfloat16_t v2bf __attribute__((vector_size(4)));
v2bf foo3 (void) { return v2bf (0x12345678); }

I also need to check with a colleague who is on holiday if any of this impacts 
the vector-reinterpret intrinsics that he was working on...

Let me know of your thoughts!

Cheers,
Stam

> Unfortunately the interface of the current hook doesn't allow for good
> diagnostics.  We'll just have to return a fixed string. >
> Formatting nit: braced block should be indented two spaces more
> than the "if (...)".
> 
> Same comment for the other hooks.

Done. Will be in next revision

> 
>> +/* Return the diagnostic message string if the unary operation OP is
>> +   not permitted on TYPE, NULL otherwise.  */
>> +
>> +static const char *
>> +aarch64_invalid_unary_op (int op, const_tree type)
>> +{
>> +  static char templ[100];
>> +  /* Reject all single-operand operations on BFmode except for &.  */
>> +  if (GET_MODE_INNER (TYPE_MODE (type)) == BFmode && op != ADDR_EXPR)
>> +  {
>> +snprintf (templ, sizeof (templ),
>> +  "operation not permitted on type '%s'",
>> +  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type;
>> +return N_(templ);
>> +  }
>> +  /* Operation allowed.  */
>> +  return NULL;
>> +}
> 
> The problem with testing TYPE_MODE is that we'll then miss things
> that don

Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]

2020-01-07 Thread Stam Markianos-Wright
On 23/12/2019 16:57, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> On 12/19/19 10:01 AM, Richard Sandiford wrote:
>>>> +
>>>> +#pragma GCC push_options
>>>> +#pragma GCC target ("arch=armv8.2-a+bf16")
>>>> +#ifdef __ARM_FEATURE_BF16_SCALAR_ARITHMETIC
>>>> +
>>>> +typedef __bf16 bfloat16_t;
>>>> +
>>>> +
>>>> +#endif
>>>> +#pragma GCC pop_options
>>>> +
>>>> +#endif
>>>
>>> Are you sure we need the #ifdef?  The target pragma should guarantee
>>> that the macro's defined.
>>>
>>> But the validity of the typedef shouldn't depend on target options,
>>> so AFAICT this should just be:
>>>
>>> typedef __bf16 bfloat16_t;
>>
>> Ok so it's a case of "what do we want to happen if the user tries to use 
>> bfloats
>> without +bf16 enabled.
>>
>> So the intent of the ifdef was to not have bfloat16_t be visible if the macro
>> wasn't defined (i.e. not having any bf16 support), but I see now that this 
>> was
>> being negated by the target macro, anyway! Oops, my bad for not really
>> understanding that, sorry!
>>
>> If we have the types always visible, then the user may use them, resulting 
>> in an
>> ICE.
>>
>> But even if the #ifdef worked this still doesn't stop the user from trying to
>> use  __bf16 or __Bfloat16x4_t, __Bfloat16x8_t , which would still do produce 
>> an
>> ICE, so it's not a perfect solution anyway...
> 
> Right.  Or they could use #pragma GCC target to switch to a different
> non-bf16 target after including arm_bf16.h.
> 
>> One other thing I tried was the below change to aarch64-builtins.c which 
>> stops
>> __bf16 or the vector types from being registered at all:
>>
>> --- a/gcc/config/aarch64/aarch64-builtins.c
>> +++ b/gcc/config/aarch64/aarch64-builtins.c
>> @@ -759,26 +759,32 @@ aarch64_init_simd_builtin_types (void)
>>   aarch64_simd_types[Float64x1_t].eltype = double_type_node;
>>   aarch64_simd_types[Float64x2_t].eltype = double_type_node;
>>
>> -  /* Init Bfloat vector types with underlying __bf16 type.  */
>> -  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
>> -  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
>> +  if (TARGET_BF16_SIMD)
>> +{
>> +  /* Init Bfloat vector types with underlying __bf16 type.  */
>> +  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
>> +  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
>> +}
>>
>>   for (i = 0; i < nelts; i++)
>> {
>>   tree eltype = aarch64_simd_types[i].eltype;
>>   machine_mode mode = aarch64_simd_types[i].mode;
>>
>> -  if (aarch64_simd_types[i].itype == NULL)
>> +  if (eltype != NULL)
>>{
>> - aarch64_simd_types[i].itype
>> -   = build_distinct_type_copy
>> - (build_vector_type (eltype, GET_MODE_NUNITS (mode)));
>> - SET_TYPE_STRUCTURAL_EQUALITY (aarch64_simd_types[i].itype);
>> -   }
>> + if (aarch64_simd_types[i].itype == NULL)
>> +   {
>> + aarch64_simd_types[i].itype
>> +   = build_distinct_type_copy
>> +   (build_vector_type (eltype, GET_MODE_NUNITS (mode)));
>> + SET_TYPE_STRUCTURAL_EQUALITY (aarch64_simd_types[i].itype);
>> +   }
>>
>> -  tdecl = add_builtin_type (aarch64_simd_types[i].name,
>> -   aarch64_simd_types[i].itype);
>> -  TYPE_NAME (aarch64_simd_types[i].itype) = tdecl;
>> + tdecl = add_builtin_type (aarch64_simd_types[i].name,
>> +   aarch64_simd_types[i].itype);
>> + TYPE_NAME (aarch64_simd_types[i].itype) = tdecl;
>> +   }
>> }
>>
>> #define AARCH64_BUILD_SIGNED_TYPE(mode)  \
>> @@ -1240,7 +1246,8 @@ aarch64_general_init_builtins (void)
>>
>>   aarch64_init_fp16_types ();
>>
>> -  aarch64_init_bf16_types ();
>> +  if (TARGET_BF16_FP)
>> +aarch64_init_bf16_types ();
>>
>>   if (TARGET_SIMD)
>> aarch64_init_simd_builtins ();
>>
>>
>>
>> But the problem in that case was that it the types could not be re-enabled 
>> using
>> a target pragma like:
>>
>> #pragma GCC push_options
>> #pragma GCC target ("+bf16")
>>
>> Inside the test

Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for bfdot for ARMv8.6 Extension

2019-12-30 Thread Stam Markianos-Wright


On 12/20/19 2:36 PM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> Hi all,
>>
>> This patch adds the ARMv8.6 Extension ACLE intrinsics for the bfloat bfdot
>> operation.
>>
>> The functions are declared in arm_neon.h with the armv8.2-a+bf16 target 
>> option
>> as required.
>>
>> RTL patterns are defined to generate assembler.
>>
>> Tests added to verify expected assembly and perform adequate lane checks.
>>
>> This patch depends on:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
>>
>> for testuite effective_target update and on:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01323.html
>> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01324.html
>>
>> for back-end Bfloat enablement.
>>
>> Cheers,
>> Stam
>>
>>
>> gcc/ChangeLog:
>>
>> 2019-11-04  Stam Markianos-Wright  
>>
>>  * config/aarch64/aarch64-simd-builtins.def (aarch64_bfdot,
>> aarch64_bfdot_lane, aarch64_bfdot_laneq): New.
>>  * config/aarch64/aarch64-simd.md
>> (aarch64_bfdot, aarch64_bfdot_lane): New.
>>  * config/aarch64/arm_neon.h (vbfdot_f32, vbfdotq_f32, vbfdot_lane_f32,
>> vbfdotq_lane_f32, vbfdot_laneq_f32, vbfdotq_laneq_f32): New.
>>  * config/aarch64/iterators.md (UNSPEC_BFDOT, VBF, isquadop, Vbfdottype,
>> VBFMLA_W): New.
> 
> Changelog nit: the continuation lines should be indened by a tab only.

Yes, sorry, that's my email client messing things up again! Fixed 
locally and will carry over when I do the commit.

> 
>> diff --git a/gcc/config/aarch64/aarch64-simd.md 
>> b/gcc/config/aarch64/aarch64-simd.md
>> index 
>> c4858ab7cffd786066646a5cd95a168311990b76..bdc26c190610580e57e9749804b7729ee4e34793
>>  100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -7027,3 +7027,37 @@
>> "xtn\t%0., %1."
>> [(set_attr "type" "neon_shift_imm_narrow_q")]
>>   )
>> +
>> +(define_insn "aarch64_bfdot"
>> +  [(set (match_operand:VDQSF 0 "register_operand" "=w")
>> +(plus:VDQSF (match_operand:VDQSF 1 "register_operand" "0")
>> +(unspec:VDQSF [(match_operand: 2
>> +"register_operand" "w")
>> +   (match_operand: 3
>> +"register_operand" "w")]
>> +   UNSPEC_BFDOT)))]
> 
> The operands to the plus should be the other way around, so that
> the more complicated operand comes first,
> 

Done

>> +  "TARGET_BF16_SIMD"
>> +  "bfdot\t%0., %2., %3."
>> +  [(set_attr "type" "neon_dot")]
>> +)
>> +
>> +
>> +(define_insn "aarch64_bfdot_lane"
>> +  [(set (match_operand:VDQSF 0 "register_operand" "=w")
>> +(plus:VDQSF (match_operand:VDQSF 1 "register_operand" "0")
>> +(unspec:VDQSF [(match_operand: 2
>> +"register_operand" "w")
>> +   (match_operand: VBF 3
> 
> Nit: should be no space before "VBF".

Done

> 
>> +"register_operand" "w")
>> +   (match_operand:SI 4
>> +"const_int_operand" "n")]
>> +   UNSPEC_BFDOT)))]
>> +  "TARGET_BF16_SIMD"
>> +{
>> +  int nunits = GET_MODE_NUNITS (mode).to_constant ();
>> +  int lane = INTVAL (operands[4]);
>> +  operands[4] =  gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane), SImode);
> 
> Should only be one space after "=".

Done

> 
>> +  return "bfdot\t%0., %2., %3.2h[%4]";
>> +}
>> +  [(set_attr "type" "neon_dot")]
>> +)
>> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
>> index 
>> 5996df0a612caff3c881fc15b0aa12b8f91a193b..0357d97cc4143c3a9c56260d9a9cc24138afc049
>>  100644
>> --- a/gcc/config/aarch64/arm_neon.h
>> +++ b/gcc/config/aarch64/arm_neon.h
>> @@ -34612,6 +34612,57 @@ vrnd64xq_f64 (float64x2_t __a)
>>   
>>   #include "arm_bf16.h"
>>   
>> +#pragma GCC push_options
>> +#pragma GCC target ("a

Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for dot product (usdot - vector, dot - by element) for AArch64 AdvSIMD ARMv8.6 Extension

2019-12-30 Thread Stam Markianos-Wright


On 12/20/19 2:13 PM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> diff --git a/gcc/config/aarch64/aarch64-simd.md 
>> b/gcc/config/aarch64/aarch64-simd.md
>> index 
>> ad4676bc167f08951e693916c7ef796e3501762a..eba71f004ef67af654f9c512b720aa6cfdd1d7fc
>>  100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -506,6 +506,19 @@
>> [(set_attr "type" "neon_dot")]
>>   )
>>   
>> +;; These instructions map to the __builtins for the armv8.6a I8MM usdot
>> +;; (vector) Dot Product operation.
>> +(define_insn "aarch64_usdot"
>> +  [(set (match_operand:VS 0 "register_operand" "=w")
>> +(plus:VS (match_operand:VS 1 "register_operand" "0")
>> +(unspec:VS [(match_operand: 2 "register_operand" "w")
>> +(match_operand: 3 "register_operand" "w")]
>> +UNSPEC_USDOT)))]
>> +  "TARGET_SIMD && TARGET_I8MM"
>> +  "usdot\\t%0., %2., %3."
>> +  [(set_attr "type" "neon_dot")]
>> +)
>> +
>>   ;; These expands map to the Dot Product optab the vectorizer checks for.
>>   ;; The auto-vectorizer expects a dot product builtin that also does an
>>   ;; accumulation into the provided register.
> 
> Sorry for not raising it last time, but this should just be "TARGET_I8MM".
> TARGET_SIMD is always true when TARGET_I8MM is.

Oh no worries! Thank you so much for the detailed feedback, every time :D
Fixed/

> 
>> @@ -573,6 +586,25 @@
>> [(set_attr "type" "neon_dot")]
>>   )
>>   
>> +;; These instructions map to the __builtins for the armv8.6a I8MM usdot, 
>> sudot
>> +;; (by element) Dot Product operations.
>> +(define_insn "aarch64_dot_lane"
>> +  [(set (match_operand:VS 0 "register_operand" "=w")
>> +(plus:VS (match_operand:VS 1 "register_operand" "0")
>> +(unspec:VS [(match_operand: 2 "register_operand" "w")
>> +(match_operand:VB 3 "register_operand" "w")
>> +(match_operand:SI 4 "immediate_operand" "i")]
>> +DOTPROD_I8MM)))]
>> +  "TARGET_SIMD && TARGET_I8MM"
>> +  {
>> +int nunits = GET_MODE_NUNITS (mode).to_constant ();
>> +int lane = INTVAL (operands[4]);
>> +operands[4] = gen_int_mode (ENDIAN_LANE_N (nunits / 4, lane), SImode);
>> +return "dot\\t%0., %2., 
>> %3.4b[%4]";
>> +  }
>> +  [(set_attr "type" "neon_dot")]
>> +)
>> +
>>   (define_expand "copysign3"
>> [(match_operand:VHSDF 0 "register_operand")
>>  (match_operand:VHSDF 1 "register_operand")
> 
> Same here.  Another thing I should have noticed last time is that the
> canonical order for (plus ...) is to have the more complicated expression
> first.  Operand 1 and the (unpec ...) should therefore be the other
> way around in the expression above.  (Having operand 1 "later" than
> operands 2, 3 and 4 is OK.)
Done.

> 
>> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
>> index 
>> 8b861601a48b2150aa5768d717c61e0d1416747f..95b92dff69343e2b6c74174b39f3cd9d9838ddab
>>  100644
>> --- a/gcc/config/aarch64/arm_neon.h
>> +++ b/gcc/config/aarch64/arm_neon.h
>> @@ -34606,6 +34606,89 @@ vrnd64xq_f64 (float64x2_t __a)
>>   
>>   #pragma GCC pop_options
>>   
>> +/* AdvSIMD 8-bit Integer Matrix Multiply (I8MM) intrinsics.  */
>> +
>> +#pragma GCC push_options
>> +#pragma GCC target ("arch=armv8.2-a+i8mm")
>> +
>> +__extension__ extern __inline int32x2_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vusdot_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b)
>> +{
>> +  return __builtin_aarch64_usdotv8qi_ssus (__r, __a, __b);
>> +}
>> +
>> +__extension__ extern __inline int32x4_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b)
>> +{
>> +  return __builtin_aarch64_usdotv16qi_ssus (__r, __a, __b);
>> +}
>> +
>> +__extension__ extern __inline int32x2_t
>> +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
>> +vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a, int8x8_t __b, const int 
&g

Re: [GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]

2019-12-23 Thread Stam Markianos-Wright


On 12/19/19 10:01 AM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> [...]
>> @@ -659,6 +666,8 @@ aarch64_simd_builtin_std_type (machine_mode mode,
>> return float_type_node;
>>   case E_DFmode:
>> return double_type_node;
>> +case E_BFmode:
>> +  return aarch64_bf16_type_node;
>>   default:
>> gcc_unreachable ();
>>   }
>> @@ -750,6 +759,11 @@ aarch64_init_simd_builtin_types (void)
>> aarch64_simd_types[Float64x1_t].eltype = double_type_node;
>> aarch64_simd_types[Float64x2_t].eltype = double_type_node;
>>   
>> +
>> +/* Init Bfloat vector types with underlying uint types.  */
>> +  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
>> +  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
> 
> Formatting nits: too many blank lines, comment should be indented
> to match the code.

Done :)

> 
>> +
>> for (i = 0; i < nelts; i++)
>>   {
>> tree eltype = aarch64_simd_types[i].eltype;
>> @@ -1059,6 +1073,19 @@ aarch64_init_fp16_types (void)
>> aarch64_fp16_ptr_type_node = build_pointer_type (aarch64_fp16_type_node);
>>   }
>>   
>> +/* Initialize the backend REAL_TYPE type supporting bfloat types.  */
>> +static void
>> +aarch64_init_bf16_types (void)
>> +{
>> +  aarch64_bf16_type_node = make_node (REAL_TYPE);
>> +  TYPE_PRECISION (aarch64_bf16_type_node) = 16;
>> +  SET_TYPE_MODE (aarch64_bf16_type_node, BFmode);
>> +  layout_type (aarch64_bf16_type_node);
>> +
>> +  (*lang_hooks.types.register_builtin_type) (aarch64_bf16_type_node, 
>> "__bf16");
> 
> This style is mostly a carry-over from pre-ANSI days.  New code
> can just use "lang_hooks.types.register_builtin_type (...)".

Ahh good to know, thanks! Done

> 
>> +  aarch64_bf16_ptr_type_node = build_pointer_type (aarch64_bf16_type_node);
>> +}
>> +
>>   /* Pointer authentication builtins that will become NOP on legacy platform.
>>  Currently, these builtins are for internal use only (libgcc EH 
>> unwinder).  */
>>   
>> [...]
>> diff --git a/gcc/config/aarch64/aarch64-simd-builtin-types.def 
>> b/gcc/config/aarch64/aarch64-simd-builtin-types.def
>> index b015694293c..3b387377f38 100644
>> --- a/gcc/config/aarch64/aarch64-simd-builtin-types.def
>> +++ b/gcc/config/aarch64/aarch64-simd-builtin-types.def
>> @@ -50,3 +50,5 @@
>> ENTRY (Float32x4_t, V4SF, none, 13)
>> ENTRY (Float64x1_t, V1DF, none, 13)
>> ENTRY (Float64x2_t, V2DF, none, 13)
>> +  ENTRY (Bfloat16x4_t, V4BF, none, 15)
>> +  ENTRY (Bfloat16x8_t, V8BF, none, 15)
> 
> Should be 14 (number of characters + 2 for "__").  Would be good to have
> a test for correct C++ mangling.

Done, thank you for pointing it out!!

> 
>> [...]
>> @@ -101,10 +101,10 @@
>> [(set_attr "type" "neon_dup")]
>>   )
>>   
>> -(define_insn "*aarch64_simd_mov"
>> -  [(set (match_operand:VD 0 "nonimmediate_operand"
>> +(define_insn "*aarch64_simd_mov"
>> +  [(set (match_operand:VDMOV 0 "nonimmediate_operand"
>>  "=w, m,  m,  w, ?r, ?w, ?r, w")
>> -(match_operand:VD 1 "general_operand"
>> +(match_operand:VDMOV 1 "general_operand"
>>  "m,  Dz, w,  w,  w,  r,  r, Dn"))]
>> "TARGET_SIMD
>>  && (register_operand (operands[0], mode)
>> @@ -126,13 +126,14 @@
>>   }
>> [(set_attr "type" "neon_load1_1reg, store_8, neon_store1_1reg,\
>>   neon_logic, neon_to_gp, f_mcr,\
>> - mov_reg, neon_move")]
>> + mov_reg, neon_move")
>> +(set_attr "arch" "*,notbf16,*,*,*,*,*,notbf16")]
>>   )
> 
> Together with the changes to the arch attribute:
> 
>> @@ -378,6 +378,12 @@
>>  (and (eq_attr "arch" "fp16")
>>   (match_test "TARGET_FP_F16INST"))
>>   
>> +(and (eq_attr "arch" "fp16_notbf16")
>> + (match_test "TARGET_FP_F16INST && !TARGET_BF16_FP"))
>> +
>> +(and (eq_attr "arch" "notbf16")
>> + (match_test "!TARGET_BF16_SIMD"))
>> +
>>  (and (eq_attr "arch" "sve")
>>   (match_test "TARGET_SVE")))
>>   (const_string "yes")
> 
> this will dis

[GCC][PATCH][ARM] Add Bfloat16_t scalar type, vector types and machine modes to ARM back-end

2019-12-20 Thread Stam Markianos-Wright
Hi all,

This patch was developed at the same time as the aarch64 version. Richards' 
feedback on that one also applies here and we'll be addressing them in a respin.

However, it's still useful to get this up for everyone (including ARM 
maintainers) to look and and comment, too.

For reference , the latest emails in the Aarch64 thread are at:
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01364.html
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01362.html

(The respin will also be split into two in a similar fashion to the Aarch64 
version)

Regression testing on arm-none-eabi passed successfully.

This patch depends on:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html

for test suite effective_target update.

Cheers,
Stam


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Details on ARM Bfloat can be found here:
https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
 



gcc/ChangeLog:

2019-12-16  Stam Markianos-Wright  

* config.gcc: Add arm_bf16.h.
* config/arm/arm-builtins.c (arm_mangle_builtin_type): Fix comment.
(arm_simd_builtin_std_type): Add BFmode.
(arm_init_simd_builtin_types): Define element types for vector types.
(arm_init_bf16_types): New function.
(arm_init_builtins): Add arm_init_bf16_types function call.
* config/arm/arm-modes.def: Add BFmode and V4BF, V8BF vector modes.
* config/arm/arm-simd-builtin-types.def: Add V4BF, V8BF.
* config/arm/arm.c (aapcs_vfp_sub_candidate): Add BFmode.
(arm_hard_regno_mode_ok): Add BFmode and tidy up statements.
(arm_vector_mode_supported_p): Add V4BF, V8BF.
(arm_invalid_conversion): New function for target hook.
(arm_invalid_unary_op): New function for target hook.
(arm_invalid_binary_op): New function for target hook.
* config/arm/arm.h: Add V4BF, V8BF to VALID_NEON_DREG_MODE,
  VALID_NEON_QREG_MODE respectively. Add export arm_bf16_type_node,
  arm_bf16_ptr_type_node.
* config/arm/arm.md: New enabled_for_bfmode_scalar,
  enabled_for_bfmode_vector attributes. Add BFmode to movhf expand.
  pattern and define_split between ARM registers.
* config/arm/arm_bf16.h: New file.
* config/arm/arm_neon.h: Add arm_bf16.h and Bfloat vector types.
* config/arm/iterators.md (ANY64_BF, VDXMOV, VHFBF, HFBF, fporbf): New.
  (VQXMOV): Add V8BF.
* config/arm/neon.md: Add BF vector types to NEON move patterns.
* config/arm/vfp.md: Add BFmode to movhf_vfp pattern.

2019-12-16  Stam Markianos-Wright  

* gcc.target/arm/bfloat16_compile-1.c: New test.
* gcc.target/arm/bfloat16_compile-2.c: New test.
* gcc.target/arm/bfloat16_compile-3.c: New test.
* gcc.target/arm/bfloat16_compile-4.c: New test.
* gcc.target/arm/bfloat16_scalar_typecheck.c: New test.
* gcc.target/arm/bfloat16_vector_typecheck1.c: New test.
* gcc.target/arm/bfloat16_vector_typecheck2.c: New test.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 5aa0130135fa3ce95df502b3f84e78832b368375..bf1b6319643cf21970495f846392983255bd 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -344,7 +344,7 @@ arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_bf16.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 2d902d0b325bc1fe5e22831ef8a59a2bb37c1225..b998a4b935d522ca9ec7b5a928fc6bcc6649d5a3 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -315,12 +315,14 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define v8qi_UP  E_V8QImode
 #define v4hi_UP  E_V4HImode
 #define v4hf_UP  E_V4HFmode
+#define v4bf_UP  E_V4BFmode
 #define v2si_UP  E_V2SImode
 #define v2sf_UP  E_V2SFmode
 #define di_UPE_DImode
 #define v16qi_UP E_V16QImode
 #define v8hi_UP  E_V8HImode
 #define v8hf_UP  E_V8HFmode
+#define v8bf_UP  E_V8BFmode
 #define v4si_UP  E_V4SImode
 #define v4sf_UP  E_V4SFmode
 #define v2di_UP  E_V2DImode
@@ -328,9 +330,10 @@ arm_set_sat_qualifiers[SIMD_MAX_BUILTIN_ARGS]
 #define ei_UP	 E_EImode
 #define oi_UP	 E_OImode
 #define hf_UP	 E_HFmode
+#define bf_UPE_BFmode
 #define si_UP	 E_SImode
 #define void_UP	 E_VOIDmode
-
+#define sf_UP	 E_SFmode
 #define UP(X) X##_UP
 
 typedef struct {
@@ -806,6 +809,11 @@ static struct arm_simd_type_info arm_simd_types [] = {
 
 /* The user-visible __fp16 type.  */
 tree arm_fp16_type_node = NULL_TREE;
+
+/* Back-end node type for brain float (bfloat) types.  */
+tree arm_bf16_t

[GCC][PATCH][AArch64]Add ACLE intrinsics for bfdot for ARMv8.6 Extension

2019-12-20 Thread Stam Markianos-Wright
Hi all,

This patch adds the ARMv8.6 Extension ACLE intrinsics for the bfloat bfdot 
operation.

The functions are declared in arm_neon.h with the armv8.2-a+bf16 target option 
as required.

RTL patterns are defined to generate assembler.

Tests added to verify expected assembly and perform adequate lane checks.

This patch depends on:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html

for testuite effective_target update and on:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01323.html
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01324.html

for back-end Bfloat enablement.

Cheers,
Stam


gcc/ChangeLog:

2019-11-04  Stam Markianos-Wright  

* config/aarch64/aarch64-simd-builtins.def (aarch64_bfdot,
   aarch64_bfdot_lane, aarch64_bfdot_laneq): New.
* config/aarch64/aarch64-simd.md
   (aarch64_bfdot, aarch64_bfdot_lane): New.
* config/aarch64/arm_neon.h (vbfdot_f32, vbfdotq_f32, vbfdot_lane_f32,
   vbfdotq_lane_f32, vbfdot_laneq_f32, vbfdotq_laneq_f32): New.
* config/aarch64/iterators.md (UNSPEC_BFDOT, VBF, isquadop, Vbfdottype,
   VBFMLA_W): New.

gcc/testsuite/ChangeLog:

2019-11-04  Stam Markianos-Wright  

* gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-1.c: New.
* gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-2.c: New.
* gcc.target/aarch64/advsimd-intrinsics/bfdot-compile-3.c: New.
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index f4ca35a59704c761fe2ac2b6d401fff7c8aba80d..6c5b61c37bcb340f963861723c6e365e32f6ca95 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -682,3 +682,8 @@
   BUILTIN_VSFDF (UNOP, frint32x, 0)
   BUILTIN_VSFDF (UNOP, frint64z, 0)
   BUILTIN_VSFDF (UNOP, frint64x, 0)
+
+  /* Implemented by aarch64_bfdot{_lane}{q}.  */
+  VAR2 (TERNOP, bfdot, 0, v2sf, v4sf)
+  VAR2 (QUADOP_LANE_PAIR, bfdot_lane, 0, v2sf, v4sf)
+  VAR2 (QUADOP_LANE_PAIR, bfdot_laneq, 0, v2sf, v4sf)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c4858ab7cffd786066646a5cd95a168311990b76..bdc26c190610580e57e9749804b7729ee4e34793 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -7027,3 +7027,37 @@
   "xtn\t%0., %1."
   [(set_attr "type" "neon_shift_imm_narrow_q")]
 )
+
+(define_insn "aarch64_bfdot"
+  [(set (match_operand:VDQSF 0 "register_operand" "=w")
+	(plus:VDQSF (match_operand:VDQSF 1 "register_operand" "0")
+		(unspec:VDQSF [(match_operand: 2
+		"register_operand" "w")
+   (match_operand: 3
+		"register_operand" "w")]
+   UNSPEC_BFDOT)))]
+  "TARGET_BF16_SIMD"
+  "bfdot\t%0., %2., %3."
+  [(set_attr "type" "neon_dot")]
+)
+
+
+(define_insn "aarch64_bfdot_lane"
+  [(set (match_operand:VDQSF 0 "register_operand" "=w")
+	(plus:VDQSF (match_operand:VDQSF 1 "register_operand" "0")
+		(unspec:VDQSF [(match_operand: 2
+		"register_operand" "w")
+   (match_operand: VBF 3
+		"register_operand" "w")
+   (match_operand:SI 4
+		"const_int_operand" "n")]
+   UNSPEC_BFDOT)))]
+  "TARGET_BF16_SIMD"
+{
+  int nunits = GET_MODE_NUNITS (mode).to_constant ();
+  int lane = INTVAL (operands[4]);
+  operands[4] =  gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane), SImode);
+  return "bfdot\t%0., %2., %3.2h[%4]";
+}
+  [(set_attr "type" "neon_dot")]
+)
diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 5996df0a612caff3c881fc15b0aa12b8f91a193b..0357d97cc4143c3a9c56260d9a9cc24138afc049 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -34612,6 +34612,57 @@ vrnd64xq_f64 (float64x2_t __a)
 
 #include "arm_bf16.h"
 
+#pragma GCC push_options
+#pragma GCC target ("arch=armv8.2-a+bf16")
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_f32 (float32x2_t __r, bfloat16x4_t __a, bfloat16x4_t __b)
+{
+  return __builtin_aarch64_bfdotv2sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x4_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdotq_f32 (float32x4_t __r, bfloat16x8_t __a, bfloat16x8_t __b)
+{
+  return __builtin_aarch64_bfdotv4sf (__r, __a, __b);
+}
+
+__extension__ extern __inline float32x2_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vbfdot_lane_f32 \
+  (float32x2_t __r, bfloat16x4_t __a, bfloat16x4_t __b, const int __index)
+{
+  return __builtin_aarch64_bfdot_lanev2sf (__r, __a, __b, __index);
+}
+
+__extension__ extern __inline float32x4_t
+__attrib

Re: [GCC][testsuite][ARM][AArch64] Add ARM v8.6 effective target checks to target-supports.exp

2019-12-20 Thread Stam Markianos-Wright


On 12/18/19 4:47 PM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> On 12/13/19 11:15 AM, Richard Sandiford wrote:
>>> Stam Markianos-Wright  writes:
>>>> Hi all,
>>>>
>>>> This small patch adds support for the ARM v8.6 extensions +bf16 and
>>>> +i8mm to the testsuite. This will be tested through other upcoming
>>>> patches, which is why we are not providing any explicit tests here.
>>>>
>>>> Ok for trunk?
>>>>
>>>> Also I don't have commit rights, so if someone could commit on my
>>>> behalf, that would be great :)
>>>>
>>>> The functionality here depends on CLI patches:
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02415.html
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>>>
>>>> but this patch applies cleanly without them, too.
>>>>
>>>> Cheers,
>>>> Stam
>>>>
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2019-12-11  Stam Markianos-Wright  
>>>>
>>>>* lib/target-supports.exp
>>>>(check_effective_target_arm_v8_2a_i8mm_ok_nocache): New.
>>>>(check_effective_target_arm_v8_2a_i8mm_ok): New.
>>>>(add_options_for_arm_v8_2a_i8mm): New.
>>>>(check_effective_target_arm_v8_2a_bf16_neon_ok_nocache): New.
>>>>(check_effective_target_arm_v8_2a_bf16_neon_ok): New.
>>>>(add_options_for_arm_v8_2a_bf16_neon): New.
>>>
>>> The new effective-target keywords need to be documented in
>>> doc/sourcebuild.texi.
>>
>> Added in new diff :)
>>
>>>
>>> LGTM otherwise.  For:
>>>
>>>> diff --git a/gcc/testsuite/lib/target-supports.exp 
>>>> b/gcc/testsuite/lib/target-supports.exp
>>>> index 5b4cc02f921..36fb63e9929 100644
>>>> --- a/gcc/testsuite/lib/target-supports.exp
>>>> +++ b/gcc/testsuite/lib/target-supports.exp
>>>> @@ -4781,6 +4781,49 @@ proc add_options_for_arm_v8_2a_dotprod_neon { flags 
>>>> } {
>>>>return "$flags $et_arm_v8_2a_dotprod_neon_flags"
>>>>}
>>>>
>>>> +# Return 1 if the target supports ARMv8.2+i8mm Adv.SIMD Dot Product
>>>> +# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
>>>> +# Record the command line options needed.
>>>> +
>>>> +proc check_effective_target_arm_v8_2a_i8mm_ok_nocache { } {
>>>> +global et_arm_v8_2a_i8mm_flags
>>>> +set et_arm_v8_2a_i8mm_flags ""
>>>> +
>>>> +if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
>>>> +return 0;
>>>> +}
>>>> +
>>>> +# Iterate through sets of options to find the compiler flags that
>>>> +# need to be added to the -march option.
>>>> +foreach flags {"" "-mfloat-abi=hard -mfpu=neon-fp-armv8" 
>>>> "-mfloat-abi=softfp -mfpu=neon-fp-armv8" } {
>>>> +if { [check_no_compiler_messages_nocache \
>>>> +  arm_v8_2a_i8mm_ok object {
>>>> +#include 
>>>> +#if !defined (__ARM_FEATURE_MATMUL_INT8)
>>>> +#error "__ARM_FEATURE_MATMUL_INT8 not defined"
>>>> +#endif
>>>> +} "$flags -march=armv8.2-a+i8mm"] } {
>>>> +set et_arm_v8_2a_i8mm_flags "$flags -march=armv8.2-a+i8mm"
>>>> +return 1
>>>> +}
>>>> +}
>>>
>>> I wondered whether it would be better to add no options if testing
>>> with something that already supports i8mm (e.g. -march=armv8.6).
>>> That would mean trying:
>>>
>>> "" "-march=armv8.2-a+i8mm" "-march=armv8.2-a+i8mm -mfloat-abi..." ...
>>>
>>> instead.  But there are arguments both ways, and the above follows
>>> existing style, so OK.
>>
>> Not quite sure if I understanding this right, but I think that's what
>> the "" option in foreach flags{} is for?
>>
>> i.e. currently what I'm seeing is:
>>
>> +/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
>> +/* { dg-add-options arm_v8_2a_i8mm }  */
>>
>> will pull through the first option that compiles to object file with no
>> errors (check_no_compiler_messages_nocache arm_v8_2a

Re: [GCC][PATCH][AArch64]Add ACLE intrinsics for dot product (usdot - vector, dot - by element) for AArch64 AdvSIMD ARMv8.6 Extension

2019-12-20 Thread Stam Markianos-Wright


On 12/13/19 11:02 AM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> @@ -573,6 +586,44 @@
>> [(set_attr "type" "neon_dot")]
>>   )
>>   
>> +;; These instructions map to the __builtins for the armv8.6a I8MM usdot, 
>> sudot
>> +;; (by element) Dot Product operations.
>> +(define_insn "aarch64_dot_lane"
>> +  [(set (match_operand:VS 0 "register_operand" "=w")
>> +(plus:VS (match_operand:VS 1 "register_operand" "0")
>> +(unspec:VS [(match_operand: 2 "register_operand" "w")
>> +(match_operand:V8QI 3 "register_operand" "")
>> +(match_operand:SI 4 "immediate_operand" "i")]
>> +DOTPROD_I8MM)))]
>> +  "TARGET_SIMD && TARGET_I8MM"
>> +  {
>> +int nunits = GET_MODE_NUNITS (V8QImode).to_constant ();
>> +int lane = INTVAL (operands[4]);
>> +operands[4]
>> +=  gen_int_mode (ENDIAN_LANE_N (nunits / 4, lane), SImode);
>> +return "dot\\t%0., %2., %3.4b[%4]";
>> +  }
>> +  [(set_attr "type" "neon_dot")]
>> +)
>> +
>> +(define_insn "aarch64_dot_laneq"
>> +  [(set (match_operand:VS 0 "register_operand" "=w")
>> +(plus:VS (match_operand:VS 1 "register_operand" "0")
>> +(unspec:VS [(match_operand: 2 "register_operand" "w")
>> +(match_operand:V16QI 3 "register_operand" "")
> 
> Using  seems a bit redundant when it's always "w" in this context,
> but either's fine.

Done!

> 
>> +(match_operand:SI 4 "immediate_operand" "i")]
>> +DOTPROD_I8MM)))]
>> +  "TARGET_SIMD && TARGET_I8MM"
>> +  {
>> +int nunits = GET_MODE_NUNITS (V16QImode).to_constant ();
>> +int lane = INTVAL (operands[4]);
>> +operands[4]
>> +=  gen_int_mode (ENDIAN_LANE_N (nunits / 4, lane), SImode);
> 
> Nit: = should be indented two spaces more, and there should be only
> one space afterwards.  But the statement fits on one line, so probably
> better not to have the line break at all.

I put put all onto one line.
> 
>> +return "dot\\t%0., %2., %3.4b[%4]";
>> +  }
>> +  [(set_attr "type" "neon_dot")]
>> +)
> 
> These two patterns can be merged using :VB for operand 3.

Merged them.

I also changed the tests to use the new check-function-bodies according to 
downstream comments.
This helps check that the assembler scans are done in the right order and 
ensures that the correct assembler was generated from the right function call 
(as opposed to "somewhere in the output file").

Hope this looks better :D

Cheers,
Stam
> 
> LGTM otherwise, thanks.
> 
> Richard
> 

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index c35a1b1f0299ce5af8ca1a3df0209614f7bd0f25..6bd26889f2f26a9f82dd6d40f50125eaeee41740 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -107,6 +107,9 @@ enum aarch64_type_qualifiers
   /* Lane indices selected in pairs. - must be in range, and flipped for
  bigendian.  */
   qualifier_lane_pair_index = 0x800,
+  /* Lane indices selected in quadtuplets. - must be in range, and flipped for
+ bigendian.  */
+  qualifier_lane_quadtup_index = 0x1000,
 };
 
 typedef struct
@@ -173,6 +176,10 @@ aarch64_types_ternopu_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned,
   qualifier_unsigned, qualifier_immediate };
 #define TYPES_TERNOPUI (aarch64_types_ternopu_imm_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_ternop_ssus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_none };
+#define TYPES_TERNOP_SSUS (aarch64_types_ternop_ssus_qualifiers)
 
 
 static enum aarch64_type_qualifiers
@@ -191,6 +198,19 @@ aarch64_types_quadopu_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned, qualifier_lane_index };
 #define TYPES_QUADOPU_LANE (aarch64_types_quadopu_lane_qualifiers)
 
+static enum aarch64_type_qualifiers
+aarch64_types_quadopssus_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned,
+  qualifier_none, qualifier_lane_quadtup_index };
+#define TYPES_QUADOPSSUS_LANE_QUADTUP \
+	(aarch64_types_quadopssus_lane_quadtup_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_quadopsssu_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = 

[PATCH, committed] Add myself to MAINTAINERS.

2019-12-19 Thread Stam Markianos-Wright
Hi all,

I have committed the attached patch adding myself to the Write After
Approval section of the MAINTAINERS file.

Cheers,
Stam

(commits r279573, r279575)

2019-12-19  Stam Markianos-Wright  

* MAINTAINERS (write_after_approval): Add myself.

diff --git a/MAINTAINERS b/MAINTAINERS
index e31fb19760e..3d78697e191 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -499,6 +499,7 @@ Luis Machado	
 Ziga Mahkovec	
 Matthew Malcomson
 Mikhail Maltsev	
+Stamatis Markianos-Wright			
 Patrick Marlier	
 Simon Martin	
 Alejandro Martinez


[GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [2/2]

2019-12-18 Thread Stam Markianos-Wright
Hi all,

This patch is part 2 of Bfloat16_t enablement in the Aarch64 back-end.

This new type is constrained using target hooks TARGET_INVALID_CONVERSION, 
TARGET_INVALID_UNARY_OP, TARGET_INVALID_BINARY_OP so that it may only be used 
through ACLE intrinsics (will be provided in later patches).

Regression testing on aarch64-none-elf passed successfully.

Ok for trunk?

Cheers,
Stam


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Details on ARM Bfloat can be found here:
https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
 


PS. I don't have commit rights, so if someone could commit on my behalf,
that would be great :)


gcc/ChangeLog:

2019-12-16  Stam Markianos-Wright  

* config/aarch64/aarch64.c
(aarch64_invalid_conversion): New function for target hook.
(aarch64_invalid_unary_op): Likewise.
(aarch64_invalid_binary_op): Likewise.
(TARGET_INVALID_CONVERSION): Add back-end define for target hook.
(TARGET_INVALID_UNARY_OP): Likewise.
(TARGET_INVALID_BINARY_OP): Likewise.


gcc/testsuite/ChangeLog:

2019-12-16  Stam Markianos-Wright  

* gcc.target/aarch64/bfloat16_scalar_typecheck.c: New test.
* gcc.target/aarch64/bfloat16_vector_typecheck1.c: New test.
* gcc.target/aarch64/bfloat16_vector_typecheck2.c: New test.

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f57469b6e23..f40f6432fd4 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -21661,6 +21661,68 @@ aarch64_stack_protect_guard (void)
   return NULL_TREE;
 }
 
+/* Return the diagnostic message string if conversion from FROMTYPE to
+   TOTYPE is not allowed, NULL otherwise.  */
+
+static const char *
+aarch64_invalid_conversion (const_tree fromtype, const_tree totype)
+{
+  static char templ[100];
+  if ((GET_MODE_INNER (TYPE_MODE (fromtype)) == BFmode
+   || GET_MODE_INNER (TYPE_MODE (totype)) == BFmode)
+   && TYPE_MODE (fromtype) != TYPE_MODE (totype))
+  {
+snprintf (templ, sizeof (templ), \
+  "incompatible types when assigning to type '%s' from type '%s'",
+  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (totype))),
+  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (fromtype;
+return N_(templ);
+  }
+  /* Conversion allowed.  */
+  return NULL;
+}
+
+/* Return the diagnostic message string if the unary operation OP is
+   not permitted on TYPE, NULL otherwise.  */
+
+static const char *
+aarch64_invalid_unary_op (int op, const_tree type)
+{
+  static char templ[100];
+  /* Reject all single-operand operations on BFmode except for &.  */
+  if (GET_MODE_INNER (TYPE_MODE (type)) == BFmode && op != ADDR_EXPR)
+  {
+snprintf (templ, sizeof (templ),
+  "operation not permitted on type '%s'",
+  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type;
+return N_(templ);
+  }
+  /* Operation allowed.  */
+  return NULL;
+}
+
+/* Return the diagnostic message string if the binary operation OP is
+   not permitted on TYPE1 and TYPE2, NULL otherwise.  */
+
+static const char *
+aarch64_invalid_binary_op (int op ATTRIBUTE_UNUSED, const_tree type1,
+			   const_tree type2)
+{
+  static char templ[100];
+  /* Reject all 2-operand operations on BFmode.  */
+  if (GET_MODE_INNER (TYPE_MODE (type1)) == BFmode
+  || GET_MODE_INNER (TYPE_MODE (type2)) == BFmode)
+  {
+snprintf (templ, sizeof (templ), \
+  "operation not permitted on types '%s', '%s'",
+  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type1))),
+  IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type2;
+return N_(templ);
+  }
+  /* Operation allowed.  */
+  return NULL;
+}
+
 /* Implement TARGET_ASM_FILE_END for AArch64.  This adds the AArch64 GNU NOTE
section at the end if needed.  */
 #define GNU_PROPERTY_AARCH64_FEATURE_1_AND	0xc000
@@ -21911,6 +21973,15 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MANGLE_TYPE
 #define TARGET_MANGLE_TYPE aarch64_mangle_type
 
+#undef TARGET_INVALID_CONVERSION
+#define TARGET_INVALID_CONVERSION aarch64_invalid_conversion
+
+#undef TARGET_INVALID_UNARY_OP
+#define TARGET_INVALID_UNARY_OP aarch64_invalid_unary_op
+
+#undef TARGET_INVALID_BINARY_OP
+#define TARGET_INVALID_BINARY_OP aarch64_invalid_binary_op
+
 #undef TARGET_VERIFY_TYPE_CONTEXT
 #define TARGET_VERIFY_TYPE_CONTEXT aarch64_verify_type_context
 
diff --git a/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_typecheck.c b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_typecheck.c
new file mode 100644
index 000..6f6a6af9587
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/bfloat16_scalar_typecheck.c
@@ -0,0 +1,83 @@
+/* { dg-do compile { target { aarch64*-*-* } } } */
+/* { dg-skip-if "" { *-*-* } { "-fno-fat-lto-objects" } } */
+/* { dg-option

[GCC][PATCH][Aarch64] Add Bfloat16_t scalar type, vector types and machine modes to Aarch64 back-end [1/2]

2019-12-18 Thread Stam Markianos-Wright
Hi all,

This patch adds Bfloat type support to the ARM back-end.
It also adds a new machine_mode (BFmode) for this type and accompanying Vector 
modes V4BFmode and V8BFmode.

The second patch in this series uses existing target hooks to restrict type use.

Regression testing on aarch64-none-elf passed successfully.

This patch depends on:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html

for test suite effective_target update.

Ok for trunk?

Cheers,
Stam


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

Details on ARM Bfloat can be found here:
https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
 


PS. I don't have commit rights, so if someone could commit on my behalf,
that would be great :)



gcc/ChangeLog:

2019-12-16  Stam Markianos-Wright  

* config.gcc: Add arm_bf16.h.
* config/aarch64/aarch64-builtins.c
 (aarch64_simd_builtin_std_type): Add BFmode.
 (aarch64_init_simd_builtin_types): Add element types for vector types.
(aarch64_init_bf16_types): New function.
(aarch64_general_init_builtins): Add arm_init_bf16_types function call.
* config/aarch64/aarch64-modes.def: Add BFmode and vector modes.
* config/aarch64/aarch64-simd-builtin-types.def:
* config/aarch64/aarch64-simd.md: Add BF types to NEON move patterns.
* config/aarch64/aarch64.c (aarch64_classify_vector_mode): Add BF modes.
(aarch64_gimplify_va_arg_expr): Add BFmode.
* config/aarch64/aarch64.h (AARCH64_VALID_SIMD_DREG_MODE): Add V4BF.
(AARCH64_VALID_SIMD_QREG_MODE): Add V8BF.
* config/aarch64/aarch64.md: New enabled_for_bfmode_scalar,
  enabled_for_bfmode_vector attributes. Add BFmode to movhf pattern.
* config/aarch64/arm_bf16.h: New file.
* config/aarch64/arm_neon.h: Add arm_bf16.h and Bfloat vector types.
* config/aarch64/iterators.md
  (HFBF, GPF_TF_F16_MOV, VDMOV, VQMOV, VALL_F16MOV): New.



gcc/testsuite/ChangeLog:

2019-12-16  Stam Markianos-Wright  

* gcc.target/aarch64/bfloat16_compile.c: New test.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 9802f436e06..b49c110ccaf 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -315,7 +315,7 @@ m32c*-*-*)
 ;;
 aarch64*-*-*)
 	cpu_type=aarch64
-	extra_headers="arm_fp16.h arm_neon.h arm_acle.h arm_sve.h"
+	extra_headers="arm_fp16.h arm_neon.h arm_bf16.h arm_acle.h arm_sve.h"
 	c_target_objs="aarch64-c.o"
 	cxx_target_objs="aarch64-c.o"
 	d_target_objs="aarch64-d.o"
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index c35a1b1f029..3ba2f12166f 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -68,6 +68,9 @@
 #define hi_UPE_HImode
 #define hf_UPE_HFmode
 #define qi_UPE_QImode
+#define bf_UPE_BFmode
+#define v4bf_UP  E_V4BFmode
+#define v8bf_UP  E_V8BFmode
 #define UP(X) X##_UP
 
 #define SIMD_MAX_BUILTIN_ARGS 5
@@ -568,6 +571,10 @@ static tree aarch64_simd_intXI_type_node = NULL_TREE;
 tree aarch64_fp16_type_node = NULL_TREE;
 tree aarch64_fp16_ptr_type_node = NULL_TREE;
 
+/* Back-end node type for brain float (bfloat) types.  */
+tree aarch64_bf16_type_node = NULL_TREE;
+tree aarch64_bf16_ptr_type_node = NULL_TREE;
+
 /* Wrapper around add_builtin_function.  NAME is the name of the built-in
function, TYPE is the function type, and CODE is the function subcode
(relative to AARCH64_BUILTIN_GENERAL).  */
@@ -659,6 +666,8 @@ aarch64_simd_builtin_std_type (machine_mode mode,
   return float_type_node;
 case E_DFmode:
   return double_type_node;
+case E_BFmode:
+  return aarch64_bf16_type_node;
 default:
   gcc_unreachable ();
 }
@@ -750,6 +759,11 @@ aarch64_init_simd_builtin_types (void)
   aarch64_simd_types[Float64x1_t].eltype = double_type_node;
   aarch64_simd_types[Float64x2_t].eltype = double_type_node;
 
+
+/* Init Bfloat vector types with underlying uint types.  */
+  aarch64_simd_types[Bfloat16x4_t].eltype = aarch64_bf16_type_node;
+  aarch64_simd_types[Bfloat16x8_t].eltype = aarch64_bf16_type_node;
+
   for (i = 0; i < nelts; i++)
 {
   tree eltype = aarch64_simd_types[i].eltype;
@@ -1059,6 +1073,19 @@ aarch64_init_fp16_types (void)
   aarch64_fp16_ptr_type_node = build_pointer_type (aarch64_fp16_type_node);
 }
 
+/* Initialize the backend REAL_TYPE type supporting bfloat types.  */
+static void
+aarch64_init_bf16_types (void)
+{
+  aarch64_bf16_type_node = make_node (REAL_TYPE);
+  TYPE_PRECISION (aarch64_bf16_type_node) = 16;
+  SET_TYPE_MODE (aarch64_bf16_type_node, BFmode);
+  layout_type (aarch64_bf16_type_node);
+
+  (*lang_hooks.types.register_builtin_type) (aarch64_bf16_type_node, "__bf16");
+  aarch64_

[Ping][GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension

2019-12-18 Thread Stam Markianos-Wright


On 12/13/19 10:22 AM, Stam Markianos-Wright wrote:
> Hi all,
> 
> This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
> operations (vector/by element) to the ARM back-end.
> 
> These are:
> usdot (vector), dot (by element).
> 
> The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
> for ARM they remain optional as of ARMv8.6-a.
> 
> The functions are declared in arm_neon.h, RTL patterns are defined to
> generate assembler and tests are added to verify and perform adequate 
> checks.
> 
> Regression testing on arm-none-eabi passed successfully.
> 
> This patch depends on:
> 
> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
> 
> for ARM CLI updates, and on:
> 
> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html
> 
> for testsuite effective_target update.
> 
> Ok for trunk?

.Ping :)

> 
> Cheers,
> Stam
> 
> 
> ACLE documents are at https://developer.arm.com/docs/101028/latest
> ISA documents are at https://developer.arm.com/docs/ddi0596/latest
> 
> PS. I don't have commit rights, so if someone could commit on my behalf,
> that would be great :)
> 
> 
> gcc/ChangeLog:
> 
> 2019-11-28  Stam Markianos-Wright  
> 
>  * config/arm/arm-builtins.c (enum arm_type_qualifiers):
>  (USTERNOP_QUALIFIERS): New define.
>  (USMAC_LANE_QUADTUP_QUALIFIERS): New define.
>  (SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
>  (arm_expand_builtin_args):
>      Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
>  (arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
>  * config/arm/arm_neon.h (vusdot_s32): New.
>  (vusdot_lane_s32): New.
>  (vusdotq_lane_s32): New.
>  (vsudot_lane_s32): New.
>  (vsudotq_lane_s32): New.
>  * config/arm/arm_neon_builtins.def
>      (usdot,usdot_lane,sudot_lane): New.
>  * config/arm/iterators.md (DOTPROD_I8MM): New.
>      (sup, opsuffix): Add .
>     * config/arm/neon.md (neon_usdot, dot_lane: New.
>  * config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.
> 
> 
> gcc/testsuite/ChangeLog:
> 
> 2019-12-12  Stam Markianos-Wright  
> 
>  * gcc.target/arm/simd/vdot-compile-2-1.c: New test.
>  * gcc.target/arm/simd/vdot-compile-2-2.c: New test.
>  * gcc.target/arm/simd/vdot-compile-2-3.c: New test.
>  * gcc.target/arm/simd/vdot-compile-2-4.c: New test.
> 
> 


Re: [GCC][testsuite][ARM][AArch64] Add ARM v8.6 effective target checks to target-supports.exp

2019-12-17 Thread Stam Markianos-Wright


On 12/13/19 11:15 AM, Richard Sandiford wrote:
> Stam Markianos-Wright  writes:
>> Hi all,
>>
>> This small patch adds support for the ARM v8.6 extensions +bf16 and
>> +i8mm to the testsuite. This will be tested through other upcoming
>> patches, which is why we are not providing any explicit tests here.
>>
>> Ok for trunk?
>>
>> Also I don't have commit rights, so if someone could commit on my
>> behalf, that would be great :)
>>
>> The functionality here depends on CLI patches:
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02415.html
>> https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html
>>
>> but this patch applies cleanly without them, too.
>>
>> Cheers,
>> Stam
>>
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-12-11  Stam Markianos-Wright  
>>
>>  * lib/target-supports.exp
>>  (check_effective_target_arm_v8_2a_i8mm_ok_nocache): New.
>>  (check_effective_target_arm_v8_2a_i8mm_ok): New.
>>  (add_options_for_arm_v8_2a_i8mm): New.
>>  (check_effective_target_arm_v8_2a_bf16_neon_ok_nocache): New.
>>  (check_effective_target_arm_v8_2a_bf16_neon_ok): New.
>>  (add_options_for_arm_v8_2a_bf16_neon): New.
> 
> The new effective-target keywords need to be documented in
> doc/sourcebuild.texi.

Added in new diff :)

> 
> LGTM otherwise.  For:
> 
>> diff --git a/gcc/testsuite/lib/target-supports.exp 
>> b/gcc/testsuite/lib/target-supports.exp
>> index 5b4cc02f921..36fb63e9929 100644
>> --- a/gcc/testsuite/lib/target-supports.exp
>> +++ b/gcc/testsuite/lib/target-supports.exp
>> @@ -4781,6 +4781,49 @@ proc add_options_for_arm_v8_2a_dotprod_neon { flags } 
>> {
>>   return "$flags $et_arm_v8_2a_dotprod_neon_flags"
>>   }
>>   
>> +# Return 1 if the target supports ARMv8.2+i8mm Adv.SIMD Dot Product
>> +# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
>> +# Record the command line options needed.
>> +
>> +proc check_effective_target_arm_v8_2a_i8mm_ok_nocache { } {
>> +global et_arm_v8_2a_i8mm_flags
>> +set et_arm_v8_2a_i8mm_flags ""
>> +
>> +if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
>> +return 0;
>> +}
>> +
>> +# Iterate through sets of options to find the compiler flags that
>> +# need to be added to the -march option.
>> +foreach flags {"" "-mfloat-abi=hard -mfpu=neon-fp-armv8" 
>> "-mfloat-abi=softfp -mfpu=neon-fp-armv8" } {
>> +if { [check_no_compiler_messages_nocache \
>> +  arm_v8_2a_i8mm_ok object {
>> +#include 
>> +#if !defined (__ARM_FEATURE_MATMUL_INT8)
>> +#error "__ARM_FEATURE_MATMUL_INT8 not defined"
>> +#endif
>> +} "$flags -march=armv8.2-a+i8mm"] } {
>> +set et_arm_v8_2a_i8mm_flags "$flags -march=armv8.2-a+i8mm"
>> +return 1
>> +}
>> +}
> 
> I wondered whether it would be better to add no options if testing
> with something that already supports i8mm (e.g. -march=armv8.6).
> That would mean trying:
> 
>"" "-march=armv8.2-a+i8mm" "-march=armv8.2-a+i8mm -mfloat-abi..." ...
> 
> instead.  But there are arguments both ways, and the above follows
> existing style, so OK.

Not quite sure if I understanding this right, but I think that's what 
the "" option in foreach flags{} is for?

i.e. currently what I'm seeing is:

+/* { dg-require-effective-target arm_v8_2a_i8mm_ok } */
+/* { dg-add-options arm_v8_2a_i8mm }  */

will pull through the first option that compiles to object file with no 
errors (check_no_compiler_messages_nocache arm_v8_2a_i8mm_ok object).

So in a lot of cases it should just be fine for "" and only pull in 
-march=armv8.2-a+i8mm.

I think that's right? Lmk if I'm not reading it properly!

Cheers,
Stam
> 
> Thanks,
> Richard
> 
diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 85573a49a2b..73408d12cbe 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1877,6 +1877,18 @@ ARM target supports extensions to generate the @code{VFMAL} and @code{VFMLS}
 half-precision floating-point instructions available from ARMv8.2-A and
 onwards.  Some multilibs may be incompatible with these options.
 
+@item arm_v8_2a_bf16_neon_ok
+@anchor{arm_v8_2a_bf16_neon_ok}
+ARM target supports options to generate instructions from ARMv8.2-A with
+the BFloat16 extension (bf16). Some multilibs may be incompatible with these
+options.
+
+@ite

[GCC][PATCH][ARM]Add ACLE intrinsics for dot product (vusdot - vector, vdot - by element) for AArch32 AdvSIMD ARMv8.6 Extension

2019-12-13 Thread Stam Markianos-Wright
Hi all,

This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
operations (vector/by element) to the ARM back-end.

These are:
usdot (vector), dot (by element).

The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm and
for ARM they remain optional as of ARMv8.6-a.

The functions are declared in arm_neon.h, RTL patterns are defined to
generate assembler and tests are added to verify and perform adequate 
checks.

Regression testing on arm-none-eabi passed successfully.

This patch depends on:

https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html

for ARM CLI updates, and on:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html

for testsuite effective_target update.

Ok for trunk?

Cheers,
Stam


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

PS. I don't have commit rights, so if someone could commit on my behalf,
that would be great :)


gcc/ChangeLog:

2019-11-28  Stam Markianos-Wright  

* config/arm/arm-builtins.c (enum arm_type_qualifiers):
(USTERNOP_QUALIFIERS): New define.
(USMAC_LANE_QUADTUP_QUALIFIERS): New define.
(SUMAC_LANE_QUADTUP_QUALIFIERS): New define.
(arm_expand_builtin_args):
 Add case ARG_BUILTIN_LANE_QUADTUP_INDEX.
(arm_expand_builtin_1): Add qualifier_lane_quadtup_index.
* config/arm/arm_neon.h (vusdot_s32): New.
(vusdot_lane_s32): New.
(vusdotq_lane_s32): New.
(vsudot_lane_s32): New.
(vsudotq_lane_s32): New.
* config/arm/arm_neon_builtins.def
 (usdot,usdot_lane,sudot_lane): New.
* config/arm/iterators.md (DOTPROD_I8MM): New.
 (sup, opsuffix): Add .
* config/arm/neon.md (neon_usdot, dot_lane: New.
* config/arm/unspecs.md (UNSPEC_DOT_US, UNSPEC_DOT_SU): New.


gcc/testsuite/ChangeLog:

2019-12-12  Stam Markianos-Wright  

* gcc.target/arm/simd/vdot-compile-2-1.c: New test.
* gcc.target/arm/simd/vdot-compile-2-2.c: New test.
* gcc.target/arm/simd/vdot-compile-2-3.c: New test.
* gcc.target/arm/simd/vdot-compile-2-4.c: New test.


diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 2d902d0b325bc1fe5e22831ef8a59a2bb37c1225..a63c1a978fb1d436065ce9f5f082249c4ebf5ade 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -86,7 +86,10 @@ enum arm_type_qualifiers
   qualifier_const_void_pointer = 0x802,
   /* Lane indices selected in pairs - must be within range of previous
  argument = a vector.  */
-  qualifier_lane_pair_index = 0x1000
+  qualifier_lane_pair_index = 0x1000,
+  /* Lane indices selected in quadtuplets - must be within range of previous
+ argument = a vector.  */
+  qualifier_lane_quadtup_index = 0x2000
 };
 
 /*  The qualifier_internal allows generation of a unary builtin from
@@ -122,6 +125,13 @@ arm_unsigned_uternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned };
 #define UTERNOP_QUALIFIERS (arm_unsigned_uternop_qualifiers)
 
+/* T (T, unsigned T, T).  */
+static enum arm_type_qualifiers
+arm_usternop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned,
+  qualifier_none };
+#define USTERNOP_QUALIFIERS (arm_usternop_qualifiers)
+
 /* T (T, immediate).  */
 static enum arm_type_qualifiers
 arm_binop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -176,6 +186,20 @@ arm_umac_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned, qualifier_lane_index };
 #define UMAC_LANE_QUALIFIERS (arm_umac_lane_qualifiers)
 
+/* T (T, unsigned T, T, lane index).  */
+static enum arm_type_qualifiers
+arm_usmac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned,
+  qualifier_none, qualifier_lane_quadtup_index };
+#define USMAC_LANE_QUADTUP_QUALIFIERS (arm_usmac_lane_quadtup_qualifiers)
+
+/* T (T, T, unsigend T, lane index).  */
+static enum arm_type_qualifiers
+arm_sumac_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none,
+  qualifier_unsigned, qualifier_lane_quadtup_index };
+#define SUMAC_LANE_QUADTUP_QUALIFIERS (arm_sumac_lane_quadtup_qualifiers)
+
 /* T (T, T, immediate).  */
 static enum arm_type_qualifiers
 arm_ternop_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
@@ -2148,6 +2172,7 @@ typedef enum {
   ARG_BUILTIN_LANE_INDEX,
   ARG_BUILTIN_STRUCT_LOAD_STORE_LANE_INDEX,
   ARG_BUILTIN_LANE_PAIR_INDEX,
+  ARG_BUILTIN_LANE_QUADTUP_INDEX,
   ARG_BUILTIN_NEON_MEMORY,
   ARG_BUILTIN_MEMORY,
   ARG_BUILTIN_STOP
@@ -2296,11 +2321,24 @@ arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
 	  if (CONST_INT_P (op[argc]))
 		{
 		  machine_mode vmode = mode[argc - 1];
-		  neon_lane_bounds (op[argc], 0, GET_MODE_NUNITS (vmode) / 2, exp);
+		  neon_lane_bounds (op[argc], 0,
+GET_MODE_NUNITS (vmode) / 2, exp);
+		}
+	  /* If the lane index

[GCC][PATCH][AArch64]Add ACLE intrinsics for dot product (usdot - vector, dot - by element) for AArch64 AdvSIMD ARMv8.6 Extension

2019-12-13 Thread Stam Markianos-Wright
Hi all,

This patch adds the ARMv8.6 Extension ACLE intrinsics for dot product
operations (vector/by element) to AArch64.

These are:
usdot (vector), dot (by element).

The functions are optional from ARMv8.2-a as -march=armv8.2-a+i8mm
and are then enabled by default from ARMv8.6a.

The functions are declared in arm_neon.h, RTL patterns are defined to
generate assembler and tests are added to verify them and perform 
adequate checks.

Regression testing on aarch64-none-elf passed successfully.

This patch depends on:

https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02415.html

for Aaarch64 CLI updates, and on:

https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00857.html

for the testsuite effective_target update.

Ok for trunk?

Cheers,
Stam


ACLE documents are at https://developer.arm.com/docs/101028/latest
ISA documents are at https://developer.arm.com/docs/ddi0596/latest

PS. I don't have commit rights, so if someone could commit on my behalf,
that would be great :)



gcc/ChangeLog:

2019-11-28  Stam Markianos-Wright  

* config/aarch64/aarch64-builtins.c: (enum aarch64_type_qualifiers)
  New qualifier_lane_quadtup_index, TYPES_TERNOP_SSUS,
  TYPES_QUADOPSSUS_LANE_QUADTUP, TYPES_QUADOPSSSU_LANE_QUADTUP.
(aarch64_simd_expand_args): Add case SIMD_ARG_LANE_QUADTUP_INDEX.
(aarch64_simd_expand_builtin): Add qualifier_lane_quadtup_index.
* config/aarch64/aarch64-simd-builtins.def (usdot, usdot_lane,
  usdot_laneq, sudot_lane,sudot_laneq): New.
* config/aarch64/aarch64-simd.md (aarch64_usdot): New .
  (aarch64_dot_lane): New.
  (aarch64_dot_laneq): New.
* config/aarch64/arm_neon.h (vusdot_s32): New.
(vusdotq_s32): New.
(vusdot_lane_s32): New.
(vsudot_lane_s32): New.
* config/aarch64/iterators.md (DOTPROD_I8MM): New iterator.
  (UNSPEC_USDOT, UNSPEC_SUDOT): New unspecs.

gcc/testsuite/ChangeLog:

2019-11-28  Stam Markianos-Wright  

* gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-1.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-2.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-3.c: New test.
* gcc.target/aarch64/advsimd-intrinsics/vdot-compile-3-4.c: New test.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index c35a1b1f0299ce5af8ca1a3df0209614f7bd0f25..6bd26889f2f26a9f82dd6d40f50125eaeee41740 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -107,6 +107,9 @@ enum aarch64_type_qualifiers
   /* Lane indices selected in pairs. - must be in range, and flipped for
  bigendian.  */
   qualifier_lane_pair_index = 0x800,
+  /* Lane indices selected in quadtuplets. - must be in range, and flipped for
+ bigendian.  */
+  qualifier_lane_quadtup_index = 0x1000,
 };
 
 typedef struct
@@ -173,6 +176,10 @@ aarch64_types_ternopu_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned,
   qualifier_unsigned, qualifier_immediate };
 #define TYPES_TERNOPUI (aarch64_types_ternopu_imm_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_ternop_ssus_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned, qualifier_none };
+#define TYPES_TERNOP_SSUS (aarch64_types_ternop_ssus_qualifiers)
 
 
 static enum aarch64_type_qualifiers
@@ -191,6 +198,19 @@ aarch64_types_quadopu_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   qualifier_unsigned, qualifier_lane_index };
 #define TYPES_QUADOPU_LANE (aarch64_types_quadopu_lane_qualifiers)
 
+static enum aarch64_type_qualifiers
+aarch64_types_quadopssus_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_unsigned,
+  qualifier_none, qualifier_lane_quadtup_index };
+#define TYPES_QUADOPSSUS_LANE_QUADTUP \
+	(aarch64_types_quadopssus_lane_quadtup_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_quadopsssu_lane_quadtup_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_none, qualifier_none,
+  qualifier_unsigned, qualifier_lane_quadtup_index };
+#define TYPES_QUADOPSSSU_LANE_QUADTUP \
+	(aarch64_types_quadopsssu_lane_quadtup_qualifiers)
+
 static enum aarch64_type_qualifiers
 aarch64_types_quadopu_imm_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned,
@@ -1260,6 +1280,7 @@ typedef enum
   SIMD_ARG_LANE_INDEX,
   SIMD_ARG_STRUCT_LOAD_STORE_LANE_INDEX,
   SIMD_ARG_LANE_PAIR_INDEX,
+  SIMD_ARG_LANE_QUADTUP_INDEX,
   SIMD_ARG_STOP
 } builtin_simd_arg;
 
@@ -1349,9 +1370,25 @@ aarch64_simd_expand_args (rtx target, int icode, int have_retval,
 		  op[opc] = gen_int_mode (ENDIAN_LANE_N (nunits / 2, lane),
 	  SImode);
 		}
-	  /* Fall through - if the lane index isn't a constant then
-		 the next case will error.  */
-	  /* FALLTHRU */
+	  /* If the lane index isn't a constant

[GCC][testsuite][ARM][AArch64] Add ARM v8.6 effective target checks to target-supports.exp

2019-12-12 Thread Stam Markianos-Wright
Hi all,

This small patch adds support for the ARM v8.6 extensions +bf16 and 
+i8mm to the testsuite. This will be tested through other upcoming 
patches, which is why we are not providing any explicit tests here.

Ok for trunk?

Also I don't have commit rights, so if someone could commit on my 
behalf, that would be great :)

The functionality here depends on CLI patches:
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02415.html
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02195.html

but this patch applies cleanly without them, too.

Cheers,
Stam


gcc/testsuite/ChangeLog:

2019-12-11  Stam Markianos-Wright  

* lib/target-supports.exp
(check_effective_target_arm_v8_2a_i8mm_ok_nocache): New.
(check_effective_target_arm_v8_2a_i8mm_ok): New.
(add_options_for_arm_v8_2a_i8mm): New.
(check_effective_target_arm_v8_2a_bf16_neon_ok_nocache): New.
(check_effective_target_arm_v8_2a_bf16_neon_ok): New.
(add_options_for_arm_v8_2a_bf16_neon): New.
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 5b4cc02f921..36fb63e9929 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4781,6 +4781,49 @@ proc add_options_for_arm_v8_2a_dotprod_neon { flags } {
 return "$flags $et_arm_v8_2a_dotprod_neon_flags"
 }
 
+# Return 1 if the target supports ARMv8.2+i8mm Adv.SIMD Dot Product
+# instructions, 0 otherwise.  The test is valid for ARM and for AArch64.
+# Record the command line options needed.
+
+proc check_effective_target_arm_v8_2a_i8mm_ok_nocache { } {
+global et_arm_v8_2a_i8mm_flags
+set et_arm_v8_2a_i8mm_flags ""
+
+if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+return 0;
+}
+
+# Iterate through sets of options to find the compiler flags that
+# need to be added to the -march option.
+foreach flags {"" "-mfloat-abi=hard -mfpu=neon-fp-armv8" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" } {
+if { [check_no_compiler_messages_nocache \
+  arm_v8_2a_i8mm_ok object {
+#include 
+#if !defined (__ARM_FEATURE_MATMUL_INT8)
+#error "__ARM_FEATURE_MATMUL_INT8 not defined"
+#endif
+} "$flags -march=armv8.2-a+i8mm"] } {
+set et_arm_v8_2a_i8mm_flags "$flags -march=armv8.2-a+i8mm"
+return 1
+}
+}
+
+return 0;
+}
+
+proc check_effective_target_arm_v8_2a_i8mm_ok { } {
+return [check_cached_effective_target arm_v8_2a_i8mm_ok \
+check_effective_target_arm_v8_2a_i8mm_ok_nocache]
+}
+
+proc add_options_for_arm_v8_2a_i8mm { flags } {
+if { ! [check_effective_target_arm_v8_2a_i8mm_ok] } {
+return "$flags"
+}
+global et_arm_v8_2a_i8mm_flags
+return "$flags $et_arm_v8_2a_i8mm_flags"
+}
+
 # Return 1 if the target supports FP16 VFMAL and VFMSL
 # instructions, 0 otherwise.
 # Record the command line options needed.
@@ -4826,6 +4869,45 @@ proc add_options_for_arm_fp16fml_neon { flags } {
 return "$flags $et_arm_fp16fml_neon_flags"
 }
 
+# Return 1 if the target supports BFloat16 SIMD instructions, 0 otherwise.
+# The test is valid for ARM and for AArch64.
+
+proc check_effective_target_arm_v8_2a_bf16_neon_ok_nocache { } {
+global et_arm_v8_2a_bf16_neon_flags
+set et_arm_v8_2a_bf16_neon_flags ""
+
+if { ![istarget arm*-*-*] && ![istarget aarch64*-*-*] } {
+return 0;
+}
+
+foreach flags {"" "-mfloat-abi=hard -mfpu=neon-fp-armv8" "-mfloat-abi=softfp -mfpu=neon-fp-armv8" } {
+if { [check_no_compiler_messages_nocache arm_v8_2a_bf16_neon_ok object {
+#include 
+#if !defined (__ARM_FEATURE_BF16_VECTOR_ARITHMETIC)
+#error "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC not defined"
+#endif
+} "$flags -march=armv8.2-a+bf16"] } {
+set et_arm_v8_2a_bf16_neon_flags "$flags -march=armv8.2-a+bf16"
+return 1
+}
+}
+
+return 0;
+}
+
+proc check_effective_target_arm_v8_2a_bf16_neon_ok { } {
+return [check_cached_effective_target arm_v8_2a_bf16_neon_ok \
+check_effective_target_arm_v8_2a_bf16_neon_ok_nocache]
+}
+
+proc add_options_for_arm_v8_2a_bf16_neon { flags } {
+if { ! [check_effective_target_arm_v8_2a_bf16_neon_ok] } {
+return "$flags"
+}
+global et_arm_v8_2a_bf16_neon_flags
+return "$flags $et_arm_v8_2a_bf16_neon_flags"
+}
+
 # Return 1 if the target supports executing ARMv8 NEON instructions, 0
 # otherwise.
 


Re: Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-12-11 Thread Stam Markianos-Wright


On 12/11/19 3:48 AM, Jeff Law wrote:
> On Mon, 2019-12-09 at 13:40 +0000, Stam Markianos-Wright wrote:
>>
>> On 12/3/19 10:31 AM, Stam Markianos-Wright wrote:
>>>
>>> On 12/2/19 9:27 PM, Joseph Myers wrote:
>>>> On Mon, 2 Dec 2019, Jeff Law wrote:
>>>>
>>>>>> 2019-11-13  Stam Markianos-Wright  <
>>>>>> stam.markianos-wri...@arm.com>
>>>>>>
>>>>>>  * real.c (struct arm_bfloat_half_format,
>>>>>>  encode_arm_bfloat_half, decode_arm_bfloat_half): New.
>>>>>>  * real.h (arm_bfloat_half_format): New.
>>>>>>
>>>>>>
>>>>> Generally OK.  Please consider using "arm_bfloat_half" instead
>>>>> of
>>>>> "bfloat_half" for the name field in the arm_bfloat_half_format
>>>>> structure.  I'm not sure if that's really visible externally,
>>>>> but it
>>> Hi both! Agreed that we want to be conservative. See latest diff
>>> attached with the name field change (also pasted below).
>>
>> .Ping :)
> Sorry if I wasn't clear.  WIth the name change I considered this OK for
> the trunk.  Please install on the trunk.
> 
> If you don't have commit privs let me know.

Ahh ok gotcha! Sorry I'm new here, and yes, I don't have commit 
privileges, yet!

Cheers,
Stam
> 
> 
> Jeff
> 


[PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2019-12-09 Thread Stam Markianos-Wright


On 12/2/19 4:43 PM, Stam Markianos-Wright wrote:
> 
> 
> On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:
>> Pinging with more correct maintainers this time :)
>>
>> Also would need to backport to gcc7,8,9, but need to get this approved 
>> first!
>>
>> Thank you,
>> Stam
>>
>>
>>  Forwarded Message 
>> Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional 
>> branches in Thumb2 (PR91816)
>> Date: Mon, 21 Oct 2019 10:37:09 +0100
>> From: Stam Markianos-Wright 
>> To: Ramana Radhakrishnan 
>> CC: gcc-patches@gcc.gnu.org , nd 
>> , James Greenhalgh , Richard 
>> Earnshaw 
>>
>>
>>
>> On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:
>>>>
>>>> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
>>>> however, on my native Aarch32 setup the test times out when run as part
>>>> of a big "make check-gcc" regression, but not when run individually.
>>>>
>>>> 2019-10-11  Stamatis Markianos-Wright 
>>>>
>>>> * config/arm/arm.md: Update b for Thumb2 range checks.
>>>> * config/arm/arm.c: New function arm_gen_far_branch.
>>>>    * config/arm/arm-protos.h: New function arm_gen_far_branch
>>>> prototype.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2019-10-11  Stamatis Markianos-Wright 
>>>>
>>>>    * testsuite/gcc.target/arm/pr91816.c: New test.
>>>
>>>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>>>> index f995974f9bb..1dce333d1c3 100644
>>>> --- a/gcc/config/arm/arm-protos.h
>>>> +++ b/gcc/config/arm/arm-protos.h
>>>> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
>>>> cpu_arch_option *,
>>>>   void arm_initialize_isa (sbitmap, const enum isa_feature *);
>>>> +const char * arm_gen_far_branch (rtx *, int,const char * , const 
>>>> char *);
>>>> +
>>>> +
>>>
>>> Lets get the nits out of the way.
>>>
>>> Unnecessary extra new line, need a space between int and const above.
>>>
>>>
>>
>> .Fixed!
>>
>>>>   #endif /* ! GCC_ARM_PROTOS_H */
>>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>>> index 39e1a1ef9a2..1a693d2ddca 100644
>>>> --- a/gcc/config/arm/arm.c
>>>> +++ b/gcc/config/arm/arm.c
>>>> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>>>>   }
>>>>   } /* Namespace selftest.  */
>>>> +
>>>> +/* Generate code to enable conditional branches in functions over 1 
>>>> MiB.  */
>>>> +const char *
>>>> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
>>>> +    const char * branch_format)
>>>
>>> Not sure if this is some munging from the attachment but check
>>> vertical alignment of parameters.
>>>
>>
>> .Fixed!
>>
>>>> +{
>>>> +  rtx_code_label * tmp_label = gen_label_rtx ();
>>>> +  char label_buf[256];
>>>> +  char buffer[128];
>>>> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
>>>> +    CODE_LABEL_NUMBER (tmp_label));
>>>> +  const char *label_ptr = arm_strip_name_encoding (label_buf);
>>>> +  rtx dest_label = operands[pos_label];
>>>> +  operands[pos_label] = tmp_label;
>>>> +
>>>> +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , 
>>>> label_ptr);
>>>> +  output_asm_insn (buffer, operands);
>>>> +
>>>> +  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
>>>> label_ptr);
>>>> +  operands[pos_label] = dest_label;
>>>> +  output_asm_insn (buffer, operands);
>>>> +  return "";
>>>> +}
>>>> +
>>>> +
>>>
>>> Unnecessary extra newline.
>>>
>>
>> .Fixed!
>>
>>>>   #undef TARGET_RUN_TARGET_SELFTESTS
>>>>   #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
>>>>   #endif /* CHECKING_P */
>>>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>>>> index f861c72ccfc..634fd0a59da 100644
>>>> --- a/gcc/config/arm/arm.md
>>>> +++ b/gcc/config/arm/arm.md
>>>> @@ -6686,9 +

Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-12-09 Thread Stam Markianos-Wright


On 12/3/19 10:31 AM, Stam Markianos-Wright wrote:
> 
> 
> On 12/2/19 9:27 PM, Joseph Myers wrote:
>> On Mon, 2 Dec 2019, Jeff Law wrote:
>>
>>>> 2019-11-13  Stam Markianos-Wright  
>>>>
>>>>     * real.c (struct arm_bfloat_half_format,
>>>>     encode_arm_bfloat_half, decode_arm_bfloat_half): New.
>>>>     * real.h (arm_bfloat_half_format): New.
>>>>
>>>>
>>> Generally OK.  Please consider using "arm_bfloat_half" instead of
>>> "bfloat_half" for the name field in the arm_bfloat_half_format
>>> structure.  I'm not sure if that's really visible externally, but it
>>
> Hi both! Agreed that we want to be conservative. See latest diff 
> attached with the name field change (also pasted below).

.Ping :)
> 

>> Isn't this the same format used by AVX512_BF16 / Intel DL Boost (albeit
>> with Arm and Intel using different rounding modes)?
> 
> Yes it is remarkably similar, but there's really only so much variation 
> you can have with what is half an f32!
> 
> Cheers,
> Stam
> 
> 
>>
> 
> 
> diff --git a/gcc/real.h b/gcc/real.h
> index 0f660c9c671..2b337bb7f7d 100644
> --- a/gcc/real.h
> +++ b/gcc/real.h
> @@ -368,6 +368,7 @@ extern const struct real_format decimal_double_format;
>   extern const struct real_format decimal_quad_format;
>   extern const struct real_format ieee_half_format;
>   extern const struct real_format arm_half_format;
> +extern const struct real_format arm_bfloat_half_format;
> 
> 
>   /* 
> == */
> diff --git a/gcc/real.c b/gcc/real.c
> index 134240a6be9..07b63b6f27e 100644
> --- a/gcc/real.c
> +++ b/gcc/real.c
> @@ -4799,6 +4799,116 @@ decode_ieee_half (const struct real_format *fmt, 
> REAL_VALUE_TYPE *r,
>   }
>   }
> 
> +/* Encode arm_bfloat types.  */
> +static void
> +encode_arm_bfloat_half (const struct real_format *fmt, long *buf,
> +    const REAL_VALUE_TYPE *r)
> +{
> +  unsigned long image, sig, exp;
> +  unsigned long sign = r->sign;
> +  bool denormal = (r->sig[SIGSZ-1] & SIG_MSB) == 0;
> +
> +  image = sign << 15;
> +  sig = (r->sig[SIGSZ-1] >> (HOST_BITS_PER_LONG - 8)) & 0x7f;
> +
> +  switch (r->cl)
> +    {
> +    case rvc_zero:
> +  break;
> +
> +    case rvc_inf:
> +  if (fmt->has_inf)
> +    image |= 255 << 7;
> +  else
> +    image |= 0x7fff;
> +  break;
> +
> +    case rvc_nan:
> +  if (fmt->has_nans)
> +    {
> +  if (r->canonical)
> +    sig = (fmt->canonical_nan_lsbs_set ? (1 << 6) - 1 : 0);
> +  if (r->signalling == fmt->qnan_msb_set)
> +    sig &= ~(1 << 6);
> +  else
> +    sig |= 1 << 6;
> +  if (sig == 0)
> +    sig = 1 << 5;
> +
> +  image |= 255 << 7;
> +  image |= sig;
> +    }
> +  else
> +    image |= 0x7fff;
> +  break;
> +
> +    case rvc_normal:
> +  if (denormal)
> +    exp = 0;
> +  else
> +  exp = REAL_EXP (r) + 127 - 1;
> +  image |= exp << 7;
> +  image |= sig;
> +  break;
> +
> +    default:
> +  gcc_unreachable ();
> +    }
> +
> +  buf[0] = image;
> +}
> +
> +/* Decode arm_bfloat types.  */
> +static void
> +decode_arm_bfloat_half (const struct real_format *fmt, REAL_VALUE_TYPE *r,
> +    const long *buf)
> +{
> +  unsigned long image = buf[0] & 0x;
> +  bool sign = (image >> 15) & 1;
> +  int exp = (image >> 7) & 0xff;
> +
> +  memset (r, 0, sizeof (*r));
> +  image <<= HOST_BITS_PER_LONG - 8;
> +  image &= ~SIG_MSB;
> +
> +  if (exp == 0)
> +    {
> +  if (image && fmt->has_denorm)
> +    {
> +  r->cl = rvc_normal;
> +  r->sign = sign;
> +  SET_REAL_EXP (r, -126);
> +  r->sig[SIGSZ-1] = image << 1;
> +  normalize (r);
> +    }
> +  else if (fmt->has_signed_zero)
> +    r->sign = sign;
> +    }
> +  else if (exp == 255 && (fmt->has_nans || fmt->has_inf))
> +    {
> +  if (image)
> +    {
> +  r->cl = rvc_nan;
> +  r->sign = sign;
> +  r->signalling = (((image >> (HOST_BITS_PER_LONG - 2)) & 1)
> +   ^ fmt->qnan_msb_set);
> +  r->sig[SIGSZ-1] = image;
> +    }
> +  else
> +    {
> +  r->cl = rvc_inf;
> +  r->sign = sign;
> +    }
> +    }
> +  else
> +    {
> +  r->cl = rvc_normal;
&g

Re: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-12-03 Thread Stam Markianos-Wright


On 12/2/19 9:27 PM, Joseph Myers wrote:
> On Mon, 2 Dec 2019, Jeff Law wrote:
> 
>>> 2019-11-13  Stam Markianos-Wright  
>>>
>>> * real.c (struct arm_bfloat_half_format,
>>> encode_arm_bfloat_half, decode_arm_bfloat_half): New.
>>> * real.h (arm_bfloat_half_format): New.
>>>
>>>
>> Generally OK.  Please consider using "arm_bfloat_half" instead of
>> "bfloat_half" for the name field in the arm_bfloat_half_format
>> structure.  I'm not sure if that's really visible externally, but it
> 
Hi both! Agreed that we want to be conservative. See latest diff 
attached with the name field change (also pasted below).

> Isn't this the same format used by AVX512_BF16 / Intel DL Boost (albeit
> with Arm and Intel using different rounding modes)?

Yes it is remarkably similar, but there's really only so much variation 
you can have with what is half an f32!

Cheers,
Stam


> 


diff --git a/gcc/real.h b/gcc/real.h
index 0f660c9c671..2b337bb7f7d 100644
--- a/gcc/real.h
+++ b/gcc/real.h
@@ -368,6 +368,7 @@ extern const struct real_format decimal_double_format;
  extern const struct real_format decimal_quad_format;
  extern const struct real_format ieee_half_format;
  extern const struct real_format arm_half_format;
+extern const struct real_format arm_bfloat_half_format;


  /* 
== */
diff --git a/gcc/real.c b/gcc/real.c
index 134240a6be9..07b63b6f27e 100644
--- a/gcc/real.c
+++ b/gcc/real.c
@@ -4799,6 +4799,116 @@ decode_ieee_half (const struct real_format *fmt, 
REAL_VALUE_TYPE *r,
  }
  }

+/* Encode arm_bfloat types.  */
+static void
+encode_arm_bfloat_half (const struct real_format *fmt, long *buf,
+   const REAL_VALUE_TYPE *r)
+{
+  unsigned long image, sig, exp;
+  unsigned long sign = r->sign;
+  bool denormal = (r->sig[SIGSZ-1] & SIG_MSB) == 0;
+
+  image = sign << 15;
+  sig = (r->sig[SIGSZ-1] >> (HOST_BITS_PER_LONG - 8)) & 0x7f;
+
+  switch (r->cl)
+{
+case rvc_zero:
+  break;
+
+case rvc_inf:
+  if (fmt->has_inf)
+   image |= 255 << 7;
+  else
+   image |= 0x7fff;
+  break;
+
+case rvc_nan:
+  if (fmt->has_nans)
+   {
+ if (r->canonical)
+   sig = (fmt->canonical_nan_lsbs_set ? (1 << 6) - 1 : 0);
+ if (r->signalling == fmt->qnan_msb_set)
+   sig &= ~(1 << 6);
+ else
+   sig |= 1 << 6;
+ if (sig == 0)
+   sig = 1 << 5;
+
+ image |= 255 << 7;
+ image |= sig;
+   }
+  else
+   image |= 0x7fff;
+  break;
+
+case rvc_normal:
+  if (denormal)
+   exp = 0;
+  else
+  exp = REAL_EXP (r) + 127 - 1;
+  image |= exp << 7;
+  image |= sig;
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  buf[0] = image;
+}
+
+/* Decode arm_bfloat types.  */
+static void
+decode_arm_bfloat_half (const struct real_format *fmt, REAL_VALUE_TYPE *r,
+   const long *buf)
+{
+  unsigned long image = buf[0] & 0x;
+  bool sign = (image >> 15) & 1;
+  int exp = (image >> 7) & 0xff;
+
+  memset (r, 0, sizeof (*r));
+  image <<= HOST_BITS_PER_LONG - 8;
+  image &= ~SIG_MSB;
+
+  if (exp == 0)
+{
+  if (image && fmt->has_denorm)
+   {
+ r->cl = rvc_normal;
+ r->sign = sign;
+ SET_REAL_EXP (r, -126);
+ r->sig[SIGSZ-1] = image << 1;
+ normalize (r);
+   }
+  else if (fmt->has_signed_zero)
+   r->sign = sign;
+}
+  else if (exp == 255 && (fmt->has_nans || fmt->has_inf))
+{
+  if (image)
+   {
+ r->cl = rvc_nan;
+ r->sign = sign;
+ r->signalling = (((image >> (HOST_BITS_PER_LONG - 2)) & 1)
+  ^ fmt->qnan_msb_set);
+ r->sig[SIGSZ-1] = image;
+   }
+  else
+   {
+ r->cl = rvc_inf;
+ r->sign = sign;
+   }
+}
+  else
+{
+  r->cl = rvc_normal;
+  r->sign = sign;
+  SET_REAL_EXP (r, exp - 127 + 1);
+  r->sig[SIGSZ-1] = image | SIG_MSB;
+}
+}
+
  /* Half-precision format, as specified in IEEE 754R.  */
  const struct real_format ieee_half_format =
{
@@ -4848,6 +4958,33 @@ const struct real_format arm_half_format =
  false,
  "arm_half"
};
+
+/* ARM Bfloat half-precision format.  This format resembles a truncated
+   (16-bit) version of the 32-bit IEEE 754 single-precision floating-point
+   format.  */
+const struct real_format arm_bfloat_half_format =
+  {
+encode_arm_bfloat_half,
+decode_arm_bfloat_half,
+2,
+8,
+8,
+-125,
+128,
+15,
+15,
+0,
+false

[PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2019-12-02 Thread Stam Markianos-Wright


On 11/15/19 5:26 PM, Stam Markianos-Wright wrote:
> Pinging with more correct maintainers this time :)
> 
> Also would need to backport to gcc7,8,9, but need to get this approved 
> first!
> 
> Thank you,
> Stam
> 
> 
>  Forwarded Message 
> Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional 
> branches in Thumb2 (PR91816)
> Date: Mon, 21 Oct 2019 10:37:09 +0100
> From: Stam Markianos-Wright 
> To: Ramana Radhakrishnan 
> CC: gcc-patches@gcc.gnu.org , nd , 
> James Greenhalgh , Richard Earnshaw 
> 
> 
> 
> 
> On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:
>>>
>>> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
>>> however, on my native Aarch32 setup the test times out when run as part
>>> of a big "make check-gcc" regression, but not when run individually.
>>>
>>> 2019-10-11  Stamatis Markianos-Wright 
>>>
>>> * config/arm/arm.md: Update b for Thumb2 range checks.
>>> * config/arm/arm.c: New function arm_gen_far_branch.
>>>    * config/arm/arm-protos.h: New function arm_gen_far_branch
>>> prototype.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2019-10-11  Stamatis Markianos-Wright 
>>>
>>>    * testsuite/gcc.target/arm/pr91816.c: New test.
>>
>>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>>> index f995974f9bb..1dce333d1c3 100644
>>> --- a/gcc/config/arm/arm-protos.h
>>> +++ b/gcc/config/arm/arm-protos.h
>>> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
>>> cpu_arch_option *,
>>>   void arm_initialize_isa (sbitmap, const enum isa_feature *);
>>> +const char * arm_gen_far_branch (rtx *, int,const char * , const 
>>> char *);
>>> +
>>> +
>>
>> Lets get the nits out of the way.
>>
>> Unnecessary extra new line, need a space between int and const above.
>>
>>
> 
> .Fixed!
> 
>>>   #endif /* ! GCC_ARM_PROTOS_H */
>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>>> index 39e1a1ef9a2..1a693d2ddca 100644
>>> --- a/gcc/config/arm/arm.c
>>> +++ b/gcc/config/arm/arm.c
>>> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>>>   }
>>>   } /* Namespace selftest.  */
>>> +
>>> +/* Generate code to enable conditional branches in functions over 1 
>>> MiB.  */
>>> +const char *
>>> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
>>> +    const char * branch_format)
>>
>> Not sure if this is some munging from the attachment but check
>> vertical alignment of parameters.
>>
> 
> .Fixed!
> 
>>> +{
>>> +  rtx_code_label * tmp_label = gen_label_rtx ();
>>> +  char label_buf[256];
>>> +  char buffer[128];
>>> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
>>> +    CODE_LABEL_NUMBER (tmp_label));
>>> +  const char *label_ptr = arm_strip_name_encoding (label_buf);
>>> +  rtx dest_label = operands[pos_label];
>>> +  operands[pos_label] = tmp_label;
>>> +
>>> +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , 
>>> label_ptr);
>>> +  output_asm_insn (buffer, operands);
>>> +
>>> +  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
>>> label_ptr);
>>> +  operands[pos_label] = dest_label;
>>> +  output_asm_insn (buffer, operands);
>>> +  return "";
>>> +}
>>> +
>>> +
>>
>> Unnecessary extra newline.
>>
> 
> .Fixed!
> 
>>>   #undef TARGET_RUN_TARGET_SELFTESTS
>>>   #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
>>>   #endif /* CHECKING_P */
>>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>>> index f861c72ccfc..634fd0a59da 100644
>>> --- a/gcc/config/arm/arm.md
>>> +++ b/gcc/config/arm/arm.md
>>> @@ -6686,9 +6686,16 @@
>>>   ;; And for backward branches we have
>>>   ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or 
>>> -4) + 4).
>>>   ;;
>>> +;; In 16-bit Thumb these ranges are:
>>>   ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving 
>>> (-2040->2048).
>>>   ;; For a 'b' pos_range = 254,  neg_range = -256  giving (-250 
>>> ->256).
>>> +;; In 32-bit Thumb these ranges are:
>>> +;; For a 'b'   +/- 16MB is not 

Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-12-02 Thread Stam Markianos-Wright


On 11/25/19 2:54 PM, Stam Markianos-Wright wrote:
> 
> On 11/15/19 12:02 PM, Stam Markianos-Wright wrote:
>> Hi all,
>>
>> This patch adds support for a new real_format for ARM Brain Floating 
>> Point numbers to the middle end. This is to be used exclusively in the 
>> ARM back-end.
>>
>> The encode_arm_bfloat_half and decode_arm_bfloat_half functions are 
>> provided to satisfy real_format struct requirements, but are never 
>> intended to be called, which is why they are provided without an 
>> explicit test.
>>
>> Details on ARM Bfloat can be found here: 
>> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
>>  
>>
>>
>> Regtested on aarch64-none-elf for sanity.
>>
>> Is this ok for trunk?
> 
> Ping.

>>
>> Also, I do not have commit rights, so could someone commit this on my 
>> behalf?
> 
> Ping.

> 
> Thank you :)
> 
>>
>> Thank you!
>> Stam Markianos-Wright
>>
>>
>> 2019-11-13  Stam Markianos-Wright  
>>
>>    * real.c (struct arm_bfloat_half_format,
>>    encode_arm_bfloat_half, decode_arm_bfloat_half): New.
>>    * real.h (arm_bfloat_half_format): New.
>>
>>


Ping: [GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-11-25 Thread Stam Markianos-Wright

On 11/15/19 12:02 PM, Stam Markianos-Wright wrote:
> Hi all,
> 
> This patch adds support for a new real_format for ARM Brain Floating 
> Point numbers to the middle end. This is to be used exclusively in the 
> ARM back-end.
> 
> The encode_arm_bfloat_half and decode_arm_bfloat_half functions are 
> provided to satisfy real_format struct requirements, but are never 
> intended to be called, which is why they are provided without an 
> explicit test.
> 
> Details on ARM Bfloat can be found here: 
> https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a
>  
> 
> 
> Regtested on aarch64-none-elf for sanity.
> 
> Is this ok for trunk?

Ping.
> 
> Also, I do not have commit rights, so could someone commit this on my 
> behalf?

Ping.

Thank you :)

> 
> Thank you!
> Stam Markianos-Wright
> 
> 
> 2019-11-13  Stam Markianos-Wright  
> 
>    * real.c (struct arm_bfloat_half_format,
>    encode_arm_bfloat_half, decode_arm_bfloat_half): New.
>    * real.h (arm_bfloat_half_format): New.
> 
> 


[PING][PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2019-11-15 Thread Stam Markianos-Wright
Pinging with more correct maintainers this time :)

Also would need to backport to gcc7,8,9, but need to get this approved 
first!

Thank you,
Stam


 Forwarded Message 
Subject: Re: [PATCH][GCC][ARM] Arm generates out of range conditional 
branches in Thumb2 (PR91816)
Date: Mon, 21 Oct 2019 10:37:09 +0100
From: Stam Markianos-Wright 
To: Ramana Radhakrishnan 
CC: gcc-patches@gcc.gnu.org , nd , 
James Greenhalgh , Richard Earnshaw 




On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:
>>
>> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
>> however, on my native Aarch32 setup the test times out when run as part
>> of a big "make check-gcc" regression, but not when run individually.
>>
>> 2019-10-11  Stamatis Markianos-Wright 
>>
>>  * config/arm/arm.md: Update b for Thumb2 range checks.
>>  * config/arm/arm.c: New function arm_gen_far_branch.
>>  * config/arm/arm-protos.h: New function arm_gen_far_branch
>>  prototype.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-10-11  Stamatis Markianos-Wright 
>>
>>  * testsuite/gcc.target/arm/pr91816.c: New test.
> 
>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>> index f995974f9bb..1dce333d1c3 100644
>> --- a/gcc/config/arm/arm-protos.h
>> +++ b/gcc/config/arm/arm-protos.h
>> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
>> cpu_arch_option *,
>>   
>>   void arm_initialize_isa (sbitmap, const enum isa_feature *);
>>   
>> +const char * arm_gen_far_branch (rtx *, int,const char * , const char *);
>> +
>> +
> 
> Lets get the nits out of the way.
> 
> Unnecessary extra new line, need a space between int and const above.
> 
> 

.Fixed!

>>   #endif /* ! GCC_ARM_PROTOS_H */
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index 39e1a1ef9a2..1a693d2ddca 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>>   }
>>   } /* Namespace selftest.  */
>>   
>> +
>> +/* Generate code to enable conditional branches in functions over 1 MiB.  */
>> +const char *
>> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
>> +const char * branch_format)
> 
> Not sure if this is some munging from the attachment but check
> vertical alignment of parameters.
> 

.Fixed!

>> +{
>> +  rtx_code_label * tmp_label = gen_label_rtx ();
>> +  char label_buf[256];
>> +  char buffer[128];
>> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
>> +CODE_LABEL_NUMBER (tmp_label));
>> +  const char *label_ptr = arm_strip_name_encoding (label_buf);
>> +  rtx dest_label = operands[pos_label];
>> +  operands[pos_label] = tmp_label;
>> +
>> +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr);
>> +  output_asm_insn (buffer, operands);
>> +
>> +  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
>> label_ptr);
>> +  operands[pos_label] = dest_label;
>> +  output_asm_insn (buffer, operands);
>> +  return "";
>> +}
>> +
>> +
> 
> Unnecessary extra newline.
> 

.Fixed!

>>   #undef TARGET_RUN_TARGET_SELFTESTS
>>   #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
>>   #endif /* CHECKING_P */
>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>> index f861c72ccfc..634fd0a59da 100644
>> --- a/gcc/config/arm/arm.md
>> +++ b/gcc/config/arm/arm.md
>> @@ -6686,9 +6686,16 @@
>>   ;; And for backward branches we have
>>   ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4).
>>   ;;
>> +;; In 16-bit Thumb these ranges are:
>>   ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving 
>> (-2040->2048).
>>   ;; For a 'b' pos_range = 254,  neg_range = -256  giving (-250 ->256).
>>   
>> +;; In 32-bit Thumb these ranges are:
>> +;; For a 'b'   +/- 16MB is not checked for.
>> +;; For a 'b' pos_range = 1048574,  neg_range = -1048576  giving
>> +;; (-1048568 -> 1048576).
>> +
>> +
> 
> Unnecessary extra newline.
> 

.Fixed!

>>   (define_expand "cbranchsi4"
>> [(set (pc) (if_then_else
>>(match_operator 0 "expandable_comparison_operator"
>> @@ -6947,22 +6954,42 @@
>>(pc)))]
>> "TARGET_32BIT"
>> "*
>> -  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>> -{
&

[GCC][PATCH] Add ARM-specific Bfloat format support to middle-end

2019-11-15 Thread Stam Markianos-Wright
Hi all,

This patch adds support for a new real_format for ARM Brain Floating 
Point numbers to the middle end. This is to be used exclusively in the 
ARM back-end.

The encode_arm_bfloat_half and decode_arm_bfloat_half functions are 
provided to satisfy real_format struct requirements, but are never 
intended to be called, which is why they are provided without an 
explicit test.

Details on ARM Bfloat can be found here: 
https://community.arm.com/developer/ip-products/processors/b/ml-ip-blog/posts/bfloat16-processing-for-neural-networks-on-armv8_2d00_a

Regtested on aarch64-none-elf for sanity.

Is this ok for trunk?

Also, I do not have commit rights, so could someone commit this on my 
behalf?

Thank you!
Stam Markianos-Wright


2019-11-13  Stam Markianos-Wright  

   * real.c (struct arm_bfloat_half_format,
   encode_arm_bfloat_half, decode_arm_bfloat_half): New.
   * real.h (arm_bfloat_half_format): New.


diff --git a/gcc/real.h b/gcc/real.h
index 0f660c9c671..2b337bb7f7d 100644
--- a/gcc/real.h
+++ b/gcc/real.h
@@ -368,6 +368,7 @@ extern const struct real_format decimal_double_format;
 extern const struct real_format decimal_quad_format;
 extern const struct real_format ieee_half_format;
 extern const struct real_format arm_half_format;
+extern const struct real_format arm_bfloat_half_format;
 
 
 /* == */
diff --git a/gcc/real.c b/gcc/real.c
index 90067f0087b..671a21241d8 100644
--- a/gcc/real.c
+++ b/gcc/real.c
@@ -4799,6 +4799,116 @@ decode_ieee_half (const struct real_format *fmt, REAL_VALUE_TYPE *r,
 }
 }
 
+/* Encode arm_bfloat types.  */
+static void
+encode_arm_bfloat_half (const struct real_format *fmt, long *buf,
+		const REAL_VALUE_TYPE *r)
+{
+  unsigned long image, sig, exp;
+  unsigned long sign = r->sign;
+  bool denormal = (r->sig[SIGSZ-1] & SIG_MSB) == 0;
+
+  image = sign << 15;
+  sig = (r->sig[SIGSZ-1] >> (HOST_BITS_PER_LONG - 8)) & 0x7f;
+
+  switch (r->cl)
+{
+case rvc_zero:
+  break;
+
+case rvc_inf:
+  if (fmt->has_inf)
+	image |= 255 << 7;
+  else
+	image |= 0x7fff;
+  break;
+
+case rvc_nan:
+  if (fmt->has_nans)
+	{
+	  if (r->canonical)
+	sig = (fmt->canonical_nan_lsbs_set ? (1 << 6) - 1 : 0);
+	  if (r->signalling == fmt->qnan_msb_set)
+	sig &= ~(1 << 6);
+	  else
+	sig |= 1 << 6;
+	  if (sig == 0)
+	sig = 1 << 5;
+
+	  image |= 255 << 7;
+	  image |= sig;
+	}
+  else
+	image |= 0x7fff;
+  break;
+
+case rvc_normal:
+  if (denormal)
+	exp = 0;
+  else
+  exp = REAL_EXP (r) + 127 - 1;
+  image |= exp << 7;
+  image |= sig;
+  break;
+
+default:
+  gcc_unreachable ();
+}
+
+  buf[0] = image;
+}
+
+/* Decode arm_bfloat types.  */
+static void
+decode_arm_bfloat_half (const struct real_format *fmt, REAL_VALUE_TYPE *r,
+		const long *buf)
+{
+  unsigned long image = buf[0] & 0x;
+  bool sign = (image >> 15) & 1;
+  int exp = (image >> 7) & 0xff;
+
+  memset (r, 0, sizeof (*r));
+  image <<= HOST_BITS_PER_LONG - 8;
+  image &= ~SIG_MSB;
+
+  if (exp == 0)
+{
+  if (image && fmt->has_denorm)
+	{
+	  r->cl = rvc_normal;
+	  r->sign = sign;
+	  SET_REAL_EXP (r, -126);
+	  r->sig[SIGSZ-1] = image << 1;
+	  normalize (r);
+	}
+  else if (fmt->has_signed_zero)
+	r->sign = sign;
+}
+  else if (exp == 255 && (fmt->has_nans || fmt->has_inf))
+{
+  if (image)
+	{
+	  r->cl = rvc_nan;
+	  r->sign = sign;
+	  r->signalling = (((image >> (HOST_BITS_PER_LONG - 2)) & 1)
+			   ^ fmt->qnan_msb_set);
+	  r->sig[SIGSZ-1] = image;
+	}
+  else
+	{
+	  r->cl = rvc_inf;
+	  r->sign = sign;
+	}
+}
+  else
+{
+  r->cl = rvc_normal;
+  r->sign = sign;
+  SET_REAL_EXP (r, exp - 127 + 1);
+  r->sig[SIGSZ-1] = image | SIG_MSB;
+}
+}
+
 /* Half-precision format, as specified in IEEE 754R.  */
 const struct real_format ieee_half_format =
   {
@@ -4848,6 +4958,33 @@ const struct real_format arm_half_format =
 false,
 "arm_half"
   };
+
+/* ARM Bfloat half-precision format.  This format resembles a truncated
+   (16-bit) version of the 32-bit IEEE 754 single-precision floating-point
+   format.  */
+const struct real_format arm_bfloat_half_format =
+  {
+encode_arm_bfloat_half,
+decode_arm_bfloat_half,
+2,
+8,
+8,
+-125,
+128,
+15,
+15,
+0,
+false,
+true,
+true,
+true,
+true,
+true,
+true,
+false,
+"bfloat_half"
+  };
+
 
 /* A synthetic "format" for internal arithmetic.  It's the size of the
internal significand minus the two bits needed for proper rounding.




Re: [PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2019-10-21 Thread Stam Markianos-Wright


On 10/13/19 4:23 PM, Ramana Radhakrishnan wrote:
>>
>> Patch bootstrapped and regression tested on arm-none-linux-gnueabihf,
>> however, on my native Aarch32 setup the test times out when run as part
>> of a big "make check-gcc" regression, but not when run individually.
>>
>> 2019-10-11  Stamatis Markianos-Wright 
>>
>>  * config/arm/arm.md: Update b for Thumb2 range checks.
>>  * config/arm/arm.c: New function arm_gen_far_branch.
>>  * config/arm/arm-protos.h: New function arm_gen_far_branch
>>  prototype.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2019-10-11  Stamatis Markianos-Wright 
>>
>>  * testsuite/gcc.target/arm/pr91816.c: New test.
> 
>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
>> index f995974f9bb..1dce333d1c3 100644
>> --- a/gcc/config/arm/arm-protos.h
>> +++ b/gcc/config/arm/arm-protos.h
>> @@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const 
>> cpu_arch_option *,
>>   
>>   void arm_initialize_isa (sbitmap, const enum isa_feature *);
>>   
>> +const char * arm_gen_far_branch (rtx *, int,const char * , const char *);
>> +
>> +
> 
> Lets get the nits out of the way.
> 
> Unnecessary extra new line, need a space between int and const above.
> 
> 

.Fixed!

>>   #endif /* ! GCC_ARM_PROTOS_H */
>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
>> index 39e1a1ef9a2..1a693d2ddca 100644
>> --- a/gcc/config/arm/arm.c
>> +++ b/gcc/config/arm/arm.c
>> @@ -32139,6 +32139,31 @@ arm_run_selftests (void)
>>   }
>>   } /* Namespace selftest.  */
>>   
>> +
>> +/* Generate code to enable conditional branches in functions over 1 MiB.  */
>> +const char *
>> +arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
>> +const char * branch_format)
> 
> Not sure if this is some munging from the attachment but check
> vertical alignment of parameters.
> 

.Fixed!

>> +{
>> +  rtx_code_label * tmp_label = gen_label_rtx ();
>> +  char label_buf[256];
>> +  char buffer[128];
>> +  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
>> +CODE_LABEL_NUMBER (tmp_label));
>> +  const char *label_ptr = arm_strip_name_encoding (label_buf);
>> +  rtx dest_label = operands[pos_label];
>> +  operands[pos_label] = tmp_label;
>> +
>> +  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr);
>> +  output_asm_insn (buffer, operands);
>> +
>> +  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, 
>> label_ptr);
>> +  operands[pos_label] = dest_label;
>> +  output_asm_insn (buffer, operands);
>> +  return "";
>> +}
>> +
>> +
> 
> Unnecessary extra newline.
> 

.Fixed!

>>   #undef TARGET_RUN_TARGET_SELFTESTS
>>   #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
>>   #endif /* CHECKING_P */
>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
>> index f861c72ccfc..634fd0a59da 100644
>> --- a/gcc/config/arm/arm.md
>> +++ b/gcc/config/arm/arm.md
>> @@ -6686,9 +6686,16 @@
>>   ;; And for backward branches we have
>>   ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4).
>>   ;;
>> +;; In 16-bit Thumb these ranges are:
>>   ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving 
>> (-2040->2048).
>>   ;; For a 'b' pos_range = 254,  neg_range = -256  giving (-250 ->256).
>>   
>> +;; In 32-bit Thumb these ranges are:
>> +;; For a 'b'   +/- 16MB is not checked for.
>> +;; For a 'b' pos_range = 1048574,  neg_range = -1048576  giving
>> +;; (-1048568 -> 1048576).
>> +
>> +
> 
> Unnecessary extra newline.
> 

.Fixed!

>>   (define_expand "cbranchsi4"
>> [(set (pc) (if_then_else
>>(match_operator 0 "expandable_comparison_operator"
>> @@ -6947,22 +6954,42 @@
>>(pc)))]
>> "TARGET_32BIT"
>> "*
>> -  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>> -{
>> -  arm_ccfsm_state += 2;
>> -  return \"\";
>> -}
>> -  return \"b%d1\\t%l0\";
>> + if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
>> +  {
>> +arm_ccfsm_state += 2;
>> +return \"\";
>> +  }
>> + switch (get_attr_length (insn))
>> +  {
>> +// Thumb2 16-bit b{cond}
>> +case 2:
>> +
>> +// Thumb2 32-bit b{cond}
>> +case 4: return \"b%d1\\t%l0\";break;
>> +
>> +// Thumb2 b{cond} out of range.  Use unconditional branch.
>> +case 8: return arm_gen_far_branch \
>> +(operands, 0, \"Lbcond\", \"b%D1\t\");
>> +break;
>> +
>> +// A32 b{cond}
>> +default: return \"b%d1\\t%l0\";
>> +  }
> 
> Please fix indentation here.
> 

.Fixed together with below changes.

>> "
>> [(set_attr "conds" "use")
>>  (set_attr "type" "branch")
>>  (set (attr "length")
>> -(if_then_else
>> -   (and (match_test "TARGET_THUMB2")
>> -(and (ge (minus (match_dup 0) (pc)) (const_int -250))
>> - (le (minus (match_dup 0) (pc)) (const_int 256
>> -   (const_int 2)
>> -   (const_int 4)))]
>> +(if_then_else (match_test 

[PATCH][GCC][ARM] Arm generates out of range conditional branches in Thumb2 (PR91816)

2019-10-11 Thread Stam Markianos-Wright
Hi all,

This is a patch for an issue where the compiler was generating a 
conditional branch in Thumb2, which was too far for b{cond} to handle.

This was originally reported at binutils:
https://sourceware.org/bugzilla/show_bug.cgi?id=24991

And then raised for GCC:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91816


As can be seen here:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0489c/Cihfddaf.html

the range of a 32-bit Thumb B{cond} is +/-1MB.

This is now checked for in arm.md and an unconditional branch is 
generated if the jump would be greater than 1MB.

New test has been written that checks this for: beq (if (a)), bne (if 
(a==1))

Patch bootstrapped and regression tested on arm-none-linux-gnueabihf, 
however, on my native Aarch32 setup the test times out when run as part 
of a big "make check-gcc" regression, but not when run individually.

Patch also regression tested on arm-none-eabi, arm-none-linux-gnueabi 
with no issues.

Also, I don't have commit rights yet, so could someone commit it on my 
behalf?

Thanks,
Stam Markianos-Wright



gcc/ChangeLog:

2019-10-11  Stamatis Markianos-Wright 

* config/arm/arm.md: Update b for Thumb2 range checks.
* config/arm/arm.c: New function arm_gen_far_branch.
* config/arm/arm-protos.h: New function arm_gen_far_branch
prototype.

gcc/testsuite/ChangeLog:

2019-10-11  Stamatis Markianos-Wright 

* testsuite/gcc.target/arm/pr91816.c: New test.
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f995974f9bb..1dce333d1c3 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -570,4 +570,7 @@ void arm_parse_option_features (sbitmap, const cpu_arch_option *,
 
 void arm_initialize_isa (sbitmap, const enum isa_feature *);
 
+const char * arm_gen_far_branch (rtx *, int,const char * , const char *);
+
+
 #endif /* ! GCC_ARM_PROTOS_H */
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 39e1a1ef9a2..1a693d2ddca 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -32139,6 +32139,31 @@ arm_run_selftests (void)
 }
 } /* Namespace selftest.  */
 
+
+/* Generate code to enable conditional branches in functions over 1 MiB.  */
+const char *
+arm_gen_far_branch (rtx * operands, int pos_label, const char * dest,
+			const char * branch_format)
+{
+  rtx_code_label * tmp_label = gen_label_rtx ();
+  char label_buf[256];
+  char buffer[128];
+  ASM_GENERATE_INTERNAL_LABEL (label_buf, dest , \
+			CODE_LABEL_NUMBER (tmp_label));
+  const char *label_ptr = arm_strip_name_encoding (label_buf);
+  rtx dest_label = operands[pos_label];
+  operands[pos_label] = tmp_label;
+
+  snprintf (buffer, sizeof (buffer), "%s%s", branch_format , label_ptr);
+  output_asm_insn (buffer, operands);
+
+  snprintf (buffer, sizeof (buffer), "b\t%%l0%d\n%s:", pos_label, label_ptr);
+  operands[pos_label] = dest_label;
+  output_asm_insn (buffer, operands);
+  return "";
+}
+
+
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::arm_run_selftests
 #endif /* CHECKING_P */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index f861c72ccfc..634fd0a59da 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6686,9 +6686,16 @@
 ;; And for backward branches we have 
 ;;   (neg_range - neg_base_offs + pc_offs) = (neg_range - (-2 or -4) + 4).
 ;;
+;; In 16-bit Thumb these ranges are:
 ;; For a 'b'   pos_range = 2046, neg_range = -2048 giving (-2040->2048).
 ;; For a 'b' pos_range = 254,  neg_range = -256  giving (-250 ->256).
 
+;; In 32-bit Thumb these ranges are:
+;; For a 'b'   +/- 16MB is not checked for.
+;; For a 'b' pos_range = 1048574,  neg_range = -1048576  giving
+;; (-1048568 -> 1048576).
+
+
 (define_expand "cbranchsi4"
   [(set (pc) (if_then_else
 	  (match_operator 0 "expandable_comparison_operator"
@@ -6947,22 +6954,42 @@
 		  (pc)))]
   "TARGET_32BIT"
   "*
-  if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
-{
-  arm_ccfsm_state += 2;
-  return \"\";
-}
-  return \"b%d1\\t%l0\";
+ if (arm_ccfsm_state == 1 || arm_ccfsm_state == 2)
+  {
+	arm_ccfsm_state += 2;
+	return \"\";
+  }
+ switch (get_attr_length (insn))
+  {
+	// Thumb2 16-bit b{cond}
+	case 2:
+
+	// Thumb2 32-bit b{cond}
+	case 4: return \"b%d1\\t%l0\";break;
+
+	// Thumb2 b{cond} out of range.  Use unconditional branch.
+	case 8: return arm_gen_far_branch \
+		(operands, 0, \"Lbcond\", \"b%D1\t\");
+	break;
+
+	// A32 b{cond}
+	default: return \"b%d1\\t%l0\";
+  }
   "
   [(set_attr "conds" "use")
(set_attr "type" "branch")
(set (attr "length")
-	(if_then_else
-	   (and (match_test "TARGET_THUMB2")
-		(and (ge (minus (match_dup 0) (pc)) (const_int -250))
-		

[GCC][PATCH][AArch64] Update hwcap string for fp16fml in aarch64-option-extensions.def

2019-09-10 Thread Stam Markianos-Wright

Hi all,

This is a minor patch that fixes the entry for the fp16fml feature in 
GCC's aarch64-option-extensions.def.

As can be seen in the Linux sources here 
https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/cpuinfo.c#L69 
the correct string is "asimdfhm", not "asimdfml".

Cross-compiled and tested on aarch64-none-linux-gnu.

Is this ok for trunk?

Also, I don't have commit rights, so could someone commit it on my behalf?

Thanks,
Stam Markianos-Wright


The diff is:

diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
b/gcc/config/aarch64/aarch64-option-extensions.def
index 9919edd43d0..60e8f28fff5 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -135,7 +135,7 @@ AARCH64_OPT_EXTENSION("sm4", AARCH64_FL_SM4, 
AARCH64_FL_SIMD, \
   /* Enabling "fp16fml" also enables "fp" and "fp16".
  Disabling "fp16fml" just disables "fp16fml".  */
   AARCH64_OPT_EXTENSION("fp16fml", AARCH64_FL_F16FML, \
-  AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfml")
+  AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfhm")

   /* Enabling "sve" also enables "fp16", "fp" and "simd".
  Disabling "sve" disables "sve", "sve2", "sve2-aes", "sve2-sha3", 
"sve2-sm4"



gcc/ChangeLog:

2019-09-09  Stamatis Markianos-Wright 

  * config/aarch64/aarch64-option-extensions.def: Updated hwcap 
string for fp16fml.


diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def
index 9919edd43d0..60e8f28fff5 100644
--- a/gcc/config/aarch64/aarch64-option-extensions.def
+++ b/gcc/config/aarch64/aarch64-option-extensions.def
@@ -135,7 +135,7 @@ AARCH64_OPT_EXTENSION("sm4", AARCH64_FL_SM4, AARCH64_FL_SIMD, \
 /* Enabling "fp16fml" also enables "fp" and "fp16".
Disabling "fp16fml" just disables "fp16fml".  */
 AARCH64_OPT_EXTENSION("fp16fml", AARCH64_FL_F16FML, \
-		  AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfml")
+		  AARCH64_FL_FP | AARCH64_FL_F16, 0, false, "asimdfhm")
 
 /* Enabling "sve" also enables "fp16", "fp" and "simd".
Disabling "sve" disables "sve", "sve2", "sve2-aes", "sve2-sha3", "sve2-sm4"